.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/evaluation/plot_qce_donut_overview.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_evaluation_plot_qce_donut_overview.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_evaluation_plot_qce_donut_overview.py:


Read quantile miscalibration with ``plot_qce_donut``
====================================================

This lesson explains how to use
``geoprior.plot.evaluation.plot_qce_donut`` when you
want to understand **which quantiles contribute most**
to calibration error.

Why this function matters
-------------------------
A calibration score such as QCE is useful, but it is
also compact. Once you know the average error, the next
question is usually:

*Which quantiles are causing the problem?*

That is exactly where the donut view helps.

Instead of showing only one overall number, this helper
breaks the calibration mismatch into one contribution
per quantile level. That makes it easier to answer
practical questions such as:

- Are the outer quantiles causing most of the problem?
- Is the median reasonably calibrated while the tails
  are not?
- Is one forecast variant globally better only because
  one bad quantile improved?
- Which quantile should I inspect first in a more
  detailed reliability diagram?

This page is written as a **teaching guide**, not only
as a quick API example. We will build realistic forecast
columns, draw donut charts, compare a better and a
worse forecast, and finish with a checklist for using
this helper on your own saved tables.

.. GENERATED FROM PYTHON SOURCE LINES 44-61

.. code-block:: Python


    from __future__ import annotations

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd

    from geoprior.plot.evaluation import plot_qce_donut

    pd.set_option("display.max_columns", 20)
    pd.set_option("display.width", 110)
    pd.set_option(
        "display.float_format",
        lambda v: f"{v:0.4f}",
    )


.. GENERATED FROM PYTHON SOURCE LINES 62-98

What this function expects
--------------------------

``plot_qce_donut`` works directly with a DataFrame.
That is an important difference from several other
evaluation helpers in this gallery, which usually take
NumPy arrays.

The required inputs are:

- ``df``: a DataFrame containing one column of observed
  values and several quantile prediction columns,
- ``actual_col``: the name of the observed-value column,
- ``quantile_cols``: an ordered list of the quantile
  prediction columns,
- ``quantile_levels``: the matching list of quantile
  levels.

The order matters. If your columns are
``['q10', 'q50', 'q90']``, the levels must be
``[0.10, 0.50, 0.90]`` in the same order.

The helper then computes, for each quantile,
the absolute gap:

``| observed proportion - nominal quantile level |``

and uses those per-quantile gaps as donut segments.

A key reading rule is simple:

- a larger donut segment means that quantile is making
  a larger contribution to average calibration error,
- the number in the center summarizes the average QCE,
- and the donut should usually be read **with** a
  reliability diagram, not instead of one.

.. GENERATED FROM PYTHON SOURCE LINES 101-111

Build a realistic forecast table
--------------------------------

For a teaching page, we want one forecast variant that
is reasonably calibrated and one that is visibly biased
and too narrow.

We use five quantiles because that is rich enough to
show where calibration error lives without making the
legend too crowded.

.. GENERATED FROM PYTHON SOURCE LINES 111-173

.. code-block:: Python


    rng = np.random.default_rng(34)

    n_samples = 260
    quantile_levels = [0.10, 0.25, 0.50, 0.75, 0.90]
    quantile_cols = [
        "subsidence_q10",
        "subsidence_q25",
        "subsidence_q50",
        "subsidence_q75",
        "subsidence_q90",
    ]

    # Approximate z-scores for the chosen quantiles.
    z = np.array([-1.2816, -0.6745, 0.0, 0.6745, 1.2816])

    x = np.linspace(0.0, 5.0 * np.pi, n_samples)
    y_true = (
        18.0
        + 2.3 * np.sin(x / 3.0)
        + 0.8 * np.cos(x / 7.0)
        + rng.normal(scale=0.95, size=n_samples)
    )

    center_good = (
        18.0
        + 2.2 * np.sin(x / 3.0)
        + 0.75 * np.cos(x / 7.0)
    )
    spread_good = 1.00 + 0.10 * np.sin(x / 4.0)
    q_good = center_good[:, None] + spread_good[:, None] * z[None, :]

    df_good = pd.DataFrame(
        {
            "subsidence_actual": y_true,
            quantile_cols[0]: q_good[:, 0],
            quantile_cols[1]: q_good[:, 1],
            quantile_cols[2]: q_good[:, 2],
            quantile_cols[3]: q_good[:, 3],
            quantile_cols[4]: q_good[:, 4],
        }
    )

    center_bad = center_good + 0.35
    spread_bad = 0.58 + 0.04 * np.cos(x / 2.4)
    q_bad = center_bad[:, None] + spread_bad[:, None] * z[None, :]

    df_bad = pd.DataFrame(
        {
            "subsidence_actual": y_true,
            quantile_cols[0]: q_bad[:, 0],
            quantile_cols[1]: q_bad[:, 1],
            quantile_cols[2]: q_bad[:, 2],
            quantile_cols[3]: q_bad[:, 3],
            quantile_cols[4]: q_bad[:, 4],
        }
    )

    print("Preview of the better-calibrated table")
    print(df_good.head(8))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Preview of the better-calibrated table
       subsidence_actual  subsidence_q10  subsidence_q25  subsidence_q50  subsidence_q75  subsidence_q90
    0            18.7621         17.4684         18.0755         18.7500         19.4245         20.0316
    1            17.6516         17.5109         18.1189         18.7944         19.4700         20.0780
    2            21.3382         17.5533         18.1623         18.8388         19.5154         20.1243
    3            19.3968         17.5957         18.2055         18.8831         19.5607         20.1705
    4            19.5967         17.6379         18.2487         18.9273         19.6058         20.2166
    5            18.8338         17.6800         18.2917         18.9713         19.6509         20.2626
    6            19.1326         17.7219         18.3346         19.0152         19.6958         20.3084
    7            19.4428         17.7637         18.3773         19.0589         19.7406         20.3541


.. GENERATED FROM PYTHON SOURCE LINES 174-186

Start with a single donut chart
--------------------------------

A single donut is the easiest way to learn the plot.

Here the colored segments are not showing forecast
values themselves. They are showing the **size of the
calibration mismatch** for each quantile level.

If one segment is much larger than the others, that
quantile is contributing disproportionately to the
average QCE.

.. GENERATED FROM PYTHON SOURCE LINES 186-233

.. code-block:: Python


    fig, axes = plt.subplots(
        1,
        2,
        figsize=(11.2, 5.2),
        constrained_layout=True,
    )

    plot_qce_donut(
        df_good,
        actual_col="subsidence_actual",
        quantile_cols=quantile_cols,
        quantile_levels=quantile_levels,
        title="QCE donut: better-calibrated forecast",
        colors=[
            "#355C7D",
            "#4E79A7",
            "#59A14F",
            "#F28E2B",
            "#E15759",
        ],
        center_text_format="Avg QCE:\n{:.4f}",
        donut_width=0.42,
        legend_bbox_to_anchor=(1.02, 0.5),
        ax=axes[0],
    )

    plot_qce_donut(
        df_bad,
        actual_col="subsidence_actual",
        quantile_cols=quantile_cols,
        quantile_levels=quantile_levels,
        title="QCE donut: biased and narrow forecast",
        colors=[
            "#355C7D",
            "#4E79A7",
            "#59A14F",
            "#F28E2B",
            "#E15759",
        ],
        center_text_format="Avg QCE:\n{:.4f}",
        donut_width=0.42,
        legend_bbox_to_anchor=(1.02, 0.5),
        ax=axes[1],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_qce_donut_overview_001.png
   :alt: QCE donut: better-calibrated forecast, QCE donut: biased and narrow forecast
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_qce_donut_overview_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'QCE donut: biased and narrow forecast'}>


.. GENERATED FROM PYTHON SOURCE LINES 234-254

How to read the donut correctly
-------------------------------

A useful reading order is:

1. read the number in the center,
2. compare the largest segments,
3. identify whether the problem is concentrated in the
   tails or spread across all quantiles,
4. then move to a reliability diagram if you want to
   know the direction of the error.

This last point is important.

The donut shows the **magnitude** of each quantile's
calibration mismatch, but not its sign. It tells you
*how much* a quantile is contributing to error, not
whether that quantile is systematically too high or too
low. That is why this plot works best as a companion to
``plot_quantile_calibration``.

.. GENERATED FROM PYTHON SOURCE LINES 257-274

Highlight the practical comparison
----------------------------------

The two donuts above tell a compact story:

- the better forecast should show a smaller center QCE,
- the worse forecast should usually show one or more
  visibly dominant segments,
- and the dominant segments often live in the lower or
  upper tails when the forecast is too narrow.

That makes the donut especially useful in reports. It
gives a compact answer to: *where is the calibration
problem concentrated?*

To reinforce that idea, we can print the empirical
proportions by hand.

.. GENERATED FROM PYTHON SOURCE LINES 274-312

.. code-block:: Python


    def empirical_props(
        df: pd.DataFrame,
        *,
        actual_col: str,
        quantile_cols: list[str],
    ) -> np.ndarray:
        y = df[actual_col].to_numpy()
        q = df[quantile_cols].to_numpy()
        return np.mean(y[:, None] <= q, axis=0)


    summary = pd.DataFrame(
        {
            "quantile": quantile_levels,
            "observed_good": empirical_props(
                df_good,
                actual_col="subsidence_actual",
                quantile_cols=quantile_cols,
            ),
            "observed_bad": empirical_props(
                df_bad,
                actual_col="subsidence_actual",
                quantile_cols=quantile_cols,
            ),
        }
    )
    summary["abs_gap_good"] = np.abs(
        summary["observed_good"] - summary["quantile"]
    )
    summary["abs_gap_bad"] = np.abs(
        summary["observed_bad"] - summary["quantile"]
    )

    print("\nPer-quantile calibration summary")
    print(summary)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Per-quantile calibration summary
       quantile  observed_good  observed_bad  abs_gap_good  abs_gap_bad
    0    0.1000         0.1000        0.3308        0.0000       0.2308
    1    0.2500         0.2192        0.4923        0.0308       0.2423
    2    0.5000         0.5077        0.6423        0.0077       0.1423
    3    0.7500         0.7923        0.8038        0.0423       0.0538
    4    0.9000         0.9462        0.9077        0.0462       0.0077


.. GENERATED FROM PYTHON SOURCE LINES 313-327

Use a weighted version when some samples matter more
----------------------------------------------------

The helper also accepts ``sample_weight`` inside
``metric_kws``. That is useful when the user wants some
rows to count more heavily, for example:

- larger-population zones,
- higher-risk locations,
- or later forecast steps that matter more in a given
  decision setting.

Here we simulate weights that put more emphasis on the
higher-amplitude part of the series.

.. GENERATED FROM PYTHON SOURCE LINES 327-373

.. code-block:: Python


    weights = 1.0 + 0.8 * (y_true > np.quantile(y_true, 0.70))

    fig, axes = plt.subplots(
        1,
        2,
        figsize=(11.0, 5.0),
        constrained_layout=True,
    )

    plot_qce_donut(
        df_bad,
        actual_col="subsidence_actual",
        quantile_cols=quantile_cols,
        quantile_levels=quantile_levels,
        title="Unweighted QCE contributions",
        colors=[
            "#2F4858",
            "#33658A",
            "#86BBD8",
            "#F6AE2D",
            "#F26419",
        ],
        donut_width=0.40,
        ax=axes[0],
    )

    plot_qce_donut(
        df_bad,
        actual_col="subsidence_actual",
        quantile_cols=quantile_cols,
        quantile_levels=quantile_levels,
        metric_kws={"sample_weight": weights},
        title="Weighted QCE contributions",
        colors=[
            "#2F4858",
            "#33658A",
            "#86BBD8",
            "#F6AE2D",
            "#F26419",
        ],
        donut_width=0.40,
        ax=axes[1],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_qce_donut_overview_002.png
   :alt: Unweighted QCE contributions, Weighted QCE contributions
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_qce_donut_overview_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'Weighted QCE contributions'}>


.. GENERATED FROM PYTHON SOURCE LINES 374-388

Why the weighted view is worth teaching
---------------------------------------

The weighted and unweighted donuts may look similar,
but they answer slightly different questions:

- the unweighted donut asks how the model calibrates on
  average across rows,
- the weighted donut asks how the model calibrates when
  some rows are treated as more important.

This distinction is valuable in applied work. A model
may look acceptable overall while still misrepresenting
uncertainty in the cases that matter most.

.. GENERATED FROM PYTHON SOURCE LINES 391-401

Handle missing values deliberately
----------------------------------

In real forecast tables, a few quantile columns may be
missing for some rows. The helper supports ``nan_policy``
through ``metric_kws``.

For a teaching page, the safest policy to demonstrate is
``'omit'`` because it removes rows with NaNs before the
contributions are calculated.

.. GENERATED FROM PYTHON SOURCE LINES 401-450

.. code-block:: Python


    df_nan = df_bad.copy()
    df_nan.loc[5:10, "subsidence_q90"] = np.nan
    df_nan.loc[40:44, "subsidence_q10"] = np.nan

    fig, axes = plt.subplots(
        1,
        2,
        figsize=(11.0, 5.0),
        constrained_layout=True,
    )

    plot_qce_donut(
        df_bad,
        actual_col="subsidence_actual",
        quantile_cols=quantile_cols,
        quantile_levels=quantile_levels,
        title="Original table",
        colors=[
            "#6C5B7B",
            "#C06C84",
            "#F67280",
            "#F8B195",
            "#355C7D",
        ],
        donut_width=0.40,
        ax=axes[0],
    )

    plot_qce_donut(
        df_nan,
        actual_col="subsidence_actual",
        quantile_cols=quantile_cols,
        quantile_levels=quantile_levels,
        metric_kws={"nan_policy": "omit"},
        title="Same table with NaNs omitted",
        colors=[
            "#6C5B7B",
            "#C06C84",
            "#F67280",
            "#F8B195",
            "#355C7D",
        ],
        donut_width=0.40,
        ax=axes[1],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_qce_donut_overview_003.png
   :alt: Original table, Same table with NaNs omitted
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_qce_donut_overview_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'Same table with NaNs omitted'}>


.. GENERATED FROM PYTHON SOURCE LINES 451-487

How to use this function on your own data
-----------------------------------------

A practical workflow is:

1. start from a forecast table with one observed-value
   column and several quantile columns,
2. order the quantile columns exactly as their levels,
3. pass the DataFrame directly to ``plot_qce_donut``,
4. use custom colors so the same quantile always keeps
   the same visual identity across pages,
5. read the center value first,
6. then inspect the largest donut segments,
7. then confirm the direction of the problem with
   ``plot_quantile_calibration``.

In many reports, that pair works very well:

- the donut explains **where** the calibration error is
  concentrated,
- the reliability diagram explains **how** it is wrong.

For a saved forecast table, the code often looks like:

``df = pd.read_csv("my_forecast_eval.csv")``

``plot_qce_donut(``
``    df,``
``    actual_col="subsidence_actual",``
``    quantile_cols=["subsidence_q10", "subsidence_q50",``
``                   "subsidence_q90"],``
``    quantile_levels=[0.10, 0.50, 0.90],``
``)``

That is all you need when the table is already in tidy
forecast form.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.627 seconds)


.. _sphx_glr_download_auto_examples_evaluation_plot_qce_donut_overview.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_qce_donut_overview.ipynb <plot_qce_donut_overview.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_qce_donut_overview.py <plot_qce_donut_overview.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_qce_donut_overview.zip <plot_qce_donut_overview.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_