.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/evaluation/plot_quantile_calibration_overview.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_evaluation_plot_quantile_calibration_overview.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_evaluation_plot_quantile_calibration_overview.py:


Read quantile reliability with ``plot_quantile_calibration``
============================================================

This lesson explains how to use
``geoprior.plot.evaluation.plot_quantile_calibration`` when you want
to check whether predicted quantiles behave like they claim.

Why this function matters
-------------------------
A quantile forecast does not only need to look reasonable. It also
needs to be **calibrated**.

That means a forecasted 10th percentile should really behave like a
10th percentile, a 50th percentile should behave like a median, and a
90th percentile should really capture about 90 percent of the observed
values below it.

This distinction matters in practice:

- a forecast may look smooth and convincing while still being too
  narrow,
- a forecast may have good central accuracy while badly representing
  uncertainty,
- and two interval forecasts may look similar even though one is much
  better calibrated.

This page is written as a **teaching guide**, not only as a quick API
example. We will build small quantile-forecast examples, read the two
main plot styles, compare a better and a worse forecast, and end with a
simple checklist for adapting the helper to your own arrays or saved
forecast tables.

.. GENERATED FROM PYTHON SOURCE LINES 39-56

.. code-block:: Python


    from __future__ import annotations

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd

    from geoprior.plot.evaluation import plot_quantile_calibration

    pd.set_option("display.max_columns", 20)
    pd.set_option("display.width", 110)
    pd.set_option(
        "display.float_format",
        lambda v: f"{v:0.4f}",
    )


.. GENERATED FROM PYTHON SOURCE LINES 57-88

What this function expects
--------------------------

``plot_quantile_calibration`` is built for **quantile predictions**.
The required inputs are:

- ``y_true`` with shape ``(N,)`` for one output or ``(N, O)`` for
  multiple outputs,
- ``y_pred_quantiles`` with shape ``(N, Q)`` for one output or
  ``(N, O, Q)`` for multiple outputs,
- ``quantiles`` with shape ``(Q,)``.

In other words:

- ``N`` = number of samples,
- ``O`` = number of outputs,
- ``Q`` = number of quantile levels.

The helper offers two complementary views:

1. ``'reliability_diagram'`` shows **nominal** quantile levels on the
   x-axis and the **observed proportion** on the y-axis,
2. ``'summary_bar'`` shows the aggregated Quantile Calibration Error
   (QCE).

A key reading rule is simple:

- a curve close to the diagonal means better calibration,
- a lower QCE bar means better calibration,
- and calibration should always be read together with interval width
  or WIS, not in isolation.

.. GENERATED FROM PYTHON SOURCE LINES 91-102

Build a realistic single-output example
---------------------------------------

For a gallery lesson, we want two forecast behaviours that are easy to
distinguish:

- a better-calibrated forecast whose quantiles are centered close to
  the truth and have reasonable spread,
- and a worse forecast that is both biased and too narrow.

We use the same quantile levels for both so the comparison is fair.

.. GENERATED FROM PYTHON SOURCE LINES 102-152

.. code-block:: Python


    rng = np.random.default_rng(21)

    n_samples = 220
    quantiles = np.array([0.10, 0.25, 0.50, 0.75, 0.90])

    # Approximate standard-normal quantiles for the chosen levels.
    z = np.array([-1.2816, -0.6745, 0.0, 0.6745, 1.2816])

    x = np.linspace(0.0, 4.5 * np.pi, n_samples)
    y_true = (
        12.0
        + 1.4 * np.sin(x / 2.4)
        + 0.55 * np.cos(x / 5.2)
        + rng.normal(scale=0.75, size=n_samples)
    )

    center_good = (
        12.0
        + 1.35 * np.sin(x / 2.4)
        + 0.50 * np.cos(x / 5.2)
    )
    spread_good = 0.82 + 0.08 * np.sin(x / 3.0)
    y_pred_q_good = center_good[:, None] + spread_good[:, None] * z[None, :]

    center_bad = center_good + 0.35
    spread_bad = 0.48 + 0.04 * np.cos(x / 2.8)
    y_pred_q_bad = center_bad[:, None] + spread_bad[:, None] * z[None, :]

    preview = pd.DataFrame(
        {
            "y_true": y_true[:8],
            "q10_good": y_pred_q_good[:8, 0],
            "q50_good": y_pred_q_good[:8, 2],
            "q90_good": y_pred_q_good[:8, 4],
            "q10_bad": y_pred_q_bad[:8, 0],
            "q50_bad": y_pred_q_bad[:8, 2],
            "q90_bad": y_pred_q_bad[:8, 4],
        }
    )

    print("Single-output preview")
    print(preview)
    print("\nShapes")
    print(f"y_true shape          : {y_true.shape}")
    print(f"y_pred_q_good shape   : {y_pred_q_good.shape}")
    print(f"y_pred_q_bad shape    : {y_pred_q_bad.shape}")
    print(f"quantiles shape       : {quantiles.shape}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Single-output preview
       y_true  q10_good  q50_good  q90_good  q10_bad  q50_bad  q90_bad
    0 12.8191   11.4491   12.5000   13.5509  12.1836  12.8500  13.5164
    1 13.7206   11.4832   12.5363   13.5894  12.2198  12.8863  13.5527
    2 11.2854   11.5171   12.5724   13.6278  12.2561  12.9224  13.5888
    3 13.9274   11.5509   12.6085   13.6660  12.2922  12.9585  13.6248
    4 12.6642   11.5846   12.6443   13.7041  12.3281  12.9943  13.6606
    5 12.1367   11.6181   12.6800   13.7420  12.3640  13.0300  13.6961
    6 12.1712   11.6514   12.7155   13.7796  12.3996  13.0655  13.7315
    7 11.9979   11.6845   12.7508   13.8171  12.4350  13.1008  13.7666

    Shapes
    y_true shape          : (220,)
    y_pred_q_good shape   : (220, 5)
    y_pred_q_bad shape    : (220, 5)
    quantiles shape       : (5,)


.. GENERATED FROM PYTHON SOURCE LINES 153-164

Start with the reliability diagram
----------------------------------

This is the most intuitive entry point.

The x-axis shows the nominal quantile level, for example 0.10 or 0.90.
The y-axis shows the fraction of observed values that actually fell
below the predicted quantile.

A perfectly calibrated forecast would place the observed curve on the
diagonal. Deviations from that line tell you where calibration is off.

.. GENERATED FROM PYTHON SOURCE LINES 164-197

.. code-block:: Python


    fig, axes = plt.subplots(
        1,
        2,
        figsize=(11.6, 4.8),
        constrained_layout=True,
    )

    plot_quantile_calibration(
        y_true,
        y_pred_q_good,
        quantiles,
        kind="reliability_diagram",
        title="Reliability diagram: better forecast",
        perfect_calib_color="dimgray",
        observed_prop_color="teal",
        observed_prop_marker="o",
        ax=axes[0],
    )

    plot_quantile_calibration(
        y_true,
        y_pred_q_bad,
        quantiles,
        kind="reliability_diagram",
        title="Reliability diagram: biased and narrow forecast",
        perfect_calib_color="dimgray",
        observed_prop_color="darkorange",
        observed_prop_marker="s",
        ax=axes[1],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_quantile_calibration_overview_001.png
   :alt: Reliability diagram: better forecast (Avg. QCE: 0.0373), Reliability diagram: biased and narrow forecast (Avg. QCE: 0.1818)
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_quantile_calibration_overview_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'Reliability diagram: biased and narrow forecast\n(Avg. QCE: 0.1818)'}, xlabel='Nominal Quantile Level (q)', ylabel='Observed Proportion (y <= Q_pred(q))'>


.. GENERATED FROM PYTHON SOURCE LINES 198-218

How to read the reliability diagram
-----------------------------------

A useful reading habit is to compare the observed curve against the
diagonal one quantile at a time.

In practical terms:

- if the observed proportion at q=0.90 is much lower than 0.90, the
  upper quantile is too low or the forecast is too narrow,
- if the observed proportion at q=0.10 is much higher than 0.10, the
  lower quantile is too high,
- if the whole curve is shifted upward or downward, that often signals
  a systematic bias,
- and if the middle quantiles look fine while the extremes drift away,
  the tails are the main problem.

This is why the reliability diagram is often the best first figure for
teaching or debugging quantile forecasts: it shows *where* the
calibration problem lives.

.. GENERATED FROM PYTHON SOURCE LINES 221-232

Move to the summary view with QCE bars
--------------------------------------

The reliability diagram is diagnostic, but the summary bar gives the
quick ranking.

Quantile Calibration Error averages the mismatch between the nominal
quantile level and the observed proportion. Lower is better.

The summary bar is especially useful when the user wants a compact
report figure or needs to compare several forecast variants.

.. GENERATED FROM PYTHON SOURCE LINES 232-261

.. code-block:: Python


    fig, axes = plt.subplots(
        1,
        2,
        figsize=(10.8, 4.5),
        constrained_layout=True,
    )

    plot_quantile_calibration(
        y_true,
        y_pred_q_good,
        quantiles,
        kind="summary_bar",
        title="QCE summary: better forecast",
        bar_color="seagreen",
        ax=axes[0],
    )

    plot_quantile_calibration(
        y_true,
        y_pred_q_bad,
        quantiles,
        kind="summary_bar",
        title="QCE summary: biased and narrow forecast",
        bar_color="firebrick",
        ax=axes[1],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_quantile_calibration_overview_002.png
   :alt: QCE summary: better forecast, QCE summary: biased and narrow forecast
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_quantile_calibration_overview_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'QCE summary: biased and narrow forecast'}, ylabel='Quantile Calibration Error (QCE)'>


.. GENERATED FROM PYTHON SOURCE LINES 262-278

Why the bar and the diagram should be used together
---------------------------------------------------

The two views answer different questions.

The bar answers:

*Which forecast is better calibrated overall?*

The reliability diagram answers:

*Where exactly is the calibration error coming from?*

In practice, this means the bar is the headline and the diagram is the
explanation. When both are read together, users can make a much safer
decision than with either figure alone.

.. GENERATED FROM PYTHON SOURCE LINES 281-292

Multi-output forecasts need an explicit output choice
-----------------------------------------------------

For multi-output data, the helper supports:

- a per-output reliability diagram, where you must provide
  ``output_idx``,
- or a summary bar that can show one bar per output when
  ``metric_kws={'multioutput': 'raw_values'}`` is used.

We build a small two-output example to show both patterns.

.. GENERATED FROM PYTHON SOURCE LINES 292-353

.. code-block:: Python


    n_outputs = 2

    trend_1 = center_good
    trend_2 = 4.5 + 0.8 * np.cos(x / 3.2)

    y_true_multi = np.column_stack(
        [
            y_true,
            trend_2 + rng.normal(scale=0.55, size=n_samples),
        ]
    )

    spread_out1 = 0.80 + 0.05 * np.sin(x / 3.4)
    spread_out2 = 0.58 + 0.05 * np.cos(x / 4.0)

    y_pred_q_multi = np.stack(
        [
            trend_1[:, None] + spread_out1[:, None] * z[None, :],
            trend_2[:, None] + spread_out2[:, None] * z[None, :],
        ],
        axis=1,
    )

    print("\nMulti-output shapes")
    print(f"y_true_multi shape    : {y_true_multi.shape}")
    print(f"y_pred_q_multi shape  : {y_pred_q_multi.shape}")
    print(f"n_outputs             : {n_outputs}")

    fig, axes = plt.subplots(
        1,
        2,
        figsize=(11.2, 4.8),
        constrained_layout=True,
    )

    plot_quantile_calibration(
        y_true_multi,
        y_pred_q_multi,
        quantiles,
        kind="reliability_diagram",
        output_idx=0,
        title="Reliability diagram: output 0",
        perfect_calib_color="gray",
        observed_prop_color="royalblue",
        observed_prop_marker="D",
        ax=axes[0],
    )

    plot_quantile_calibration(
        y_true_multi,
        y_pred_q_multi,
        quantiles,
        kind="summary_bar",
        metric_kws={"multioutput": "raw_values"},
        title="QCE summary by output",
        bar_color=["mediumpurple", "goldenrod"],
        ax=axes[1],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_quantile_calibration_overview_003.png
   :alt: Reliability diagram: output 0 (Output 0) (Avg. QCE: 0.0318), QCE summary by output (Per Output)
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_quantile_calibration_overview_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Multi-output shapes
    y_true_multi shape    : (220, 2)
    y_pred_q_multi shape  : (220, 2, 5)
    n_outputs             : 2

    <Axes: title={'center': 'QCE summary by output (Per Output)'}, ylabel='Quantile Calibration Error (QCE)'>


.. GENERATED FROM PYTHON SOURCE LINES 354-363

What the multi-output summary teaches
-------------------------------------

A single aggregated score can hide differences between targets.

That is why the per-output summary is often the safer reporting view
when a model predicts multiple quantities. One output may be well
calibrated while another is still weak. The bar chart helps reveal
that imbalance immediately.

.. GENERATED FROM PYTHON SOURCE LINES 366-382

A practical pattern for users with DataFrame forecast tables
------------------------------------------------------------

Many users will not start from NumPy arrays. They will start from a
saved forecast DataFrame with columns such as:

- ``subsidence_actual``
- ``subsidence_q10``
- ``subsidence_q25``
- ``subsidence_q50``
- ``subsidence_q75``
- ``subsidence_q90``

The adaptation is straightforward: extract the actual column, stack the
quantile columns in the same order as the ``quantiles`` array, then
call the helper.

.. GENERATED FROM PYTHON SOURCE LINES 382-425

.. code-block:: Python


    forecast_df = pd.DataFrame(
        {
            "subsidence_actual": y_true,
            "subsidence_q10": y_pred_q_good[:, 0],
            "subsidence_q25": y_pred_q_good[:, 1],
            "subsidence_q50": y_pred_q_good[:, 2],
            "subsidence_q75": y_pred_q_good[:, 3],
            "subsidence_q90": y_pred_q_good[:, 4],
        }
    )

    q_cols = [
        "subsidence_q10",
        "subsidence_q25",
        "subsidence_q50",
        "subsidence_q75",
        "subsidence_q90",
    ]

    y_true_from_df = forecast_df["subsidence_actual"].to_numpy()
    y_pred_q_from_df = forecast_df[q_cols].to_numpy()

    print("\nDataFrame extraction preview")
    print(forecast_df.head())
    print("\nExtracted array shapes")
    print(f"y_true_from_df shape  : {y_true_from_df.shape}")
    print(f"y_pred_q_from_df shape: {y_pred_q_from_df.shape}")

    fig, ax = plt.subplots(figsize=(6.1, 5.1), constrained_layout=True)
    plot_quantile_calibration(
        y_true_from_df,
        y_pred_q_from_df,
        quantiles,
        kind="reliability_diagram",
        title="Reliability diagram from a forecast table",
        perfect_calib_color="black",
        observed_prop_color="crimson",
        observed_prop_marker="^",
        ax=ax,
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_quantile_calibration_overview_004.png
   :alt: Reliability diagram from a forecast table (Avg. QCE: 0.0373)
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_quantile_calibration_overview_004.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    DataFrame extraction preview
       subsidence_actual  subsidence_q10  subsidence_q25  subsidence_q50  subsidence_q75  subsidence_q90
    0            12.8191         11.4491         11.9469         12.5000         13.0531         13.5509
    1            13.7206         11.4832         11.9820         12.5363         13.0905         13.5894
    2            11.2854         11.5171         12.0170         12.5724         13.1278         13.6278
    3            13.9274         11.5509         12.0519         12.6085         13.1650         13.6660
    4            12.6642         11.5846         12.0866         12.6443         13.2021         13.7041

    Extracted array shapes
    y_true_from_df shape  : (220,)
    y_pred_q_from_df shape: (220, 5)

    <Axes: title={'center': 'Reliability diagram from a forecast table\n(Avg. QCE: 0.0373)'}, xlabel='Nominal Quantile Level (q)', ylabel='Observed Proportion (y <= Q_pred(q))'>


.. GENERATED FROM PYTHON SOURCE LINES 426-448

A compact checklist for your own data
-------------------------------------

Before trusting the plot, check these points carefully:

1. the quantile columns are stacked in the same order as
   ``quantiles``,
2. the quantile values themselves are in ``(0, 1)``,
3. lower quantiles are really lower than upper quantiles in your
   forecast generation process,
4. you choose ``output_idx`` whenever the data have multiple outputs
   and you want a reliability diagram,
5. and you read calibration together with width-oriented plots such as
   mean interval width or WIS.

A final interpretation rule is worth keeping in mind:

*good calibration does not automatically mean good forecasting.*

A forecast can be calibrated because it is very wide. That is why this
page fits naturally after coverage and width lessons and before more
advanced calibration summaries.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.852 seconds)


.. _sphx_glr_download_auto_examples_evaluation_plot_quantile_calibration_overview.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_quantile_calibration_overview.ipynb <plot_quantile_calibration_overview.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_quantile_calibration_overview.py <plot_quantile_calibration_overview.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_quantile_calibration_overview.zip <plot_quantile_calibration_overview.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_