.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/uncertainty/plot_calibration_comparison_overview.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_uncertainty_plot_calibration_comparison_overview.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_uncertainty_plot_calibration_comparison_overview.py:


Compare raw and calibrated reliability with ``plot_calibration_comparison``
===========================================================================

This lesson explains how to use
``geoprior.plot.forecast.plot_calibration_comparison`` when you want to
go one step beyond a standard reliability diagram.

A reliability diagram answers:

*How well calibrated is the forecast right now?*

This helper answers a stronger question:

*Did the calibration step actually improve the forecast reliability?*

Why this function matters
-------------------------
Once a model produces quantiles or direct probabilities, many workflows
apply a calibration step. But calibration is not automatically useful.
It can help a lot, help a little, or change the curve in ways that are
not worth adopting downstream.

This helper overlays:

- the **raw** reliability curve,
- and the **calibrated** reliability curve.

It supports both:

- **quantile-based forecasts** via the quantile calibration workflow,
- **probability forecasts** via isotonic or logistic calibration.

This makes it a natural bridge lesson for the uncertainty gallery:
the page connects saved forecast tables to a practical calibration
decision.

This page is written as a **teaching guide**, not only as a quick API
demo. We will start with the required input layout, then compare raw
and calibrated quantile reliability, then move to direct probabilities,
and finally end with a checklist for adapting the helper to real saved
forecast tables.

.. GENERATED FROM PYTHON SOURCE LINES 49-66

.. code-block:: Python


    from __future__ import annotations

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd

    from geoprior.plot.forecast import plot_calibration_comparison

    pd.set_option("display.max_columns", 20)
    pd.set_option("display.width", 110)
    pd.set_option(
        "display.float_format",
        lambda v: f"{v:0.4f}",
    )


.. GENERATED FROM PYTHON SOURCE LINES 67-106

What this function expects
--------------------------

``plot_calibration_comparison`` accepts one or more forecast tables as:

- one DataFrame,
- several DataFrames,
- a list of DataFrames,
- or a dict of named DataFrames.

It supports two workflows:

1. **quantile mode**

   Required inputs:

   - ``quantiles=[...]``
   - ``q_prefix='subsidence'``
   - ``actual_col='subsidence_actual'``

   The helper calibrates the quantile forecasts and overlays the raw
   and calibrated empirical reliability curves.

2. **probability mode**

   Required inputs:

   - ``prob_col='p_event'``
   - ``actual_col='event_flag'``

   The helper calibrates the direct probabilities with
   ``method='isotonic'`` or ``method='logistic'`` and compares the raw
   and calibrated curves.

A useful habit is:

- compare the raw curve to the diagonal,
- compare the calibrated curve to the diagonal,
- and only then decide whether calibration helped enough to matter.

.. GENERATED FROM PYTHON SOURCE LINES 109-118

Build a quantile-forecast example
---------------------------------

We begin with a quantile table. This is the most direct way to show
the difference between a raw reliability problem and its post-processed
version.

The forecast below is intentionally slightly too narrow and shifted,
so the calibration step has a real job to do.

.. GENERATED FROM PYTHON SOURCE LINES 118-150

.. code-block:: Python


    rng = np.random.default_rng(7)

    n = 240
    x = np.linspace(0.0, 4.8 * np.pi, n)

    y_true = (
        6.0
        + 1.25 * np.sin(x / 2.1)
        + 0.38 * np.cos(x / 4.0)
        + rng.normal(scale=0.78, size=n)
    )

    quantiles = [0.10, 0.50, 0.90]
    z10, z50, z90 = -1.2816, 0.0, 1.2816

    center_raw = 6.1 + 1.20 * np.sin(x / 2.1) + 0.34 * np.cos(x / 4.0)
    spread_raw = 0.48 + 0.04 * np.sin(x / 3.2)

    quant_df = pd.DataFrame(
        {
            "subsidence_q10": center_raw + spread_raw * z10,
            "subsidence_q50": center_raw + spread_raw * z50,
            "subsidence_q90": center_raw + spread_raw * z90,
            "subsidence_actual": y_true,
        }
    )

    print("Quantile-table preview")
    print(quant_df.head(8))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Quantile-table preview
       subsidence_q10  subsidence_q50  subsidence_q90  subsidence_actual
    0          5.8248          6.4400          7.0552             6.3810
    1          5.8598          6.4760          7.0922             6.6505
    2          5.8947          6.5119          7.1291             6.2411
    3          5.9294          6.5476          7.1658             5.7974
    4          5.9640          6.5832          7.2024             6.1745
    5          5.9983          6.6185          7.2388             5.7924
    6          6.0324          6.6536          7.2749             6.6493
    7          6.0662          6.6885          7.3107             7.6840


.. GENERATED FROM PYTHON SOURCE LINES 151-162

Read raw versus calibrated reliability for quantiles
----------------------------------------------------

The raw curve tells you what the original forecast implied.
The calibrated curve tells you how the reliability changed after the
post-processing step.

The function itself adds the default title
``'Raw vs Calibrated Reliability'``, which is why this helper works
best as a dedicated one-figure teaching page rather than a multi-panel
subplot helper.

.. GENERATED FROM PYTHON SOURCE LINES 162-175

.. code-block:: Python


    ax = plot_calibration_comparison(
        quant_df,
        quantiles=quantiles,
        q_prefix="subsidence",
        actual_col="subsidence_actual",
        method="isotonic",
        figsize=(7.5, 5.2),
        grid_props={"linestyle": "--", "alpha": 0.5},
    )
    ax.set_title("Raw vs calibrated reliability for a quantile forecast")


.. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_calibration_comparison_overview_001.png
   :alt: Raw vs calibrated reliability for a quantile forecast
   :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_calibration_comparison_overview_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [INFO] Processing model 'model'

    Text(0.5, 1.0, 'Raw vs calibrated reliability for a quantile forecast')


.. GENERATED FROM PYTHON SOURCE LINES 176-194

How to read this comparison
---------------------------

A practical reading order is:

1. where is the raw curve relative to the diagonal?
2. where is the calibrated curve relative to the diagonal?
3. which parts of the curve improved most?

This shows whether calibration mainly fixed:

- the tails,
- the center,
- or only a small part of the reliability problem.

If the calibrated curve still sits far from the diagonal, then the
original forecast may need deeper model or spread changes, not only a
post-processing fix.

.. GENERATED FROM PYTHON SOURCE LINES 197-204

Direct probabilities: isotonic calibration
------------------------------------------

In the probability mode the function compares raw and calibrated event
probabilities rather than quantile coverage.

We build a slightly overconfident forecast first.

.. GENERATED FROM PYTHON SOURCE LINES 204-231

.. code-block:: Python


    p_signal = 1.0 / (1.0 + np.exp(-(0.9 * np.sin(x / 2.6) + 0.2)))
    event_flag = rng.binomial(1, np.clip(p_signal, 0.03, 0.97), size=n)

    prob_df = pd.DataFrame(
        {
            "p_event": np.clip(1.25 * p_signal - 0.08, 0.01, 0.99),
            "event_flag": event_flag,
        }
    )

    print("\nProbability-table preview")
    print(prob_df.head(8))

    ax = plot_calibration_comparison(
        prob_df,
        prob_col="p_event",
        actual_col="event_flag",
        bins=12,
        bin_strategy="quantile",
        method="isotonic",
        figsize=(7.5, 5.2),
        grid_props={"linestyle": ":", "alpha": 0.6},
    )
    ax.set_title("Isotonic calibration on direct probabilities")


.. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_calibration_comparison_overview_002.png
   :alt: Isotonic calibration on direct probabilities
   :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_calibration_comparison_overview_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Probability-table preview
       p_event  event_flag
    0   0.6073           0
    1   0.6140           1
    2   0.6208           1
    3   0.6275           1
    4   0.6341           0
    5   0.6408           0
    6   0.6474           1
    7   0.6539           1
    [INFO] Processing model 'model'

    Text(0.5, 1.0, 'Isotonic calibration on direct probabilities')


.. GENERATED FROM PYTHON SOURCE LINES 232-240

Direct probabilities: logistic calibration
------------------------------------------

It is often useful to compare isotonic calibration with a smoother,
more parametric logistic calibration.

The same forecast table can therefore be inspected twice with
different calibration methods before choosing a workflow.

.. GENERATED FROM PYTHON SOURCE LINES 240-254

.. code-block:: Python


    ax = plot_calibration_comparison(
        prob_df,
        prob_col="p_event",
        actual_col="event_flag",
        bins=12,
        bin_strategy="quantile",
        method="logistic",
        figsize=(7.5, 5.2),
        grid_props={"linestyle": ":", "alpha": 0.6},
    )
    ax.set_title("Logistic calibration on direct probabilities")


.. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_calibration_comparison_overview_003.png
   :alt: Logistic calibration on direct probabilities
   :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_calibration_comparison_overview_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [INFO] Processing model 'model'

    Text(0.5, 1.0, 'Logistic calibration on direct probabilities')


.. GENERATED FROM PYTHON SOURCE LINES 255-267

What to look for in the probability case
----------------------------------------

In direct-probability mode, the most useful questions are:

- did the calibrated curve move toward the diagonal over the whole
  range or only at one end?
- are the highest-probability bins still overconfident?
- does isotonic help more than logistic, or vice versa?

This comparison helper is valuable precisely because it makes the
*effect* of calibration visible, not only the original problem.

.. GENERATED FROM PYTHON SOURCE LINES 270-280

A forecast-step teaching pattern
--------------------------------

The quantile calibration workflow also accepts ``group_by``. This is
helpful when a long-format forecast table contains a step or horizon
column and calibration should be learned separately per group.

The plot still returns one overall raw-versus-calibrated comparison,
but the grouped fitting stage is often the right modeling choice when
early and late horizons have different uncertainty behaviour.

.. GENERATED FROM PYTHON SOURCE LINES 280-315

.. code-block:: Python


    frames = []
    for step, bias, spread_scale in [
        (1, 0.05, 0.62),
        (2, 0.14, 0.54),
        (3, 0.26, 0.46),
    ]:
        center = 6.0 + 1.16 * np.sin(x / 2.1) + bias
        spread = spread_scale + 0.03 * np.cos(x / 3.5)
        frame = pd.DataFrame(
            {
                "forecast_step": step,
                "subsidence_q10": center + spread * z10,
                "subsidence_q50": center + spread * z50,
                "subsidence_q90": center + spread * z90,
                "subsidence_actual": y_true,
            }
        )
        frames.append(frame)

    step_df = pd.concat(frames, ignore_index=True)

    ax = plot_calibration_comparison(
        step_df,
        quantiles=quantiles,
        q_prefix="subsidence",
        actual_col="subsidence_actual",
        method="isotonic",
        group_by="forecast_step",
        figsize=(7.5, 5.2),
        grid_props={"linestyle": "--", "alpha": 0.45},
    )
    ax.set_title("Grouped calibration by forecast step")


.. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_calibration_comparison_overview_004.png
   :alt: Grouped calibration by forecast step
   :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_calibration_comparison_overview_004.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [INFO] Processing model 'model'

    Text(0.5, 1.0, 'Grouped calibration by forecast step')


.. GENERATED FROM PYTHON SOURCE LINES 316-335

Why this page belongs in uncertainty teaching
---------------------------------------------

``plot_reliability_diagram`` asks:

- how reliable is the forecast now?

``plot_calibration_comparison`` asks:

- did calibration improve that reliability enough to justify using the
  calibrated outputs?

That makes this helper ideal for a bridge page in the uncertainty
gallery:

- it starts from saved forecast tables,
- it teaches a calibration workflow,
- and it encourages evaluation-minded judgment instead of assuming
  calibration is always beneficial.

.. GENERATED FROM PYTHON SOURCE LINES 338-367

Practical checklist for your own data
-------------------------------------

When adapting this helper to a real project:

1. decide whether you are calibrating **quantiles** or **direct
   probabilities**,
2. confirm the required forecast columns exist,
3. compare the raw and calibrated curves against the diagonal,
4. inspect both ``method='isotonic'`` and ``method='logistic'`` when
   direct probabilities are available,
5. consider ``group_by`` for long-format quantile tables.

For quantile forecasts:

- pass the nominal levels in ``quantiles``,
- set ``q_prefix`` to the shared quantile column family,
- and point ``actual_col`` to the observed target.

For direct probabilities:

- pass ``prob_col`` and the binary event column,
- choose sensible bins,
- and judge the improvement, not only the existence of a calibrated
  curve.

This function is most useful after a reliability problem has already
been found and you need to decide whether calibration genuinely solved
it.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.339 seconds)


.. _sphx_glr_download_auto_examples_uncertainty_plot_calibration_comparison_overview.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_calibration_comparison_overview.ipynb <plot_calibration_comparison_overview.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_calibration_comparison_overview.py <plot_calibration_comparison_overview.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_calibration_comparison_overview.zip <plot_calibration_comparison_overview.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_