.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/uncertainty/plot_interval_calibration.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_uncertainty_plot_interval_calibration.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_uncertainty_plot_interval_calibration.py:


Interval calibration with ``calibrate_quantile_forecasts``
==========================================================

This lesson introduces one of the most important uncertainty utilities
in GeoPrior:
:func:`geoprior.utils.calibrate.calibrate_quantile_forecasts`.

Why this page matters
---------------------
A forecast can be visually impressive and still be poorly calibrated.

For quantile forecasts, a central question is:

**Do the predictive intervals contain the truth as often as they claim?**

For example, if we use the q10-q90 interval as an 80% interval, then
we expect the observed value to fall inside that band about 80% of the
time. If it falls inside much less often, the intervals are too narrow
and the forecast is overconfident.

That is the problem this utility addresses.

The calibration workflow in GeoPrior is built around three closely
connected helpers:

- ``fit_interval_factors_df``:
  fit a widening factor per horizon step so the interval coverage
  approaches a target;
- ``apply_interval_factors_df``:
  widen or shrink the quantile columns around the median;
- ``calibrate_quantile_forecasts``:
  the high-level wrapper that fits or accepts factors, applies them
  to evaluation and future forecasts, and returns summary statistics.

The lower-level factor fitter groups by forecast step, estimates a
factor per horizon, and targets a requested coverage level. The
application helper then rescales lower and upper quantiles around the
median and can enforce monotonic quantile ordering. The wrapper adds
automatic skip logic and before/after summaries. 

What this lesson teaches
------------------------
We will:

1. build a synthetic evaluation and future forecast table,
2. deliberately make the raw intervals too narrow,
3. measure coverage and sharpness before calibration,
4. fit horizon-wise interval factors,
5. apply them manually,
6. repeat the same workflow with the high-level wrapper,
7. build explanatory plots showing why the calibration is useful.

This lesson is intentionally synthetic so it remains fully executable
during the documentation build.

.. GENERATED FROM PYTHON SOURCE LINES 59-62

Imports
-------
We use the real uncertainty helpers from GeoPrior.

.. GENERATED FROM PYTHON SOURCE LINES 62-75

.. code-block:: Python


    from __future__ import annotations

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt

    from geoprior.utils.calibrate import (
        apply_interval_factors_df,
        calibrate_quantile_forecasts,
        fit_interval_factors_df,
    )


.. GENERATED FROM PYTHON SOURCE LINES 76-89

Step 1 - Build synthetic forecast tables
----------------------------------------
We create:

- ``df_eval``:
  evaluation rows with actual values available;
- ``df_future``:
  future rows with predictions only.

Each row represents one spatial sample at one forecast step.

We deliberately make the q10-q90 intervals too narrow so that the
initial empirical coverage is below the nominal 80% target.

.. GENERATED FROM PYTHON SOURCE LINES 89-191

.. code-block:: Python


    rng = np.random.default_rng(7)

    nx = 9
    ny = 6
    steps = [1, 2, 3]
    future_years = {1: 2024, 2: 2025, 3: 2026}

    xv = np.linspace(0.0, 12_000.0, nx)
    yv = np.linspace(0.0, 8_000.0, ny)
    X, Y = np.meshgrid(xv, yv)

    x_flat = X.ravel()
    y_flat = Y.ravel()
    n_sites = x_flat.size

    xn = (x_flat - x_flat.min()) / (x_flat.max() - x_flat.min())
    yn = (y_flat - y_flat.min()) / (y_flat.max() - y_flat.min())

    hotspot = np.exp(
        -(
            ((xn - 0.72) ** 2) / 0.020
            + ((yn - 0.34) ** 2) / 0.030
        )
    )
    ridge = 0.50 * np.exp(
        -(
            ((xn - 0.26) ** 2) / 0.030
            + ((yn - 0.74) ** 2) / 0.050
        )
    )
    gradient = 0.44 * xn + 0.20 * (1.0 - yn)

    eval_rows: list[dict[str, float | int]] = []
    future_rows: list[dict[str, float | int]] = []

    for step in steps:
        scale = {1: 1.00, 2: 1.18, 3: 1.40}[step]

        q50 = (
            2.2
            + 1.4 * gradient
            + 2.0 * hotspot
            + 0.9 * ridge
        ) * scale

        # Deliberately too narrow intervals
        width_raw = (
            0.12
            + 0.04 * xn
            + 0.02 * hotspot
            + 0.02 * step
        )

        q10 = q50 - width_raw
        q90 = q50 + width_raw

        # Truth is noisier than the interval expects -> undercoverage
        actual = q50 + rng.normal(0.0, 0.22 + 0.05 * step, size=n_sites)

        for i in range(n_sites):
            eval_rows.append(
                {
                    "sample_idx": i,
                    "forecast_step": step,
                    "coord_t": future_years[step],
                    "coord_x": float(x_flat[i]),
                    "coord_y": float(y_flat[i]),
                    "subsidence_q10": float(q10[i]),
                    "subsidence_q50": float(q50[i]),
                    "subsidence_q90": float(q90[i]),
                    "subsidence_actual": float(actual[i]),
                }
            )

        # Future forecast uses the same raw uncertainty pattern but no actuals
        q50_future = q50 * 1.06
        q10_future = q50_future - width_raw
        q90_future = q50_future + width_raw

        for i in range(n_sites):
            future_rows.append(
                {
                    "sample_idx": i,
                    "forecast_step": step,
                    "coord_t": future_years[step] + 1,
                    "coord_x": float(x_flat[i]),
                    "coord_y": float(y_flat[i]),
                    "subsidence_q10": float(q10_future[i]),
                    "subsidence_q50": float(q50_future[i]),
                    "subsidence_q90": float(q90_future[i]),
                }
            )

    df_eval = pd.DataFrame(eval_rows)
    df_future = pd.DataFrame(future_rows)

    print("df_eval shape:", df_eval.shape)
    print("df_future shape:", df_future.shape)
    print("")
    print(df_eval.head(8).to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    df_eval shape: (162, 9)
    df_future shape: (162, 8)

     sample_idx  forecast_step  coord_t    coord_x  coord_y  subsidence_q10  subsidence_q50  subsidence_q90  subsidence_actual
              0              1     2024     0.0000   0.0000          2.3400          2.4800          2.6200             2.4803
              1              1     2024  1500.0000   0.0000          2.4120          2.5570          2.7020             2.6377
              2              1     2024  3000.0000   0.0000          2.4840          2.6340          2.7840             2.5600
              3              1     2024  4500.0000   0.0000          2.5561          2.7111          2.8661             2.4707
              4              1     2024  6000.0000   0.0000          2.6317          2.7918          2.9518             2.6690
              5              1     2024  7500.0000   0.0000          2.7267          2.8920          3.0573             2.6243
              6              1     2024  9000.0000   0.0000          2.8121          2.9826          3.1530             2.9988
              7              1     2024 10500.0000   0.0000          2.8566          3.0318          3.2069             3.3936


.. GENERATED FROM PYTHON SOURCE LINES 192-211

Step 2 - Compute empirical coverage and sharpness before calibration
--------------------------------------------------------------------
Calibration is about matching observed coverage to nominal coverage.

For the q10-q90 interval we treat the nominal target as 80%.

We compute:

- empirical coverage:
  fraction of actual values inside [q10, q90],
- sharpness:
  average interval width q90 - q10.

These two numbers should always be read together.

- Wider intervals often improve coverage.
- Narrower intervals often reduce coverage.

Good uncertainty quantification balances both.

.. GENERATED FROM PYTHON SOURCE LINES 211-245

.. code-block:: Python


    def coverage_and_width(
        df: pd.DataFrame,
        *,
        lo_col: str = "subsidence_q10",
        hi_col: str = "subsidence_q90",
        actual_col: str = "subsidence_actual",
    ) -> tuple[float, float]:
        yt = df[actual_col].to_numpy(float)
        lo = df[lo_col].to_numpy(float)
        hi = df[hi_col].to_numpy(float)
        covered = (yt >= lo) & (yt <= hi)
        coverage = float(np.mean(covered))
        width = float(np.mean(hi - lo))
        return coverage, width


    before_rows = []
    for step, g in df_eval.groupby("forecast_step", sort=True):
        cov, wid = coverage_and_width(g)
        before_rows.append(
            {
                "forecast_step": int(step),
                "coverage_before": cov,
                "sharpness_before": wid,
            }
        )

    df_before = pd.DataFrame(before_rows)

    print("")
    print("Before calibration")
    print(df_before.to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Before calibration
     forecast_step  coverage_before  sharpness_before
                 1           0.4815            0.3223
                 2           0.4815            0.3623
                 3           0.4444            0.4023


.. GENERATED FROM PYTHON SOURCE LINES 246-253

Interpretation
--------------
In this synthetic example, the empirical coverage should be below the
intended 0.80 target for at least some horizon steps.

That is the uncertainty problem we want to fix: the forecast intervals
are too sharp relative to the observed residual spread.

.. GENERATED FROM PYTHON SOURCE LINES 255-266

Step 3 - Fit interval factors with the low-level helper
-------------------------------------------------------
``fit_interval_factors_df`` estimates one factor per horizon step.

Conceptually, each factor rescales the interval around the median:

- lower quantiles move farther below q50,
- upper quantiles move farther above q50.

The function groups by ``forecast_step`` and uses a bisection search
so the empirical coverage approaches the requested target. 

.. GENERATED FROM PYTHON SOURCE LINES 266-281

.. code-block:: Python


    factors = fit_interval_factors_df(
        df_eval,
        target_name="subsidence",
        step_col="forecast_step",
        interval=(0.1, 0.9),
        target_coverage=0.8,
        median_q=0.5,
        verbose=0,
    )

    print("")
    print("Fitted factors by horizon")
    print(factors)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Fitted factors by horizon
    {1: 2.037485122680664, 2: 2.0255374908447266, 3: 1.9699583053588867}


.. GENERATED FROM PYTHON SOURCE LINES 282-294

How to read the factors
-----------------------
The factor is interpreted around the median:

- factor > 1:
  widen the interval,
- factor = 1:
  leave the interval unchanged,
- factor < 1:
  shrink the interval.

In undercovered forecasts, we usually expect factors larger than 1.

.. GENERATED FROM PYTHON SOURCE LINES 296-304

Step 4 - Apply interval factors manually
----------------------------------------
``apply_interval_factors_df`` is the second low-level helper.

It rescales all quantiles around the median and can enforce monotonic
ordering of the quantiles using strategies such as ``cummax`` or
``sort``. It also stores the applied factor and marks the forecast as
calibrated. 

.. GENERATED FROM PYTHON SOURCE LINES 304-347

.. code-block:: Python


    df_eval_manual = apply_interval_factors_df(
        df_eval,
        factors,
        target_name="subsidence",
        step_col="forecast_step",
        keep_original=True,
        factor_col="calibration_factor",
        calibrated_col="is_calibrated",
        enforce_monotonic="cummax",
        verbose=0,
    )

    df_future_manual = apply_interval_factors_df(
        df_future,
        factors,
        target_name="subsidence",
        step_col="forecast_step",
        keep_original=True,
        factor_col="calibration_factor",
        calibrated_col="is_calibrated",
        enforce_monotonic="cummax",
        verbose=0,
    )

    after_rows = []
    for step, g in df_eval_manual.groupby("forecast_step", sort=True):
        cov, wid = coverage_and_width(g)
        after_rows.append(
            {
                "forecast_step": int(step),
                "coverage_after": cov,
                "sharpness_after": wid,
            }
        )

    df_after = pd.DataFrame(after_rows)

    df_compare = df_before.merge(df_after, on="forecast_step")
    print("")
    print("Before/after manual calibration")
    print(df_compare.to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Before/after manual calibration
     forecast_step  coverage_before  sharpness_before  coverage_after  sharpness_after
                 1           0.4815            0.3223          0.8148           0.6566
                 2           0.4815            0.3623          0.8148           0.7338
                 3           0.4444            0.4023          0.8148           0.7925


.. GENERATED FROM PYTHON SOURCE LINES 348-356

What changed
------------
After calibration:

- coverage should move closer to the target 0.80,
- sharpness will usually increase because the interval has widened.

This is the central trade-off in interval calibration.

.. GENERATED FROM PYTHON SOURCE LINES 358-370

Step 5 - Run the high-level wrapper
-----------------------------------
``calibrate_quantile_forecasts`` packages the full workflow:

- optional auto-skip if already calibrated,
- use user-supplied factors or fit from ``df_eval``,
- apply to both ``df_eval`` and ``df_future``,
- return a stats dictionary with before/after summaries.

This is the function that should be taught first in the uncertainty
gallery because it naturally explains the two helper functions as
part of one coherent pipeline. 

.. GENERATED FROM PYTHON SOURCE LINES 370-390

.. code-block:: Python


    df_eval_cal, df_future_cal, stats = calibrate_quantile_forecasts(
        df_eval=df_eval,
        df_future=df_future,
        target_name="subsidence",
        step_col="forecast_step",
        interval=(0.1, 0.9),
        target_coverage=0.8,
        median_q=0.5,
        use="auto",
        tol=0.02,
        keep_original=True,
        enforce_monotonic="cummax",
        verbose=0,
    )

    print("")
    print("Calibration stats")
    print(stats)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Calibration stats
    {'target': 0.8, 'interval': [0.1, 0.9], 'f_max': 5.0, 'tol': 0.02, 'overall_key': '__overall__', 'factors_source': 'fit', 'factors': {'1': 2.037485122680664, '2': 2.0255374908447266, '3': 1.9699583053588867}, 'eval_before': {'coverage': 0.4691358024691358, 'sharpness': 0.3622788281364486, 'per_horizon': {'1': {'coverage': 0.48148148148148145, 'sharpness': 0.3222788281364486}, '2': {'coverage': 0.48148148148148145, 'sharpness': 0.36227882813644857}, '3': {'coverage': 0.4444444444444444, 'sharpness': 0.4022788281364487}}}, 'eval_after': {'coverage': 0.8148148148148148, 'sharpness': 0.7276400615900265, 'per_horizon': {'1': {'coverage': 0.8148148148148148, 'sharpness': 0.6566383176829725}, '2': {'coverage': 0.8148148148148148, 'sharpness': 0.7338093485296703}, '3': {'coverage': 0.8148148148148148, 'sharpness': 0.792472518557437}}}}


.. GENERATED FROM PYTHON SOURCE LINES 391-401

Why this wrapper is useful
--------------------------
The wrapper adds several practical behaviors that matter in real
workflows:

- it can skip calibration if the evaluation intervals already look
  calibrated;
- it can use fixed user factors instead of refitting;
- it can save calibrated eval/future tables and JSON stats;
- it summarizes coverage and sharpness before and after calibration. 

.. GENERATED FROM PYTHON SOURCE LINES 403-415

Step 6 - Build compact explanatory plots
----------------------------------------
The calibration helpers themselves return DataFrames and dictionaries,
not figures. But lesson pages become much easier to understand when we
turn the returned information into small diagnostic plots.

We build two plots:

- coverage before vs after by horizon,
- sharpness before vs after by horizon.

This makes the calibration trade-off visible immediately.

.. GENERATED FROM PYTHON SOURCE LINES 415-475

.. code-block:: Python


    stats_before = stats["eval_before"]["per_horizon"]
    stats_after = stats["eval_after"]["per_horizon"]

    plot_rows = []
    for h in sorted(stats_before, key=lambda x: int(x)):
        plot_rows.append(
            {
                "forecast_step": int(h),
                "coverage_before": stats_before[h]["coverage"],
                "coverage_after": stats_after[h]["coverage"],
                "sharpness_before": stats_before[h]["sharpness"],
                "sharpness_after": stats_after[h]["sharpness"],
            }
        )

    df_plot = pd.DataFrame(plot_rows)

    fig, axes = plt.subplots(1, 2, figsize=(11, 4.2))

    axes[0].plot(
        df_plot["forecast_step"],
        df_plot["coverage_before"],
        marker="o",
        label="Before",
    )
    axes[0].plot(
        df_plot["forecast_step"],
        df_plot["coverage_after"],
        marker="s",
        label="After",
    )
    axes[0].axhline(0.80, linestyle="--", linewidth=1.5, label="Target 0.80")
    axes[0].set_xlabel("Forecast step")
    axes[0].set_ylabel("Empirical coverage")
    axes[0].set_title("Coverage moves toward the target")
    axes[0].grid(True, linestyle=":", alpha=0.6)
    axes[0].legend()

    axes[1].plot(
        df_plot["forecast_step"],
        df_plot["sharpness_before"],
        marker="o",
        label="Before",
    )
    axes[1].plot(
        df_plot["forecast_step"],
        df_plot["sharpness_after"],
        marker="s",
        label="After",
    )
    axes[1].set_xlabel("Forecast step")
    axes[1].set_ylabel("Mean interval width")
    axes[1].set_title("Sharper vs wider intervals")
    axes[1].grid(True, linestyle=":", alpha=0.6)
    axes[1].legend()

    plt.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_interval_calibration_001.png
   :alt: Coverage moves toward the target, Sharper vs wider intervals
   :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_interval_calibration_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 476-496

How to read these plots
-----------------------
Left panel
~~~~~~~~~~
The goal is not necessarily to hit the target exactly at every step,
but to move empirical coverage toward the desired level.

Right panel
~~~~~~~~~~~
Better coverage usually comes at the price of wider intervals.

That is why uncertainty pages should always talk about both:

- **coverage**:
  are we honest enough?
- **sharpness**:
  are we still informative?

Calibration is useful when it improves honesty without exploding the
interval width unnecessarily.

.. GENERATED FROM PYTHON SOURCE LINES 498-501

Step 7 - Inspect the calibrated tables directly
-----------------------------------------------
The resulting forecast tables now carry explicit calibration metadata.

.. GENERATED FROM PYTHON SOURCE LINES 501-510

.. code-block:: Python


    print("")
    print("Calibrated evaluation columns")
    print(df_eval_cal.columns.tolist())

    print("")
    print("Calibrated future head")
    print(df_future_cal.head(8).to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Calibrated evaluation columns
    ['sample_idx', 'forecast_step', 'coord_t', 'coord_x', 'coord_y', 'subsidence_q10', 'subsidence_q50', 'subsidence_q90', 'subsidence_actual', 'subsidence_q10_raw', 'subsidence_q50_raw', 'subsidence_q90_raw', 'calibration_factor', 'is_calibrated']

    Calibrated future head
     sample_idx  forecast_step  coord_t    coord_x  coord_y  subsidence_q10  subsidence_q50  subsidence_q90  subsidence_q10_raw  subsidence_q50_raw  subsidence_q90_raw  calibration_factor  is_calibrated
              0              1     2025     0.0000   0.0000          2.3436          2.6288          2.9140              2.4888              2.6288              2.7688              2.0375           True
              1              1     2025  1500.0000   0.0000          2.4150          2.7104          3.0059              2.5654              2.7104              2.8554              2.0375           True
              2              1     2025  3000.0000   0.0000          2.4864          2.7920          3.0977              2.6420              2.7920              2.9420              2.0375           True
              3              1     2025  4500.0000   0.0000          2.5580          2.8738          3.1896              2.7188              2.8738              3.0288              2.0375           True
              4              1     2025  6000.0000   0.0000          2.6332          2.9593          3.2854              2.7992              2.9593              3.1193              2.0375           True
              5              1     2025  7500.0000   0.0000          2.7288          3.0655          3.4023              2.9003              3.0655              3.2308              2.0375           True
              6              1     2025  9000.0000   0.0000          2.8143          3.1615          3.5087              2.9911              3.1615              3.3319              2.0375           True
              7              1     2025 10500.0000   0.0000          2.8568          3.2137          3.5705              3.0385              3.2137              3.3888              2.0375           True


.. GENERATED FROM PYTHON SOURCE LINES 511-520

Notice two practical columns:

- ``calibration_factor``:
  the factor actually used for that horizon;
- ``is_calibrated``:
  a flag that helps downstream logic detect that calibration was
  already applied.

This is also why the wrapper can auto-skip re-calibration later. 

.. GENERATED FROM PYTHON SOURCE LINES 522-539

Step 8 - Why this page should come first in uncertainty
-------------------------------------------------------
This page teaches the foundational uncertainty idea in the most
concrete possible way:

- forecasts can be overconfident,
- interval coverage can be measured directly,
- intervals can be recalibrated per horizon,
- future forecasts can inherit the same fitted factors.

That makes this the best first page in the uncertainty gallery.

Later pages can then build naturally on top of it:

- reliability diagrams,
- coverage-versus-sharpness trade-offs,
- probability calibration,
- exceedance-oriented uncertainty analysis.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.303 seconds)


.. _sphx_glr_download_auto_examples_uncertainty_plot_interval_calibration.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_interval_calibration.ipynb <plot_interval_calibration.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_interval_calibration.py <plot_interval_calibration.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_interval_calibration.zip <plot_interval_calibration.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_