.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/uncertainty/plot_interval_calibration.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_uncertainty_plot_interval_calibration.py: Interval calibration with ``calibrate_quantile_forecasts`` ========================================================== This lesson introduces one of the most important uncertainty utilities in GeoPrior: :func:`geoprior.utils.calibrate.calibrate_quantile_forecasts`. Why this page matters --------------------- A forecast can be visually impressive and still be poorly calibrated. For quantile forecasts, a central question is: **Do the predictive intervals contain the truth as often as they claim?** For example, if we use the q10-q90 interval as an 80% interval, then we expect the observed value to fall inside that band about 80% of the time. If it falls inside much less often, the intervals are too narrow and the forecast is overconfident. That is the problem this utility addresses. The calibration workflow in GeoPrior is built around three closely connected helpers: - ``fit_interval_factors_df``: fit a widening factor per horizon step so the interval coverage approaches a target; - ``apply_interval_factors_df``: widen or shrink the quantile columns around the median; - ``calibrate_quantile_forecasts``: the high-level wrapper that fits or accepts factors, applies them to evaluation and future forecasts, and returns summary statistics. The lower-level factor fitter groups by forecast step, estimates a factor per horizon, and targets a requested coverage level. The application helper then rescales lower and upper quantiles around the median and can enforce monotonic quantile ordering. The wrapper adds automatic skip logic and before/after summaries. What this lesson teaches ------------------------ We will: 1. build a synthetic evaluation and future forecast table, 2. deliberately make the raw intervals too narrow, 3. measure coverage and sharpness before calibration, 4. fit horizon-wise interval factors, 5. apply them manually, 6. repeat the same workflow with the high-level wrapper, 7. build explanatory plots showing why the calibration is useful. This lesson is intentionally synthetic so it remains fully executable during the documentation build. .. GENERATED FROM PYTHON SOURCE LINES 59-62 Imports ------- We use the real uncertainty helpers from GeoPrior. .. GENERATED FROM PYTHON SOURCE LINES 62-75 .. code-block:: Python from __future__ import annotations import numpy as np import pandas as pd import matplotlib.pyplot as plt from geoprior.utils.calibrate import ( apply_interval_factors_df, calibrate_quantile_forecasts, fit_interval_factors_df, ) .. GENERATED FROM PYTHON SOURCE LINES 76-89 Step 1 - Build synthetic forecast tables ---------------------------------------- We create: - ``df_eval``: evaluation rows with actual values available; - ``df_future``: future rows with predictions only. Each row represents one spatial sample at one forecast step. We deliberately make the q10-q90 intervals too narrow so that the initial empirical coverage is below the nominal 80% target. .. GENERATED FROM PYTHON SOURCE LINES 89-191 .. code-block:: Python rng = np.random.default_rng(7) nx = 9 ny = 6 steps = [1, 2, 3] future_years = {1: 2024, 2: 2025, 3: 2026} xv = np.linspace(0.0, 12_000.0, nx) yv = np.linspace(0.0, 8_000.0, ny) X, Y = np.meshgrid(xv, yv) x_flat = X.ravel() y_flat = Y.ravel() n_sites = x_flat.size xn = (x_flat - x_flat.min()) / (x_flat.max() - x_flat.min()) yn = (y_flat - y_flat.min()) / (y_flat.max() - y_flat.min()) hotspot = np.exp( -( ((xn - 0.72) ** 2) / 0.020 + ((yn - 0.34) ** 2) / 0.030 ) ) ridge = 0.50 * np.exp( -( ((xn - 0.26) ** 2) / 0.030 + ((yn - 0.74) ** 2) / 0.050 ) ) gradient = 0.44 * xn + 0.20 * (1.0 - yn) eval_rows: list[dict[str, float | int]] = [] future_rows: list[dict[str, float | int]] = [] for step in steps: scale = {1: 1.00, 2: 1.18, 3: 1.40}[step] q50 = ( 2.2 + 1.4 * gradient + 2.0 * hotspot + 0.9 * ridge ) * scale # Deliberately too narrow intervals width_raw = ( 0.12 + 0.04 * xn + 0.02 * hotspot + 0.02 * step ) q10 = q50 - width_raw q90 = q50 + width_raw # Truth is noisier than the interval expects -> undercoverage actual = q50 + rng.normal(0.0, 0.22 + 0.05 * step, size=n_sites) for i in range(n_sites): eval_rows.append( { "sample_idx": i, "forecast_step": step, "coord_t": future_years[step], "coord_x": float(x_flat[i]), "coord_y": float(y_flat[i]), "subsidence_q10": float(q10[i]), "subsidence_q50": float(q50[i]), "subsidence_q90": float(q90[i]), "subsidence_actual": float(actual[i]), } ) # Future forecast uses the same raw uncertainty pattern but no actuals q50_future = q50 * 1.06 q10_future = q50_future - width_raw q90_future = q50_future + width_raw for i in range(n_sites): future_rows.append( { "sample_idx": i, "forecast_step": step, "coord_t": future_years[step] + 1, "coord_x": float(x_flat[i]), "coord_y": float(y_flat[i]), "subsidence_q10": float(q10_future[i]), "subsidence_q50": float(q50_future[i]), "subsidence_q90": float(q90_future[i]), } ) df_eval = pd.DataFrame(eval_rows) df_future = pd.DataFrame(future_rows) print("df_eval shape:", df_eval.shape) print("df_future shape:", df_future.shape) print("") print(df_eval.head(8).to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none df_eval shape: (162, 9) df_future shape: (162, 8) sample_idx forecast_step coord_t coord_x coord_y subsidence_q10 subsidence_q50 subsidence_q90 subsidence_actual 0 1 2024 0.0000 0.0000 2.3400 2.4800 2.6200 2.4803 1 1 2024 1500.0000 0.0000 2.4120 2.5570 2.7020 2.6377 2 1 2024 3000.0000 0.0000 2.4840 2.6340 2.7840 2.5600 3 1 2024 4500.0000 0.0000 2.5561 2.7111 2.8661 2.4707 4 1 2024 6000.0000 0.0000 2.6317 2.7918 2.9518 2.6690 5 1 2024 7500.0000 0.0000 2.7267 2.8920 3.0573 2.6243 6 1 2024 9000.0000 0.0000 2.8121 2.9826 3.1530 2.9988 7 1 2024 10500.0000 0.0000 2.8566 3.0318 3.2069 3.3936 .. GENERATED FROM PYTHON SOURCE LINES 192-211 Step 2 - Compute empirical coverage and sharpness before calibration -------------------------------------------------------------------- Calibration is about matching observed coverage to nominal coverage. For the q10-q90 interval we treat the nominal target as 80%. We compute: - empirical coverage: fraction of actual values inside [q10, q90], - sharpness: average interval width q90 - q10. These two numbers should always be read together. - Wider intervals often improve coverage. - Narrower intervals often reduce coverage. Good uncertainty quantification balances both. .. GENERATED FROM PYTHON SOURCE LINES 211-245 .. code-block:: Python def coverage_and_width( df: pd.DataFrame, *, lo_col: str = "subsidence_q10", hi_col: str = "subsidence_q90", actual_col: str = "subsidence_actual", ) -> tuple[float, float]: yt = df[actual_col].to_numpy(float) lo = df[lo_col].to_numpy(float) hi = df[hi_col].to_numpy(float) covered = (yt >= lo) & (yt <= hi) coverage = float(np.mean(covered)) width = float(np.mean(hi - lo)) return coverage, width before_rows = [] for step, g in df_eval.groupby("forecast_step", sort=True): cov, wid = coverage_and_width(g) before_rows.append( { "forecast_step": int(step), "coverage_before": cov, "sharpness_before": wid, } ) df_before = pd.DataFrame(before_rows) print("") print("Before calibration") print(df_before.to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Before calibration forecast_step coverage_before sharpness_before 1 0.4815 0.3223 2 0.4815 0.3623 3 0.4444 0.4023 .. GENERATED FROM PYTHON SOURCE LINES 246-253 Interpretation -------------- In this synthetic example, the empirical coverage should be below the intended 0.80 target for at least some horizon steps. That is the uncertainty problem we want to fix: the forecast intervals are too sharp relative to the observed residual spread. .. GENERATED FROM PYTHON SOURCE LINES 255-266 Step 3 - Fit interval factors with the low-level helper ------------------------------------------------------- ``fit_interval_factors_df`` estimates one factor per horizon step. Conceptually, each factor rescales the interval around the median: - lower quantiles move farther below q50, - upper quantiles move farther above q50. The function groups by ``forecast_step`` and uses a bisection search so the empirical coverage approaches the requested target. .. GENERATED FROM PYTHON SOURCE LINES 266-281 .. code-block:: Python factors = fit_interval_factors_df( df_eval, target_name="subsidence", step_col="forecast_step", interval=(0.1, 0.9), target_coverage=0.8, median_q=0.5, verbose=0, ) print("") print("Fitted factors by horizon") print(factors) .. rst-class:: sphx-glr-script-out .. code-block:: none Fitted factors by horizon {1: 2.037485122680664, 2: 2.0255374908447266, 3: 1.9699583053588867} .. GENERATED FROM PYTHON SOURCE LINES 282-294 How to read the factors ----------------------- The factor is interpreted around the median: - factor > 1: widen the interval, - factor = 1: leave the interval unchanged, - factor < 1: shrink the interval. In undercovered forecasts, we usually expect factors larger than 1. .. GENERATED FROM PYTHON SOURCE LINES 296-304 Step 4 - Apply interval factors manually ---------------------------------------- ``apply_interval_factors_df`` is the second low-level helper. It rescales all quantiles around the median and can enforce monotonic ordering of the quantiles using strategies such as ``cummax`` or ``sort``. It also stores the applied factor and marks the forecast as calibrated. .. GENERATED FROM PYTHON SOURCE LINES 304-347 .. code-block:: Python df_eval_manual = apply_interval_factors_df( df_eval, factors, target_name="subsidence", step_col="forecast_step", keep_original=True, factor_col="calibration_factor", calibrated_col="is_calibrated", enforce_monotonic="cummax", verbose=0, ) df_future_manual = apply_interval_factors_df( df_future, factors, target_name="subsidence", step_col="forecast_step", keep_original=True, factor_col="calibration_factor", calibrated_col="is_calibrated", enforce_monotonic="cummax", verbose=0, ) after_rows = [] for step, g in df_eval_manual.groupby("forecast_step", sort=True): cov, wid = coverage_and_width(g) after_rows.append( { "forecast_step": int(step), "coverage_after": cov, "sharpness_after": wid, } ) df_after = pd.DataFrame(after_rows) df_compare = df_before.merge(df_after, on="forecast_step") print("") print("Before/after manual calibration") print(df_compare.to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Before/after manual calibration forecast_step coverage_before sharpness_before coverage_after sharpness_after 1 0.4815 0.3223 0.8148 0.6566 2 0.4815 0.3623 0.8148 0.7338 3 0.4444 0.4023 0.8148 0.7925 .. GENERATED FROM PYTHON SOURCE LINES 348-356 What changed ------------ After calibration: - coverage should move closer to the target 0.80, - sharpness will usually increase because the interval has widened. This is the central trade-off in interval calibration. .. GENERATED FROM PYTHON SOURCE LINES 358-370 Step 5 - Run the high-level wrapper ----------------------------------- ``calibrate_quantile_forecasts`` packages the full workflow: - optional auto-skip if already calibrated, - use user-supplied factors or fit from ``df_eval``, - apply to both ``df_eval`` and ``df_future``, - return a stats dictionary with before/after summaries. This is the function that should be taught first in the uncertainty gallery because it naturally explains the two helper functions as part of one coherent pipeline. .. GENERATED FROM PYTHON SOURCE LINES 370-390 .. code-block:: Python df_eval_cal, df_future_cal, stats = calibrate_quantile_forecasts( df_eval=df_eval, df_future=df_future, target_name="subsidence", step_col="forecast_step", interval=(0.1, 0.9), target_coverage=0.8, median_q=0.5, use="auto", tol=0.02, keep_original=True, enforce_monotonic="cummax", verbose=0, ) print("") print("Calibration stats") print(stats) .. rst-class:: sphx-glr-script-out .. code-block:: none Calibration stats {'target': 0.8, 'interval': [0.1, 0.9], 'f_max': 5.0, 'tol': 0.02, 'overall_key': '__overall__', 'factors_source': 'fit', 'factors': {'1': 2.037485122680664, '2': 2.0255374908447266, '3': 1.9699583053588867}, 'eval_before': {'coverage': 0.4691358024691358, 'sharpness': 0.3622788281364486, 'per_horizon': {'1': {'coverage': 0.48148148148148145, 'sharpness': 0.3222788281364486}, '2': {'coverage': 0.48148148148148145, 'sharpness': 0.36227882813644857}, '3': {'coverage': 0.4444444444444444, 'sharpness': 0.4022788281364487}}}, 'eval_after': {'coverage': 0.8148148148148148, 'sharpness': 0.7276400615900265, 'per_horizon': {'1': {'coverage': 0.8148148148148148, 'sharpness': 0.6566383176829725}, '2': {'coverage': 0.8148148148148148, 'sharpness': 0.7338093485296703}, '3': {'coverage': 0.8148148148148148, 'sharpness': 0.792472518557437}}}} .. GENERATED FROM PYTHON SOURCE LINES 391-401 Why this wrapper is useful -------------------------- The wrapper adds several practical behaviors that matter in real workflows: - it can skip calibration if the evaluation intervals already look calibrated; - it can use fixed user factors instead of refitting; - it can save calibrated eval/future tables and JSON stats; - it summarizes coverage and sharpness before and after calibration. .. GENERATED FROM PYTHON SOURCE LINES 403-415 Step 6 - Build compact explanatory plots ---------------------------------------- The calibration helpers themselves return DataFrames and dictionaries, not figures. But lesson pages become much easier to understand when we turn the returned information into small diagnostic plots. We build two plots: - coverage before vs after by horizon, - sharpness before vs after by horizon. This makes the calibration trade-off visible immediately. .. GENERATED FROM PYTHON SOURCE LINES 415-475 .. code-block:: Python stats_before = stats["eval_before"]["per_horizon"] stats_after = stats["eval_after"]["per_horizon"] plot_rows = [] for h in sorted(stats_before, key=lambda x: int(x)): plot_rows.append( { "forecast_step": int(h), "coverage_before": stats_before[h]["coverage"], "coverage_after": stats_after[h]["coverage"], "sharpness_before": stats_before[h]["sharpness"], "sharpness_after": stats_after[h]["sharpness"], } ) df_plot = pd.DataFrame(plot_rows) fig, axes = plt.subplots(1, 2, figsize=(11, 4.2)) axes[0].plot( df_plot["forecast_step"], df_plot["coverage_before"], marker="o", label="Before", ) axes[0].plot( df_plot["forecast_step"], df_plot["coverage_after"], marker="s", label="After", ) axes[0].axhline(0.80, linestyle="--", linewidth=1.5, label="Target 0.80") axes[0].set_xlabel("Forecast step") axes[0].set_ylabel("Empirical coverage") axes[0].set_title("Coverage moves toward the target") axes[0].grid(True, linestyle=":", alpha=0.6) axes[0].legend() axes[1].plot( df_plot["forecast_step"], df_plot["sharpness_before"], marker="o", label="Before", ) axes[1].plot( df_plot["forecast_step"], df_plot["sharpness_after"], marker="s", label="After", ) axes[1].set_xlabel("Forecast step") axes[1].set_ylabel("Mean interval width") axes[1].set_title("Sharper vs wider intervals") axes[1].grid(True, linestyle=":", alpha=0.6) axes[1].legend() plt.tight_layout() plt.show() .. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_interval_calibration_001.png :alt: Coverage moves toward the target, Sharper vs wider intervals :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_interval_calibration_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 476-496 How to read these plots ----------------------- Left panel ~~~~~~~~~~ The goal is not necessarily to hit the target exactly at every step, but to move empirical coverage toward the desired level. Right panel ~~~~~~~~~~~ Better coverage usually comes at the price of wider intervals. That is why uncertainty pages should always talk about both: - **coverage**: are we honest enough? - **sharpness**: are we still informative? Calibration is useful when it improves honesty without exploding the interval width unnecessarily. .. GENERATED FROM PYTHON SOURCE LINES 498-501 Step 7 - Inspect the calibrated tables directly ----------------------------------------------- The resulting forecast tables now carry explicit calibration metadata. .. GENERATED FROM PYTHON SOURCE LINES 501-510 .. code-block:: Python print("") print("Calibrated evaluation columns") print(df_eval_cal.columns.tolist()) print("") print("Calibrated future head") print(df_future_cal.head(8).to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Calibrated evaluation columns ['sample_idx', 'forecast_step', 'coord_t', 'coord_x', 'coord_y', 'subsidence_q10', 'subsidence_q50', 'subsidence_q90', 'subsidence_actual', 'subsidence_q10_raw', 'subsidence_q50_raw', 'subsidence_q90_raw', 'calibration_factor', 'is_calibrated'] Calibrated future head sample_idx forecast_step coord_t coord_x coord_y subsidence_q10 subsidence_q50 subsidence_q90 subsidence_q10_raw subsidence_q50_raw subsidence_q90_raw calibration_factor is_calibrated 0 1 2025 0.0000 0.0000 2.3436 2.6288 2.9140 2.4888 2.6288 2.7688 2.0375 True 1 1 2025 1500.0000 0.0000 2.4150 2.7104 3.0059 2.5654 2.7104 2.8554 2.0375 True 2 1 2025 3000.0000 0.0000 2.4864 2.7920 3.0977 2.6420 2.7920 2.9420 2.0375 True 3 1 2025 4500.0000 0.0000 2.5580 2.8738 3.1896 2.7188 2.8738 3.0288 2.0375 True 4 1 2025 6000.0000 0.0000 2.6332 2.9593 3.2854 2.7992 2.9593 3.1193 2.0375 True 5 1 2025 7500.0000 0.0000 2.7288 3.0655 3.4023 2.9003 3.0655 3.2308 2.0375 True 6 1 2025 9000.0000 0.0000 2.8143 3.1615 3.5087 2.9911 3.1615 3.3319 2.0375 True 7 1 2025 10500.0000 0.0000 2.8568 3.2137 3.5705 3.0385 3.2137 3.3888 2.0375 True .. GENERATED FROM PYTHON SOURCE LINES 511-520 Notice two practical columns: - ``calibration_factor``: the factor actually used for that horizon; - ``is_calibrated``: a flag that helps downstream logic detect that calibration was already applied. This is also why the wrapper can auto-skip re-calibration later. .. GENERATED FROM PYTHON SOURCE LINES 522-539 Step 8 - Why this page should come first in uncertainty ------------------------------------------------------- This page teaches the foundational uncertainty idea in the most concrete possible way: - forecasts can be overconfident, - interval coverage can be measured directly, - intervals can be recalibrated per horizon, - future forecasts can inherit the same fitted factors. That makes this the best first page in the uncertainty gallery. Later pages can then build naturally on top of it: - reliability diagrams, - coverage-versus-sharpness trade-offs, - probability calibration, - exceedance-oriented uncertainty analysis. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.303 seconds) .. _sphx_glr_download_auto_examples_uncertainty_plot_interval_calibration.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_interval_calibration.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_interval_calibration.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_interval_calibration.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_