.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/uncertainty/plot_coverage_vs_sharpness.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_uncertainty_plot_coverage_vs_sharpness.py: Coverage versus sharpness in probabilistic forecasts ==================================================== This lesson teaches one of the most important ideas in forecast uncertainty: **a good interval forecast must be both honest and useful.** In practice, that means balancing two competing goals: - **coverage**: does the interval contain the truth often enough? - **sharpness**: is the interval still narrow enough to be informative? Why this page matters --------------------- After interval calibration, the next question is no longer only "can we widen the intervals?" but: **how should we judge whether the resulting uncertainty is actually better?** That is the role of a coverage-versus-sharpness analysis. In GeoPrior, this lesson naturally combines three pieces: - :func:`geoprior.utils.forecast_utils.evaluate_forecast`, which computes coverage and sharpness from an evaluation forecast table in quantile mode; - :func:`geoprior.utils.calibrate.calibrate_quantile_forecasts`, which can recalibrate under-covered intervals; - :func:`geoprior.utils.forecast_utils.plot_reliability_diagram`, which visualizes empirical coverage against nominal probability for interval forecasts. The evaluator reports ``coverage80`` and ``sharpness80`` for a chosen interval such as q10-q90, and the reliability helper draws the perfect calibration line against empirical coverage from the forecast intervals. What this lesson teaches ------------------------ We will: 1. create one common synthetic truth field, 2. build several probabilistic forecast variants: overconfident, balanced, conservative, 3. recalibrate the overconfident forecast, 4. compare all models in coverage-sharpness space, 5. inspect how that trade-off changes by horizon, 6. finish with a reliability diagram. This page is intentionally synthetic so it remains fully executable during the documentation build. .. GENERATED FROM PYTHON SOURCE LINES 60-63 Imports ------- We use the real uncertainty and forecast-evaluation helpers. .. GENERATED FROM PYTHON SOURCE LINES 63-76 .. code-block:: Python from __future__ import annotations import numpy as np import pandas as pd import matplotlib.pyplot as plt from geoprior.utils.calibrate import calibrate_quantile_forecasts from geoprior.utils.forecast_utils import ( evaluate_forecast, plot_reliability_diagram, ) .. GENERATED FROM PYTHON SOURCE LINES 77-84 Step 1 - Build one shared synthetic truth field ----------------------------------------------- We mimic a compact spatial forecasting problem. The important design choice here is that *all* forecast variants will share the same true outcomes. This makes their uncertainty comparison fair. .. GENERATED FROM PYTHON SOURCE LINES 84-167 .. code-block:: Python rng = np.random.default_rng(21) nx = 9 ny = 6 steps = [1, 2, 3] years = {1: 2024, 2: 2025, 3: 2026} xv = np.linspace(0.0, 12_000.0, nx) yv = np.linspace(0.0, 8_000.0, ny) X, Y = np.meshgrid(xv, yv) x_flat = X.ravel() y_flat = Y.ravel() n_sites = x_flat.size xn = (x_flat - x_flat.min()) / (x_flat.max() - x_flat.min()) yn = (y_flat - y_flat.min()) / (y_flat.max() - y_flat.min()) hotspot = np.exp( -( ((xn - 0.72) ** 2) / 0.020 + ((yn - 0.34) ** 2) / 0.032 ) ) ridge = 0.52 * np.exp( -( ((xn - 0.28) ** 2) / 0.028 + ((yn - 0.72) ** 2) / 0.050 ) ) gradient = 0.44 * xn + 0.22 * (1.0 - yn) # Quantile levels available in the forecast tables. quantiles = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] # Standard-normal style z scores for symmetric quantiles. z_map = { 0.1: -1.2816, 0.2: -0.8416, 0.3: -0.5244, 0.4: -0.2533, 0.5: 0.0, 0.6: 0.2533, 0.7: 0.5244, 0.8: 0.8416, 0.9: 1.2816, } # Shared truth-generating process by horizon step. truth_mean_by_step: dict[int, np.ndarray] = {} truth_actual_by_step: dict[int, np.ndarray] = {} base_sigma_by_step: dict[int, np.ndarray] = {} for step in steps: step_scale = {1: 1.00, 2: 1.18, 3: 1.42}[step] mu = ( 2.2 + 1.45 * gradient + 2.05 * hotspot + 0.95 * ridge ) * step_scale # "Natural" uncertainty scale before model over/under-confidence. sigma = ( 0.22 + 0.07 * xn + 0.04 * hotspot + 0.03 * step ) # Shared actual values that all models will face. actual = mu + rng.normal(0.0, sigma, size=n_sites) truth_mean_by_step[step] = mu truth_actual_by_step[step] = actual base_sigma_by_step[step] = sigma print("Number of spatial samples:", n_sites) print("Forecast steps:", steps) print("Quantiles:", quantiles) .. rst-class:: sphx-glr-script-out .. code-block:: none Number of spatial samples: 54 Forecast steps: [1, 2, 3] Quantiles: [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] .. GENERATED FROM PYTHON SOURCE LINES 168-181 Step 2 - Create forecast tables with different uncertainty behavior ------------------------------------------------------------------- We now build three forecast styles: - **overconfident**: intervals too narrow; - **balanced**: intervals reasonably matched to the data; - **conservative**: intervals too wide. All three share the same median forecast so the comparison focuses on uncertainty behavior rather than mean-forecast accuracy. .. GENERATED FROM PYTHON SOURCE LINES 181-224 .. code-block:: Python def make_quantile_df( *, width_scale: float, median_bias: float = 0.0, ) -> pd.DataFrame: rows: list[dict[str, float | int]] = [] for step in steps: mu = truth_mean_by_step[step] + median_bias actual = truth_actual_by_step[step] sigma = base_sigma_by_step[step] * width_scale for i in range(n_sites): row = { "sample_idx": i, "forecast_step": step, "coord_t": years[step], "coord_x": float(x_flat[i]), "coord_y": float(y_flat[i]), "subsidence_actual": float(actual[i]), } for q in quantiles: col = f"subsidence_q{int(q * 100):02d}" row[col] = float(mu[i] + z_map[q] * sigma[i]) rows.append(row) return pd.DataFrame(rows) df_over = make_quantile_df(width_scale=0.55) df_bal = make_quantile_df(width_scale=1.00) df_cons = make_quantile_df(width_scale=1.70) print("Overconfident rows:", len(df_over)) print("Balanced rows:", len(df_bal)) print("Conservative rows:", len(df_cons)) print("") print(df_over.head(6).to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Overconfident rows: 162 Balanced rows: 162 Conservative rows: 162 sample_idx forecast_step coord_t coord_x coord_y subsidence_actual subsidence_q10 subsidence_q20 subsidence_q30 subsidence_q40 subsidence_q50 subsidence_q60 subsidence_q70 subsidence_q80 subsidence_q90 0 1 2024 0.0000 0.0000 2.6087 2.3428 2.4033 2.4469 2.4842 2.5190 2.5538 2.5911 2.6347 2.6952 1 1 2024 1500.0000 0.0000 2.9896 2.4164 2.4790 2.5241 2.5627 2.5988 2.6348 2.6734 2.7185 2.7811 2 1 2024 3000.0000 0.0000 2.2007 2.4900 2.5547 2.6014 2.6412 2.6785 2.7158 2.7557 2.8023 2.8671 3 1 2024 4500.0000 0.0000 3.2243 2.5637 2.6305 2.6787 2.7199 2.7584 2.7969 2.8381 2.8863 2.9531 4 1 2024 6000.0000 0.0000 2.8294 2.6420 2.7110 2.7607 2.8032 2.8429 2.8826 2.9251 2.9749 3.0439 5 1 2024 7500.0000 0.0000 2.7174 2.7454 2.8167 2.8681 2.9120 2.9530 2.9940 3.0379 3.0893 3.1605 .. GENERATED FROM PYTHON SOURCE LINES 225-233 Step 3 - Create a future forecast table for calibration transfer ---------------------------------------------------------------- The calibration wrapper expects both an evaluation table and, when available, a future table. The future table does not contain actuals. We only need one future table here because we are going to calibrate the overconfident forecast and compare the calibrated result to the other variants. .. GENERATED FROM PYTHON SOURCE LINES 233-265 .. code-block:: Python def make_future_df( *, width_scale: float, median_growth: float = 1.06, ) -> pd.DataFrame: rows: list[dict[str, float | int]] = [] for step in steps: mu = truth_mean_by_step[step] * median_growth sigma = base_sigma_by_step[step] * width_scale for i in range(n_sites): row = { "sample_idx": i, "forecast_step": step, "coord_t": years[step] + 1, "coord_x": float(x_flat[i]), "coord_y": float(y_flat[i]), } for q in quantiles: col = f"subsidence_q{int(q * 100):02d}" row[col] = float(mu[i] + z_map[q] * sigma[i]) rows.append(row) return pd.DataFrame(rows) df_future_over = make_future_df(width_scale=0.55) .. GENERATED FROM PYTHON SOURCE LINES 266-276 Step 4 - Calibrate the overconfident forecast --------------------------------------------- We now produce a fourth model: - **calibrated**: obtained by applying the real GeoPrior interval-calibration workflow to the overconfident forecast. The calibration wrapper fits horizon-wise widening factors on ``df_eval`` and applies them to both evaluation and future tables. .. GENERATED FROM PYTHON SOURCE LINES 276-294 .. code-block:: Python df_over_cal, df_future_over_cal, stats_cal = calibrate_quantile_forecasts( df_eval=df_over, df_future=df_future_over, target_name="subsidence", step_col="forecast_step", interval=(0.1, 0.9), target_coverage=0.8, median_q=0.5, use="auto", keep_original=True, enforce_monotonic="cummax", verbose=0, ) print("Calibration stats") print(stats_cal) .. rst-class:: sphx-glr-script-out .. code-block:: none Calibration stats {'target': 0.8, 'interval': [0.1, 0.9], 'f_max': 5.0, 'tol': 0.02, 'overall_key': '__overall__', 'factors_source': 'fit', 'factors': {'1': 1.9013090133666992, '2': 1.5369853973388672, '3': 1.7673044204711914}, 'eval_before': {'coverage': 0.49382716049382713, 'sharpness': 0.44739274608428753, 'per_horizon': {'1': {'coverage': 0.46296296296296297, 'sharpness': 0.40509994608428745}, '2': {'coverage': 0.5555555555555556, 'sharpness': 0.4473927460842876}, '3': {'coverage': 0.46296296296296297, 'sharpness': 0.48968554608428744}}}, 'eval_after': {'coverage': 0.8148148148148148, 'sharpness': 0.7744265755489718, 'per_horizon': {'1': {'coverage': 0.8148148148148148, 'sharpness': 0.7702201788044195}, '2': {'coverage': 0.8148148148148148, 'sharpness': 0.6876361176068855}, '3': {'coverage': 0.8148148148148148, 'sharpness': 0.8654234302356104}}}} .. GENERATED FROM PYTHON SOURCE LINES 295-303 Step 5 - Evaluate all models with the real evaluator ---------------------------------------------------- ``evaluate_forecast`` computes deterministic accuracy plus interval quality. In quantile mode it reports: - ``coverage80`` for the chosen interval, - ``sharpness80`` as mean interval width, - and optional per-horizon deterministic summaries. .. GENERATED FROM PYTHON SOURCE LINES 303-337 .. code-block:: Python model_tables = { "Overconfident": df_over, "Balanced": df_bal, "Conservative": df_cons, "Calibrated": df_over_cal, } overall_rows = [] for name, df_model in model_tables.items(): metrics = evaluate_forecast( df_model, target_name="subsidence", per_horizon=True, quantile_interval=(0.1, 0.9), verbose=0, ) overall = metrics["__overall__"] overall_rows.append( { "model": name, "overall_mae": overall["overall_mae"], "overall_rmse": overall["overall_rmse"], "overall_r2": overall["overall_r2"], "coverage80": overall["coverage80"], "sharpness80": overall["sharpness80"], } ) df_overall = pd.DataFrame(overall_rows) print("") print("Overall uncertainty summary") print(df_overall.to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Overall uncertainty summary model overall_mae overall_rmse overall_r2 coverage80 sharpness80 Overconfident 0.2419 0.2952 0.8493 0.4938 0.4474 Balanced 0.2419 0.2952 0.8493 0.8210 0.8134 Conservative 0.2419 0.2952 0.8493 0.9877 1.3829 Calibrated 0.2419 0.2952 0.8493 0.8148 0.7744 .. GENERATED FROM PYTHON SOURCE LINES 338-346 Step 6 - Compute coverage and sharpness by horizon -------------------------------------------------- ``evaluate_forecast`` gives overall interval quality directly, but for a teaching page it is helpful to also build a small per-step table by hand. This lets us see whether uncertainty problems become worse at longer horizons. .. GENERATED FROM PYTHON SOURCE LINES 346-378 .. code-block:: Python def coverage_and_sharpness_by_step(df: pd.DataFrame) -> pd.DataFrame: rows = [] for step, g in df.groupby("forecast_step", sort=True): actual = g["subsidence_actual"].to_numpy(float) lo = g["subsidence_q10"].to_numpy(float) hi = g["subsidence_q90"].to_numpy(float) coverage = float(np.mean((actual >= lo) & (actual <= hi))) sharpness = float(np.mean(hi - lo)) rows.append( { "forecast_step": int(step), "coverage80": coverage, "sharpness80": sharpness, } ) return pd.DataFrame(rows) step_tables = { name: coverage_and_sharpness_by_step(df_model) for name, df_model in model_tables.items() } for name, df_step in step_tables.items(): print("") print(name) print(df_step.to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Overconfident forecast_step coverage80 sharpness80 1 0.4630 0.4051 2 0.5556 0.4474 3 0.4630 0.4897 Balanced forecast_step coverage80 sharpness80 1 0.7778 0.7365 2 0.8519 0.8134 3 0.8333 0.8903 Conservative forecast_step coverage80 sharpness80 1 0.9815 1.2521 2 0.9815 1.3829 3 1.0000 1.5136 Calibrated forecast_step coverage80 sharpness80 1 0.8148 0.7702 2 0.8148 0.6876 3 0.8148 0.8654 .. GENERATED FROM PYTHON SOURCE LINES 379-389 Step 7 - Plot the coverage-versus-sharpness trade-off ----------------------------------------------------- This is the main figure of the lesson. Read this scatter plot as follows: - moving **up** improves empirical coverage; - moving **right** widens the interval; - the "best" point depends on the target use case, but typically we want enough coverage without becoming excessively wide. .. GENERATED FROM PYTHON SOURCE LINES 389-414 .. code-block:: Python fig, ax = plt.subplots(figsize=(7.2, 5.5)) for _, row in df_overall.iterrows(): ax.scatter( row["sharpness80"], row["coverage80"], s=90, label=row["model"], ) ax.annotate( row["model"], (row["sharpness80"], row["coverage80"]), xytext=(6, 6), textcoords="offset points", ) ax.axhline(0.80, linestyle="--", linewidth=1.5) ax.set_xlabel("Sharpness80 (mean interval width)") ax.set_ylabel("Coverage80 (empirical coverage)") ax.set_title("Coverage versus sharpness") ax.grid(True, linestyle=":", alpha=0.6) plt.tight_layout() plt.show() .. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_coverage_vs_sharpness_001.png :alt: Coverage versus sharpness :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_coverage_vs_sharpness_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 415-428 How to read this trade-off -------------------------- Typical interpretation: - **Overconfident**: sharp but under-covered; - **Conservative**: well-covered but very wide; - **Balanced**: closer to a useful compromise; - **Calibrated**: should move upward from the overconfident point, usually with some rightward movement because intervals widen. .. GENERATED FROM PYTHON SOURCE LINES 430-434 Step 8 - Plot horizon-wise coverage and sharpness ------------------------------------------------- A forecast can look acceptable overall and still behave badly at the longer horizons. So we now inspect the trade-off step by step. .. GENERATED FROM PYTHON SOURCE LINES 434-467 .. code-block:: Python fig, axes = plt.subplots(1, 2, figsize=(11.2, 4.2)) for name, df_step in step_tables.items(): axes[0].plot( df_step["forecast_step"], df_step["coverage80"], marker="o", label=name, ) axes[0].axhline(0.80, linestyle="--", linewidth=1.2) axes[0].set_xlabel("Forecast step") axes[0].set_ylabel("Coverage80") axes[0].set_title("Coverage by horizon") axes[0].grid(True, linestyle=":", alpha=0.6) axes[0].legend() for name, df_step in step_tables.items(): axes[1].plot( df_step["forecast_step"], df_step["sharpness80"], marker="o", label=name, ) axes[1].set_xlabel("Forecast step") axes[1].set_ylabel("Sharpness80") axes[1].set_title("Sharpness by horizon") axes[1].grid(True, linestyle=":", alpha=0.6) axes[1].legend() plt.tight_layout() plt.show() .. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_coverage_vs_sharpness_002.png :alt: Coverage by horizon, Sharpness by horizon :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_coverage_vs_sharpness_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 468-480 Why this horizon view matters ----------------------------- This panel often reveals the real uncertainty story. In many forecasting systems: - later steps need wider intervals, - undercoverage becomes more severe with horizon, - recalibration helps most at the longer steps. That is exactly why the first uncertainty lesson fit interval factors per horizon instead of using one global factor for the whole forecast. .. GENERATED FROM PYTHON SOURCE LINES 482-491 Step 9 - Add a reliability diagram ---------------------------------- The reliability diagram belongs naturally in this section because it compares nominal interval probability to empirical coverage. The helper can wrap simple DataFrames and draw the diagonal "perfect calibration" reference line. We compare the overconfident, balanced, conservative, and calibrated forecasts on the same observed outcomes. .. GENERATED FROM PYTHON SOURCE LINES 491-531 .. code-block:: Python y_true_series = pd.Series(df_over["subsidence_actual"].to_numpy()) rel_models = { "Overconfident": { "forecasts": df_over[ [f"subsidence_q{int(q * 100):02d}" for q in quantiles] ].reset_index(drop=True), "marker": "o", }, "Balanced": { "forecasts": df_bal[ [f"subsidence_q{int(q * 100):02d}" for q in quantiles] ].reset_index(drop=True), "marker": "s", }, "Conservative": { "forecasts": df_cons[ [f"subsidence_q{int(q * 100):02d}" for q in quantiles] ].reset_index(drop=True), "marker": "^", }, "Calibrated": { "forecasts": df_over_cal[ [f"subsidence_q{int(q * 100):02d}" for q in quantiles] ].reset_index(drop=True), "marker": "D", }, } plot_reliability_diagram( models_data=rel_models, y_true=y_true_series.reset_index(drop=True), prefix="subsidence", figsize=(7, 7), title="Reliability diagram across forecast variants", plot_style="default", verbose=0, ) .. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_coverage_vs_sharpness_003.png :alt: Reliability diagram across forecast variants :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_coverage_vs_sharpness_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 532-545 How to read the reliability diagram ----------------------------------- The diagonal represents perfect calibration. A model below the diagonal is typically overconfident for those nominal probabilities, while a model above it is conservative. This is why the reliability diagram and the coverage-sharpness scatter complement each other: - the scatter summarizes one chosen interval, - the reliability diagram shows calibration over a range of nominal interval probabilities. .. GENERATED FROM PYTHON SOURCE LINES 547-564 Step 10 - Final practical takeaway ---------------------------------- A good uncertainty analysis should never report coverage alone. Why? Because coverage can always be improved by making intervals wider. That is not enough. The right question is: - how much coverage did we gain, - how much sharpness did we lose, - and does the new forecast sit closer to a useful operating point? That is why this page belongs early in the uncertainty section: it teaches the core trade-off that the later calibration and exceedance pages build on. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.645 seconds) .. _sphx_glr_download_auto_examples_uncertainty_plot_coverage_vs_sharpness.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_coverage_vs_sharpness.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_coverage_vs_sharpness.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_coverage_vs_sharpness.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_