.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/evaluation/plot_mean_interval_width_overview.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_evaluation_plot_mean_interval_width_overview.py: Learn how to read forecast sharpness with ``plot_mean_interval_width`` ====================================================================== This lesson explains how to use ``geoprior.plot.evaluation.plot_mean_interval_width`` when you want to answer a practical uncertainty question: **How wide are my prediction intervals, and are they becoming so wide that they stop being useful?** Why this function matters ------------------------- Coverage tells you whether the truth falls inside the interval. Width tells you how much uncertainty the interval is claiming. That makes mean interval width one of the simplest sharpness checks in forecast evaluation. It is tempting to think that narrower is always better, but that is not correct. - extremely narrow intervals may look impressive but miss the truth, - extremely wide intervals may cover almost everything while being too vague for decisions, - and two models can have similar coverage while offering very different practical usefulness because their widths differ a lot. That is exactly why this helper is valuable. It gives two complementary views: - a **histogram** of individual interval widths, - and a **summary bar** for the mean width. This page is written as a teaching guide, not only as an API demo. We will build realistic interval arrays, inspect the full width distribution, compare narrower and wider regimes, look at multi-output behavior, and finish with a checklist for applying the function to your own forecast tables. .. GENERATED FROM PYTHON SOURCE LINES 46-63 .. code-block:: Python from __future__ import annotations import matplotlib.pyplot as plt import numpy as np import pandas as pd from geoprior.plot.evaluation import plot_mean_interval_width pd.set_option("display.max_columns", 20) pd.set_option("display.width", 110) pd.set_option( "display.float_format", lambda v: f"{v:0.4f}", ) .. GENERATED FROM PYTHON SOURCE LINES 64-103 What this function really expects --------------------------------- ``plot_mean_interval_width`` works directly with two aligned arrays: - ``y_lower`` - ``y_upper`` The arrays must have the same shape, and the implementation accepts only: - ``(N,)`` for one output, - ``(N, O)`` for multiple outputs. This helper does not need ``y_true`` because it is not checking coverage or error. It is checking **sharpness** only. It offers two viewing modes: - ``kind='widths_histogram'`` - ``kind='summary_bar'`` The histogram answers: *How are the individual interval widths distributed?* The summary bar answers: *What is the mean interval width overall?* For multi-output data, one important rule matters: - ``output_idx`` is required for the histogram, - while the summary bar can show one overall mean or one bar per output when ``metric_kws={'multioutput': 'raw_values'}`` is used. This is helpful because sharpness questions are often different at different outputs. A subsidence interval and a groundwater interval rarely live on the same scale. .. GENERATED FROM PYTHON SOURCE LINES 106-120 Build a realistic one-output interval example --------------------------------------------- We begin with a single output and 80 forecast cases. The intervals will be narrower in the early part of the sample axis and wider later on. This creates a realistic width distribution instead of one flat repeated value. The helper computes width simply as: ``y_upper - y_lower`` so the most important thing for a lesson example is to make the width pattern visible and interpretable. .. GENERATED FROM PYTHON SOURCE LINES 120-148 .. code-block:: Python rng = np.random.default_rng(2026) n_samples = 80 center = 20.0 + 1.6 * np.sin(np.linspace(0, 4 * np.pi, n_samples)) base_width = np.linspace(0.9, 3.1, n_samples) noise = rng.normal(0.0, 0.12, n_samples) width = np.clip(base_width + noise, 0.4, None) y_lower = center - 0.5 * width y_upper = center + 0.5 * width preview = pd.DataFrame( { "center": center, "y_lower": y_lower, "y_upper": y_upper, "interval_width": y_upper - y_lower, } ) print("One-output preview") print(preview.head(10)) print("\nWidth summary") print(preview["interval_width"].describe()) .. rst-class:: sphx-glr-script-out .. code-block:: none One-output preview center y_lower y_upper interval_width 0 20.0000 19.5976 20.4024 0.8048 1 20.2534 19.7751 20.7318 0.9567 2 20.5005 20.1364 20.8645 0.7281 3 20.7349 20.1594 21.3104 1.1510 4 20.9507 20.4067 21.4947 1.0880 5 21.1426 20.6405 21.6447 1.0042 6 21.3056 20.7907 21.8204 1.0297 7 21.4356 20.8699 22.0013 1.1314 8 21.5294 20.9840 22.0747 1.0907 9 21.5845 21.0228 22.1463 1.1235 Width summary count 80.0000 mean 2.0144 std 0.6611 min 0.7281 25% 1.4293 50% 1.9948 75% 2.5131 max 3.2266 Name: interval_width, dtype: float64 .. GENERATED FROM PYTHON SOURCE LINES 149-164 Start with the width histogram ------------------------------ The histogram is the best first plot when you want to understand the *distribution* of sharpness rather than only its average. This matters because two forecasting systems can have the same mean width while behaving differently: - one may keep widths tightly concentrated, - another may mix very narrow and very wide intervals, - and a third may have a long right tail only at difficult cases. In this example the right tail is expected because later cases were deliberately given wider intervals. .. GENERATED FROM PYTHON SOURCE LINES 164-178 .. code-block:: Python plot_mean_interval_width( y_lower=y_lower, y_upper=y_upper, kind="widths_histogram", figsize=(9.8, 5.4), title="Distribution of interval widths", hist_bins=14, hist_color="#7C3AED", hist_edgecolor="#312E81", show_score=True, ) .. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_mean_interval_width_overview_001.png :alt: Distribution of interval widths (Mean Width: 2.0144) :srcset: /auto_examples/evaluation/images/sphx_glr_plot_mean_interval_width_overview_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 179-194 How to read the histogram correctly ----------------------------------- A good reading order is: 1. check where the bulk of the widths sits, 2. inspect whether there is a long tail of much wider intervals, 3. then compare that picture with the mean width. In this demo, the histogram is not symmetric. That is a useful warning sign for real projects because it means the model is not expressing uncertainty uniformly across cases. That is not automatically bad, but it tells you uncertainty is being concentrated in some parts of the forecast set. .. GENERATED FROM PYTHON SOURCE LINES 197-208 The summary bar gives the compact sharpness number -------------------------------------------------- Once you understand the distribution, the summary bar gives the compact value you would compare in a table or across competing runs. This is the simplest sharpness statistic for quick reporting. A smaller mean interval width indicates sharper forecasts, but that only becomes a positive result when coverage or calibration remains acceptable as well. .. GENERATED FROM PYTHON SOURCE LINES 208-220 .. code-block:: Python plot_mean_interval_width( y_lower=y_lower, y_upper=y_upper, kind="summary_bar", figsize=(6.8, 5.0), title="Mean interval width", bar_color="#8B5CF6", score_annotation_format="{:.3f}", ) .. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_mean_interval_width_overview_002.png :alt: Mean interval width :srcset: /auto_examples/evaluation/images/sphx_glr_plot_mean_interval_width_overview_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 221-236 Why width must never be read alone ---------------------------------- A low width score may come from genuinely sharp forecasts, but it may also come from intervals that are too narrow and therefore unreliable. A high width score may indicate proper caution, but it may also signal overly diffuse uncertainty that is not operationally useful. So the best habit is: - use ``plot_mean_interval_width`` to study sharpness, - then read it together with coverage, WIS, or calibration plots. In practice, width is one side of the reliability–sharpness trade-off. .. GENERATED FROM PYTHON SOURCE LINES 239-257 Compare a narrower regime and a wider regime -------------------------------------------- A helpful teaching trick is to split the same forecast set into two regimes. Here we compare the first half and the second half of the sample axis. This answers a very practical question: *Is the forecast becoming less sharp in one part of the evaluation set?* In many real workflows that split could represent: - early vs late horizons, - dry vs wet seasons, - urban core vs peripheral cells, - or one city vs another. .. GENERATED FROM PYTHON SOURCE LINES 257-288 .. code-block:: Python fig, axes = plt.subplots( 1, 2, figsize=(12.2, 4.9), constrained_layout=True, ) plot_mean_interval_width( y_lower=y_lower[:40], y_upper=y_upper[:40], kind="summary_bar", ax=axes[0], title="Earlier regime: narrower intervals", bar_color="#0EA5E9", score_annotation_format="{:.3f}", ) plot_mean_interval_width( y_lower=y_lower[40:], y_upper=y_upper[40:], kind="summary_bar", ax=axes[1], title="Later regime: wider intervals", bar_color="#F97316", score_annotation_format="{:.3f}", ) plt.show() .. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_mean_interval_width_overview_003.png :alt: Earlier regime: narrower intervals, Later regime: wider intervals :srcset: /auto_examples/evaluation/images/sphx_glr_plot_mean_interval_width_overview_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 289-302 What this regime comparison teaches ----------------------------------- The second half should show a visibly larger mean width. That is exactly what we encoded in the synthetic data. The lesson is important for real evaluations: a single average width can hide regime drift. If your intervals widen sharply only at late horizons, difficult geologies, or a single study area, you need to know that before you report one global sharpness number. .. GENERATED FROM PYTHON SOURCE LINES 305-319 Multi-output interval widths are often the most realistic case -------------------------------------------------------------- Many GeoPrior workflows evaluate more than one target or more than one forecast family. The helper accepts 2D arrays of shape ``(N, O)`` for that situation. Here we create two outputs: - Output 0: relatively sharp intervals, - Output 1: visibly wider intervals. This is useful because the summary bar can now show one bar per output when we request raw values from the metric. .. GENERATED FROM PYTHON SOURCE LINES 319-349 .. code-block:: Python center_2 = 7.0 + 0.9 * np.cos(np.linspace(0, 3 * np.pi, n_samples)) width_1 = np.clip(0.8 + rng.normal(0.0, 0.08, n_samples), 0.35, None) width_2 = np.clip(2.0 + 0.45 * np.sin(np.linspace(0, 2 * np.pi, n_samples)) + rng.normal(0.0, 0.10, n_samples), 0.8, None) y_lower_2d = np.column_stack( [ center - 0.5 * width_1, center_2 - 0.5 * width_2, ] ) y_upper_2d = np.column_stack( [ center + 0.5 * width_1, center_2 + 0.5 * width_2, ] ) multi_preview = pd.DataFrame( { "output0_width": y_upper_2d[:, 0] - y_lower_2d[:, 0], "output1_width": y_upper_2d[:, 1] - y_lower_2d[:, 1], } ) print("\nTwo-output width preview") print(multi_preview.head(10)) .. rst-class:: sphx-glr-script-out .. code-block:: none Two-output width preview output0_width output1_width 0 0.9222 2.0621 1 0.7425 2.3276 2 0.8046 2.1653 3 0.8372 2.0122 4 0.8299 2.4071 5 0.7013 2.0822 6 0.7469 2.2856 7 0.7843 2.3121 8 0.7317 2.1794 9 0.8542 2.4348 .. GENERATED FROM PYTHON SOURCE LINES 350-359 Use one bar per output in the summary view ------------------------------------------ This is one of the most useful multi-output patterns. It produces one mean-width bar per output instead of collapsing them into one average. That is usually the better teaching choice because interval width is scale-sensitive. Different outputs can behave very differently. .. GENERATED FROM PYTHON SOURCE LINES 359-372 .. code-block:: Python plot_mean_interval_width( y_lower=y_lower_2d, y_upper=y_upper_2d, kind="summary_bar", metric_kws={"multioutput": "raw_values"}, figsize=(7.6, 5.0), title="Mean interval width per output", bar_color=["#14B8A6", "#EC4899"], score_annotation_format="{:.3f}", ) .. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_mean_interval_width_overview_004.png :alt: Mean interval width per output (Per Output) :srcset: /auto_examples/evaluation/images/sphx_glr_plot_mean_interval_width_overview_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 373-381 For the histogram, choose the output explicitly ----------------------------------------------- Histogram mode works on one selected output at a time when the data is multi-output. That is why ``output_idx`` is required. This makes the reading cleaner: the histogram answers the distribution question for one output without mixing scales. .. GENERATED FROM PYTHON SOURCE LINES 381-395 .. code-block:: Python plot_mean_interval_width( y_lower=y_lower_2d, y_upper=y_upper_2d, kind="widths_histogram", output_idx=1, hist_bins=12, figsize=(9.6, 5.2), title="Width distribution for output 1", hist_color="#F59E0B", hist_edgecolor="#7C2D12", ) .. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_mean_interval_width_overview_005.png :alt: Width distribution for output 1 (Output 1) (Mean Width: 2.0144) :srcset: /auto_examples/evaluation/images/sphx_glr_plot_mean_interval_width_overview_005.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 396-411 A practical reading rule for your own data ------------------------------------------ When you use this helper on real forecast outputs, a good workflow is: 1. start with the histogram, 2. check whether widths are tightly concentrated or widely spread, 3. then read the mean width from the summary bar, 4. compare outputs or regimes separately if needed, 5. and finally interpret width together with reliability metrics. In real forecast evaluation, ``plot_mean_interval_width`` is rarely a final decision plot by itself. It is a sharpness plot that becomes most useful when paired with coverage, calibration, and weighted interval score. .. GENERATED FROM PYTHON SOURCE LINES 414-446 How to adapt this lesson to your own forecast results ----------------------------------------------------- In your own project, the key step is simply to provide aligned lower and upper bounds. Typical sources are: - ``q10`` and ``q90`` columns for an 80% interval, - ``q05`` and ``q95`` for a 90% interval, - or calibrated lower/upper bounds written by a Stage-2 workflow. A practical adaptation pattern is: ``y_lower = df['subsidence_q10'].to_numpy()`` ``y_upper = df['subsidence_q90'].to_numpy()`` Then: - use ``kind='widths_histogram'`` to inspect the full spread, - use ``kind='summary_bar'`` to report the mean width, - use ``metric_kws={'multioutput': 'raw_values'}`` when each column is a separate output, - and use ``output_idx=...`` when you want one histogram for one output. If the result looks extremely narrow, do not celebrate too early. Check coverage next. If the result looks extremely wide, do not reject it too quickly. Check whether calibration or long-horizon uncertainty genuinely needed that width. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.498 seconds) .. _sphx_glr_download_auto_examples_evaluation_plot_mean_interval_width_overview.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_mean_interval_width_overview.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_mean_interval_width_overview.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_mean_interval_width_overview.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_