.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/uncertainty/plot_brier_exceedance.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_uncertainty_plot_brier_exceedance.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_uncertainty_plot_brier_exceedance.py:


Exceedance probabilities and Brier score
========================================

This lesson teaches how to work with **exceedance probabilities**
in GeoPrior and how to calibrate them with
:func:`geoprior.utils.calibrate.calibrate_probability_forecast`.

Why this page matters
---------------------
Not every uncertainty question is about an interval such as q10-q90.

In many practical risk settings, the question is event-based:

- What is the probability that subsidence exceeds a critical threshold?
- Which zones have high exceedance risk next year?
- Are those probabilities trustworthy?

That is the setting of **exceedance probability forecasts**.

For binary events, a natural accuracy measure is the **Brier score**:

.. math::

   \mathrm{BS}
   =
   \frac{1}{n}
   \sum_{i=1}^{n}
   (p_i - y_i)^2

where:

- :math:`p_i` is the forecast probability of exceedance,
- :math:`y_i \in \{0,1\}` is the observed event outcome.

Smaller Brier scores are better.

What the real utility does
--------------------------
GeoPrior provides
:func:`geoprior.utils.calibrate.calibrate_probability_forecast`
to calibrate a probability column against a binary outcome column.

It supports:

- ``method="isotonic"``:
  non-parametric monotonic calibration,
- ``method="logistic"``:
  Platt-style logistic calibration,

and returns a copy of the input DataFrame with an added calibrated
probability column. This makes it the right core helper for an
exceedance-oriented uncertainty lesson.

What this lesson teaches
------------------------
We will:

1. build a synthetic spatial exceedance-risk dataset,
2. create deliberately miscalibrated raw probabilities,
3. compute Brier scores before calibration,
4. calibrate the probabilities with isotonic and logistic methods,
5. compare Brier scores after calibration,
6. build explanatory plots for:
   - reliability,
   - Brier score by horizon,
   - exceedance risk maps.

This page is synthetic so it stays fully executable during the
documentation build.

.. GENERATED FROM PYTHON SOURCE LINES 74-78

Imports
-------
We use the real GeoPrior calibration helper and build the rest of
the lesson around it.

.. GENERATED FROM PYTHON SOURCE LINES 78-85

.. code-block:: Python


    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt

    from geoprior.utils.calibrate import calibrate_probability_forecast


.. GENERATED FROM PYTHON SOURCE LINES 86-93

Step 1 - Build a compact synthetic spatial exceedance dataset
-------------------------------------------------------------
We mimic a spatial forecasting problem with three forecast steps.

The target is no longer a continuous forecast value. Instead, we
focus on whether the realized subsidence exceeds a critical
threshold.

.. GENERATED FROM PYTHON SOURCE LINES 93-184

.. code-block:: Python


    rng = np.random.default_rng(33)

    nx = 10
    ny = 7
    steps = [1, 2, 3]
    years = {1: 2024, 2: 2025, 3: 2026}

    xv = np.linspace(0.0, 12_000.0, nx)
    yv = np.linspace(0.0, 8_000.0, ny)
    X, Y = np.meshgrid(xv, yv)

    x_flat = X.ravel()
    y_flat = Y.ravel()
    n_sites = x_flat.size

    xn = (x_flat - x_flat.min()) / (x_flat.max() - x_flat.min())
    yn = (y_flat - y_flat.min()) / (y_flat.max() - y_flat.min())

    hotspot = np.exp(
        -(
            ((xn - 0.72) ** 2) / 0.020
            + ((yn - 0.34) ** 2) / 0.030
        )
    )
    ridge = 0.50 * np.exp(
        -(
            ((xn - 0.28) ** 2) / 0.030
            + ((yn - 0.74) ** 2) / 0.050
        )
    )
    gradient = 0.45 * xn + 0.20 * (1.0 - yn)

    threshold = 5.5

    rows: list[dict[str, float | int]] = []

    for step in steps:
        scale = {1: 1.00, 2: 1.20, 3: 1.45}[step]

        # latent mean field
        mu = (
            2.4
            + 1.5 * gradient
            + 2.1 * hotspot
            + 0.9 * ridge
        ) * scale

        # true uncertainty for realized outcomes
        sigma = (
            0.40
            + 0.10 * xn
            + 0.05 * hotspot
            + 0.04 * step
        )

        actual = mu + rng.normal(0.0, sigma, size=n_sites)

        # binary exceedance event
        exceed = (actual > threshold).astype(int)

        # "True" event probability under a latent Gaussian idea.
        # We avoid scipy here and use a logistic-style approximation.
        z = (mu - threshold) / np.maximum(sigma, 1e-6)
        p_true = 1.0 / (1.0 + np.exp(-1.7 * z))

        # Deliberately overconfident raw forecast probabilities:
        # push values too hard toward 0 and 1.
        p_raw = np.clip(p_true ** 0.65, 0.0, 1.0)

        for i in range(n_sites):
            rows.append(
                {
                    "sample_idx": i,
                    "forecast_step": step,
                    "coord_t": years[step],
                    "coord_x": float(x_flat[i]),
                    "coord_y": float(y_flat[i]),
                    "subsidence_actual": float(actual[i]),
                    "exceed_event": int(exceed[i]),
                    "prob_exceed_raw": float(p_raw[i]),
                    "prob_exceed_true_latent": float(p_true[i]),
                }
            )

    df = pd.DataFrame(rows)

    print("Dataset shape:", df.shape)
    print("")
    print(df.head(10).to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Dataset shape: (210, 9)

     sample_idx  forecast_step  coord_t    coord_x  coord_y  subsidence_actual  exceed_event  prob_exceed_raw  prob_exceed_true_latent
              0              1     2024     0.0000   0.0000             2.8753             0           0.0009                   0.0000
              1              1     2024  1333.3333   0.0000             2.5211             0           0.0013                   0.0000
              2              1     2024  2666.6667   0.0000             3.1222             0           0.0018                   0.0001
              3              1     2024  4000.0000   0.0000             2.9450             0           0.0025                   0.0001
              4              1     2024  5333.3333   0.0000             2.2400             0           0.0033                   0.0002
              5              1     2024  6666.6667   0.0000             3.5832             0           0.0046                   0.0003
              6              1     2024  8000.0000   0.0000             3.1390             0           0.0065                   0.0004
              7              1     2024  9333.3333   0.0000             3.5842             0           0.0085                   0.0007
              8              1     2024 10666.6667   0.0000             4.2826             0           0.0103                   0.0009
              9              1     2024 12000.0000   0.0000             3.5208             0           0.0129                   0.0012


.. GENERATED FROM PYTHON SOURCE LINES 185-193

Step 2 - Understand the event we are forecasting
------------------------------------------------
The binary event column is the main target for this lesson:

- ``exceed_event = 1`` if subsidence > threshold
- ``exceed_event = 0`` otherwise

We summarize the event rate by forecast step.

.. GENERATED FROM PYTHON SOURCE LINES 193-209

.. code-block:: Python


    event_summary = (
        df.groupby("forecast_step")
        .agg(
            year=("coord_t", "first"),
            event_rate=("exceed_event", "mean"),
            mean_prob_raw=("prob_exceed_raw", "mean"),
            mean_actual=("subsidence_actual", "mean"),
        )
        .reset_index()
    )

    print("")
    print("Event summary")
    print(event_summary.to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Event summary
     forecast_step  year  event_rate  mean_prob_raw  mean_actual
                 1  2024      0.0000         0.0134       3.0872
                 2  2025      0.0286         0.0535       3.6562
                 3  2026      0.0714         0.1740       4.4622


.. GENERATED FROM PYTHON SOURCE LINES 210-219

Step 3 - Define a small Brier score helper
------------------------------------------
The Brier score is the mean squared error for probabilities.

It is ideal for exceedance forecasting because it rewards:

- high probability when the event occurs,
- low probability when the event does not occur,
- and penalizes overconfident mistakes strongly.

.. GENERATED FROM PYTHON SOURCE LINES 219-237

.. code-block:: Python


    def brier_score(y_true: np.ndarray, p: np.ndarray) -> float:
        y_true = np.asarray(y_true, dtype=float)
        p = np.asarray(p, dtype=float)
        m = np.isfinite(y_true) & np.isfinite(p)
        if not np.any(m):
            return float("nan")
        return float(np.mean((p[m] - y_true[m]) ** 2))


    raw_brier = brier_score(
        df["exceed_event"].to_numpy(),
        df["prob_exceed_raw"].to_numpy(),
    )

    print("")
    print("Overall raw Brier score:", raw_brier)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Overall raw Brier score: 0.013873097981654217


.. GENERATED FROM PYTHON SOURCE LINES 238-247

Step 4 - Calibrate the probabilities with the real helper
---------------------------------------------------------
We apply the actual GeoPrior utility twice:

- isotonic calibration,
- logistic calibration.

The helper adds a calibrated probability column to a copy of the
input DataFrame.

.. GENERATED FROM PYTHON SOURCE LINES 247-268

.. code-block:: Python


    df_iso = calibrate_probability_forecast(
        df,
        prob_col="prob_exceed_raw",
        actual_col="exceed_event",
        method="isotonic",
        out_col="prob_exceed_iso",
    )

    df_log = calibrate_probability_forecast(
        df,
        prob_col="prob_exceed_raw",
        actual_col="exceed_event",
        method="logistic",
        out_col="prob_exceed_log",
    )

    print("")
    print("Columns after isotonic calibration")
    print(df_iso.columns.tolist())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Columns after isotonic calibration
    ['sample_idx', 'forecast_step', 'coord_t', 'coord_x', 'coord_y', 'subsidence_actual', 'exceed_event', 'prob_exceed_raw', 'prob_exceed_true_latent', 'prob_exceed_iso']


.. GENERATED FROM PYTHON SOURCE LINES 269-272

Step 5 - Compare Brier scores before and after calibration
----------------------------------------------------------
This is the first key numerical check.

.. GENERATED FROM PYTHON SOURCE LINES 272-303

.. code-block:: Python


    score_rows = [
        {
            "model": "Raw probability",
            "brier": brier_score(
                df["exceed_event"].to_numpy(),
                df["prob_exceed_raw"].to_numpy(),
            ),
        },
        {
            "model": "Isotonic calibrated",
            "brier": brier_score(
                df_iso["exceed_event"].to_numpy(),
                df_iso["prob_exceed_iso"].to_numpy(),
            ),
        },
        {
            "model": "Logistic calibrated",
            "brier": brier_score(
                df_log["exceed_event"].to_numpy(),
                df_log["prob_exceed_log"].to_numpy(),
            ),
        },
    ]

    df_scores = pd.DataFrame(score_rows)

    print("")
    print("Overall Brier score comparison")
    print(df_scores.to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Overall Brier score comparison
                  model  brier
        Raw probability 0.0139
    Isotonic calibrated 0.0048
    Logistic calibrated 0.0201


.. GENERATED FROM PYTHON SOURCE LINES 304-310

How to read these numbers
-------------------------
Lower Brier score is better.

A calibration method is useful here when it reduces the Brier score
without destroying the ranking of risk across the map.

.. GENERATED FROM PYTHON SOURCE LINES 312-317

Step 6 - Compare Brier score by forecast horizon
------------------------------------------------
Overall accuracy can hide horizon-specific problems.

So we also compute Brier score separately for each forecast step.

.. GENERATED FROM PYTHON SOURCE LINES 317-348

.. code-block:: Python


    step_rows = []
    for step, g in df.groupby("forecast_step", sort=True):
        g_iso = df_iso[df_iso["forecast_step"] == step]
        g_log = df_log[df_log["forecast_step"] == step]

        step_rows.append(
            {
                "forecast_step": int(step),
                "year": int(g["coord_t"].iloc[0]),
                "brier_raw": brier_score(
                    g["exceed_event"].to_numpy(),
                    g["prob_exceed_raw"].to_numpy(),
                ),
                "brier_iso": brier_score(
                    g_iso["exceed_event"].to_numpy(),
                    g_iso["prob_exceed_iso"].to_numpy(),
                ),
                "brier_log": brier_score(
                    g_log["exceed_event"].to_numpy(),
                    g_log["prob_exceed_log"].to_numpy(),
                ),
            }
        )

    df_step_scores = pd.DataFrame(step_rows)

    print("")
    print("Brier score by horizon")
    print(df_step_scores.to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Brier score by horizon
     forecast_step  year  brier_raw  brier_iso  brier_log
                 1  2024     0.0021     0.0000     0.0005
                 2  2025     0.0033     0.0000     0.0163
                 3  2026     0.0362     0.0143     0.0436


.. GENERATED FROM PYTHON SOURCE LINES 349-358

Step 7 - Build a simple reliability table
-----------------------------------------
For probability forecasts, we also want a coarse reliability view:

- group probabilities into bins,
- compare forecast probability with observed event frequency.

This is not a replacement for a dedicated reliability diagram page,
but it helps explain calibration visually.

.. GENERATED FROM PYTHON SOURCE LINES 358-408

.. code-block:: Python


    def reliability_bins(
        y_true: np.ndarray,
        p: np.ndarray,
        *,
        n_bins: int = 8,
    ) -> pd.DataFrame:
        y_true = np.asarray(y_true, dtype=float)
        p = np.asarray(p, dtype=float)

        bins = np.linspace(0.0, 1.0, n_bins + 1)
        idx = np.digitize(p, bins, right=True)
        idx = np.clip(idx, 1, n_bins)

        rows = []
        for b in range(1, n_bins + 1):
            m = idx == b
            if not np.any(m):
                continue
            rows.append(
                {
                    "bin": b,
                    "forecast_prob": float(np.mean(p[m])),
                    "observed_freq": float(np.mean(y_true[m])),
                    "count": int(np.sum(m)),
                }
            )

        return pd.DataFrame(rows)


    rel_raw = reliability_bins(
        df["exceed_event"].to_numpy(),
        df["prob_exceed_raw"].to_numpy(),
    )

    rel_iso = reliability_bins(
        df_iso["exceed_event"].to_numpy(),
        df_iso["prob_exceed_iso"].to_numpy(),
    )

    rel_log = reliability_bins(
        df_log["exceed_event"].to_numpy(),
        df_log["prob_exceed_log"].to_numpy(),
    )

    print("")
    print("Reliability bins for isotonic calibration")
    print(rel_iso.to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Reliability bins for isotonic calibration
     bin  forecast_prob  observed_freq  count
       1         0.0000         0.0000    201
       4         0.5000         0.5000      4
       8         1.0000         1.0000      5


.. GENERATED FROM PYTHON SOURCE LINES 409-413

Step 8 - Plot Brier score by horizon
------------------------------------
This first figure shows whether calibration helps consistently across
the horizon.

.. GENERATED FROM PYTHON SOURCE LINES 413-444

.. code-block:: Python


    fig, ax = plt.subplots(figsize=(7.4, 4.6))

    ax.plot(
        df_step_scores["forecast_step"],
        df_step_scores["brier_raw"],
        marker="o",
        label="Raw",
    )
    ax.plot(
        df_step_scores["forecast_step"],
        df_step_scores["brier_iso"],
        marker="s",
        label="Isotonic",
    )
    ax.plot(
        df_step_scores["forecast_step"],
        df_step_scores["brier_log"],
        marker="^",
        label="Logistic",
    )

    ax.set_xlabel("Forecast step")
    ax.set_ylabel("Brier score")
    ax.set_title("Brier score across the horizon")
    ax.grid(True, linestyle=":", alpha=0.6)
    ax.legend()

    plt.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_brier_exceedance_001.png
   :alt: Brier score across the horizon
   :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_brier_exceedance_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 445-451

How to read this panel
----------------------
Lower is better.

A useful uncertainty calibration should reduce the Brier score at
least for the steps where the raw forecast is clearly overconfident.

.. GENERATED FROM PYTHON SOURCE LINES 453-460

Step 9 - Plot reliability before and after calibration
------------------------------------------------------
This second figure shows the calibration effect directly.

- the diagonal is perfect reliability,
- points below the diagonal are overconfident,
- points above the diagonal are conservative.

.. GENERATED FROM PYTHON SOURCE LINES 460-496

.. code-block:: Python


    fig, ax = plt.subplots(figsize=(6.6, 6.2))

    ax.plot([0, 1], [0, 1], linestyle="--", linewidth=1.5)

    ax.plot(
        rel_raw["forecast_prob"],
        rel_raw["observed_freq"],
        marker="o",
        label="Raw",
    )
    ax.plot(
        rel_iso["forecast_prob"],
        rel_iso["observed_freq"],
        marker="s",
        label="Isotonic",
    )
    ax.plot(
        rel_log["forecast_prob"],
        rel_log["observed_freq"],
        marker="^",
        label="Logistic",
    )

    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)
    ax.set_aspect("equal", adjustable="box")
    ax.set_xlabel("Forecast probability")
    ax.set_ylabel("Observed frequency")
    ax.set_title("Reliability of exceedance probabilities")
    ax.grid(True, linestyle=":", alpha=0.6)
    ax.legend()

    plt.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_brier_exceedance_002.png
   :alt: Reliability of exceedance probabilities
   :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_brier_exceedance_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 497-510

Why this plot matters
---------------------
Brier score is a single summary number.

Reliability reveals *how* the probabilities are wrong.

For example:

- a model may systematically overstate high probabilities,
- or it may compress everything toward the center.

Calibration is useful when it moves the points closer to the
diagonal while keeping the risk ranking meaningful.

.. GENERATED FROM PYTHON SOURCE LINES 512-524

Step 10 - Plot spatial exceedance risk maps
-------------------------------------------
A good uncertainty lesson should connect the scores back to the map.

We now compare:

- raw exceedance probability,
- isotonic calibrated probability,
- observed event outcome.

We show the last horizon because event risk usually becomes most
interesting there.

.. GENERATED FROM PYTHON SOURCE LINES 524-572

.. code-block:: Python


    step_to_plot = 3

    g_raw = df[df["forecast_step"] == step_to_plot].copy()
    g_iso = df_iso[df_iso["forecast_step"] == step_to_plot].copy()

    fig, axes = plt.subplots(1, 3, figsize=(12.5, 4.1))

    sc0 = axes[0].scatter(
        g_raw["coord_x"],
        g_raw["coord_y"],
        c=g_raw["prob_exceed_raw"],
        s=28,
    )
    axes[0].set_title("Raw exceedance probability")
    axes[0].set_xlabel("coord_x")
    axes[0].set_ylabel("coord_y")
    axes[0].grid(True, linestyle=":", alpha=0.4)

    sc1 = axes[1].scatter(
        g_iso["coord_x"],
        g_iso["coord_y"],
        c=g_iso["prob_exceed_iso"],
        s=28,
    )
    axes[1].set_title("Isotonic calibrated probability")
    axes[1].set_xlabel("coord_x")
    axes[1].set_ylabel("coord_y")
    axes[1].grid(True, linestyle=":", alpha=0.4)

    sc2 = axes[2].scatter(
        g_raw["coord_x"],
        g_raw["coord_y"],
        c=g_raw["exceed_event"],
        s=28,
    )
    axes[2].set_title("Observed exceedance event")
    axes[2].set_xlabel("coord_x")
    axes[2].set_ylabel("coord_y")
    axes[2].grid(True, linestyle=":", alpha=0.4)

    fig.colorbar(sc0, ax=axes[0], shrink=0.85)
    fig.colorbar(sc1, ax=axes[1], shrink=0.85)
    fig.colorbar(sc2, ax=axes[2], shrink=0.85)

    plt.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/uncertainty/images/sphx_glr_plot_brier_exceedance_003.png
   :alt: Raw exceedance probability, Isotonic calibrated probability, Observed exceedance event
   :srcset: /auto_examples/uncertainty/images/sphx_glr_plot_brier_exceedance_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 573-588

How to read the map view
------------------------
The raw and calibrated maps should not be read the same way as a
binary classification map.

They express *risk intensity*:

- 0.1 means low exceedance probability,
- 0.8 means high exceedance probability.

Calibration changes how trustworthy those numbers are, not just
how pretty the map looks.

That is why the Brier and reliability panels should be read before
over-interpreting any single risk map.

.. GENERATED FROM PYTHON SOURCE LINES 590-594

Step 11 - Compare event-rate calibration numerically
----------------------------------------------------
One final compact table helps connect the forecast means to the
actual event frequency by horizon.

.. GENERATED FROM PYTHON SOURCE LINES 594-617

.. code-block:: Python


    summary_rows = []
    for step, g in df.groupby("forecast_step", sort=True):
        g_iso = df_iso[df_iso["forecast_step"] == step]
        g_log = df_log[df_log["forecast_step"] == step]

        summary_rows.append(
            {
                "forecast_step": int(step),
                "year": int(g["coord_t"].iloc[0]),
                "observed_event_rate": float(g["exceed_event"].mean()),
                "mean_raw_prob": float(g["prob_exceed_raw"].mean()),
                "mean_iso_prob": float(g_iso["prob_exceed_iso"].mean()),
                "mean_log_prob": float(g_log["prob_exceed_log"].mean()),
            }
        )

    df_prob_summary = pd.DataFrame(summary_rows)

    print("")
    print("Event-rate summary by horizon")
    print(df_prob_summary.to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Event-rate summary by horizon
     forecast_step  year  observed_event_rate  mean_raw_prob  mean_iso_prob  mean_log_prob
                 1  2024               0.0000         0.0134         0.0000         0.0214
                 2  2025               0.0286         0.0535         0.0286         0.0292
                 3  2026               0.0714         0.1740         0.0714         0.0495


.. GENERATED FROM PYTHON SOURCE LINES 618-637

Final takeaway
--------------
This page teaches a different uncertainty question from interval
calibration.

Instead of asking:

- "how wide should the interval be?"

it asks:

- "how good are the probabilities of an important event?"

That is why exceedance probability analysis deserves its own page.

Once users understand this lesson, they are ready for:

- dedicated probability calibration comparisons,
- quantile-to-probability calibration utilities,
- and richer reliability-diagram pages.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.699 seconds)


.. _sphx_glr_download_auto_examples_uncertainty_plot_brier_exceedance.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_brier_exceedance.ipynb <plot_brier_exceedance.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_brier_exceedance.py <plot_brier_exceedance.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_brier_exceedance.zip <plot_brier_exceedance.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_