Note

Go to the end to download the full example code.

Exceedance probabilities and Brier score#

This lesson teaches how to work with exceedance probabilities in GeoPrior and how to calibrate them with geoprior.utils.calibrate.calibrate_probability_forecast().

Why this page matters#

Not every uncertainty question is about an interval such as q10-q90.

In many practical risk settings, the question is event-based:

What is the probability that subsidence exceeds a critical threshold?
Which zones have high exceedance risk next year?
Are those probabilities trustworthy?

That is the setting of exceedance probability forecasts.

For binary events, a natural accuracy measure is the Brier score:

(1)#\[\mathrm{BS} = \frac{1}{n} \sum_{i=1}^{n} (p_i - y_i)^2\]

where:

\(p_i\) is the forecast probability of exceedance,
\(y_i \in \{0,1\}\) is the observed event outcome.

Smaller Brier scores are better.

What the real utility does#

GeoPrior provides geoprior.utils.calibrate.calibrate_probability_forecast() to calibrate a probability column against a binary outcome column.

It supports:

method="isotonic": non-parametric monotonic calibration,
method="logistic": Platt-style logistic calibration,

and returns a copy of the input DataFrame with an added calibrated probability column. This makes it the right core helper for an exceedance-oriented uncertainty lesson.

What this lesson teaches#

We will:

build a synthetic spatial exceedance-risk dataset,
create deliberately miscalibrated raw probabilities,
compute Brier scores before calibration,
calibrate the probabilities with isotonic and logistic methods,
compare Brier scores after calibration,
build explanatory plots for: - reliability, - Brier score by horizon, - exceedance risk maps.

This page is synthetic so it stays fully executable during the documentation build.

Imports#

We use the real GeoPrior calibration helper and build the rest of the lesson around it.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from geoprior.utils.calibrate import calibrate_probability_forecast

Step 1 - Build a compact synthetic spatial exceedance dataset#

We mimic a spatial forecasting problem with three forecast steps.

The target is no longer a continuous forecast value. Instead, we focus on whether the realized subsidence exceeds a critical threshold.

rng = np.random.default_rng(33)

nx = 10
ny = 7
steps = [1, 2, 3]
years = {1: 2024, 2: 2025, 3: 2026}

xv = np.linspace(0.0, 12_000.0, nx)
yv = np.linspace(0.0, 8_000.0, ny)
X, Y = np.meshgrid(xv, yv)

x_flat = X.ravel()
y_flat = Y.ravel()
n_sites = x_flat.size

xn = (x_flat - x_flat.min()) / (x_flat.max() - x_flat.min())
yn = (y_flat - y_flat.min()) / (y_flat.max() - y_flat.min())

hotspot = np.exp(
    -(
        ((xn - 0.72) ** 2) / 0.020
        + ((yn - 0.34) ** 2) / 0.030
    )
)
ridge = 0.50 * np.exp(
    -(
        ((xn - 0.28) ** 2) / 0.030
        + ((yn - 0.74) ** 2) / 0.050
    )
)
gradient = 0.45 * xn + 0.20 * (1.0 - yn)

threshold = 5.5

rows: list[dict[str, float | int]] = []

for step in steps:
    scale = {1: 1.00, 2: 1.20, 3: 1.45}[step]

    # latent mean field
    mu = (
        2.4
        + 1.5 * gradient
        + 2.1 * hotspot
        + 0.9 * ridge
    ) * scale

    # true uncertainty for realized outcomes
    sigma = (
        0.40
        + 0.10 * xn
        + 0.05 * hotspot
        + 0.04 * step
    )

    actual = mu + rng.normal(0.0, sigma, size=n_sites)

    # binary exceedance event
    exceed = (actual > threshold).astype(int)

    # "True" event probability under a latent Gaussian idea.
    # We avoid scipy here and use a logistic-style approximation.
    z = (mu - threshold) / np.maximum(sigma, 1e-6)
    p_true = 1.0 / (1.0 + np.exp(-1.7 * z))

    # Deliberately overconfident raw forecast probabilities:
    # push values too hard toward 0 and 1.
    p_raw = np.clip(p_true ** 0.65, 0.0, 1.0)

    for i in range(n_sites):
        rows.append(
            {
                "sample_idx": i,
                "forecast_step": step,
                "coord_t": years[step],
                "coord_x": float(x_flat[i]),
                "coord_y": float(y_flat[i]),
                "subsidence_actual": float(actual[i]),
                "exceed_event": int(exceed[i]),
                "prob_exceed_raw": float(p_raw[i]),
                "prob_exceed_true_latent": float(p_true[i]),
            }
        )

df = pd.DataFrame(rows)

print("Dataset shape:", df.shape)
print("")
print(df.head(10).to_string(index=False))

Dataset shape: (210, 9)

 sample_idx  forecast_step  coord_t    coord_x  coord_y  subsidence_actual  exceed_event  prob_exceed_raw  prob_exceed_true_latent
          0              1     2024     0.0000   0.0000             2.8753             0           0.0009                   0.0000
          1              1     2024  1333.3333   0.0000             2.5211             0           0.0013                   0.0000
          2              1     2024  2666.6667   0.0000             3.1222             0           0.0018                   0.0001
          3              1     2024  4000.0000   0.0000             2.9450             0           0.0025                   0.0001
          4              1     2024  5333.3333   0.0000             2.2400             0           0.0033                   0.0002
          5              1     2024  6666.6667   0.0000             3.5832             0           0.0046                   0.0003
          6              1     2024  8000.0000   0.0000             3.1390             0           0.0065                   0.0004
          7              1     2024  9333.3333   0.0000             3.5842             0           0.0085                   0.0007
          8              1     2024 10666.6667   0.0000             4.2826             0           0.0103                   0.0009
          9              1     2024 12000.0000   0.0000             3.5208             0           0.0129                   0.0012

Step 2 - Understand the event we are forecasting#

The binary event column is the main target for this lesson:

exceed_event = 1 if subsidence > threshold
exceed_event = 0 otherwise

We summarize the event rate by forecast step.

event_summary = (
    df.groupby("forecast_step")
    .agg(
        year=("coord_t", "first"),
        event_rate=("exceed_event", "mean"),
        mean_prob_raw=("prob_exceed_raw", "mean"),
        mean_actual=("subsidence_actual", "mean"),
    )
    .reset_index()
)

print("")
print("Event summary")
print(event_summary.to_string(index=False))

Event summary
 forecast_step  year  event_rate  mean_prob_raw  mean_actual
             1  2024      0.0000         0.0134       3.0872
             2  2025      0.0286         0.0535       3.6562
             3  2026      0.0714         0.1740       4.4622

Step 3 - Define a small Brier score helper#

The Brier score is the mean squared error for probabilities.

It is ideal for exceedance forecasting because it rewards:

high probability when the event occurs,
low probability when the event does not occur,
and penalizes overconfident mistakes strongly.

def brier_score(y_true: np.ndarray, p: np.ndarray) -> float:
    y_true = np.asarray(y_true, dtype=float)
    p = np.asarray(p, dtype=float)
    m = np.isfinite(y_true) & np.isfinite(p)
    if not np.any(m):
        return float("nan")
    return float(np.mean((p[m] - y_true[m]) ** 2))


raw_brier = brier_score(
    df["exceed_event"].to_numpy(),
    df["prob_exceed_raw"].to_numpy(),
)

print("")
print("Overall raw Brier score:", raw_brier)

Overall raw Brier score: 0.013873097981654217

Step 4 - Calibrate the probabilities with the real helper#

We apply the actual GeoPrior utility twice:

isotonic calibration,
logistic calibration.

The helper adds a calibrated probability column to a copy of the input DataFrame.

df_iso = calibrate_probability_forecast(
    df,
    prob_col="prob_exceed_raw",
    actual_col="exceed_event",
    method="isotonic",
    out_col="prob_exceed_iso",
)

df_log = calibrate_probability_forecast(
    df,
    prob_col="prob_exceed_raw",
    actual_col="exceed_event",
    method="logistic",
    out_col="prob_exceed_log",
)

print("")
print("Columns after isotonic calibration")
print(df_iso.columns.tolist())

Columns after isotonic calibration
['sample_idx', 'forecast_step', 'coord_t', 'coord_x', 'coord_y', 'subsidence_actual', 'exceed_event', 'prob_exceed_raw', 'prob_exceed_true_latent', 'prob_exceed_iso']

Step 5 - Compare Brier scores before and after calibration#

This is the first key numerical check.

score_rows = [
    {
        "model": "Raw probability",
        "brier": brier_score(
            df["exceed_event"].to_numpy(),
            df["prob_exceed_raw"].to_numpy(),
        ),
    },
    {
        "model": "Isotonic calibrated",
        "brier": brier_score(
            df_iso["exceed_event"].to_numpy(),
            df_iso["prob_exceed_iso"].to_numpy(),
        ),
    },
    {
        "model": "Logistic calibrated",
        "brier": brier_score(
            df_log["exceed_event"].to_numpy(),
            df_log["prob_exceed_log"].to_numpy(),
        ),
    },
]

df_scores = pd.DataFrame(score_rows)

print("")
print("Overall Brier score comparison")
print(df_scores.to_string(index=False))

Overall Brier score comparison
              model  brier
    Raw probability 0.0139
Isotonic calibrated 0.0048
Logistic calibrated 0.0201

How to read these numbers#

Lower Brier score is better.

A calibration method is useful here when it reduces the Brier score without destroying the ranking of risk across the map.

Step 6 - Compare Brier score by forecast horizon#

Overall accuracy can hide horizon-specific problems.

So we also compute Brier score separately for each forecast step.

step_rows = []
for step, g in df.groupby("forecast_step", sort=True):
    g_iso = df_iso[df_iso["forecast_step"] == step]
    g_log = df_log[df_log["forecast_step"] == step]

    step_rows.append(
        {
            "forecast_step": int(step),
            "year": int(g["coord_t"].iloc[0]),
            "brier_raw": brier_score(
                g["exceed_event"].to_numpy(),
                g["prob_exceed_raw"].to_numpy(),
            ),
            "brier_iso": brier_score(
                g_iso["exceed_event"].to_numpy(),
                g_iso["prob_exceed_iso"].to_numpy(),
            ),
            "brier_log": brier_score(
                g_log["exceed_event"].to_numpy(),
                g_log["prob_exceed_log"].to_numpy(),
            ),
        }
    )

df_step_scores = pd.DataFrame(step_rows)

print("")
print("Brier score by horizon")
print(df_step_scores.to_string(index=False))

Brier score by horizon
 forecast_step  year  brier_raw  brier_iso  brier_log
             1  2024     0.0021     0.0000     0.0005
             2  2025     0.0033     0.0000     0.0163
             3  2026     0.0362     0.0143     0.0436

Step 7 - Build a simple reliability table#

For probability forecasts, we also want a coarse reliability view:

group probabilities into bins,
compare forecast probability with observed event frequency.

This is not a replacement for a dedicated reliability diagram page, but it helps explain calibration visually.

def reliability_bins(
    y_true: np.ndarray,
    p: np.ndarray,
    *,
    n_bins: int = 8,
) -> pd.DataFrame:
    y_true = np.asarray(y_true, dtype=float)
    p = np.asarray(p, dtype=float)

    bins = np.linspace(0.0, 1.0, n_bins + 1)
    idx = np.digitize(p, bins, right=True)
    idx = np.clip(idx, 1, n_bins)

    rows = []
    for b in range(1, n_bins + 1):
        m = idx == b
        if not np.any(m):
            continue
        rows.append(
            {
                "bin": b,
                "forecast_prob": float(np.mean(p[m])),
                "observed_freq": float(np.mean(y_true[m])),
                "count": int(np.sum(m)),
            }
        )

    return pd.DataFrame(rows)


rel_raw = reliability_bins(
    df["exceed_event"].to_numpy(),
    df["prob_exceed_raw"].to_numpy(),
)

rel_iso = reliability_bins(
    df_iso["exceed_event"].to_numpy(),
    df_iso["prob_exceed_iso"].to_numpy(),
)

rel_log = reliability_bins(
    df_log["exceed_event"].to_numpy(),
    df_log["prob_exceed_log"].to_numpy(),
)

print("")
print("Reliability bins for isotonic calibration")
print(rel_iso.to_string(index=False))

Reliability bins for isotonic calibration
 bin  forecast_prob  observed_freq  count
   1         0.0000         0.0000    201
   4         0.5000         0.5000      4
   8         1.0000         1.0000      5

Step 8 - Plot Brier score by horizon#

This first figure shows whether calibration helps consistently across the horizon.

fig, ax = plt.subplots(figsize=(7.4, 4.6))

ax.plot(
    df_step_scores["forecast_step"],
    df_step_scores["brier_raw"],
    marker="o",
    label="Raw",
)
ax.plot(
    df_step_scores["forecast_step"],
    df_step_scores["brier_iso"],
    marker="s",
    label="Isotonic",
)
ax.plot(
    df_step_scores["forecast_step"],
    df_step_scores["brier_log"],
    marker="^",
    label="Logistic",
)

ax.set_xlabel("Forecast step")
ax.set_ylabel("Brier score")
ax.set_title("Brier score across the horizon")
ax.grid(True, linestyle=":", alpha=0.6)
ax.legend()

plt.tight_layout()
plt.show()

How to read this panel#

Lower is better.

A useful uncertainty calibration should reduce the Brier score at least for the steps where the raw forecast is clearly overconfident.

Step 9 - Plot reliability before and after calibration#

This second figure shows the calibration effect directly.

the diagonal is perfect reliability,
points below the diagonal are overconfident,
points above the diagonal are conservative.

fig, ax = plt.subplots(figsize=(6.6, 6.2))

ax.plot([0, 1], [0, 1], linestyle="--", linewidth=1.5)

ax.plot(
    rel_raw["forecast_prob"],
    rel_raw["observed_freq"],
    marker="o",
    label="Raw",
)
ax.plot(
    rel_iso["forecast_prob"],
    rel_iso["observed_freq"],
    marker="s",
    label="Isotonic",
)
ax.plot(
    rel_log["forecast_prob"],
    rel_log["observed_freq"],
    marker="^",
    label="Logistic",
)

ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.set_aspect("equal", adjustable="box")
ax.set_xlabel("Forecast probability")
ax.set_ylabel("Observed frequency")
ax.set_title("Reliability of exceedance probabilities")
ax.grid(True, linestyle=":", alpha=0.6)
ax.legend()

plt.tight_layout()
plt.show()

Why this plot matters#

Brier score is a single summary number.

Reliability reveals how the probabilities are wrong.

For example:

a model may systematically overstate high probabilities,
or it may compress everything toward the center.

Calibration is useful when it moves the points closer to the diagonal while keeping the risk ranking meaningful.

Step 10 - Plot spatial exceedance risk maps#

A good uncertainty lesson should connect the scores back to the map.

We now compare:

raw exceedance probability,
isotonic calibrated probability,
observed event outcome.

We show the last horizon because event risk usually becomes most interesting there.

step_to_plot = 3

g_raw = df[df["forecast_step"] == step_to_plot].copy()
g_iso = df_iso[df_iso["forecast_step"] == step_to_plot].copy()

fig, axes = plt.subplots(1, 3, figsize=(12.5, 4.1))

sc0 = axes[0].scatter(
    g_raw["coord_x"],
    g_raw["coord_y"],
    c=g_raw["prob_exceed_raw"],
    s=28,
)
axes[0].set_title("Raw exceedance probability")
axes[0].set_xlabel("coord_x")
axes[0].set_ylabel("coord_y")
axes[0].grid(True, linestyle=":", alpha=0.4)

sc1 = axes[1].scatter(
    g_iso["coord_x"],
    g_iso["coord_y"],
    c=g_iso["prob_exceed_iso"],
    s=28,
)
axes[1].set_title("Isotonic calibrated probability")
axes[1].set_xlabel("coord_x")
axes[1].set_ylabel("coord_y")
axes[1].grid(True, linestyle=":", alpha=0.4)

sc2 = axes[2].scatter(
    g_raw["coord_x"],
    g_raw["coord_y"],
    c=g_raw["exceed_event"],
    s=28,
)
axes[2].set_title("Observed exceedance event")
axes[2].set_xlabel("coord_x")
axes[2].set_ylabel("coord_y")
axes[2].grid(True, linestyle=":", alpha=0.4)

fig.colorbar(sc0, ax=axes[0], shrink=0.85)
fig.colorbar(sc1, ax=axes[1], shrink=0.85)
fig.colorbar(sc2, ax=axes[2], shrink=0.85)

plt.tight_layout()
plt.show()

Raw exceedance probability, Isotonic calibrated probability, Observed exceedance event

How to read the map view#

The raw and calibrated maps should not be read the same way as a binary classification map.

They express risk intensity:

0.1 means low exceedance probability,
0.8 means high exceedance probability.

Calibration changes how trustworthy those numbers are, not just how pretty the map looks.

That is why the Brier and reliability panels should be read before over-interpreting any single risk map.

Step 11 - Compare event-rate calibration numerically#

One final compact table helps connect the forecast means to the actual event frequency by horizon.

summary_rows = []
for step, g in df.groupby("forecast_step", sort=True):
    g_iso = df_iso[df_iso["forecast_step"] == step]
    g_log = df_log[df_log["forecast_step"] == step]

    summary_rows.append(
        {
            "forecast_step": int(step),
            "year": int(g["coord_t"].iloc[0]),
            "observed_event_rate": float(g["exceed_event"].mean()),
            "mean_raw_prob": float(g["prob_exceed_raw"].mean()),
            "mean_iso_prob": float(g_iso["prob_exceed_iso"].mean()),
            "mean_log_prob": float(g_log["prob_exceed_log"].mean()),
        }
    )

df_prob_summary = pd.DataFrame(summary_rows)

print("")
print("Event-rate summary by horizon")
print(df_prob_summary.to_string(index=False))

Event-rate summary by horizon
 forecast_step  year  observed_event_rate  mean_raw_prob  mean_iso_prob  mean_log_prob
             1  2024               0.0000         0.0134         0.0000         0.0214
             2  2025               0.0286         0.0535         0.0286         0.0292
             3  2026               0.0714         0.1740         0.0714         0.0495

Final takeaway#

This page teaches a different uncertainty question from interval calibration.

Instead of asking:

“how wide should the interval be?”

it asks:

“how good are the probabilities of an important event?”

That is why exceedance probability analysis deserves its own page.

Once users understand this lesson, they are ready for:

dedicated probability calibration comparisons,
quantile-to-probability calibration utilities,
and richer reliability-diagram pages.

Total running time of the script: (0 minutes 0.699 seconds)

Gallery generated by Sphinx-Gallery