Note

Go to the end to download the full example code.

SM3 identifiability: learning when recovery is accurate and when parameters slide along a ridge#

This example teaches you how to read the GeoPrior SM3 identifiability figure.

This page is different from the city-scale map figures. Here we are not asking:

where the model predicts large subsidence,
or which district looks riskier.

Instead, we are asking a more fundamental scientific question:

When the synthetic physics is known, can the model recover the effective parameters in a uniquely interpretable way, or do some parameters slide along a degeneracy ridge?

That is exactly what the SM3 identifiability figure is designed to show.

What the four panels mean#

The plotting backend builds four panels:

timescale recovery \(\log_{10}( au_{true})\) versus \(\log_{10}(\hat{ au})\), with optional overlays for the closure-implied :math:` au_{cl}` and the prior timescale.
permeability recovery either direct \(\log_{10}(K_{true})\) versus \(\log_{10}(\hat{K})\), or closure-implied \(\log_{10}(K_{true})\) versus \(\log_{10}(K_{cl}(\hat{ au}))\).
degeneracy ridge check \(\delta_K\) versus \(\delta_{S_s} + 2\delta_{H_d}\).
error versus identifiability metric \(|\Delta \log_{10} au|\) versus one chosen diagnostic such as ridge_resid, eps_prior_rms, or closure_consistency_rms.

Why this matters#

In poroelastic closures, several parameter combinations can produce very similar behaviour. That means a model may look good on one target while still being only weakly identifiable.

This figure helps the reader separate two ideas that are often confused:

recovery quality: did we estimate the right effective value?
identifiability quality: did we estimate it for a stable, interpretable reason rather than by sliding along a trade-off ridge?

This gallery page builds a synthetic SM3-style table so the example is fully executable during the docs build.

Imports#

We use the real plotting backend from the project script. So this gallery page teaches the actual figure generator, not a separate demo implementation.

import json
import tempfile
from pathlib import Path

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from geoprior.scripts.plot_sm3_identifiability import (
    plot_sm3_identifiability,
)

Step 1 - Define the synthetic lithology groups#

The real SM3 figure colors points by lith_idx and uses a different marker for each lithology group.

We therefore create four lithology families:

Fine
Mixed
Coarse
Rock

Each family gets a characteristic effective parameter scale.

This is not meant to be a perfect physical truth model. It is meant to create a synthetic table that behaves like a useful identifiability experiment.

SEC_PER_YEAR = 365.25 * 24.0 * 3600.0
rng = np.random.default_rng(42)

lith_cfg = {
    0: {
        "name": "Fine",
        "tau_year": 7.5,
        "K_mps": 1.5e-7,
        "Ss": 6.0e-4,
        "Hd_m": 26.0,
    },
    1: {
        "name": "Mixed",
        "tau_year": 4.0,
        "K_mps": 5.0e-7,
        "Ss": 3.5e-4,
        "Hd_m": 22.0,
    },
    2: {
        "name": "Coarse",
        "tau_year": 1.7,
        "K_mps": 1.8e-6,
        "Ss": 1.6e-4,
        "Hd_m": 18.0,
    },
    3: {
        "name": "Rock",
        "tau_year": 0.9,
        "K_mps": 6.5e-6,
        "Ss": 7.5e-5,
        "Hd_m": 14.0,
    },
}

n_per_lith = 120

Step 2 - Build a synthetic SM3 results table#

The real plotting function requires a very specific table structure. Its base columns include:

true / prior / estimated tau
true / prior / estimated K
true / prior / estimated Ss
true / prior / estimated Hd
the log-offset summaries relative to the true values
eps_prior_rms
closure_consistency_rms

The key teaching idea here is the latent ridge variable u.

When u changes:

K is pushed one way,
Ss and Hd can partly compensate,
and the system can still look “acceptable” unless we inspect the ridge structure explicitly.

That is exactly the kind of ambiguity panel (c) is meant to expose.

rows: list[dict[str, float | int | str]] = []

for lith_idx, cfg0 in lith_cfg.items():
    for _ in range(n_per_lith):
        tau_true_year = cfg0["tau_year"] * (
            10.0 ** rng.normal(0.0, 0.05)
        )
        K_true_mps = cfg0["K_mps"] * (
            10.0 ** rng.normal(0.0, 0.07)
        )
        Ss_true = cfg0["Ss"] * (
            10.0 ** rng.normal(0.0, 0.05)
        )
        Hd_true = cfg0["Hd_m"] * (
            10.0 ** rng.normal(0.0, 0.04)
        )

        # Lithology-based priors are deliberately broader and
        # slightly biased relative to the truth.
        tau_prior_year = tau_true_year * (
            10.0 ** rng.normal(0.08, 0.10)
        )
        K_prior_mps = K_true_mps * (
            10.0 ** rng.normal(-0.05, 0.16)
        )
        Ss_prior = Ss_true * (
            10.0 ** rng.normal(0.04, 0.14)
        )
        Hd_prior = Hd_true * (
            10.0 ** rng.normal(0.03, 0.10)
        )

        # Latent ridge direction.
        u = rng.normal(0.0, 0.22)

        # A small orthogonal estimation error for tau.
        e_tau = rng.normal(0.0, 0.07)

        tau_est_year = tau_true_year * (
            10.0 ** (0.10 * u + e_tau)
        )

        # Three coordinated offsets that create a ridge-like
        # trade-off structure.
        delta_K_log10 = 0.68 * u + rng.normal(0.0, 0.06)
        delta_Ss_log10 = 0.36 * u + rng.normal(0.0, 0.05)
        delta_Hd_log10 = 0.16 * u + rng.normal(0.0, 0.04)

        K_est_mps = K_true_mps * (10.0 ** delta_K_log10)
        Ss_est = Ss_true * (10.0 ** delta_Ss_log10)
        Hd_est = Hd_true * (10.0 ** delta_Hd_log10)

        ridge_resid_log10 = (
            delta_K_log10
            - (delta_Ss_log10 + 2.0 * delta_Hd_log10)
        )

        # Positive diagnostics used in panel (d).
        eps_prior_rms = (
            0.03
            + 0.45 * abs(ridge_resid_log10)
            + 0.18 * abs(e_tau)
            + rng.uniform(0.0, 0.01)
        )
        closure_consistency_rms = (
            0.025
            + 0.35 * abs(ridge_resid_log10)
            + 0.24 * abs(e_tau)
            + rng.uniform(0.0, 0.01)
        )

        rows.append(
            {
                "identify": "both",
                "nx": 21,
                "lith_idx": int(lith_idx),
                "kappa_b": 1.0,
                "tau_true_year": float(tau_true_year),
                "tau_prior_year": float(tau_prior_year),
                "tau_est_med_year": float(tau_est_year),
                "tau_true_sec": float(
                    tau_true_year * SEC_PER_YEAR
                ),
                "tau_prior_sec": float(
                    tau_prior_year * SEC_PER_YEAR
                ),
                "tau_est_med_sec": float(
                    tau_est_year * SEC_PER_YEAR
                ),
                "K_true_mps": float(K_true_mps),
                "K_prior_mps": float(K_prior_mps),
                "K_est_med_mps": float(K_est_mps),
                "K_est_med_m_per_year": float(
                    K_est_mps * SEC_PER_YEAR
                ),
                "Ss_true": float(Ss_true),
                "Ss_prior": float(Ss_prior),
                "Ss_est_med": float(Ss_est),
                "Hd_true": float(Hd_true),
                "Hd_prior": float(Hd_prior),
                "Hd_est_med": float(Hd_est),
                # Important: the real script divides these by ln(10),
                # so we store natural-log offsets here.
                "vs_true_delta_K_q50": float(
                    np.log(K_est_mps / K_true_mps)
                ),
                "vs_true_delta_Ss_q50": float(
                    np.log(Ss_est / Ss_true)
                ),
                "vs_true_delta_Hd_q50": float(
                    np.log(Hd_est / Hd_true)
                ),
                "eps_prior_rms": float(eps_prior_rms),
                "closure_consistency_rms": float(
                    closure_consistency_rms
                ),
            }
        )

df = pd.DataFrame(rows)

print("Synthetic SM3 table")
print(df.head().to_string(index=False))

Synthetic SM3 table
identify  nx  lith_idx  kappa_b  tau_true_year  tau_prior_year  tau_est_med_year   tau_true_sec  tau_prior_sec  tau_est_med_sec  K_true_mps  K_prior_mps  K_est_med_mps  K_est_med_m_per_year  Ss_true  Ss_prior  Ss_est_med  Hd_true  Hd_prior  Hd_est_med  vs_true_delta_K_q50  vs_true_delta_Ss_q50  vs_true_delta_Hd_q50  eps_prior_rms  closure_consistency_rms
    both  21         0   1.0000         7.7678          5.9593            6.7642 245132607.2674 188060490.9750   213460923.5387      0.0000       0.0000         0.0000                4.4941   0.0007    0.0007      0.0007  28.3528   28.2470     28.4869               0.1157                0.0865                0.0047         0.0528                   0.0468
    both  21         0   1.0000         6.7935          8.0743            6.2909 214388066.4050 254805041.2998   198526282.9055      0.0000       0.0000         0.0000                4.5369   0.0005    0.0005      0.0006  28.1911   40.0281     28.7932              -0.1019                0.0331                0.0211         0.0716                   0.0660
    both  21         0   1.0000         9.5972         13.2966           10.2223 302863969.7583 419609732.5377   322589716.5510      0.0000       0.0000         0.0000                3.6983   0.0006    0.0006      0.0005  24.1225   21.3013     21.2217              -0.1813               -0.0878               -0.1281         0.0781                   0.0645
    both  21         0   1.0000         7.6912          9.3919            7.0151 242716700.8611 296385919.2480   221381235.9562      0.0000       0.0000         0.0000                4.4672   0.0006    0.0008      0.0006  27.6777   21.2038     30.9510              -0.1984               -0.0900                0.1118         0.1075                   0.0881
    both  21         0   1.0000         6.1790          8.7507            6.2575 194994221.6425 276149588.1645   197472174.5160      0.0000       0.0000         0.0000                5.0532   0.0006    0.0006      0.0006  27.4424   26.4355     27.0299               0.1193                0.0260               -0.0151         0.0594                   0.0499

Step 3 - Read the key synthetic signals before plotting#

A good teaching page should show the logic before the figure.

Here we compute two helpful summaries:

the median ridge residual by lithology,
the median tau recovery error by lithology.

These are not required by the plotting function itself, but they help the reader understand what the figure is about.

ridge_log10 = (
    df["vs_true_delta_K_q50"].to_numpy() / np.log(10.0)
    - (
        df["vs_true_delta_Ss_q50"].to_numpy() / np.log(10.0)
        + 2.0 * df["vs_true_delta_Hd_q50"].to_numpy() / np.log(10.0)
    )
)

tau_err_log10 = np.abs(
    np.log10(df["tau_est_med_year"].to_numpy())
    - np.log10(df["tau_true_year"].to_numpy())
)

summary_rows = []

for lith_idx, sub in df.groupby("lith_idx", sort=True):
    mask = df["lith_idx"].eq(lith_idx).to_numpy()
    summary_rows.append(
        {
            "lith_idx": int(lith_idx),
            "median_|ridge|_log10": float(
                np.median(np.abs(ridge_log10[mask]))
            ),
            "median_|dlog10_tau|": float(
                np.median(tau_err_log10[mask])
            ),
        }
    )

summary_df = pd.DataFrame(summary_rows)
print("")
print("Lithology summaries")
print(summary_df.to_string(index=False))

Lithology summaries
 lith_idx  median_|ridge|_log10  median_|dlog10_tau|
        0                0.0700               0.0520
        1                0.0914               0.0534
        2                0.0717               0.0499
        3                0.0741               0.0517

Step 4 - Render the real SM3 identifiability figure#

We call the actual plotting backend.

A few choices are worth explaining:

tau_units="year" is friendlier for the first lesson,
metric="eps_prior_rms" makes panel (d) different from panel (c), so the page teaches two kinds of identifiability evidence rather than repeating the same one,
k_from_tau=False keeps panel (b) as direct permeability recovery,
show_tau_cl=True and show_prior=True make panel (a) richer and more instructive,
n_boot is reduced from the paper default so the docs build stays lightweight.

tmp_dir = Path(
    tempfile.mkdtemp(prefix="gp_sg_sm3_ident_")
)

fig_out = tmp_dir / "sm3_identifiability_gallery"
csv_out = tmp_dir / "sm3_identifiability_summary.csv"
json_out = tmp_dir / "sm3_identifiability_summary.json"

plot_sm3_identifiability(
    df,
    tau_units="year",
    metric="eps_prior_rms",
    k_from_tau=False,
    k_cl_source="prior",
    show_prior=True,
    show_tau_cl=True,
    show_legend=True,
    show_labels=True,
    show_ticklabels=True,
    show_panel_titles=True,
    show_title=True,
    title=(
        "Synthetic SM3 identifiability: recovery, ridge "
        "structure, and identifiability error"
    ),
    show_stats=True,
    show_ci=True,
    n_boot=300,
    ci=0.95,
    boot_seed=7,
    out=str(fig_out),
    out_csv=str(csv_out),
    out_json=str(json_out),
    dpi=160,
)

[OK] wrote /tmp/gp_sg_sm3_ident__5vv7370/sm3_identifiability_gallery (eps,pdf,png,svg)

Step 5 - Show the actual saved figure in the gallery page#

The plotting backend saves the figure and summary files, then closes the figure. We reload the PNG so Sphinx-Gallery shows the real result produced by the project code.

img = mpimg.imread(str(fig_out) + ".png")

fig, ax = plt.subplots(figsize=(8.6, 5.2))
ax.imshow(img)
ax.axis("off")

(np.float64(-0.5), np.float64(1143.5), np.float64(664.5), np.float64(-0.5))

Step 6 - Read the summary table produced by the script#

The real script writes a compact summary CSV and JSON. This is useful because the figure is not just visual: it also exports auditable statistics.

stats_df = pd.read_csv(csv_out)

print("")
print("Summary statistics written by the plotting backend")
print(stats_df.to_string(index=False))

with open(json_out, "r", encoding="utf-8") as f:
    payload = json.load(f)

print("")
print("JSON meta block")
print(json.dumps(payload["meta"], indent=2))

Summary statistics written by the plotting backend
                  name  estimate  ci_lo  ci_hi  nd
         tau_mae_log10    0.0611 0.0573 0.0656   2
                tau_r2    0.9579 0.9517 0.9629   3
             tau_slope    0.9995 0.9840 1.0204   2
           K_mae_log10    0.1213 0.1130 0.1297   2
                  K_r2    0.9405 0.9315 0.9487   3
               K_slope    0.9948 0.9750 1.0155   2
        ridge_spearman    0.7276 0.6803 0.7692   2
err_vs_metric_spearman    0.2027 0.1178 0.2901   2

JSON meta block
{
  "boot_seed": 7,
  "ci": 0.95,
  "k_cl_source": "prior",
  "k_from_tau": false,
  "metric": "eps_prior_rms",
  "n_boot": 300,
  "n_rows": 480,
  "ridge_q50": 0.07668299297833492,
  "ridge_q95": 0.20492275662679227,
  "tau_units": "year"
}

Step 7 - Learn how to read panel (a)#

Panel (a) is the timescale recovery panel.

The dashed diagonal is the ideal line:

estimated tau = true tau

Points near that line mean the effective relaxation time has been recovered well.

Two optional overlays make the panel more informative:

the open circles show tau_cl, the closure-implied timescale computed from the estimated fields,
the open triangles show the prior timescale.

This means panel (a) is teaching three ideas at once:

truth,
estimate,
and physics-guided reference structure.

Step 8 - Learn how to read panel (b)#

Panel (b) is the permeability recovery panel.

In this lesson we keep k_from_tau=False, so the panel shows:

log10(K_true) versus log10(K_est)

That makes it the direct permeability recovery view.

But the real script can switch this panel into a closure-based interpretation with k_from_tau=True. In that mode it plots K implied by tau through the closure formula instead of the directly estimated K.

That option is scientifically useful because it lets the user ask:

“Is tau correct merely as a number, or is it also consistent with the permeability implied by the closure?”

Step 9 - Learn how to read panel (c)#

Panel (c) is the heart of the identifiability lesson.

It plots:

delta_K versus (delta_Ss + 2 delta_Hd)

in log10 space.

If parameter recovery is sliding along the poroelastic ridge, the cloud will tend to align with the 1:1 line, because shifts in K can be compensated by shifts in Ss and Hd.

So this panel should not be read as a simple accuracy plot. It should be read as a trade-off structure plot.

A strong alignment here can mean:

the model is finding a family of nearly equivalent solutions,
the closure is allowing compensation between parameters,
and some parameters may be only weakly identifiable even if the main target still looks acceptable.

Step 10 - Learn how to read panel (d)#

Panel (d) connects recovery error to one chosen identifiability diagnostic.

In this page we used:

eps_prior_rms

so the y-axis shows how the tau error changes as the prior-side identifiability metric worsens.

The real script can instead use:

ridge_resid
eps_prior_rms
closure_consistency_rms

This is powerful because it lets the same figure answer slightly different scientific questions:

does tau error grow with ridge ambiguity?
does tau error grow with prior inconsistency?
does tau error grow with closure inconsistency?

Step 11 - Practical takeaway#

This figure is useful because it separates two ideas that are often mixed together:

good prediction of tau,
good identifiability of the underlying physics.

A run can recover tau reasonably well while still living on a parameter ridge. That is exactly why the ridge panel and the error-versus-metric panel matter.

In practice, this figure is best used when:

designing SM3 experiments,
comparing identifiability regimes,
and explaining why one regime is more scientifically stable than another.

Command-line version#

The same figure can be produced from the command line.

The real script supports:

--csv for the SM3 summary table,
--tau-units with year or sec,
--metric with ridge_resid | eps_prior_rms | closure_consistency_rms,
--k-from-tau with auto | true | false,
--k-cl-source with prior | true | est,
optional filtering with --only-identify and --nx-min,
bootstrap settings through --n-boot, --ci, and --boot-seed,
the usual plot text toggles,
and summary exports through --out-csv and --out-json.

Legacy dispatcher:

python -m scripts plot-sm3-identifiability \
  --csv results/sm3_synth_1d/sm3_synth_runs.csv \
  --tau-units year \
  --metric ridge_resid \
  --k-from-tau auto \
  --k-cl-source prior \
  --show-prior true \
  --show-tau-cl true \
  --n-boot 2000 \
  --ci 0.95 \
  --out sm3-identifiability

Focus on only one identifiability regime:

python -m scripts plot-sm3-identifiability \
  --csv results/sm3_synth_1d/sm3_synth_runs.csv \
  --only-identify both \
  --nx-min 21 \
  --metric eps_prior_rms \
  --out sm3-identifiability-both

Modern CLI:

geoprior plot sm3-identifiability \
  --csv results/sm3_synth_1d/sm3_synth_runs.csv \
  --metric ridge_resid \
  --out sm3-identifiability

The gallery page teaches the figure. The command line reproduces it in a workflow.

Total running time of the script: (0 minutes 8.175 seconds)

Gallery generated by Sphinx-Gallery