SM3 identifiability: learning when recovery is accurate and when parameters slide along a ridge#

This example teaches you how to read the GeoPrior SM3 identifiability figure.

This page is different from the city-scale map figures. Here we are not asking:

  • where the model predicts large subsidence,

  • or which district looks riskier.

Instead, we are asking a more fundamental scientific question:

When the synthetic physics is known, can the model recover the effective parameters in a uniquely interpretable way, or do some parameters slide along a degeneracy ridge?

That is exactly what the SM3 identifiability figure is designed to show.

What the four panels mean#

The plotting backend builds four panels:

  1. timescale recovery \(\log_{10}( au_{true})\) versus \(\log_{10}(\hat{ au})\), with optional overlays for the closure-implied :math:` au_{cl}` and the prior timescale.

  2. permeability recovery either direct \(\log_{10}(K_{true})\) versus \(\log_{10}(\hat{K})\), or closure-implied \(\log_{10}(K_{true})\) versus \(\log_{10}(K_{cl}(\hat{ au}))\).

  3. degeneracy ridge check \(\delta_K\) versus \(\delta_{S_s} + 2\delta_{H_d}\).

  4. error versus identifiability metric \(|\Delta \log_{10} au|\) versus one chosen diagnostic such as ridge_resid, eps_prior_rms, or closure_consistency_rms.

Why this matters#

In poroelastic closures, several parameter combinations can produce very similar behaviour. That means a model may look good on one target while still being only weakly identifiable.

This figure helps the reader separate two ideas that are often confused:

  • recovery quality: did we estimate the right effective value?

  • identifiability quality: did we estimate it for a stable, interpretable reason rather than by sliding along a trade-off ridge?

This gallery page builds a synthetic SM3-style table so the example is fully executable during the docs build.

Imports#

We use the real plotting backend from the project script. So this gallery page teaches the actual figure generator, not a separate demo implementation.

import json
import tempfile
from pathlib import Path

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from geoprior.scripts.plot_sm3_identifiability import (
    plot_sm3_identifiability,
)

Step 1 - Define the synthetic lithology groups#

The real SM3 figure colors points by lith_idx and uses a different marker for each lithology group.

We therefore create four lithology families:

  • Fine

  • Mixed

  • Coarse

  • Rock

Each family gets a characteristic effective parameter scale.

This is not meant to be a perfect physical truth model. It is meant to create a synthetic table that behaves like a useful identifiability experiment.

SEC_PER_YEAR = 365.25 * 24.0 * 3600.0
rng = np.random.default_rng(42)

lith_cfg = {
    0: {
        "name": "Fine",
        "tau_year": 7.5,
        "K_mps": 1.5e-7,
        "Ss": 6.0e-4,
        "Hd_m": 26.0,
    },
    1: {
        "name": "Mixed",
        "tau_year": 4.0,
        "K_mps": 5.0e-7,
        "Ss": 3.5e-4,
        "Hd_m": 22.0,
    },
    2: {
        "name": "Coarse",
        "tau_year": 1.7,
        "K_mps": 1.8e-6,
        "Ss": 1.6e-4,
        "Hd_m": 18.0,
    },
    3: {
        "name": "Rock",
        "tau_year": 0.9,
        "K_mps": 6.5e-6,
        "Ss": 7.5e-5,
        "Hd_m": 14.0,
    },
}

n_per_lith = 120

Step 2 - Build a synthetic SM3 results table#

The real plotting function requires a very specific table structure. Its base columns include:

  • true / prior / estimated tau

  • true / prior / estimated K

  • true / prior / estimated Ss

  • true / prior / estimated Hd

  • the log-offset summaries relative to the true values

  • eps_prior_rms

  • closure_consistency_rms

The key teaching idea here is the latent ridge variable u.

When u changes:

  • K is pushed one way,

  • Ss and Hd can partly compensate,

  • and the system can still look “acceptable” unless we inspect the ridge structure explicitly.

That is exactly the kind of ambiguity panel (c) is meant to expose.

rows: list[dict[str, float | int | str]] = []

for lith_idx, cfg0 in lith_cfg.items():
    for _ in range(n_per_lith):
        tau_true_year = cfg0["tau_year"] * (
            10.0 ** rng.normal(0.0, 0.05)
        )
        K_true_mps = cfg0["K_mps"] * (
            10.0 ** rng.normal(0.0, 0.07)
        )
        Ss_true = cfg0["Ss"] * (
            10.0 ** rng.normal(0.0, 0.05)
        )
        Hd_true = cfg0["Hd_m"] * (
            10.0 ** rng.normal(0.0, 0.04)
        )

        # Lithology-based priors are deliberately broader and
        # slightly biased relative to the truth.
        tau_prior_year = tau_true_year * (
            10.0 ** rng.normal(0.08, 0.10)
        )
        K_prior_mps = K_true_mps * (
            10.0 ** rng.normal(-0.05, 0.16)
        )
        Ss_prior = Ss_true * (
            10.0 ** rng.normal(0.04, 0.14)
        )
        Hd_prior = Hd_true * (
            10.0 ** rng.normal(0.03, 0.10)
        )

        # Latent ridge direction.
        u = rng.normal(0.0, 0.22)

        # A small orthogonal estimation error for tau.
        e_tau = rng.normal(0.0, 0.07)

        tau_est_year = tau_true_year * (
            10.0 ** (0.10 * u + e_tau)
        )

        # Three coordinated offsets that create a ridge-like
        # trade-off structure.
        delta_K_log10 = 0.68 * u + rng.normal(0.0, 0.06)
        delta_Ss_log10 = 0.36 * u + rng.normal(0.0, 0.05)
        delta_Hd_log10 = 0.16 * u + rng.normal(0.0, 0.04)

        K_est_mps = K_true_mps * (10.0 ** delta_K_log10)
        Ss_est = Ss_true * (10.0 ** delta_Ss_log10)
        Hd_est = Hd_true * (10.0 ** delta_Hd_log10)

        ridge_resid_log10 = (
            delta_K_log10
            - (delta_Ss_log10 + 2.0 * delta_Hd_log10)
        )

        # Positive diagnostics used in panel (d).
        eps_prior_rms = (
            0.03
            + 0.45 * abs(ridge_resid_log10)
            + 0.18 * abs(e_tau)
            + rng.uniform(0.0, 0.01)
        )
        closure_consistency_rms = (
            0.025
            + 0.35 * abs(ridge_resid_log10)
            + 0.24 * abs(e_tau)
            + rng.uniform(0.0, 0.01)
        )

        rows.append(
            {
                "identify": "both",
                "nx": 21,
                "lith_idx": int(lith_idx),
                "kappa_b": 1.0,
                "tau_true_year": float(tau_true_year),
                "tau_prior_year": float(tau_prior_year),
                "tau_est_med_year": float(tau_est_year),
                "tau_true_sec": float(
                    tau_true_year * SEC_PER_YEAR
                ),
                "tau_prior_sec": float(
                    tau_prior_year * SEC_PER_YEAR
                ),
                "tau_est_med_sec": float(
                    tau_est_year * SEC_PER_YEAR
                ),
                "K_true_mps": float(K_true_mps),
                "K_prior_mps": float(K_prior_mps),
                "K_est_med_mps": float(K_est_mps),
                "K_est_med_m_per_year": float(
                    K_est_mps * SEC_PER_YEAR
                ),
                "Ss_true": float(Ss_true),
                "Ss_prior": float(Ss_prior),
                "Ss_est_med": float(Ss_est),
                "Hd_true": float(Hd_true),
                "Hd_prior": float(Hd_prior),
                "Hd_est_med": float(Hd_est),
                # Important: the real script divides these by ln(10),
                # so we store natural-log offsets here.
                "vs_true_delta_K_q50": float(
                    np.log(K_est_mps / K_true_mps)
                ),
                "vs_true_delta_Ss_q50": float(
                    np.log(Ss_est / Ss_true)
                ),
                "vs_true_delta_Hd_q50": float(
                    np.log(Hd_est / Hd_true)
                ),
                "eps_prior_rms": float(eps_prior_rms),
                "closure_consistency_rms": float(
                    closure_consistency_rms
                ),
            }
        )

df = pd.DataFrame(rows)

print("Synthetic SM3 table")
print(df.head().to_string(index=False))
Synthetic SM3 table
identify  nx  lith_idx  kappa_b  tau_true_year  tau_prior_year  tau_est_med_year   tau_true_sec  tau_prior_sec  tau_est_med_sec  K_true_mps  K_prior_mps  K_est_med_mps  K_est_med_m_per_year  Ss_true  Ss_prior  Ss_est_med  Hd_true  Hd_prior  Hd_est_med  vs_true_delta_K_q50  vs_true_delta_Ss_q50  vs_true_delta_Hd_q50  eps_prior_rms  closure_consistency_rms
    both  21         0   1.0000         7.7678          5.9593            6.7642 245132607.2674 188060490.9750   213460923.5387      0.0000       0.0000         0.0000                4.4941   0.0007    0.0007      0.0007  28.3528   28.2470     28.4869               0.1157                0.0865                0.0047         0.0528                   0.0468
    both  21         0   1.0000         6.7935          8.0743            6.2909 214388066.4050 254805041.2998   198526282.9055      0.0000       0.0000         0.0000                4.5369   0.0005    0.0005      0.0006  28.1911   40.0281     28.7932              -0.1019                0.0331                0.0211         0.0716                   0.0660
    both  21         0   1.0000         9.5972         13.2966           10.2223 302863969.7583 419609732.5377   322589716.5510      0.0000       0.0000         0.0000                3.6983   0.0006    0.0006      0.0005  24.1225   21.3013     21.2217              -0.1813               -0.0878               -0.1281         0.0781                   0.0645
    both  21         0   1.0000         7.6912          9.3919            7.0151 242716700.8611 296385919.2480   221381235.9562      0.0000       0.0000         0.0000                4.4672   0.0006    0.0008      0.0006  27.6777   21.2038     30.9510              -0.1984               -0.0900                0.1118         0.1075                   0.0881
    both  21         0   1.0000         6.1790          8.7507            6.2575 194994221.6425 276149588.1645   197472174.5160      0.0000       0.0000         0.0000                5.0532   0.0006    0.0006      0.0006  27.4424   26.4355     27.0299               0.1193                0.0260               -0.0151         0.0594                   0.0499

Step 3 - Read the key synthetic signals before plotting#

A good teaching page should show the logic before the figure.

Here we compute two helpful summaries:

  • the median ridge residual by lithology,

  • the median tau recovery error by lithology.

These are not required by the plotting function itself, but they help the reader understand what the figure is about.

ridge_log10 = (
    df["vs_true_delta_K_q50"].to_numpy() / np.log(10.0)
    - (
        df["vs_true_delta_Ss_q50"].to_numpy() / np.log(10.0)
        + 2.0 * df["vs_true_delta_Hd_q50"].to_numpy() / np.log(10.0)
    )
)

tau_err_log10 = np.abs(
    np.log10(df["tau_est_med_year"].to_numpy())
    - np.log10(df["tau_true_year"].to_numpy())
)

summary_rows = []

for lith_idx, sub in df.groupby("lith_idx", sort=True):
    mask = df["lith_idx"].eq(lith_idx).to_numpy()
    summary_rows.append(
        {
            "lith_idx": int(lith_idx),
            "median_|ridge|_log10": float(
                np.median(np.abs(ridge_log10[mask]))
            ),
            "median_|dlog10_tau|": float(
                np.median(tau_err_log10[mask])
            ),
        }
    )

summary_df = pd.DataFrame(summary_rows)
print("")
print("Lithology summaries")
print(summary_df.to_string(index=False))
Lithology summaries
 lith_idx  median_|ridge|_log10  median_|dlog10_tau|
        0                0.0700               0.0520
        1                0.0914               0.0534
        2                0.0717               0.0499
        3                0.0741               0.0517

Step 4 - Render the real SM3 identifiability figure#

We call the actual plotting backend.

A few choices are worth explaining:

  • tau_units="year" is friendlier for the first lesson,

  • metric="eps_prior_rms" makes panel (d) different from panel (c), so the page teaches two kinds of identifiability evidence rather than repeating the same one,

  • k_from_tau=False keeps panel (b) as direct permeability recovery,

  • show_tau_cl=True and show_prior=True make panel (a) richer and more instructive,

  • n_boot is reduced from the paper default so the docs build stays lightweight.

tmp_dir = Path(
    tempfile.mkdtemp(prefix="gp_sg_sm3_ident_")
)

fig_out = tmp_dir / "sm3_identifiability_gallery"
csv_out = tmp_dir / "sm3_identifiability_summary.csv"
json_out = tmp_dir / "sm3_identifiability_summary.json"

plot_sm3_identifiability(
    df,
    tau_units="year",
    metric="eps_prior_rms",
    k_from_tau=False,
    k_cl_source="prior",
    show_prior=True,
    show_tau_cl=True,
    show_legend=True,
    show_labels=True,
    show_ticklabels=True,
    show_panel_titles=True,
    show_title=True,
    title=(
        "Synthetic SM3 identifiability: recovery, ridge "
        "structure, and identifiability error"
    ),
    show_stats=True,
    show_ci=True,
    n_boot=300,
    ci=0.95,
    boot_seed=7,
    out=str(fig_out),
    out_csv=str(csv_out),
    out_json=str(json_out),
    dpi=160,
)
[OK] wrote /tmp/gp_sg_sm3_ident__5vv7370/sm3_identifiability_gallery (eps,pdf,png,svg)

Step 6 - Read the summary table produced by the script#

The real script writes a compact summary CSV and JSON. This is useful because the figure is not just visual: it also exports auditable statistics.

stats_df = pd.read_csv(csv_out)

print("")
print("Summary statistics written by the plotting backend")
print(stats_df.to_string(index=False))

with open(json_out, "r", encoding="utf-8") as f:
    payload = json.load(f)

print("")
print("JSON meta block")
print(json.dumps(payload["meta"], indent=2))
Summary statistics written by the plotting backend
                  name  estimate  ci_lo  ci_hi  nd
         tau_mae_log10    0.0611 0.0573 0.0656   2
                tau_r2    0.9579 0.9517 0.9629   3
             tau_slope    0.9995 0.9840 1.0204   2
           K_mae_log10    0.1213 0.1130 0.1297   2
                  K_r2    0.9405 0.9315 0.9487   3
               K_slope    0.9948 0.9750 1.0155   2
        ridge_spearman    0.7276 0.6803 0.7692   2
err_vs_metric_spearman    0.2027 0.1178 0.2901   2

JSON meta block
{
  "boot_seed": 7,
  "ci": 0.95,
  "k_cl_source": "prior",
  "k_from_tau": false,
  "metric": "eps_prior_rms",
  "n_boot": 300,
  "n_rows": 480,
  "ridge_q50": 0.07668299297833492,
  "ridge_q95": 0.20492275662679227,
  "tau_units": "year"
}

Step 7 - Learn how to read panel (a)#

Panel (a) is the timescale recovery panel.

The dashed diagonal is the ideal line:

estimated tau = true tau

Points near that line mean the effective relaxation time has been recovered well.

Two optional overlays make the panel more informative:

  • the open circles show tau_cl, the closure-implied timescale computed from the estimated fields,

  • the open triangles show the prior timescale.

This means panel (a) is teaching three ideas at once:

  • truth,

  • estimate,

  • and physics-guided reference structure.

Step 8 - Learn how to read panel (b)#

Panel (b) is the permeability recovery panel.

In this lesson we keep k_from_tau=False, so the panel shows:

log10(K_true) versus log10(K_est)

That makes it the direct permeability recovery view.

But the real script can switch this panel into a closure-based interpretation with k_from_tau=True. In that mode it plots K implied by tau through the closure formula instead of the directly estimated K.

That option is scientifically useful because it lets the user ask:

“Is tau correct merely as a number, or is it also consistent with the permeability implied by the closure?”

Step 9 - Learn how to read panel (c)#

Panel (c) is the heart of the identifiability lesson.

It plots:

delta_K versus (delta_Ss + 2 delta_Hd)

in log10 space.

If parameter recovery is sliding along the poroelastic ridge, the cloud will tend to align with the 1:1 line, because shifts in K can be compensated by shifts in Ss and Hd.

So this panel should not be read as a simple accuracy plot. It should be read as a trade-off structure plot.

A strong alignment here can mean:

  • the model is finding a family of nearly equivalent solutions,

  • the closure is allowing compensation between parameters,

  • and some parameters may be only weakly identifiable even if the main target still looks acceptable.

Step 10 - Learn how to read panel (d)#

Panel (d) connects recovery error to one chosen identifiability diagnostic.

In this page we used:

eps_prior_rms

so the y-axis shows how the tau error changes as the prior-side identifiability metric worsens.

The real script can instead use:

  • ridge_resid

  • eps_prior_rms

  • closure_consistency_rms

This is powerful because it lets the same figure answer slightly different scientific questions:

  • does tau error grow with ridge ambiguity?

  • does tau error grow with prior inconsistency?

  • does tau error grow with closure inconsistency?

Step 11 - Practical takeaway#

This figure is useful because it separates two ideas that are often mixed together:

  • good prediction of tau,

  • good identifiability of the underlying physics.

A run can recover tau reasonably well while still living on a parameter ridge. That is exactly why the ridge panel and the error-versus-metric panel matter.

In practice, this figure is best used when:

  • designing SM3 experiments,

  • comparing identifiability regimes,

  • and explaining why one regime is more scientifically stable than another.

Command-line version#

The same figure can be produced from the command line.

The real script supports:

  • --csv for the SM3 summary table,

  • --tau-units with year or sec,

  • --metric with ridge_resid | eps_prior_rms | closure_consistency_rms,

  • --k-from-tau with auto | true | false,

  • --k-cl-source with prior | true | est,

  • optional filtering with --only-identify and --nx-min,

  • bootstrap settings through --n-boot, --ci, and --boot-seed,

  • the usual plot text toggles,

  • and summary exports through --out-csv and --out-json.

Legacy dispatcher:

python -m scripts plot-sm3-identifiability \
  --csv results/sm3_synth_1d/sm3_synth_runs.csv \
  --tau-units year \
  --metric ridge_resid \
  --k-from-tau auto \
  --k-cl-source prior \
  --show-prior true \
  --show-tau-cl true \
  --n-boot 2000 \
  --ci 0.95 \
  --out sm3-identifiability

Focus on only one identifiability regime:

python -m scripts plot-sm3-identifiability \
  --csv results/sm3_synth_1d/sm3_synth_runs.csv \
  --only-identify both \
  --nx-min 21 \
  --metric eps_prior_rms \
  --out sm3-identifiability-both

Modern CLI:

geoprior plot sm3-identifiability \
  --csv results/sm3_synth_1d/sm3_synth_runs.csv \
  --metric ridge_resid \
  --out sm3-identifiability

The gallery page teaches the figure. The command line reproduces it in a workflow.

Total running time of the script: (0 minutes 8.175 seconds)

Gallery generated by Sphinx-Gallery