Identifiability#

GeoPrior-v3 treats identifiability as part of the model design, not as a post hoc warning in the discussion section.

This matters because the flagship model does not only predict subsidence and groundwater trajectories. It also learns effective physical fields such as:

\(K(x,y)\)
\(S_s(x,y)\)
\(\tau(x,y)\)
\(H_d(x,y)\)

and uses them inside a coupled residual system. In problems like this, multiple parameter combinations can often explain similar observed behavior. That is a classic inverse-problem difficulty, and it remains central in physics-informed and poroelastic surrogate models [3, 4, 10, 13].

GeoPrior addresses this directly by exposing explicit identifiability regimes, parameter locks, closure anchors, and diagnostic summaries.

Why identifiability matters here#

A physics-guided model can fit observations well while still learning physically ambiguous or weakly constrained internal fields.

For example, in the GeoPrior closure

(1)#\[\tau_{phys} \approx \frac{H_d^2 \, S_s}{\pi^2 \, \kappa \, K},\]

different combinations of:

\(K\)
\(S_s\)
\(H_d\)
\(\tau\)

can produce similar effective timescales.

This creates ridge-like trade-offs in parameter space. The supervised outputs may still look good, but the inferred fields can become hard to interpret if the problem is not constrained carefully.

That is the core identifiability issue in GeoPrior.

A useful intuition#

A practical way to think about the problem is:

data fit alone
     ↓
can constrain observable trajectories
     ↓
but may not uniquely constrain K, Ss, tau, Hd
     ↓
so the model needs explicit identifiability structure

This is why GeoPrior has a dedicated identifiability layer.

Where identifiability enters the model#

Identifiability in GeoPrior is not handled in only one place.

It enters through:

the closure design itself,
the scaling and bounds contract,
the compile-time weight defaults,
the training warmup and ramp schedules,
the ability to freeze or lock selected heads,
and the diagnostic audit layer.

So the identifiability story is distributed across:

model construction,
scaling kwargs,
compile defaults,
training behavior,
and post hoc diagnostics.

This is why the identifiability page belongs in the scientific foundations section rather than in a narrow debugging appendix.

The main source of non-identifiability#

The most important structural source of non-identifiability in GeoPrior is the interaction between:

the closure-based timescale,
the compaction equilibrium,
and the learned physical fields.

Timescale ridge#

The closure prior says, conceptually:

(2)#\[\tau_{phys} \propto \frac{H_d^2 S_s}{K}.\]

So increasing \(H_d\), increasing \(S_s\), or decreasing \(K\) can all lengthen the implied timescale. Without additional constraints, several different field combinations may generate similar effective dynamics.

Settlement ridge#

The equilibrium settlement proxy is:

(3)#\[s_{eq}(h) \approx S_s \, \Delta h \, H.\]

So the same observed compaction tendency can also be traded off among:

storage-like response,
drawdown magnitude,
and thickness.

Together, these relations mean the model can have enough freedom to fit the forecast targets while still remaining ambiguous in how it explains them internally.

That is why identifiability must be designed into the model.

The GeoPrior strategy#

GeoPrior uses three main strategies to improve identifiability.

1. Constrain by construction

Use closure-based relationships so parameters are linked instead of floating independently.

2. Control training freedom

Freeze or relax selected parts of the physics branch through named regimes, locks, and warmup schedules.

3. Audit what was actually learned

Export JSON-safe audits and payload-based summaries so the model’s internal physical interpretation can be inspected after the run.

This combination is much stronger than only saying “inverse problems can be ill-posed.”

Identifiability regimes#

GeoPrior exposes identifiability through the public model argument:

identifiability_regime="base"

The current model validation and constructor signature accept four named regimes:

base
anchored
closure_locked
data_relaxed.

These are not decorative presets. They change defaults in both:

the scaling / closure behavior,
and the compile-time physics weights.

How regimes are applied#

The identifiability helper init_identifiability(...):

normalizes the regime name,
loads the matching profile,
merges profile defaults into scaling_kwargs,
does not override user-provided keys,
ensures the bounds-loss config exists in dict form.

That “do not override user-provided keys” rule is important. It means the regime behaves like a scientific default profile, not like a hard-coded override that destroys experiment-specific settings.

The four built-in regimes#

Base#

base is the conservative default regime.

Its current profile sets:

freeze_physics_fields_over_time = True
bounds_loss_kind = "barrier"
moderate physics warmup and ramp defaults
compile defaults such as lambda_bounds = 1.0 and lambda_prior = 1.0.

Conceptually, base says:

keep the physics fields time-stable,
apply a standard closure prior,
and use moderate bounds / prior pressure.

This is the right regime when you want a scientifically disciplined baseline without very aggressive locking.

Anchored#

anchored makes the physics branch substantially more constrained.

Its profile currently adds or strengthens settings such as:

freeze_physics_fields_over_time = True
allow_subs_residual = False
barrier-style bounds
stronger bounds and prior penalties
stop_grad_ref = True
drawdown_zero_at_origin = True
shorter warmup but longer ramp
stronger compile defaults, for example lambda_bounds = 10.0, lambda_prior = 10.0, lambda_cons = 50.0.

Conceptually, anchored says:

keep the closure stable,
stop the model from using overly flexible subsidence residual freedom,
and strongly discourage parameter drift away from the prior structure.

This is useful when you want a more interpretable but less free inverse solution.

Closure-locked#

closure_locked is the strongest closure-oriented regime in the current built-in set.

It keeps many of the anchored constraints and additionally locks the tau_head by setting:

"locks": {"tau_head": True}

in the profile. It also uses a very strong prior default, currently:

lambda_prior = 100.0.

Conceptually, this regime says:

keep the learned timescale extremely close to the closure,
strongly suppress free timescale drift,
and interpret the model more like a closure-driven hydrogeological surrogate than a free inverse learner.

This is especially useful for:

ablation studies,
identifiability experiments,
synthetic validation,
and cases where you want to isolate the effect of the closure itself.

Data-relaxed#

data_relaxed is the least constrained built-in regime.

Its profile currently sets:

freeze_physics_fields_over_time = False
allow_subs_residual = True
a residual-style bounds loss instead of barrier-style
weaker bounds and prior defaults
stop_grad_ref = False
drawdown_zero_at_origin = False
much longer warmup and ramp
compile defaults such as lambda_prior = 0.0 and lambda_gw = 0.0.

Conceptually, this regime says:

allow the model more freedom to follow the data,
relax strong prior control,
and tolerate greater flexibility in the subsidence pathway.

This is useful for exploratory forecasting or when the user cares more about predictive fit than about aggressively constrained internal fields.

A regime comparison map#

A useful summary is:

base
  moderate prior + bounds, time-frozen fields

anchored
  stronger prior + stronger bounds, no subs residual freedom

closure_locked
  anchored-like + tau head locked close to closure

data_relaxed
  weaker prior, more data freedom, slower physics ramp

Why `tau` is central#

In the current implementation, the most important parameter for identifiability is usually \(\tau\).

That is because \(\tau\) sits at the intersection of:

settlement delay,
hydraulic conductivity,
storage,
and effective drainage thickness.

If \(\tau\) is left completely free, it can absorb closure mismatch and make the inferred \(K\) / \(S_s\) relationship harder to interpret.

That is why GeoPrior provides:

closure-based \(\tau_{phys}\),
log-space prior mismatch,
closure_locked behavior,
and even scenario utilities where \(K\) is derived from \(\tau\) instead of learned freely.

Locks and parameter freezing#

GeoPrior can apply explicit parameter-head locks through apply_ident_locks(...).

The current locking logic can freeze any of:

tau_head
K_head
Ss_head

when the active profile requests it. The built-in closure_locked regime currently freezes tau_head.

This is scientifically meaningful because it lets the user separate two different questions:

Can the model fit the data with closure-driven fields?
What changes when one field is allowed to move freely?

That is the essence of a good identifiability ablation.

The tau-only scenario#

The helper module also implements an explicit identifiability scenario:

- learn tau only
- derive K from tau via closure
- freeze (or fix) Ss and Hd

This is implemented by scenario_tau_only_derive_K(...). The corresponding closure is:

(4)#\[\tau = \frac{H_d^2\,S_s}{\pi^2\,\kappa_b\,K}\]

so that

(5)#\[K = \frac{H_d^2\,S_s}{\pi^2\,\kappa_b\,\tau}.\]

This is one of the clearest examples of “breaking non-identifiability ridges by construction,” which is exactly how the module describes its purpose.

Why this scenario is useful#

The tau-only scenario is especially useful when you want to ask:

if \(S_s\) and \(H_d\) are fixed or trusted, what timescale is the data asking for?
once that timescale is known, what effective conductivity \(K\) does the closure imply?

This is a much more identifiable question than trying to learn \(K\), \(S_s\), \(H_d\), and \(\tau\) all completely freely from the same observed trajectory.

Compile-time identifiability defaults#

Identifiability in GeoPrior also enters through the compile defaults.

The helper resolve_compile_weights(...) merges the active profile’s recommended compile weights with any user-specified values. The selection rule is:

if the user provides a value explicitly, that wins;
otherwise the profile default is used;
otherwise a fallback default is used.

This is a very good design choice because it keeps the model scientifically guided without making the built-in regimes impossible to override.

A useful consequence is that identifiability is not only about parameter constraints. It is also about which residuals are given enough optimization pressure to matter.

How regimes affect training behavior#

The current profiles do more than change lambdas.

They also change scaling-side and training-side behaviors such as:

freeze_physics_fields_over_time
allow_subs_residual
bounds_loss_kind
bounds_beta
bounds_guard
bounds_w
bounds_include_tau
bounds_tau_w
stop_grad_ref
drawdown_zero_at_origin
physics_warmup_steps
physics_ramp_steps.

These settings matter because identifiability is not only about closed-form algebra. It is also about how much freedom the network is allowed to exploit during training.

Examples#

A few especially important examples are:

freeze_physics_fields_over_time: Prevents the model from explaining the same site with arbitrarily time-varying physical fields.
allow_subs_residual = False: Reduces freedom in the settlement branch, pushing the model to follow the closure more closely.
stop_grad_ref = True: Stops the model from “solving” drawdown inconsistencies by moving the reference head itself.
drawdown_zero_at_origin = True: Anchors the drawdown convention more strongly near the reference point.
physics_warmup_steps and physics_ramp_steps: Control how quickly physics pressure becomes active, which strongly affects whether the model finds a stable interpretable solution or a flexible but ambiguous one.

Identifiability audits#

GeoPrior includes a JSON-safe audit helper:

from geoprior.models.subsidence import ident_audit_dict

The audit records:

the active regime name,
whether identifiability is enabled,
the profile keys used,
active locks,
the trainable status of heads such as tau_head, K_head, and Ss_head,
the effective lambda weights,
selected scaling keys taken from the profile,
selected scaling keys actually active on the model,
and the current bounds-loss form.

This is extremely useful for experiment logs, manifests, and evaluation JSON because it lets the user confirm what regime was actually active rather than what they intended to run.

Best practice

Save the output of ident_audit_dict(model) for every serious identifiability experiment.

It is one of the cleanest ways to preserve the internal scientific posture of the run.

Payload-based identifiability diagnostics#

GeoPrior also exposes post hoc identifiability diagnostics through the exported physics payload.

The public helpers include:

identifiability_diagnostics_from_payload(...)
summarise_effective_params(...)
derive_K_from_tau_np(...).

These are especially useful for synthetic or controlled experiments where “true” effective parameters are known.

What the payload diagnostics measure#

The current payload diagnostics explicitly compute three blocks:

relative error in \(\tau\)
log-timescale residual for the closure
log-offsets of \(K\), \(S_s\), and \(H_d\) relative to true values and priors.

That is scientifically meaningful because it separates three different questions:

did the model recover the effective timescale?
did the closure stay consistent with the learned fields?
how far did the learned physical fields drift away from the truth or from the priors?

Summarising effective parameters#

For 1D or synthetic-column studies, the helper summarise_effective_params(...) collapses 1D arrays of payload quantities such as:

tau
tau_prior
K
Ss
Hd

into representative scalar summaries, currently using the median of finite values.

This makes it much easier to compare runs when the real scientific question is about one effective column-scale parameter set rather than about a large spatial field.

A good synthetic identifiability workflow#

A useful synthetic workflow is:

train with a chosen identifiability regime
     ↓
export physics payload
     ↓
summarise effective params
     ↓
compare tau, tau_prior, K, Ss, Hd with true and prior values
     ↓
inspect closure_log_resid and parameter offsets

This is exactly the sort of loop that turns identifiability from an abstract concern into a measurable property of the model.

How to think about the regimes scientifically#

A good conceptual spectrum is:

data_relaxed favors predictive freedom;
base gives a balanced default;
anchored favors interpretable closure-constrained behavior;
closure_locked is closest to a closure-dominant identifiability test.

This means the regimes are not merely “more or less strict.” They are really different scientific stances about which degrees of freedom are allowed to explain the data.

When to use each regime#

Use `base` when#

you want the default scientific baseline;
you want moderate prior and bounds pressure;
you want time-frozen physical fields without very strong closure locking.

Use `anchored` when#

you want stronger physical discipline;
you want to suppress free subsidence residual behavior;
you want a more interpretable but less flexible inverse solution.

Use `closure_locked` when#

you want to test closure-dominant behavior;
you want \(\tau\) to remain tightly linked to the closure;
you are running synthetic recovery or ablation studies.

Use `data_relaxed` when#

predictive fit matters more than strict physical interpretability;
you want more data freedom;
you want a softer starting point for exploratory runs.

Common failure modes#

Good prediction, poor parameter identifiability#

This is the most common failure mode.

The model can still predict well because the observable state trajectory is easier to constrain than the internal fields.

A solution is often to:

move from data_relaxed to base or anchored;
inspect the closure log residual;
add or strengthen bounds and prior terms;
freeze more of the physics structure.

Closure looks good, fields still drift#

This can happen when the closure itself is satisfied but the field decomposition remains ambiguous.

A solution is often to:

compare offsets vs true values and vs prior values;
use a tau-only or partially frozen scenario;
inspect whether freeze_physics_fields_over_time should be enabled.

Bounds dominate too early#

If bounds and prior penalties are too aggressive too early, the model may become overly rigid before the forecasting backbone has stabilized.

A solution is often to:

lengthen warmup and ramp;
reduce bounds strength;
move from anchored to base first.

Data-relaxed solutions look better but are harder to trust#

This is expected. More flexibility can improve apparent fit while reducing interpretability.

That is why identifiability should be reported explicitly, not inferred from forecast quality alone.

A compact identifiability map#

The GeoPrior identifiability story can be summarized as:

closure:
  tau ~ Hd^2 * Ss / (pi^2 * kappa * K)
     ↓
creates ridge-like trade-offs among K, Ss, Hd, tau
     ↓
identifiability regimes constrain those trade-offs
     ↓
locks, bounds, priors, and warmups shape training freedom
     ↓
audit helpers record the active regime
     ↓
payload diagnostics measure recovery and closure consistency

Relationship to the rest of the docs#

This page explains how GeoPrior controls and audits inverse-problem ambiguity.

The companion pages explain related pieces:

Physics formulation explains the closure and residual system that create the identifiability challenge;
Residual assembly explains how those terms are built in the physics core;
Losses and training explains how the compile-time lambdas and physics multiplier affect optimization pressure;
Scaling and conventions (scaling_kwargs) explains the bounds and semantic conventions that support identifiability;
Stage-3 and Stage-5 are especially relevant when identifiability is studied through tuning or transfer experiments.

Identifiability#

Why identifiability matters here#

A useful intuition#

Where identifiability enters the model#

The main source of non-identifiability#

Timescale ridge#

Settlement ridge#

The GeoPrior strategy#

Identifiability regimes#

How regimes are applied#

The four built-in regimes#

Base#

Anchored#

Closure-locked#

Data-relaxed#

A regime comparison map#

Why tau is central#

Locks and parameter freezing#

The tau-only scenario#

Why this scenario is useful#

Compile-time identifiability defaults#

How regimes affect training behavior#

Examples#

Identifiability audits#

Payload-based identifiability diagnostics#

What the payload diagnostics measure#

Summarising effective parameters#

A good synthetic identifiability workflow#

How to think about the regimes scientifically#

When to use each regime#

Use base when#

Use anchored when#

Use closure_locked when#

Use data_relaxed when#

Common failure modes#

Good prediction, poor parameter identifiability#

Closure looks good, fields still drift#

Bounds dominate too early#

Data-relaxed solutions look better but are harder to trust#

A compact identifiability map#

Relationship to the rest of the docs#

See also#

Why `tau` is central#

Use `base` when#

Use `anchored` when#

Use `closure_locked` when#

Use `data_relaxed` when#