Configuration#
GeoPrior-v3 is a configuration-driven workflow framework.
That means the intended way to control a run is not by editing random values inside stage scripts. Instead, the workflow is designed so that:
a configuration file defines the main project settings,
commands can install or override that configuration,
the effective runtime configuration is persisted,
and later stages can inspect what was actually used.
This is one of the key design choices in GeoPrior-v3. It helps keep runs reproducible, auditable, and easier to debug across the Stage-1 → Stage-5 workflow.
Why configuration matters#
In GeoPrior-v3, many workflow outcomes depend on settings that are easy to get wrong if they remain implicit.
Examples include:
city and dataset identity,
input file locations,
time-window layout,
forecast horizon,
column semantics,
groundwater conventions,
scaling behavior,
model defaults,
results directories,
and physics-related switches.
If these values live only in scattered code edits, it becomes hard to answer basic questions later, such as:
Which city did this run really use?
Which horizon and time-step layout were active?
Which config values were overridden from the CLI?
Did this stage use the same assumptions as the previous stage?
Can this run be reproduced on another machine?
GeoPrior-v3 uses explicit configuration so those questions have clear answers.
The two main config artifacts#
GeoPrior-v3 uses two closely related configuration artifacts:
1. nat.com/config.py
This is the main human-edited configuration file.
2. nat.com/config.json
This is the persisted effective runtime snapshot used to
record the currently active config state after initialization
or CLI overrides.
A useful mental model is:
config.pyis the source you edit,config.jsonis the resolved state the workflow records.
This separation is important because a run often combines:
the installed base config,
mapped CLI fields such as
--cityor--model,repeated
--set KEY=VALUEoverrides,and any refresh logic used to derive dependent fields.
Where the config lives#
By default, GeoPrior-v3 uses a config root directory named:
nat.com
Within that root, the expected config artifacts are:
nat.com/config.py
nat.com/config.json
The shared CLI helper layer also supports:
--config-root
so advanced users can point commands at an alternate config root when needed.
This is useful for cases such as:
comparing multiple experimental setups,
isolating one project from another,
testing a new config without touching the default one,
or running CI / automation with a dedicated config root.
How configuration enters the workflow#
There are three main ways configuration enters a GeoPrior command.
1. Interactive or scripted initialization
The command:
geoprior-init
creates or refreshes the active config.py and also
refreshes the JSON-side runtime snapshot.
2. Install a user-supplied config
Many commands support:
--config path/to/config.py
When supplied, the user config is copied into the active config root before the command runs.
3. Apply one-off runtime overrides
Many commands support:
--set KEY=VALUE
repeated as needed. These values are merged into the active configuration for the current run and then persisted into the runtime JSON snapshot.
This layered model is one of the strengths of the CLI design.
Configuration precedence#
In practice, the effective configuration is built in a simple and predictable order.
A good way to think about precedence is:
start from the active
config.pyin the chosen config root;if
--configis provided, install that config into the active config root first;map selected explicit CLI fields, such as
--cityor--model, onto their corresponding config keys;apply repeated
--set KEY=VALUEoverrides;refresh any derived config fields if the command uses a refresh helper;
persist the effective result to
config.json.
This means that CLI overrides do not merely affect a local in-memory object. They are also recorded into the runtime config snapshot.
Best practice
Treat the persisted runtime config as the record of what a command actually used.
This is especially important when the run was launched with one-off overrides.
How --set values are parsed#
GeoPrior-v3 parses --set KEY=VALUE conservatively so that
common value types work without forcing everything to remain
a raw string.
The parser tries values in this order:
case-insensitive
true,false, andnoneintegers
floats
ast.literal_eval(...)for values such as lists, tuples, dicts, and quoted stringsfallback to the stripped input string
This gives practical behavior such as:
--set TIME_STEPS=6
--set USE_VSN=True
--set QUANTILES=[0.1,0.5,0.9]
--set SOME_OPTION=None
without forcing the user to implement a custom parser for every stage command.
Warning
Every --set item must use the form KEY=VALUE.
Empty keys are rejected, and malformed items exit early.
What geoprior-init creates#
The initialization command generates a NATCOM-style
config.py template for GeoPrior-v3.
The generated file is not tiny. It already organizes the configuration into meaningful blocks such as:
dataset identity and file naming,
time layout,
required columns,
physics conventions and target semantics,
feature groups,
training defaults,
model defaults,
scaling and bookkeeping.
This is a design choice because it makes the config feel like a project control file rather than a loose set of unrelated constants.
A typical initialization flow is:
geoprior-init
geoprior-init --yes
geoprior-init --city zhongshan --time-steps 6
The initializer also prints suggested next commands such as preprocess, train, tune, infer, and transfer.
The configuration template structure#
The generated config template is organized into practical sections that mirror how the workflow is used.
Dataset identity and file naming#
This section usually defines core identifiers such as:
CITY_NAMEMODEL_NAMEDATA_DIRdataset variant and file naming choices
These values control the basic identity of the run.
Time layout#
This section defines the temporal structure of the workflow, for example:
TRAIN_END_YEARFORECAST_START_YEARFORECAST_HORIZON_YEARSTIME_STEPSMODE
These values are especially important because Stage-1, Stage-2, Stage-3, and Stage-4 all depend on a consistent understanding of horizon and sequence structure.
Required columns#
This section defines the core dataset columns used by the workflow, for example:
time column,
longitude and latitude columns,
subsidence column,
groundwater column,
thickness column,
surface-elevation or head-related columns.
This is where raw tabular data begins to become a scientific contract for the pipeline.
Physics conventions and target semantics#
This section is especially important for GeoPrior-v3.
It includes choices such as:
groundwater kind,
groundwater sign convention,
whether to use a head proxy,
PDE mode defaults,
coordinate normalization or retention,
scaling switches for groundwater, thickness, and surface elevation.
These settings strongly influence how the workflow interprets the data physically, not just statistically.
Feature groups#
The config also organizes model-facing features into groups, such as:
static features,
dynamic features,
future-known features,
optional categorical or numeric groups,
and bookkeeping lists used later in manifests and audits.
This section matters because later stages depend on stable feature semantics and ordering.
Training defaults#
This section usually stores common training-time defaults such as:
batch size,
epoch count,
learning rate,
optimizer-related defaults.
These settings become most visible in Stage-2 and Stage-3.
Model defaults#
This section typically contains model architecture defaults, for example:
hidden units,
LSTM units,
attention units,
number of heads,
dropout rate,
batch normalization usage,
VSN usage.
These values give the workflow a usable default model without forcing every run to spell out the entire architecture.
Scaling and bookkeeping#
The template also contains scaling and audit-oriented values, for example:
subsidence scale,
groundwater scale,
verbose or audit-stage settings.
This helps keep the workflow explicit about units, scaling, and stage tracing.
How CLI fields map back into config#
Some CLI arguments are not only command-local inputs. They are also mapped back into config keys.
Examples include mappings such as:
--city→CITY_NAME--model→MODEL_NAME--results-dir→ a results-root config key
This design is useful because it lets a single command override a small number of important fields without forcing the user to modify the underlying config file by hand.
Typical pattern:
geoprior-run stage1-preprocess \
--city zhongshan \
--model GeoPriorSubsNet
or:
geoprior-run stage1-preprocess \
--config my_config.py \
--set TIME_STEPS=6 \
--set FORECAST_HORIZON_YEARS=3
The effective values are then written into the persisted runtime config snapshot.
Why config.json exists#
It is natural to ask why GeoPrior-v3 uses both config.py
and config.json.
The short answer is that they serve different roles.
config.py is convenient for:
human editing,
comments,
structured grouping of settings,
readable project control.
config.json is convenient for:
persisting the effective merged runtime config,
recording CLI-applied overrides,
lightweight downstream inspection,
and providing a machine-friendly snapshot.
The persisted JSON payload also records key identity fields such as city and model alongside the config dictionary itself.
This makes it easier for later stages or tools to answer: what was the actual effective runtime config for this run?
Configuration across stages#
Configuration is not a Stage-1-only concern.
Across the workflow:
Stage-1 uses config to define preprocessing, features, splits, and export structure.
Stage-2 uses config together with the Stage-1 manifest to define training, scaling, model construction, and export.
Stage-3 uses config to define the tuning search procedure on top of the Stage-1 contract.
Stage-4 uses config to control model reuse, calibration, and inference behavior.
Stage-5 uses config to define city-pair transfer defaults, results directories, and experiment behavior.
This is why the config page belongs in the user guide rather than in a small appendix.
A practical lifecycle#
A good mental model for GeoPrior configuration is:
initialize config
↓
edit config.py for project-level defaults
↓
run one command with optional --config and --set overrides
↓
persist effective runtime config to config.json
↓
inspect artifacts and manifests
↓
reuse or refine config for later stages
This lifecycle is much safer than editing stage scripts directly every time something changes.
Recommended workflow#
A reliable configuration workflow looks like this:
1. Bootstrap once
geoprior-init --yes
2. Edit ``nat.com/config.py``
Adjust city, paths, time layout, columns, feature lists, training defaults, and physics conventions.
3. Run the first stage
geoprior-run preprocess
4. Use one-off overrides sparingly
When you want a temporary change, use:
--set KEY=VALUE
instead of editing the config file for every short-lived experiment.
5. Inspect the persisted runtime snapshot
Review nat.com/config.json when you need to confirm what
a particular command actually used.
Common configuration mistakes#
Mixing permanent defaults and temporary overrides
Not every change belongs in config.py. Temporary
experiments are usually better expressed through --set.
Forgetting that stages depend on shared semantics
A harmless-looking change to horizon, feature lists, or groundwater semantics can affect multiple later stages.
Editing code instead of config
If a value is really part of the run contract, it should usually live in config.
Treating ``config.json`` as the main editing surface
The JSON file is best treated as a persisted runtime snapshot, not as the main human-authored config.
Using stale config with new artifacts
If a config changed materially, be careful about reusing old Stage-1 or Stage-2 artifacts whose semantics may no longer match.
Configuration and reproducibility#
Configuration is one of the main reasons GeoPrior-v3 can support reproducible staged workflows.
A run can be understood much more clearly when you can trace:
the authored
config.py,the effective
config.json,the stage manifest,
the generated artifacts,
and the CLI command that launched the stage.
Together, these provide a much stronger provenance trail than a notebook or script with hidden local edits.
Best practices#
Best practice
Keep config.py as the project-level source of truth.
Use it for stable defaults, not for one-off experiment clutter.
Best practice
Use --set for temporary variations.
This keeps the project config readable while still making experiments easy.
Best practice
Inspect config.json when debugging a run.
It is often the fastest way to see what effective values a CLI command actually used.
Best practice
Change configuration before the stage runs, not after.
Later stages assume earlier artifact contracts are already fixed.
Best practice
Keep configuration, manifests, and outputs conceptually aligned.
If you change the config materially, consider whether the old artifacts should still be trusted.
A compact configuration map#
The GeoPrior-v3 configuration system can be summarized like this:
geoprior-init
↓
nat.com/config.py
↓
optional --config install
↓
optional mapped CLI fields (--city, --model, ...)
↓
optional repeated --set KEY=VALUE
↓
refresh derived config fields
↓
nat.com/config.json
↓
stage command reads effective config
↓
stage manifest + workflow artifacts record the result
A few useful examples#
Initialize the default config
geoprior-init --yes
Run Stage-1 with one-off horizon overrides
geoprior-run preprocess \
--set TIME_STEPS=6 \
--set FORECAST_HORIZON_YEARS=3
Use an alternate authored config file
geoprior-run train \
--config my_project_config.py
Override identity fields directly from the CLI
geoprior-run stage1-preprocess \
--city zhongshan \
--model GeoPriorSubsNet
Work from an alternate config root
geoprior-run tune \
--config-root natcom_experiment_a
Read next#
The best next pages after this one are:
See how configuration is installed, overridden, and persisted from the CLI.
Learn how configuration first becomes concrete workflow artifacts.
Review how config participates in the staged pipeline.
Understand how config, manifests, and diagnostics work together when debugging a run.