Stage-2 ======= Stage-2 is the **training, calibration, and forecast-export stage** of the GeoPrior-v3 workflow. Where Stage-1 establishes the preprocessing and tensor handoff contract, Stage-2 consumes those exported artifacts, rebuilds the training-ready workflow state, trains the selected model variant, evaluates the resulting model, calibrates interval forecasts, and exports the first model-facing scientific outputs. In practice, Stage-2 is where the workflow moves from **prepared data artifacts** to **model execution and forecasting artifacts**. What Stage-2 does ----------------- Stage-2 is responsible for the following high-level tasks: - loading the Stage-1 manifest and NPZ bundles; - validating the Stage-1 → Stage-2 handshake; - resolving the hybrid configuration used for training; - rebuilding train and validation datasets; - constructing and compiling the model; - training with callbacks, checkpoints, and logs; - saving model bundles and training summaries; - reloading a clean inference model; - calibrating predictive intervals on the validation set; - generating forecast CSV outputs and diagnostics; - exporting physics payloads and physical parameter summaries. This makes Stage-2 the main bridge between **data preparation** and **scientific forecast outputs**. Why Stage-2 matters ------------------- Stage-2 is the first stage where three things come together at once: - the **Stage-1 artifact contract**, - the **runtime configuration**, - and the **physics-guided model behavior**. That combination is versatile, but it also means Stage-2 is a sensitive point in the workflow. A training run can complete successfully even when the scientific assumptions are wrong, for example because of: - unit mismatches, - wrong coordinate conventions, - inconsistent feature ordering, - incorrect groundwater semantics, - stale artifacts from another run, - or badly chosen physics settings. For that reason, Stage-2 is not just “model.fit()”. It is a structured training stage with explicit audits, manifests, and export logic. Stage-2 entry point ------------------- Stage-2 is exposed through the modern CLI wrapper rather than only through the older legacy script body. The wrapper exists so that Stage-2 can be dispatched safely from the GeoPrior CLI while still reusing the established training pipeline underneath. At the wrapper level, Stage-2 supports a configuration-driven launch style with options such as: - ``--config`` - ``--config-root`` - ``--city`` - ``--model`` - ``--data-dir`` - ``--stage1-manifest`` - repeated ``--set KEY=VALUE`` overrides A common pattern is to start with the live CLI help: .. code-block:: bash geoprior-run --help and then inspect the Stage-2-specific help in your installed environment. .. note:: The exact command form should always be taken from the current local CLI help output. The documentation explains the workflow contract, but the installed command surface is the authoritative reference for invocation syntax. What Stage-2 consumes from Stage-1 ---------------------------------- Stage-2 does not rebuild preprocessing from scratch. It is designed to **consume Stage-1 artifacts directly**. The required inputs come from the Stage-1 manifest and its exported NPZ bundles. In particular, Stage-2 expects the Stage-1 handoff to provide at least the core training and validation bundles. Conceptually, Stage-2 needs: - the Stage-1 ``manifest.json``, - training input arrays, - training target arrays, - validation input arrays, - validation target arrays. Depending on the Stage-1 run, additional artifacts may also be available and used later in the workflow, such as: - test NPZ bundles, - future-sequence bundles, - scaler and encoder metadata, - split summaries, - optional holdout artifacts. If the user provides ``--stage1-manifest``, Stage-2 uses that specific Stage-1 manifest explicitly. Otherwise, it resolves the most appropriate Stage-1 manifest from the results area. Stage-1 → Stage-2 handshake --------------------------- One of the most important Stage-2 responsibilities is to validate the handoff coming from Stage-1. This includes checks such as: - the manifest really corresponds to Stage-1, - the city in the manifest matches the active configuration, - the required NPZ bundle paths exist, - the tensor last-dimension sizes match the recorded feature lists, - the sequence horizon and time-step settings are consistent, - the scaling metadata is present and usable, - and the coordinate treatment is compatible with the model setup. This handshake is not a cosmetic extra. It is one of the main defenses against silent workflow drift. .. admonition:: Best practice Do not treat Stage-1 and Stage-2 as loosely connected scripts. Stage-2 is designed to trust Stage-1 as the authoritative preprocessing boundary, but only after the handshake audit confirms that the exported tensors, feature names, and scaling information remain internally consistent. How configuration is resolved ----------------------------- Stage-2 combines two configuration sources: 1. the live or installed GeoPrior configuration, 2. the Stage-1 manifest configuration snapshot. These are not merged blindly. The training workflow follows a **hybrid configuration** pattern in which Stage-1 provenance remains authoritative for the parts of the workflow that must reflect what was actually used during preprocessing, while the live configuration can still control training- and physics-facing options. In practical terms, this usually means: - Stage-1 remains the source of truth for what was exported; - training and physics choices can still come from the live config; - overlapping keys are resolved intentionally rather than by accidental overwrite. This is important because Stage-2 should not silently train a model under assumptions that no longer match the tensors it received. Internal structure of Stage-2 ----------------------------- The user does not need to know every helper function inside the training script, but it is useful to understand the overall flow. **1. Resolve manifest and hybrid config** Stage-2 locates or receives a Stage-1 manifest, verifies that it is valid, and resolves the effective configuration that will control the training run. **2. Load NPZ bundles and metadata** The stage reads the exported train and validation arrays, loads scaling and encoder metadata, and reconstructs the feature-space contract needed by the model. **3. Finalize scaling and semantics** Groundwater semantics, coordinate treatment, SI conversion rules, and feature naming are consolidated into the final ``scaling_kwargs`` and related runtime metadata. **4. Run the Stage-2 handshake audit** If auditing is enabled, the stage records a Stage-2 handshake report before the model is built. **5. Build and compile the model** The configured model class is instantiated with the resolved input dimensions, forecast horizon, quantiles, PDE mode, and identifiability regime. **6. Train with callbacks and checkpoints** Training uses datasets built from the Stage-1 tensors and runs with checkpointing, early stopping, CSV logging, NaN termination, and optional scheduling of physics-related weights. **7. Save model artifacts** Stage-2 writes multiple model-facing artifacts, including weights, best-model bundles, architecture JSON, training summaries, manifests, and plots. **8. Reload inference model and calibrate intervals** A clean inference model is loaded, validation-based interval calibration is fitted, and interval factors are saved. **9. Forecast, evaluate, and export** The trained model is used to produce forecast tables, calibrated forecast tables, metrics JSON files, and physics payload exports. Model construction in Stage-2 ----------------------------- Stage-2 builds the configured model using the dimensions derived from the Stage-1 tensors and the resolved runtime configuration. The stage is primarily designed around the GeoPrior model family and can support different model flavors depending on the active configuration. The model is instantiated with items such as: - static, dynamic, and future feature dimensions, - output dimensions, - forecast horizon, - quantiles, - PDE mode, - identifiability regime, - finalized scaling kwargs, - and model-specific physical parameters. This is why Stage-2 should never be run against stale or ambiguous Stage-1 artifacts. The model structure is tied directly to what Stage-1 exported. Compile-time behavior --------------------- Once the model is built, Stage-2 compiles it using the resolved supervised losses, metrics, and physics-related loss weights. This stage may include: - supervised losses for subsidence and groundwater targets, - quantile-aware objectives when probabilistic outputs are enabled, - physics-related residual penalties, - optional prior, smoothness, or bounds terms, - and model-flavor-specific compile adjustments. Stage-2 therefore sits at the point where the workflow switches from **prepared tensors** to **scientific training behavior**. Training behavior and callbacks ------------------------------- Stage-2 does not just train and discard the result. It uses a tracked training workflow with persistent outputs. Typical callback behavior includes: - best full-model checkpoint saving, - best weights saving, - early stopping, - CSV training log export, - termination on NaNs, - and optional offset scheduling for physics-related terms. This makes the stage more reproducible and easier to inspect after a run completes. Artifacts produced during training ---------------------------------- Stage-2 writes several important training-facing artifacts. **Model init manifest** A lightweight ``model_init_manifest.json`` is created so the model can be reconstructed cleanly for later inference. **Best model bundles** Stage-2 saves checkpointed best-model artifacts, usually in both full-model and weights-oriented forms. **Final model bundle** A final Keras model export is also written when possible. **Architecture JSON** The model architecture is saved separately for inspection or reconstruction workflows. **Training summary** A JSON summary captures the best epoch, tracked metrics, compile settings, environment details, and key hyperparameter information. **Run manifest** Stage-2 writes a small run manifest that downstream stages or scripts can read without parsing the full training summary. **Scaling snapshot** The finalized ``scaling_kwargs`` are saved explicitly so the training-time interpretation can be audited later. These artifacts make Stage-2 durable. The stage is not merely a transient training step. Forecasting and calibration --------------------------- After training, Stage-2 performs a second critical role: **forecast preparation and calibration**. The workflow first reloads or rebuilds a clean inference model, then fits an interval calibrator on the validation data before formatting forecast DataFrames. This ordering is important. The intended logic is: 1. enforce physics during training, 2. fit interval calibration on validation outputs, 3. apply calibration to forecast quantiles, 4. export calibrated and uncalibrated forecast products. By doing this, Stage-2 keeps the distinction clear between: - **physics-guided training behavior**, and - **post-training interval calibration behavior**. Forecast target split --------------------- When available, Stage-2 forecasts on the exported test split. If a test NPZ is not available, it can fall back to the validation split for forecast-oriented export. This gives the stage a practical default behavior while still remaining compatible with runs that do not yet expose a dedicated test handoff. Forecast and evaluation exports ------------------------------- Stage-2 writes a rich set of evaluation-facing outputs. Typical exports include: - evaluation forecast CSV, - future forecast CSV, - calibrated evaluation forecast CSV, - calibrated future forecast CSV, - calibration statistics JSON, - evaluation diagnostics JSON, - calibrated evaluation diagnostics JSON, - interval calibration factors, - and physics payload exports. These outputs are what make Stage-2 more than a pure training step. It is the stage where the workflow begins to produce scientifically interpretable forecast products. Physics diagnostics and payloads -------------------------------- Stage-2 also evaluates and exports physics-facing diagnostics. Depending on the run and model behavior, this may include: - evaluation-time physics diagnostics such as epsilon-style residual summaries, - physics payload export in NPZ form, - extracted physical parameter tables, - and optional plots of physical parameter values. This is especially useful for later scientific analysis, because it allows the user to inspect not only forecast quality but also the internal physical consistency behavior of the model. Typical files you should expect after Stage-2 --------------------------------------------- A successful Stage-2 run will often leave behind files such as: - ``model_init_manifest.json`` - training log CSV - best-model checkpoint bundle - best weights file - final Keras model - architecture JSON - training summary JSON - Stage-2 run manifest - ``scaling_kwargs.json`` - training history plots - physical parameter CSV - interval factor ``.npy`` file - evaluation and future forecast CSVs - calibrated forecast CSVs - evaluation diagnostics JSON - physics payload NPZ The exact file names depend on the run naming convention, but the presence of this artifact family is a good sign that Stage-2 completed the full training-and-export workflow. What to inspect after Stage-2 completes --------------------------------------- Do not move forward immediately after training finishes. Inspect the outputs first. At minimum, review: - the Stage-2 run directory, - the training summary JSON, - the run manifest, - the CSV training log, - the best-model and final-model files, - the saved scaling kwargs, - the forecast CSVs, - the calibrated forecast products, - the evaluation diagnostics JSON, - and the physics payload export. You should be able to answer questions such as: - Did Stage-2 use the intended Stage-1 manifest? - Do the recorded feature dimensions match expectations? - Did the best epoch occur reasonably early or late? - Were calibrated forecast outputs written? - Did the evaluation diagnostics look plausible? - Was a physics payload exported successfully? Common Stage-2 mistakes ----------------------- The most common Stage-2 problems are not purely algorithmic. **Using the wrong Stage-1 manifest** This can silently train the model on artifacts from another city, run, or feature contract. **Ignoring city mismatch** Stage-2 is designed to reject obvious city mismatch between the active config and the Stage-1 manifest. Do not work around that unless you are intentionally debugging. **Treating Stage-2 as only training** Stage-2 is also a calibration and export stage. A run that produces only model weights but no clean forecast artifacts is usually incomplete. **Skipping artifact inspection** A completed ``fit()`` call does not guarantee trustworthy forecasting outputs. **Forgetting that scaling semantics matter** Stage-2 carries forward the SI conversion and coordinate rules from Stage-1. If those assumptions are wrong, later physics interpretation can also be wrong. Best practices -------------- .. admonition:: Best practice Treat the Stage-1 manifest as the authoritative handoff contract. If you need to force a specific preprocessing run, pass the exact manifest explicitly rather than relying on accidental auto-resolution. .. admonition:: Best practice Inspect ``training_summary.json`` and the Stage-2 run manifest after every serious run. These are the fastest way to confirm what was actually trained, which settings were used, and which artifacts were saved. .. admonition:: Best practice Review the calibrated forecast exports, not only the raw model outputs. Stage-2 is explicitly designed to calibrate interval forecasts before downstream interpretation. .. admonition:: Best practice Keep Stage-2 outputs grouped by run directory. Mixing model bundles, forecast CSVs, or diagnostics from different Stage-2 runs is a common source of silent confusion. A compact Stage-2 map --------------------- The Stage-2 workflow can be summarized like this: .. code-block:: text Stage-1 manifest + NPZ bundles ↓ resolve hybrid config ↓ validate Stage-1 → Stage-2 handshake ↓ build datasets + finalize scaling semantics ↓ build and compile model ↓ train with checkpoints and logs ↓ save model bundles + summaries + manifests ↓ reload clean inference model ↓ fit interval calibration on validation set ↓ forecast + evaluate + export calibrated outputs ↓ export physics payload and diagnostics Read next --------- The next pages after Stage-2 are: .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: Stage-3 :link: stage3 :link-type: doc :class-card: sd-shadow-sm Continue from training into the next downstream workflow stage. .. grid-item-card:: Diagnostics :link: diagnostics :link-type: doc :class-card: sd-shadow-sm Learn how to interpret the diagnostics and audit outputs produced during later workflow steps. .. grid-item-card:: Inference and export :link: inference_and_export :link-type: doc :class-card: sd-shadow-sm Review the broader inference and export logic that builds on the artifacts produced here. .. grid-item-card:: CLI guide :link: cli :link-type: doc :class-card: sd-shadow-sm card--cli Move from the stage narrative to the full command reference. .. seealso:: - :doc:`workflow_overview` - :doc:`stage1` - :doc:`stage3` - :doc:`configuration` - :doc:`cli` - :doc:`diagnostics` - :doc:`inference_and_export` - :doc:`../scientific_foundations/data_and_units` - :doc:`../scientific_foundations/scaling` - :doc:`../scientific_foundations/identifiability`