Diagnostics#

This gallery focuses on workflow diagnostics, split credibility, training behavior, tuning interpretation, and physics-consistency checks in GeoPrior.

Unlike the forecasting gallery, which explains how forecast outputs are built and read, and unlike the uncertainty gallery, which explains how probabilistic forecasts are calibrated and interpreted, the pages collected here are organized around a different practical question:

How should a user check whether the workflow itself is behaving sensibly before trusting later forecast and physics results?

The emphasis is therefore on diagnostic credibility. These examples show how GeoPrior turns preprocessing logic, training histories, tuning artifacts, and physics summaries into readable workflow diagnostics such as:

group-validity masks,
holdout split designs,
training-curve summaries,
hyperparameter tuning summaries,
bridge diagnostics from training to physics inspection.

In other words, this gallery is about checking the workflow: not training the final model itself, and not yet producing the final forecast or physics figures.

What this gallery teaches#

Most pages in this section follow the same broad pattern:

create a compact synthetic workflow artifact,
call the real GeoPrior helper or mimic its output contract,
inspect the returned tables, splits, curves, or summaries,
explain how to read and interpret the diagnostic result.

Even when a page prints small tables or summaries, the main goal remains the same: to explain the diagnostic artifact and its interpretation.

Module guide#

Module	Main output	Purpose
`holdout_group_masks.py`	Group-validity masks	Compute which spatial groups are valid for training and which remain usable only for forecasting, then filter the raw table accordingly.
`spatial_block_holdout.py`	Holdout split design	Compare random and spatial-block train/validation/test group splits and explain why spatial-block holdout is often more credible for geospatial forecasting.
`plot_stage1_data_checks.py`	Stage-1 data-validity and split diagnostics	Check which spatial groups are valid for training or only for forecasting, filter the raw table accordingly, and compare random versus spatial-block holdout credibility.
`plot_stage2_training_curves.py`	Training-history diagnostics	Read Stage-2 loss curves, physics-loss components, validation behavior, and warmup / scaling controls.
`plot_stage3_tuning_summary.py`	Tuning-summary diagnostics	Inspect Stage-3 hyperparameter search results, top trials, search progression, and parameter-versus-score structure.
`physics_diagnostic_bridge.py`	Physics bridge diagnostics	Connect training diagnostics to later physics inspection using timescale consistency, field distributions, residual summaries, and payload-based diagnostics.

Reading path#

A useful way to move through this gallery is to follow the logic of a complete workflow check:

begin by checking whether the data groups are valid and whether the holdout split is credible,
inspect Stage-2 training behavior,
inspect Stage-3 tuning behavior,
finish with the bridge from optimization diagnostics to physics diagnostics.

That is why the examples are grouped by workflow-diagnostic purpose rather than only by utility or plotting function.

Gallery organization#

Stage-1 validity checks#

These examples are the best place to start when you want to know whether the data are even usable for the intended workflow.

They focus on questions such as:

which groups contain all required training years,
which groups contain enough years only for forecasting,
how much of the dataset survives early filtering.