.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/inspection/plot_training_summary_overview.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_inspection_plot_training_summary_overview.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_inspection_plot_training_summary_overview.py:


Inspect a training summary before trusting a Stage-2 run
==========================================================

This lesson explains how to inspect the Stage-2
``training_summary.json`` artifact.

A training summary is one of the most decision-oriented
artifacts in the GeoPrior workflow. It is not just a metric
snapshot. It is the compact place where a user can verify:

- which run was trained,
- which epoch was selected as best,
- whether training and validation tell a similar story,
- whether physics losses stayed small or started to dominate,
- whether interval quality looks plausible,
- and whether the run exported the files needed downstream.

The goal of this page is therefore not only to call helper
functions. It is to teach how to read the artifact step by
step and decide whether the run looks healthy enough for
later evaluation, calibration, inference, or export.

.. GENERATED FROM PYTHON SOURCE LINES 29-121

.. code-block:: Python


    from __future__ import annotations

    import json
    import tempfile
    from pathlib import Path
    from pprint import pprint

    import matplotlib.pyplot as plt
    import pandas as pd

    from geoprior.utils.inspect import (
        generate_training_summary,
        inspect_training_summary,
        load_training_summary,
        plot_training_best_metrics,
        plot_training_boolean_summary,
        plot_training_final_metrics,
        plot_training_loss_family,
        plot_training_metric_deltas,
        summarize_training_summary,
        training_compile_frame,
        training_env_frame,
        training_hp_frame,
        training_metrics_frame,
        training_paths_frame,
    )

    pd.set_option("display.max_columns", 30)
    pd.set_option("display.width", 108)
    pd.set_option("display.max_colwidth", 80)

    TRAIN_COLORS = {
        "best": "#2563eb",
        "final": "#0f766e",
        "loss": "#7c3aed",
        "accent": "#d97706",
        "pass": "#16a34a",
        "fail": "#dc2626",
        "ink": "#0f172a",
        "muted": "#64748b",
        "grid": "#cbd5e1",
        "face": "#f8fafc",
    }


    def _style_axes(ax: plt.Axes, *, facecolor: str = TRAIN_COLORS["face"]) -> None:
        ax.set_facecolor(facecolor)
        ax.set_axisbelow(True)
        ax.grid(True, color=TRAIN_COLORS["grid"], alpha=0.45, linewidth=0.8)
        for side in ("top", "right"):
            ax.spines[side].set_visible(False)
        for side in ("left", "bottom"):
            ax.spines[side].set_color("#94a3b8")
            ax.spines[side].set_linewidth(0.9)
        ax.tick_params(colors=TRAIN_COLORS["ink"], labelsize=9)
        if ax.get_title():
            ax.set_title(ax.get_title(), fontsize=11, fontweight="bold", color=TRAIN_COLORS["ink"], pad=12)
        if ax.get_xlabel():
            ax.set_xlabel(ax.get_xlabel(), fontsize=9.5, color=TRAIN_COLORS["muted"])
        if ax.get_ylabel():
            ax.set_ylabel(ax.get_ylabel(), fontsize=9.5, color=TRAIN_COLORS["muted"])


    def _style_bars(ax: plt.Axes, colors: list[str], *, edgecolor: str = TRAIN_COLORS["ink"], linewidth: float = 1.1, alpha: float = 0.95) -> None:
        for idx, patch in enumerate(ax.patches):
            patch.set_facecolor(colors[idx % len(colors)])
            patch.set_edgecolor(edgecolor)
            patch.set_linewidth(linewidth)
            patch.set_alpha(alpha)


    def _style_delta_bars(ax: plt.Axes) -> None:
        for patch in ax.patches:
            value = patch.get_width() if patch.get_width() else patch.get_height()
            color = TRAIN_COLORS["final"] if value <= 0 else TRAIN_COLORS["accent"]
            patch.set_facecolor(color)
            patch.set_edgecolor(TRAIN_COLORS["ink"])
            patch.set_linewidth(1.1)
            patch.set_alpha(0.95)


    def _style_boolean_bars(ax: plt.Axes) -> None:
        for patch in ax.patches:
            value = patch.get_width() if patch.get_width() else patch.get_height()
            color = TRAIN_COLORS["pass"] if value >= 0.5 else TRAIN_COLORS["fail"]
            patch.set_facecolor(color)
            patch.set_edgecolor(TRAIN_COLORS["ink"])
            patch.set_linewidth(1.1)
            patch.set_alpha(0.95)


.. GENERATED FROM PYTHON SOURCE LINES 122-140

Why this artifact matters
-------------------------

The training summary is one of the best files to inspect
immediately after Stage-2 training. It is smaller than a full
history log, but still rich enough to answer practical review
questions:

1. Which epoch was chosen as the best checkpoint?
2. Did validation improve in the same direction as training?
3. Did the run keep physics terms under control?
4. Did interval diagnostics such as coverage and sharpness look
   plausible at the selected checkpoint?
5. Were the expected model files and logs exported?

In other words, this artifact helps the user decide whether a
run is ready for deeper evaluation or whether it should be
revisited before spending more time downstream.

.. GENERATED FROM PYTHON SOURCE LINES 143-157

Create a realistic demo training summary
----------------------------------------

For documentation pages, we want a stable artifact that behaves
like a real training summary without rerunning a training job.
The generation helper was designed exactly for that purpose.

Here we create a realistic summary with:

- a clear best epoch,
- train and validation metrics,
- compile settings,
- initialization hyperparameters,
- and a realistic output-path bundle.

.. GENERATED FROM PYTHON SOURCE LINES 157-183

.. code-block:: Python


    out_dir = Path(tempfile.mkdtemp(prefix="gp_training_summary_"))
    summary_path = out_dir / "nansha_training_summary.json"

    generate_training_summary(
        output_path=summary_path,
        city="nansha",
        model="GeoPriorSubsNet",
        horizon=3,
        best_epoch=17,
        timestamp="2026-03-29 09:18:44",
        optimizer="Adam",
        learning_rate=8e-4,
        time_steps=5,
        pde_mode="on",
        offset_mode="mul",
        coords_normalized=True,
        coord_ranges={"t": 7.0, "x": 44447.0, "y": 39275.0},
        run_dir=(
            "results/nansha_GeoPriorSubsNet_stage1/"
            "train_20260329-091844"
        ),
    )

    print("Written training-summary file")
    print(f" - {summary_path}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Written training-summary file
     - /tmp/gp_training_summary_ce1z2xtr/nansha_training_summary.json


.. GENERATED FROM PYTHON SOURCE LINES 184-190

Load the artifact with the real reader
--------------------------------------

Even in a lesson, it is worth using the same entry point a
real workflow would use. That keeps the example close to a
user's actual inspection path.

.. GENERATED FROM PYTHON SOURCE LINES 190-204

.. code-block:: Python


    summary_record = load_training_summary(summary_path)

    print("\nArtifact header")
    pprint(
        {
            "kind": summary_record.kind,
            "stage": summary_record.stage,
            "city": summary_record.city,
            "model": summary_record.model,
            "path": str(summary_record.path),
        }
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Artifact header
    {'city': 'nansha',
     'kind': 'training_summary',
     'model': 'GeoPriorSubsNet',
     'path': '/tmp/gp_training_summary_ce1z2xtr/nansha_training_summary.json',
     'stage': None}


.. GENERATED FROM PYTHON SOURCE LINES 205-220

Start with the compact semantic summary
---------------------------------------

Before reading every nested section, start from the compact
semantic summary.

This is the first decision checkpoint because it condenses the
artifact into three practical layers:

- a short identity block,
- the core metrics that matter most for review,
- and boolean checks that answer whether the run looks complete.

When users inspect many runs, this summary is often the fastest
way to decide which run deserves deeper analysis.

.. GENERATED FROM PYTHON SOURCE LINES 220-226

.. code-block:: Python


    summary = summarize_training_summary(summary_record)

    print("\nCompact summary")
    print(json.dumps(summary, indent=2))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Compact summary
    {
      "brief": {
        "kind": "training_summary",
        "city": "nansha",
        "model": "GeoPriorSubsNet",
        "timestamp": "2026-03-29 09:18:44",
        "horizon": 3,
        "best_epoch": 17
      },
      "core_metrics": {
        "best_train_loss": 0.0568,
        "best_val_loss": 0.049416,
        "final_train_loss": 0.05445,
        "final_val_loss": 0.04951,
        "best_train_subs_mae_q50": 0.0101,
        "best_val_subs_mae_q50": 0.008787,
        "best_train_gwl_mae_q50": 0.24,
        "best_val_gwl_mae_q50": 0.20879999999999999,
        "best_val_coverage80": 0.8160000000000001,
        "best_val_sharpness80": 0.026709,
        "delta_final_minus_best_val_loss": 9.399999999999686e-05
      },
      "compile": {
        "optimizer": "Adam",
        "learning_rate": 0.0008,
        "lambda_offset": 1.0,
        "loss_weight_keys": [
          "subs_pred",
          "gwl_pred"
        ]
      },
      "checks": {
        "has_best_metrics": true,
        "has_final_metrics": true,
        "has_validation_metrics": true,
        "has_physics_metrics": true,
        "best_epoch_is_positive": true,
        "lambda_offset_stable": true,
        "quantiles_defined": true,
        "has_scaling_kwargs": true,
        "has_saved_model_paths": true,
        "has_optimizer": true
      }
    }


.. GENERATED FROM PYTHON SOURCE LINES 227-244

Read the metric tables carefully
--------------------------------

The training-summary artifact stores two metric sections:

- ``metrics_at_best``
- ``final_epoch_metrics``

This distinction matters a lot.

The *best* block tells us why a checkpoint was selected.
The *final* block tells us where the run ended.

If these two blocks are close, the training process probably
stayed stable after the best epoch.
If they diverge strongly, the run may have drifted, overfit,
or become numerically less healthy toward the end.

.. GENERATED FROM PYTHON SOURCE LINES 244-278

.. code-block:: Python


    best_all = training_metrics_frame(
        summary_record,
        section="metrics_at_best",
        split="all",
    )
    final_all = training_metrics_frame(
        summary_record,
        section="final_epoch_metrics",
        split="all",
    )
    best_val = training_metrics_frame(
        summary_record,
        section="metrics_at_best",
        split="val",
    )
    final_val = training_metrics_frame(
        summary_record,
        section="final_epoch_metrics",
        split="val",
    )

    print("\nBest metrics (first rows)")
    print(best_all.head(16))

    print("\nFinal metrics (first rows)")
    print(final_all.head(16))

    print("\nBest validation metrics")
    print(best_val.head(12))

    print("\nFinal validation metrics")
    print(final_val.head(12))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Best metrics (first rows)
                section  split               metric    value
    0   metrics_at_best  train          bounds_loss 0.000000
    1   metrics_at_best  train   consolidation_loss 0.000017
    2   metrics_at_best  train            data_loss 0.056800
    3   metrics_at_best  train         epsilon_cons 0.005800
    4   metrics_at_best  train     epsilon_cons_raw 0.000000
    5   metrics_at_best  train           epsilon_gw 0.000001
    6   metrics_at_best  train       epsilon_gw_raw 0.000000
    7   metrics_at_best  train        epsilon_prior 0.000770
    8   metrics_at_best  train         gw_flow_loss 0.000000
    9   metrics_at_best  train     gwl_pred_mae_q50 0.240000
    10  metrics_at_best  train     gwl_pred_mse_q50 0.077700
    11  metrics_at_best  train        lambda_offset 1.000000
    12  metrics_at_best  train                 loss 0.056800
    13  metrics_at_best  train        mv_prior_loss 0.000001
    14  metrics_at_best  train         physics_loss 0.000017
    15  metrics_at_best  train  physics_loss_scaled 0.000017

    Final metrics (first rows)
                    section  split               metric    value
    0   final_epoch_metrics  train          bounds_loss 0.000000
    1   final_epoch_metrics  train   consolidation_loss 0.000017
    2   final_epoch_metrics  train            data_loss 0.054410
    3   final_epoch_metrics  train         epsilon_cons 0.005800
    4   final_epoch_metrics  train     epsilon_cons_raw 0.000000
    5   final_epoch_metrics  train           epsilon_gw 0.000001
    6   final_epoch_metrics  train       epsilon_gw_raw 0.000000
    7   final_epoch_metrics  train        epsilon_prior 0.000589
    8   final_epoch_metrics  train         gw_flow_loss 0.000000
    9   final_epoch_metrics  train     gwl_pred_mae_q50 0.240000
    10  final_epoch_metrics  train     gwl_pred_mse_q50 0.077700
    11  final_epoch_metrics  train        lambda_offset 1.000000
    12  final_epoch_metrics  train                 loss 0.054450
    13  final_epoch_metrics  train        mv_prior_loss 0.000001
    14  final_epoch_metrics  train         physics_loss 0.000035
    15  final_epoch_metrics  train  physics_loss_scaled 0.000035

    Best validation metrics
                section split              metric    value
    0   metrics_at_best   val         bounds_loss 0.000000
    1   metrics_at_best   val  consolidation_loss 0.000015
    2   metrics_at_best   val           data_loss 0.049416
    3   metrics_at_best   val        epsilon_cons 0.005046
    4   metrics_at_best   val    epsilon_cons_raw 0.000000
    5   metrics_at_best   val          epsilon_gw 0.000000
    6   metrics_at_best   val      epsilon_gw_raw 0.000000
    7   metrics_at_best   val       epsilon_prior 0.000670
    8   metrics_at_best   val        gw_flow_loss 0.000000
    9   metrics_at_best   val    gwl_pred_mae_q50 0.208800
    10  metrics_at_best   val    gwl_pred_mse_q50 0.067599
    11  metrics_at_best   val       lambda_offset 1.000000

    Final validation metrics
                    section split              metric    value
    0   final_epoch_metrics   val         bounds_loss 0.000000
    1   final_epoch_metrics   val  consolidation_loss 0.000015
    2   final_epoch_metrics   val           data_loss 0.049480
    3   final_epoch_metrics   val        epsilon_cons 0.005220
    4   final_epoch_metrics   val    epsilon_cons_raw 0.000000
    5   final_epoch_metrics   val          epsilon_gw 0.000000
    6   final_epoch_metrics   val      epsilon_gw_raw 0.000000
    7   final_epoch_metrics   val       epsilon_prior 0.000693
    8   final_epoch_metrics   val        gw_flow_loss 0.000000
    9   final_epoch_metrics   val    gwl_pred_mae_q50 0.216000
    10  final_epoch_metrics   val    gwl_pred_mse_q50 0.069930
    11  final_epoch_metrics   val       lambda_offset 1.000000


.. GENERATED FROM PYTHON SOURCE LINES 279-299

What is important in these metric blocks?
-----------------------------------------

For a first pass, a user usually does not need to read every
scalar. Instead, a robust reading order is:

1. ``loss`` and ``data_loss``
2. ``physics_loss`` and ``physics_loss_scaled``
3. forecast quality such as ``subs_pred_mae_q50`` and
   ``gwl_pred_mae_q50``
4. interval diagnostics such as ``subs_pred_coverage80`` and
   ``subs_pred_sharpness80``
5. epsilon diagnostics such as ``epsilon_prior``,
   ``epsilon_cons``, and ``epsilon_gw``

This reading order helps answer three separate questions:

- Did the run fit the observed data?
- Did the physics penalty remain controlled?
- Did uncertainty intervals remain usable?

.. GENERATED FROM PYTHON SOURCE LINES 299-321

.. code-block:: Python


    selected_best_val = best_val.loc[
        best_val["metric"].isin(
            [
                "loss",
                "data_loss",
                "physics_loss",
                "physics_loss_scaled",
                "subs_pred_mae_q50",
                "gwl_pred_mae_q50",
                "subs_pred_coverage80",
                "subs_pred_sharpness80",
                "epsilon_prior",
                "epsilon_cons",
                "epsilon_gw",
            ]
        )
    ]

    print("\nSelected best validation metrics")
    print(selected_best_val)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Selected best validation metrics
                section split                 metric    value
    2   metrics_at_best   val              data_loss 0.049416
    3   metrics_at_best   val           epsilon_cons 0.005046
    5   metrics_at_best   val             epsilon_gw 0.000000
    7   metrics_at_best   val          epsilon_prior 0.000670
    9   metrics_at_best   val       gwl_pred_mae_q50 0.208800
    12  metrics_at_best   val                   loss 0.049416
    14  metrics_at_best   val           physics_loss 0.000015
    15  metrics_at_best   val    physics_loss_scaled 0.000015
    22  metrics_at_best   val   subs_pred_coverage80 0.816000
    23  metrics_at_best   val      subs_pred_mae_q50 0.008787
    25  metrics_at_best   val  subs_pred_sharpness80 0.026709


.. GENERATED FROM PYTHON SOURCE LINES 322-337

Inspect compile settings
------------------------

Metrics never tell the full story by themselves. We also need to
know *how* the run was compiled.

In particular, the compile block tells us:

- which optimizer and learning rate were used,
- how data outputs were weighted,
- and how the major physics loss terms were weighted.

This is important because two runs with similar losses can still
have very different physics/data tradeoffs if their compile
weights differ a lot.

.. GENERATED FROM PYTHON SOURCE LINES 337-343

.. code-block:: Python


    compile_frame = training_compile_frame(summary_record)

    print("\nCompile settings")
    print(compile_frame.head(24))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Compile settings
                                       key                                      value  is_numeric
    0                            optimizer                                       Adam       False
    1                        learning_rate                                   0.000800        True
    2               loss_weights.subs_pred                                   1.000000        True
    3                loss_weights.gwl_pred                                   0.800000        True
    4                    metrics.subs_pred  [MAEQ50, MSEQ50, Coverage80, Sharpness80]       False
    5                     metrics.gwl_pred                           [MAEQ50, MSEQ50]       False
    6     physics_loss_weights.lambda_cons                                   1.000000        True
    7       physics_loss_weights.lambda_gw                                   0.100000        True
    8    physics_loss_weights.lambda_prior                                   0.200000        True
    9   physics_loss_weights.lambda_smooth                                   0.010000        True
    10  physics_loss_weights.lambda_bounds                                   0.050000        True
    11      physics_loss_weights.lambda_mv                                   0.010000        True
    12     physics_loss_weights.mv_lr_mult                                   1.000000        True
    13  physics_loss_weights.lambda_offset                                   1.000000        True
    14  physics_loss_weights.kappa_lr_mult                                   5.000000        True
    15       physics_loss_weights.lambda_q                                   0.000500        True
    16                       lambda_offset                                   1.000000        True


.. GENERATED FROM PYTHON SOURCE LINES 344-354

Inspect initialization and hyperparameters
------------------------------------------

The ``hp_init`` block is the bridge back to model construction.
It tells us which quantiles, attention levels, PDE mode, and
scaling settings were active when the run was initialized.

This matters because a training summary should not be read in
isolation. If a run looks unusual, the hyperparameter and init
context often explains why.

.. GENERATED FROM PYTHON SOURCE LINES 354-360

.. code-block:: Python


    hp_frame = training_hp_frame(summary_record)

    print("\nHP / init settings (first rows)")
    print(hp_frame.head(30))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    HP / init settings (first rows)
                                                key                          value  is_numeric
    0                                     quantiles                [0.1, 0.5, 0.9]       False
    1                              subs_weights.0.1                       3.000000        True
    2                              subs_weights.0.5                       1.000000        True
    3                              subs_weights.0.9                       3.000000        True
    4                               gwl_weights.0.1                       1.500000        True
    5                               gwl_weights.0.5                       1.000000        True
    6                               gwl_weights.0.9                       1.500000        True
    7                              attention_levels  [cross, hierarchical, memory]       False
    8                                      pde_mode                             on       False
    9                                    time_steps                              5        True
    10                               use_batch_norm                          False       False
    11                                      use_vsn                           True       False
    12                                    vsn_units                             32        True
    13                                         mode                       tft_like       False
    14                  model_init_params.embed_dim                             32        True
    15               model_init_params.hidden_units                             64        True
    16                 model_init_params.lstm_units                             64        True
    17            model_init_params.attention_units                             64        True
    18                  model_init_params.num_heads                              2        True
    19               model_init_params.dropout_rate                       0.100000        True
    20                model_init_params.memory_size                             50        True
    21                     model_init_params.scales                         [1, 2]       False
    22              model_init_params.use_residuals                           True       False
    23             model_init_params.use_batch_norm                          False       False
    24                    model_init_params.use_vsn                           True       False
    25                  model_init_params.vsn_units                             32        True
    26                       model_init_params.mode                       tft_like       False
    27           model_init_params.attention_levels  [cross, hierarchical, memory]       False
    28        model_init_params.scale_pde_residuals                           True       False
    29  model_init_params.scaling_kwargs.time_units                           year       False


.. GENERATED FROM PYTHON SOURCE LINES 361-372

Inspect environment and paths
-----------------------------

Good workflow inspection is not only about metrics. The
environment block helps with reproducibility, while the paths
block tells us whether the run exported the files we need for the
next stages.

A training run can look excellent numerically but still be hard
to reuse if key files such as the best model, final model, or log
CSV are missing.

.. GENERATED FROM PYTHON SOURCE LINES 372-382

.. code-block:: Python


    env_frame = training_env_frame(summary_record)
    paths_frame = training_paths_frame(summary_record)

    print("\nEnvironment info")
    print(env_frame.head(16))

    print("\nSaved paths")
    print(paths_frame)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Environment info
                                 key            value  is_numeric
    0                         python          3.10.19       False
    1                     tensorflow           2.20.0       False
    2                          numpy            2.0.2       False
    3                       platform  Windows-10-demo       False
    4                  device.has_tf             True       False
    5   device.device_mode_requested             auto       False
    6   device.device_mode_effective              cpu       False
    7                device.num_cpus                1        True
    8                device.num_gpus                0        True
    9            device.visible_gpus               []       False
    10          device.intra_threads             None       False
    11          device.inter_threads             None       False
    12      device.gpu_memory_growth             None       False
    13    device.gpu_memory_limit_mb             None       False

    Saved paths
                       key                                                                            value
    0              run_dir                      results/nansha_GeoPriorSubsNet_stage1/train_20260329-091844
    1           weights_h5  results/nansha_GeoPriorSubsNet_stage1/train_20260329-091844/nansha_GeoPriorS...
    2            arch_json  results/nansha_GeoPriorSubsNet_stage1/train_20260329-091844/nansha_GeoPriorS...
    3              csv_log  results/nansha_GeoPriorSubsNet_stage1/train_20260329-091844/nansha_GeoPriorS...
    4           best_keras  results/nansha_GeoPriorSubsNet_stage1/train_20260329-091844/nansha_GeoPriorS...
    5         best_weights  results/nansha_GeoPriorSubsNet_stage1/train_20260329-091844/nansha_GeoPriorS...
    6  model_init_manifest  results/nansha_GeoPriorSubsNet_stage1/train_20260329-091844/model_init_manif...
    7          final_keras  results/nansha_GeoPriorSubsNet_stage1/train_20260329-091844/nansha_GeoPriorS...


.. GENERATED FROM PYTHON SOURCE LINES 383-396

Use the all-in-one inspector when you want the main views together
------------------------------------------------------------------

``inspect_training_summary(...)`` is useful when you want one
normalized bundle that contains:

- the semantic summary,
- the major tidy frames,
- and optionally saved figure paths.

This is especially convenient for reports, gallery generation,
and later CLI tools that may inspect many training summaries at
once.

.. GENERATED FROM PYTHON SOURCE LINES 396-405

.. code-block:: Python


    bundle = inspect_training_summary(summary_record)

    print("\nInspector bundle keys")
    print(sorted(bundle))

    print("\nBundle frame keys")
    print(sorted(bundle["frames"]))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Inspector bundle keys
    ['figure_paths', 'frames', 'summary']

    Bundle frame keys
    ['compile', 'env', 'final_epoch_metrics', 'hp_init', 'metrics_at_best', 'paths']


.. GENERATED FROM PYTHON SOURCE LINES 406-420

Plot the core review views
--------------------------

A strong first visual review usually needs four views:

1. best validation metrics,
2. final validation metrics,
3. metric deltas between final and best,
4. the validation loss family.

Together these answer a very practical question:

*Did the run stay healthy after the selected best epoch, or did
it begin to drift?*

.. GENERATED FROM PYTHON SOURCE LINES 420-465

.. code-block:: Python


    fig, axes = plt.subplots(
        2,
        2,
        figsize=(12.8, 8.8),
        constrained_layout=True,
    )

    plot_training_best_metrics(
        summary_record,
        split="val",
        ax=axes[0, 0],
        title="Best validation metrics",
    )
    _style_axes(axes[0, 0], facecolor="#f8fbff")
    _style_bars(axes[0, 0], [TRAIN_COLORS["best"]])

    plot_training_final_metrics(
        summary_record,
        split="val",
        ax=axes[0, 1],
        title="Final validation metrics",
    )
    _style_axes(axes[0, 1], facecolor="#f8fffc")
    _style_bars(axes[0, 1], [TRAIN_COLORS["final"]])

    plot_training_metric_deltas(
        summary_record,
        split="val",
        ax=axes[1, 0],
        title="Final - best validation deltas",
    )
    _style_axes(axes[1, 0], facecolor="#fffdf8")
    _style_delta_bars(axes[1, 0])

    plot_training_loss_family(
        summary_record,
        section="metrics_at_best",
        split="val",
        ax=axes[1, 1],
        title="Best validation loss family",
    )
    _style_axes(axes[1, 1], facecolor="#fcfbff")
    _style_bars(axes[1, 1], [TRAIN_COLORS["loss"]])


.. image-sg:: /auto_examples/inspection/images/sphx_glr_plot_training_summary_overview_001.png
   :alt: Best validation metrics, Final validation metrics, Final - best validation deltas, Best validation loss family
   :srcset: /auto_examples/inspection/images/sphx_glr_plot_training_summary_overview_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 466-488

How to interpret these plots
----------------------------

The first two panels should be read comparatively.

- If final validation metrics remain close to best validation
  metrics, the run probably stayed stable.
- If final values become much worse, the run may have drifted
  after the best checkpoint.

The delta plot makes that easier to see:

- values near zero usually mean stability,
- large positive deltas for losses or MAE often suggest
  degradation,
- and sharp changes in epsilon metrics may indicate the physics
  side became less consistent late in training.

The loss-family plot should also be read structurally. In many
healthy runs, the main loss is still dominated by data loss, with
physics-side terms remaining smaller and controlled. If one
auxiliary term suddenly dominates, that is often worth auditing.

.. GENERATED FROM PYTHON SOURCE LINES 490-499

Plot the structural checks separately
-------------------------------------

The boolean summary compresses the most important semantic checks
into a pass/fail style view.

This is helpful when triaging many runs because it immediately
answers whether the summary contains the minimum structural pieces
needed for a trustworthy review.

.. GENERATED FROM PYTHON SOURCE LINES 499-512

.. code-block:: Python


    fig, ax = plt.subplots(
        figsize=(8.2, 4.6),
        constrained_layout=True,
    )
    plot_training_boolean_summary(
        summary_record,
        ax=ax,
        title="Training summary decision checks",
    )
    _style_axes(ax, facecolor="#f8fafc")
    _style_boolean_bars(ax)


.. image-sg:: /auto_examples/inspection/images/sphx_glr_plot_training_summary_overview_002.png
   :alt: Training summary decision checks
   :srcset: /auto_examples/inspection/images/sphx_glr_plot_training_summary_overview_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 513-519

Save a full inspection bundle
-----------------------------

The all-in-one inspector can also write a compact figure bundle.
This pattern is useful for later reporting and for gallery pages
that want reproducible saved outputs in addition to inline plots.

.. GENERATED FROM PYTHON SOURCE LINES 519-532

.. code-block:: Python


    bundle_dir = out_dir / "inspection_bundle"
    bundle_with_figs = inspect_training_summary(
        summary_record,
        output_dir=bundle_dir,
        stem="lesson_training_summary",
        save_figures=True,
    )

    print("\nSaved inspection figures")
    for name, path in bundle_with_figs["figure_paths"].items():
        print(f" - {name}: {path}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Saved inspection figures
     - lesson_training_summary_best_val_metrics.png: /tmp/gp_training_summary_ce1z2xtr/inspection_bundle/lesson_training_summary_best_val_metrics.png
     - lesson_training_summary_final_val_metrics.png: /tmp/gp_training_summary_ce1z2xtr/inspection_bundle/lesson_training_summary_final_val_metrics.png
     - lesson_training_summary_delta_val_metrics.png: /tmp/gp_training_summary_ce1z2xtr/inspection_bundle/lesson_training_summary_delta_val_metrics.png
     - lesson_training_summary_best_val_losses.png: /tmp/gp_training_summary_ce1z2xtr/inspection_bundle/lesson_training_summary_best_val_losses.png
     - lesson_training_summary_checks.png: /tmp/gp_training_summary_ce1z2xtr/inspection_bundle/lesson_training_summary_checks.png


.. GENERATED FROM PYTHON SOURCE LINES 533-555

A practical reading rule
------------------------

A simple decision rule for this artifact family can be:

- both best and final metrics exist,
- validation metrics exist,
- physics metrics exist,
- a clear optimizer is recorded,
- scaling kwargs are preserved,
- and the main saved model paths are present.

If those checks pass, then the training summary is usually rich
enough to support downstream evaluation decisions.

After that, interpret the run qualitatively:

- small best-vs-final drifts usually suggest stable training,
- reasonable coverage/sharpness balance suggests usable interval
  behavior,
- and controlled epsilon metrics suggest the physics side did not
  become pathological.

.. GENERATED FROM PYTHON SOURCE LINES 555-579

.. code-block:: Python


    checks = summary["checks"]
    must_pass = [
        "has_best_metrics",
        "has_final_metrics",
        "has_validation_metrics",
        "has_physics_metrics",
        "has_saved_model_paths",
        "has_optimizer",
        "has_scaling_kwargs",
    ]
    ready = all(bool(checks.get(name, False)) for name in must_pass)

    print("\nDecision note")
    if ready:
        print(
            "This demo training summary looks structurally ready "
            "for deeper evaluation and downstream workflow review."
        )
    else:
        print(
            "This training summary needs attention before you rely "
            "on the run downstream."
        )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Decision note
    This demo training summary looks structurally ready for deeper evaluation and downstream workflow review.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.556 seconds)


.. _sphx_glr_download_auto_examples_inspection_plot_training_summary_overview.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_training_summary_overview.ipynb <plot_training_summary_overview.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_training_summary_overview.py <plot_training_summary_overview.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_training_summary_overview.zip <plot_training_summary_overview.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_