.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/inspection/plot_eval_diagnostics_overview.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_inspection_plot_eval_diagnostics_overview.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_inspection_plot_eval_diagnostics_overview.py:


Inspect compact evaluation diagnostics before trusting forecast quality
===========================================================================

This lesson explains how to inspect the compact
``eval_diagnostics.json`` artifact produced by the evaluation workflow.

This file is intentionally smaller than the richer interpretable
Stage-2 evaluation payload. That is exactly why it is useful: it gives
users a fast way to review forecast quality at three complementary
levels:

- the overall block ``__overall__``,
- the per-year evaluation blocks,
- the per-horizon point-metric maps.

The goal of this page is therefore not only to call helper functions.
It is to teach how to read the artifact step by step, understand what
its small set of metrics is saying, and decide whether the forecast
looks strong enough for reporting, calibration, or further diagnosis.

.. GENERATED FROM PYTHON SOURCE LINES 27-66

.. code-block:: Python


    from __future__ import annotations

    import json
    import tempfile
    from pathlib import Path
    from pprint import pprint

    import matplotlib.pyplot as plt
    import pandas as pd

    from geoprior.utils.inspect import (
        eval_overall_frame,
        eval_per_horizon_frame,
        eval_years_frame,
        generate_eval_diagnostics,
        inspect_eval_diagnostics,
        load_eval_diagnostics,
        plot_eval_boolean_summary,
        plot_eval_overall_metrics,
        plot_eval_per_horizon_metrics,
        plot_eval_year_metric_trend,
        summarize_eval_diagnostics,
    )

    pd.set_option("display.max_columns", 24)
    pd.set_option("display.width", 108)
    pd.set_option("display.float_format", lambda v: f"{v:0.6f}")

    EVAL_DIAG_PALETTE = {
        "overall": ["#1D3557", "#457B9D", "#2A9D8F", "#E9C46A", "#F4A261", "#E76F51"],
        "mae_trend": "#5A189A",
        "pss_trend": "#C1121F",
        "rmse_h": ["#277DA1", "#577590", "#F9844A"],
        "r2_h": ["#43AA8B", "#90BE6D", "#4D908E"],
        "checks": "#6D597A",
    }


.. GENERATED FROM PYTHON SOURCE LINES 67-88

Why this artifact matters
-------------------------

The compact evaluation-diagnostics artifact is one of the easiest
files to inspect after running forecast evaluation.

It is smaller than the interpretable physics-evaluation JSON, but it
still preserves the information most users need for a first-pass
decision:

1. how forecast quality behaves year by year,
2. how the overall aggregated forecast behaves,
3. whether later horizons degrade sharply,
4. whether interval diagnostics stay believable,
5. whether the prediction-stability score (PSS) stays acceptable.

In other words, this artifact is a quick decision file. It helps the
user answer a practical question before spending more time downstream:

*Does this forecast already look good enough to trust, or does it need
calibration, model comparison, or deeper physics inspection first?*

.. GENERATED FROM PYTHON SOURCE LINES 91-109

Create a realistic demo diagnostics artifact
--------------------------------------------

For documentation, we usually want a stable evaluation artifact that
behaves like a real saved file but does not require re-running a full
model evaluation.

In this example we intentionally encode a common forecasting pattern:

- early horizons perform well,
- later horizons degrade,
- coverage moves from slightly conservative toward slightly under the
  nominal 80% level,
- and PSS becomes worse as the forecast reaches further into the
  future.

That makes the lesson easier to interpret because the metrics tell a
coherent story instead of looking randomly generated.

.. GENERATED FROM PYTHON SOURCE LINES 109-127

.. code-block:: Python


    out_dir = Path(tempfile.mkdtemp(prefix="gp_eval_diag_"))
    diag_path = out_dir / "nansha_eval_diagnostics.json"

    generate_eval_diagnostics(
        output_path=diag_path,
        years=[2023, 2024, 2025],
        per_horizon_mae=[3.84, 8.96, 17.42],
        per_horizon_mse=[26.10, 166.40, 742.25],
        per_horizon_rmse=[5.109795, 12.899612, 27.244266],
        per_horizon_r2=[0.912, 0.887, 0.841],
        coverage80=[0.888, 0.821, 0.768],
        sharpness80=[21.84, 28.46, 47.30],
        pss=[34.20, 52.75, 78.10],
    )

    print("Written evaluation-diagnostics file")
    print(f" - {diag_path}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Written evaluation-diagnostics file
     - /tmp/gp_eval_diag_82pbg5fb/nansha_eval_diagnostics.json


.. GENERATED FROM PYTHON SOURCE LINES 128-134

Load the artifact with the real reader
--------------------------------------

Even in a gallery lesson, it is useful to follow the same loading
path a real user would take with a saved ``*_eval_diagnostics``
JSON under ``results/...``.

.. GENERATED FROM PYTHON SOURCE LINES 134-148

.. code-block:: Python


    diag_record = load_eval_diagnostics(diag_path)

    print("\nArtifact header")
    pprint(
        {
            "kind": diag_record.kind,
            "stage": diag_record.stage,
            "city": diag_record.city,
            "model": diag_record.model,
            "path": str(diag_record.path),
        }
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Artifact header
    {'city': None,
     'kind': 'eval_diagnostics',
     'model': None,
     'path': '/tmp/gp_eval_diag_82pbg5fb/nansha_eval_diagnostics.json',
     'stage': None}


.. GENERATED FROM PYTHON SOURCE LINES 149-163

Start with the compact semantic summary
---------------------------------------

A good inspection habit is to begin with the semantic summary.
It answers the first structural questions before you look at any
plots:

- does the artifact contain the required ``__overall__`` block?
- are year blocks present?
- are the per-horizon maps complete?
- does each year expose interval and stability diagnostics?

When those checks fail, it is usually not worth over-interpreting
the numbers yet. Structural completeness comes first.

.. GENERATED FROM PYTHON SOURCE LINES 163-169

.. code-block:: Python


    summary = summarize_eval_diagnostics(diag_record)

    print("\nCompact summary")
    print(json.dumps(summary, indent=2))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Compact summary
    {
      "brief": {
        "kind": "eval_diagnostics",
        "n_year_blocks": 3,
        "year_keys": [
          "2023.0",
          "2024.0",
          "2025.0"
        ],
        "n_horizons": 3
      },
      "overall": {
        "overall_mae": 10.073333333333334,
        "overall_rmse": 15.084557666666667,
        "overall_r2": 0.8799999999999999,
        "coverage80": 0.8256666666666668,
        "sharpness80": 32.53333333333333,
        "pss": 55.01666666666667
      },
      "checks": {
        "has_overall_block": true,
        "has_year_blocks": true,
        "overall_has_core_metrics": true,
        "overall_has_per_horizon_mae": true,
        "overall_has_per_horizon_rmse": true,
        "overall_has_per_horizon_r2": true,
        "all_years_have_pss": true,
        "all_years_have_coverage80": true,
        "all_years_have_sharpness80": true,
        "horizon_count_matches_year_count": true
      }
    }


.. GENERATED FROM PYTHON SOURCE LINES 170-185

Read the three core table views
-------------------------------

The compact diagnostics file becomes much easier to read when we
separate it into three tidy views:

1. one row per evaluated year,
2. one-row overall aggregate,
3. one row per horizon from the aggregated block.

These three levels answer different questions:

- the year table shows *temporal drift*,
- the overall table shows the *headline quality*,
- the per-horizon table shows *forecast-range degradation*.

.. GENERATED FROM PYTHON SOURCE LINES 185-199

.. code-block:: Python


    years_frame = eval_years_frame(diag_record)
    overall_frame = eval_overall_frame(diag_record)
    per_h_frame = eval_per_horizon_frame(diag_record)

    print("\nPer-year diagnostics")
    print(years_frame)

    print("\nOverall diagnostics")
    print(overall_frame)

    print("\nPer-horizon diagnostics")
    print(per_h_frame)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Per-year diagnostics
      year_key        year  overall_mae  overall_mse  overall_rmse  overall_r2  coverage80  sharpness80  \
    0   2023.0 2023.000000     3.840000    26.100000      5.109795    0.912000    0.888000    21.840000   
    1   2024.0 2024.000000     8.960000   166.400000     12.899612    0.887000    0.821000    28.460000   
    2   2025.0 2025.000000    17.420000   742.250000     27.244266    0.841000    0.768000    47.300000   

            pss  
    0 34.200000  
    1 52.750000  
    2 78.100000  

    Overall diagnostics
       overall_mae  overall_mse  overall_rmse  overall_r2  coverage80  sharpness80       pss  n_horizons  \
    0    10.073333   311.583333     15.084558    0.880000    0.825667    32.533333 55.016667           3   

       n_year_blocks  
    0              3  

    Per-horizon diagnostics
       horizon       mae        mse      rmse       r2
    0        1  3.840000  26.100000  5.109795 0.912000
    1        2  8.960000 166.400000 12.899612 0.887000
    2        3 17.420000 742.250000 27.244266 0.841000


.. GENERATED FROM PYTHON SOURCE LINES 200-222

How to interpret the per-year view
----------------------------------

The year table is useful when evaluation is reported separately for
each forecast year. A robust reading order is:

1. ``overall_mae`` or ``overall_rmse`` for absolute error,
2. ``overall_r2`` for explained variance,
3. ``coverage80`` and ``sharpness80`` for interval usefulness,
4. ``pss`` for temporal stability.

In this demo, the later years look harder:

- MAE and RMSE rise,
- R2 declines,
- coverage drifts downward,
- sharpness widens,
- and PSS increases.

That combination is a common forecasting warning sign: the model is
still producing forecasts, but confidence quality and stability are
deteriorating with forecast range.

.. GENERATED FROM PYTHON SOURCE LINES 222-238

.. code-block:: Python


    year_view = years_frame.copy()
    if not year_view.empty:
        first_mae = float(year_view["overall_mae"].iloc[0])
        first_pss = float(year_view["pss"].iloc[0])
        year_view["mae_vs_first"] = (
            year_view["overall_mae"] - first_mae
        )
        year_view["pss_vs_first"] = year_view["pss"] - first_pss
        year_view["coverage80_error"] = (
            year_view["coverage80"] - 0.80
        ).abs()

    print("\nPer-year interpretation table")
    print(year_view)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Per-year interpretation table
      year_key        year  overall_mae  overall_mse  overall_rmse  overall_r2  coverage80  sharpness80  \
    0   2023.0 2023.000000     3.840000    26.100000      5.109795    0.912000    0.888000    21.840000   
    1   2024.0 2024.000000     8.960000   166.400000     12.899612    0.887000    0.821000    28.460000   
    2   2025.0 2025.000000    17.420000   742.250000     27.244266    0.841000    0.768000    47.300000   

            pss  mae_vs_first  pss_vs_first  coverage80_error  
    0 34.200000      0.000000      0.000000          0.088000  
    1 52.750000      5.120000     18.550000          0.021000  
    2 78.100000     13.580000     43.900000          0.032000  


.. GENERATED FROM PYTHON SOURCE LINES 239-257

How to interpret the overall block
----------------------------------

The overall block is the headline view many users look at first.
That is useful, but it can also be misleading if read alone.

A good interpretation pattern is:

- lower ``overall_mae`` and ``overall_rmse`` are better,
- higher ``overall_r2`` is better,
- ``coverage80`` should usually sit reasonably close to 0.80,
- lower ``sharpness80`` means narrower intervals,
- lower ``pss`` means more temporally stable predictions.

Notice the tension here: a model can have decent RMSE yet still
show suspicious interval behavior or poor temporal stability.
That is exactly why this compact artifact includes all of these
metrics together.

.. GENERATED FROM PYTHON SOURCE LINES 257-275

.. code-block:: Python


    overall_view = overall_frame.copy()
    if not overall_view.empty:
        overall_view["coverage80_error"] = (
            overall_view["coverage80"] - 0.80
        ).abs()
        overall_view["interval_comment"] = [
            (
                "close to nominal 80%"
                if abs(float(overall_view.loc[0, "coverage80"]) - 0.80)
                <= 0.03
                else "coverage drift worth reviewing"
            )
        ]

    print("\nOverall interpretation table")
    print(overall_view)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Overall interpretation table
       overall_mae  overall_mse  overall_rmse  overall_r2  coverage80  sharpness80       pss  n_horizons  \
    0    10.073333   311.583333     15.084558    0.880000    0.825667    32.533333 55.016667           3   

       n_year_blocks  coverage80_error      interval_comment  
    0              3          0.025667  close to nominal 80%  


.. GENERATED FROM PYTHON SOURCE LINES 276-292

How to interpret the per-horizon block
--------------------------------------

The per-horizon table is often the most actionable part of the
whole artifact because it shows whether the forecast degrades in a
smooth and believable way.

A common healthy pattern is:

- MAE/RMSE gradually increase,
- R2 gradually decreases,
- but no horizon collapses abruptly.

A common warning pattern is a very sharp jump at the last horizon.
That often suggests the model is extrapolating beyond what the
training information can support well.

.. GENERATED FROM PYTHON SOURCE LINES 292-304

.. code-block:: Python


    per_h_view = per_h_frame.copy()
    if not per_h_view.empty:
        per_h_view["rmse_step_change"] = per_h_view["rmse"].diff()
        per_h_view["r2_step_change"] = per_h_view["r2"].diff()
        per_h_view["rmse_growth_ratio"] = (
            per_h_view["rmse"] / float(per_h_view["rmse"].iloc[0])
        )

    print("\nPer-horizon interpretation table")
    print(per_h_view)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Per-horizon interpretation table
       horizon       mae        mse      rmse       r2  rmse_step_change  r2_step_change  rmse_growth_ratio
    0        1  3.840000  26.100000  5.109795 0.912000               NaN             NaN           1.000000
    1        2  8.960000 166.400000 12.899612 0.887000          7.789817       -0.025000           2.524487
    2        3 17.420000 742.250000 27.244266 0.841000         14.344654       -0.046000           5.331773


.. GENERATED FROM PYTHON SOURCE LINES 305-312

Use the all-in-one inspector when you want the main bundle together
-------------------------------------------------------------------

``inspect_eval_diagnostics(...)`` is convenient when you want the
semantic summary, tidy frames, and saved figure paths in one call.
That makes it useful for gallery generation, report pipelines, or
future CLI helpers that inspect folders of evaluation artifacts.

.. GENERATED FROM PYTHON SOURCE LINES 312-318

.. code-block:: Python


    bundle = inspect_eval_diagnostics(diag_record)

    print("\nInspector bundle keys")
    print(sorted(bundle))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Inspector bundle keys
    ['figure_paths', 'frames', 'summary']


.. GENERATED FROM PYTHON SOURCE LINES 319-331

Plot the main compact views
---------------------------

A compact first-pass review usually benefits from four plots:

1. the overall aggregated metrics,
2. the year trend for absolute error,
3. the year trend for prediction stability,
4. the per-horizon RMSE curve.

These four views work well together because they combine one
headline panel with three structural trend panels.

.. GENERATED FROM PYTHON SOURCE LINES 331-379

.. code-block:: Python


    fig, axes = plt.subplots(
        2,
        2,
        figsize=(12.4, 8.8),
        constrained_layout=True,
    )

    plot_eval_overall_metrics(
        diag_record,
        ax=axes[0, 0],
        title="Overall compact evaluation metrics",
        color=EVAL_DIAG_PALETTE["overall"],
        edgecolor="#1F2937",
        linewidth=0.9,
        alpha=0.92,
    )
    plot_eval_year_metric_trend(
        diag_record,
        metric="overall_mae",
        ax=axes[0, 1],
        title="Year trend: overall MAE",
        color=EVAL_DIAG_PALETTE["mae_trend"],
        marker="o",
        linewidth=2.4,
        markersize=7,
    )
    plot_eval_year_metric_trend(
        diag_record,
        metric="pss",
        ax=axes[1, 0],
        title="Year trend: PSS",
        color=EVAL_DIAG_PALETTE["pss_trend"],
        marker="D",
        linewidth=2.4,
        markersize=6.5,
    )
    plot_eval_per_horizon_metrics(
        diag_record,
        metric="rmse",
        ax=axes[1, 1],
        title="Per-horizon RMSE",
        color=EVAL_DIAG_PALETTE["rmse_h"],
        edgecolor="#1F2937",
        linewidth=0.9,
        alpha=0.92,
    )


.. image-sg:: /auto_examples/inspection/images/sphx_glr_plot_eval_diagnostics_overview_001.png
   :alt: Overall compact evaluation metrics, Year trend: overall MAE, Year trend: PSS, Per-horizon RMSE
   :srcset: /auto_examples/inspection/images/sphx_glr_plot_eval_diagnostics_overview_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'Per-horizon RMSE'}, xlabel='horizon', ylabel='rmse'>


.. GENERATED FROM PYTHON SOURCE LINES 380-396

How to read these plots
-----------------------

These plots are meant to be read together, not in isolation.

- The overall bar plot tells us the headline quality at a glance.
- The MAE year trend reveals whether forecast error is worsening as
  the target year moves away from the observed window.
- The PSS trend reveals whether the prediction path is becoming less
  stable over time.
- The per-horizon RMSE bar plot shows whether deterioration happens
  gradually or whether one horizon becomes disproportionately hard.

In this demo, the answer is fairly coherent: later years and later
horizons are clearly harder, and the degradation is not just an
interval issue. The point forecast itself is also weakening.

.. GENERATED FROM PYTHON SOURCE LINES 398-404

Inspect another horizon-specific view
-------------------------------------

RMSE is not the whole story. The same horizon may have rising error
but still preserve some explained variance. Looking at R2 gives a
second perspective on horizon degradation.

.. GENERATED FROM PYTHON SOURCE LINES 404-420

.. code-block:: Python


    fig, ax = plt.subplots(
        figsize=(8.0, 4.6),
        constrained_layout=True,
    )
    plot_eval_per_horizon_metrics(
        diag_record,
        metric="r2",
        ax=ax,
        title="Per-horizon R²",
        color=EVAL_DIAG_PALETTE["r2_h"],
        edgecolor="#1F2937",
        linewidth=0.9,
        alpha=0.92,
    )


.. image-sg:: /auto_examples/inspection/images/sphx_glr_plot_eval_diagnostics_overview_002.png
   :alt: Per-horizon R²
   :srcset: /auto_examples/inspection/images/sphx_glr_plot_eval_diagnostics_overview_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'Per-horizon R²'}, xlabel='horizon', ylabel='r2'>


.. GENERATED FROM PYTHON SOURCE LINES 421-428

Plot the structural checks separately
-------------------------------------

The boolean summary converts the most important structural checks
into a fast pass/fail view. This is especially useful when many
evaluation files are being compared and the user wants to avoid
manually checking whether every block is present.

.. GENERATED FROM PYTHON SOURCE LINES 428-443

.. code-block:: Python


    fig, ax = plt.subplots(
        figsize=(8.2, 4.6),
        constrained_layout=True,
    )
    plot_eval_boolean_summary(
        diag_record,
        ax=ax,
        title="Evaluation diagnostics decision checks",
        color=EVAL_DIAG_PALETTE["checks"],
        edgecolor="#1F2937",
        linewidth=0.8,
        alpha=0.9,
    )


.. image-sg:: /auto_examples/inspection/images/sphx_glr_plot_eval_diagnostics_overview_003.png
   :alt: Evaluation diagnostics decision checks
   :srcset: /auto_examples/inspection/images/sphx_glr_plot_eval_diagnostics_overview_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'Evaluation diagnostics decision checks'}>


.. GENERATED FROM PYTHON SOURCE LINES 444-450

Save a full inspection bundle
-----------------------------

When you want more than live figures, the bundle helper can also
save the core inspection plots to disk. This is useful for gallery
generation and later report automation.

.. GENERATED FROM PYTHON SOURCE LINES 450-463

.. code-block:: Python


    bundle_dir = out_dir / "inspection_bundle"
    saved = inspect_eval_diagnostics(
        diag_record,
        output_dir=bundle_dir,
        stem="lesson_eval_diag",
        save_figures=True,
    )

    print("\nSaved inspection figures")
    for name, path in saved["figure_paths"].items():
        print(f" - {name}: {path}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Saved inspection figures
     - lesson_eval_diag_overall_metrics.png: /tmp/gp_eval_diag_82pbg5fb/inspection_bundle/lesson_eval_diag_overall_metrics.png
     - lesson_eval_diag_year_overall_mae.png: /tmp/gp_eval_diag_82pbg5fb/inspection_bundle/lesson_eval_diag_year_overall_mae.png
     - lesson_eval_diag_per_horizon_rmse.png: /tmp/gp_eval_diag_82pbg5fb/inspection_bundle/lesson_eval_diag_per_horizon_rmse.png
     - lesson_eval_diag_checks.png: /tmp/gp_eval_diag_82pbg5fb/inspection_bundle/lesson_eval_diag_checks.png


.. GENERATED FROM PYTHON SOURCE LINES 464-484

A practical reading rule
------------------------

For this artifact family, a compact decision rule can be:

- the artifact contains the overall block,
- year blocks are present,
- per-horizon maps are complete,
- overall coverage80 is not too far from the nominal 0.80 level,
- later horizons degrade, but not in a catastrophic jump,
- and PSS stays within a range you consider operationally usable.

The exact thresholds depend on your project. But even without fixed
thresholds, this artifact already helps you separate three common
cases:

1. forecast looks healthy,
2. forecast is usable but should be calibrated or compared,
3. forecast quality is weak enough that retraining or redesign is
   probably needed.

.. GENERATED FROM PYTHON SOURCE LINES 484-519

.. code-block:: Python


    checks = summary["checks"]
    coverage_ok = abs(float(summary["overall"]["coverage80"] or 0.0) - 0.80) <= 0.05
    horizon_jump_ok = True
    if not per_h_view.empty and per_h_view["rmse_step_change"].notna().any():
        max_jump = float(per_h_view["rmse_step_change"].dropna().max())
        horizon_jump_ok = max_jump < 20.0

    pss_ok = float(summary["overall"]["pss"] or 0.0) < 80.0

    ready = all(
        bool(checks.get(name, False))
        for name in [
            "has_overall_block",
            "has_year_blocks",
            "overall_has_core_metrics",
            "overall_has_per_horizon_mae",
            "overall_has_per_horizon_rmse",
            "overall_has_per_horizon_r2",
        ]
    )
    ready = ready and coverage_ok and horizon_jump_ok and pss_ok

    print("\nDecision note")
    if ready:
        print(
            "This demo evaluation artifact looks coherent enough for "
            "reporting or for use as the input to a deeper comparison "
            "and calibration review."
        )
    else:
        print(
            "This evaluation artifact suggests that the forecast should "
            "be reviewed more carefully before being trusted downstream."
        )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Decision note
    This demo evaluation artifact looks coherent enough for reporting or for use as the input to a deeper comparison and calibration review.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.186 seconds)


.. _sphx_glr_download_auto_examples_inspection_plot_eval_diagnostics_overview.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_eval_diagnostics_overview.ipynb <plot_eval_diagnostics_overview.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_eval_diagnostics_overview.py <plot_eval_diagnostics_overview.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_eval_diagnostics_overview.zip <plot_eval_diagnostics_overview.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_