.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/evaluation/plot_crps_overview.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_evaluation_plot_crps_overview.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_evaluation_plot_crps_overview.py:


Read ensemble forecast quality with ``plot_crps``
=================================================

This lesson explains how to use
``geoprior.plot.evaluation.plot_crps`` when you want to judge the
quality of **ensemble forecasts** instead of only a single point
prediction.

Why this function matters
-------------------------
A probabilistic forecast is not only about getting the center right.
It must also represent uncertainty well. Two ensemble forecasts may
have a similar mean prediction while behaving very differently:

- one may be sharp but overconfident,
- another may be wide but realistic,
- and a third may simply be noisy without adding useful information.

The Continuous Ranked Probability Score (CRPS) helps summarize that
trade-off. Lower CRPS is better because it rewards ensembles that stay
close to the truth while also avoiding unnecessary spread.

This page is written as a **teaching guide**, not only as an API demo.
We will build a small ensemble-forecast example, inspect the required
array shapes, use the three main plotting modes, and end with a simple
checklist for adapting the helper to your own data.

.. GENERATED FROM PYTHON SOURCE LINES 34-51

.. code-block:: Python


    from __future__ import annotations

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd

    from geoprior.plot.evaluation import plot_crps

    pd.set_option("display.max_columns", 20)
    pd.set_option("display.width", 108)
    pd.set_option(
        "display.float_format",
        lambda v: f"{v:0.4f}",
    )


.. GENERATED FROM PYTHON SOURCE LINES 52-79

What this function expects
--------------------------

``plot_crps`` works with observed values and an **ensemble** of
predictions.

The accepted shapes are simple but important:

- ``y_true`` can be ``(N,)`` for one output or ``(N, O)`` for multiple
  outputs,
- ``y_pred_ensemble`` can be ``(N, M)`` for one output or ``(N, O, M)``
  for multiple outputs,
- where ``N`` is the number of samples,
- ``O`` is the number of outputs,
- and ``M`` is the number of ensemble members.

The helper offers three complementary views:

1. ``'summary_bar'`` for one overall score,
2. ``'scores_histogram'`` for the distribution of sample-wise CRPS,
3. ``'ensemble_ecdf'`` for a single-sample visual inspection.

A good habit is to treat these as three different questions:

- *How good is the ensemble overall?*
- *Is performance stable across samples?*
- *What does one concrete ensemble look like against the truth?*

.. GENERATED FROM PYTHON SOURCE LINES 82-96

Build a realistic single-output ensemble example
------------------------------------------------

For a gallery lesson, we want an ensemble that behaves like a real
forecast rather than a random cloud.

We therefore create:

- one observed target series,
- a "better" ensemble that is centered near the truth,
- and a noisier ensemble that spreads too widely.

This gives the lesson a meaningful comparison because CRPS should
prefer the first ensemble over the second.

.. GENERATED FROM PYTHON SOURCE LINES 96-133

.. code-block:: Python


    rng = np.random.default_rng(7)

    n_samples = 140
    n_members = 25

    x = np.linspace(0.0, 6.0 * np.pi, n_samples)
    y_true = 22.0 + 1.8 * np.sin(x / 2.2) + 0.5 * np.cos(x / 5.0)

    ensemble_good = (
        y_true[:, None]
        + rng.normal(loc=0.0, scale=0.65, size=(n_samples, n_members))
    )
    ensemble_wide = (
        y_true[:, None]
        + 0.30
        + rng.normal(loc=0.0, scale=1.25, size=(n_samples, n_members))
    )

    preview = pd.DataFrame(
        {
            "y_true": y_true[:8],
            "ens_mean_good": ensemble_good[:8].mean(axis=1),
            "ens_std_good": ensemble_good[:8].std(axis=1),
            "ens_mean_wide": ensemble_wide[:8].mean(axis=1),
            "ens_std_wide": ensemble_wide[:8].std(axis=1),
        }
    )

    print("Single-output preview")
    print(preview)
    print("\nShapes")
    print(f"y_true shape          : {y_true.shape}")
    print(f"ensemble_good shape   : {ensemble_good.shape}")
    print(f"ensemble_wide shape   : {ensemble_wide.shape}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Single-output preview
       y_true  ens_mean_good  ens_std_good  ens_mean_wide  ens_std_wide
    0 22.5000        22.2596        0.5090        22.5760        1.2165
    1 22.6107        22.4706        0.6307        22.6413        1.1233
    2 22.7206        22.7012        0.5601        23.0352        0.9607
    3 22.8293        22.7796        0.5426        23.3260        1.3125
    4 22.9364        22.8317        0.5382        22.7034        1.1575
    5 23.0414        22.9823        0.6353        23.4183        1.1686
    6 23.1440        22.9329        0.5193        23.8532        1.4453
    7 23.2438        23.3822        0.4943        23.4207        1.3120

    Shapes
    y_true shape          : (140,)
    ensemble_good shape   : (140, 25)
    ensemble_wide shape   : (140, 25)


.. GENERATED FROM PYTHON SOURCE LINES 134-145

Start with the headline view: one overall CRPS bar
--------------------------------------------------

The simplest reading is the overall summary.

This is the right first step when the user wants to compare two
ensemble designs quickly. A lower bar means the ensemble distribution
is, on average, more useful.

We draw the two cases side by side by calling the helper twice on two
axes. The contrast is more instructive than looking at a single run.

.. GENERATED FROM PYTHON SOURCE LINES 145-167

.. code-block:: Python


    fig, axes = plt.subplots(1, 2, figsize=(11, 4.8), constrained_layout=True)

    plot_crps(
        y_true,
        ensemble_good,
        kind="summary_bar",
        title="CRPS summary: better ensemble",
        bar_color="teal",
        ax=axes[0],
    )

    plot_crps(
        y_true,
        ensemble_wide,
        kind="summary_bar",
        title="CRPS summary: wider ensemble",
        bar_color="darkorange",
        ax=axes[1],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_001.png
   :alt: CRPS summary: better ensemble, CRPS summary: wider ensemble
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'CRPS summary: wider ensemble'}, ylabel='CRPS'>


.. GENERATED FROM PYTHON SOURCE LINES 168-184

How to read the summary bar
---------------------------

This first figure answers a direct ranking question:

*Which ensemble is better overall?*

But it does not yet explain *why* one score is better.

That is why CRPS is best read in layers:

1. start with the summary bar,
2. inspect the histogram of sample-wise scores,
3. then inspect one concrete ensemble with the ECDF view.

The bar gives the verdict. The next two views explain the verdict.

.. GENERATED FROM PYTHON SOURCE LINES 187-196

Move to the distribution view with a histogram
----------------------------------------------

The histogram reveals whether the average score is stable across
samples or dominated by a few very bad cases.

This matters because two ensembles can share a similar mean CRPS while
one behaves consistently and the other alternates between excellent and
poor uncertainty estimates.

.. GENERATED FROM PYTHON SOURCE LINES 196-220

.. code-block:: Python


    fig, axes = plt.subplots(1, 2, figsize=(11, 4.8), constrained_layout=True)

    plot_crps(
        y_true,
        ensemble_good,
        kind="scores_histogram",
        title="CRPS distribution: better ensemble",
        hist_color="slateblue",
        hist_edgecolor="white",
        ax=axes[0],
    )

    plot_crps(
        y_true,
        ensemble_wide,
        kind="scores_histogram",
        title="CRPS distribution: wider ensemble",
        hist_color="indianred",
        hist_edgecolor="white",
        ax=axes[1],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_002.png
   :alt: CRPS distribution: better ensemble (Mean CRPS: 0.1646), CRPS distribution: wider ensemble (Mean CRPS: 0.3481)
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'CRPS distribution: wider ensemble\n(Mean CRPS: 0.3481)'}, xlabel='CRPS per Sample', ylabel='Frequency'>


.. GENERATED FROM PYTHON SOURCE LINES 221-234

What the histogram teaches that the mean cannot
-----------------------------------------------

The histogram helps answer a different operational question:

*Is the ensemble usually reasonable, or only good on average?*

A compact histogram concentrated near lower values suggests more
stable probabilistic behaviour.

A wide right tail usually means some samples are much harder, or the
ensemble is poorly matched to those cases. In practice, this is often
the figure that tells you whether a good average score is trustworthy.

.. GENERATED FROM PYTHON SOURCE LINES 237-246

Inspect one concrete sample with the ECDF view
----------------------------------------------

The ECDF mode is the most intuitive teaching view because it shows one
ensemble directly against the true value.

Use it when you want to explain what a "good" or "bad" probabilistic
forecast looks like for one case. Here we inspect the same sample in
both ensembles.

.. GENERATED FROM PYTHON SOURCE LINES 246-276

.. code-block:: Python


    sample_idx = 62

    fig, axes = plt.subplots(1, 2, figsize=(11.4, 4.8), constrained_layout=True)

    plot_crps(
        y_true,
        ensemble_good,
        kind="ensemble_ecdf",
        sample_idx=sample_idx,
        title=f"ECDF view: better ensemble (sample {sample_idx})",
        ecdf_color="navy",
        true_value_color="darkgreen",
        ensemble_marker_color="steelblue",
        ax=axes[0],
    )

    plot_crps(
        y_true,
        ensemble_wide,
        kind="ensemble_ecdf",
        sample_idx=sample_idx,
        title=f"ECDF view: wider ensemble (sample {sample_idx})",
        ecdf_color="darkorange",
        true_value_color="firebrick",
        ensemble_marker_color="peru",
        ax=axes[1],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_003.png
   :alt: ECDF view: better ensemble (sample 62) (CRPS: 0.1378), ECDF view: wider ensemble (sample 62) (CRPS: 0.3506)
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'ECDF view: wider ensemble (sample 62)\n(CRPS: 0.3506)'}, xlabel='Value', ylabel='Cumulative Probability'>


.. GENERATED FROM PYTHON SOURCE LINES 277-291

How to interpret the ECDF view
------------------------------

Read this plot in a simple order:

1. Where is the vertical true-value line?
2. Does it fall inside the bulk of the ensemble members?
3. Is the ensemble tightly concentrated or excessively spread?

A useful ensemble is not necessarily the narrowest one. It should be
*appropriately* spread around the truth. If the truth sits far from
most ensemble members, CRPS usually worsens. If the ensemble is much
wider than needed, CRPS also worsens because unnecessary dispersion is
penalized.

.. GENERATED FROM PYTHON SOURCE LINES 294-307

Multi-output data: the shape changes, not the idea
--------------------------------------------------

Many real workflows have more than one target. The helper supports
that directly by moving from ``(N,)`` and ``(N, M)`` to ``(N, O)`` and
``(N, O, M)``.

Here we create two outputs:

- output 0 behaves like a cleaner target,
- output 1 is noisier and intentionally harder.

This is useful because the summary bar can then show per-output CRPS.

.. GENERATED FROM PYTHON SOURCE LINES 307-337

.. code-block:: Python


    output_0 = y_true
    output_1 = 11.0 + 0.6 * y_true + 0.4 * np.sin(x / 1.8)
    y_true_multi = np.column_stack([output_0, output_1])

    ensemble_multi = np.empty((n_samples, 2, n_members), dtype=float)
    ensemble_multi[:, 0, :] = (
        output_0[:, None]
        + rng.normal(loc=0.0, scale=0.60, size=(n_samples, n_members))
    )
    ensemble_multi[:, 1, :] = (
        output_1[:, None]
        + 0.15
        + rng.normal(loc=0.0, scale=1.10, size=(n_samples, n_members))
    )

    print("\nMulti-output shapes")
    print(f"y_true_multi shape      : {y_true_multi.shape}")
    print(f"ensemble_multi shape    : {ensemble_multi.shape}")

    plot_crps(
        y_true_multi,
        ensemble_multi,
        kind="summary_bar",
        metric_kws={"multioutput": "raw_values"},
        title="CRPS summary by output",
        bar_color=["mediumseagreen", "mediumpurple"],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_004.png
   :alt: CRPS summary by output (Per Output)
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_004.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Multi-output shapes
    y_true_multi shape      : (140, 2)
    ensemble_multi shape    : (140, 2, 25)

    <Axes: title={'center': 'CRPS summary by output (Per Output)'}, ylabel='CRPS'>


.. GENERATED FROM PYTHON SOURCE LINES 338-352

Why the per-output summary matters
----------------------------------

A single combined CRPS can hide important structure across targets.
When users work with more than one output, a per-output summary is a
much safer first diagnostic.

In a real project, this plot helps answer questions such as:

- Is groundwater uncertainty modeled better than subsidence?
- Does one target need recalibration while another already behaves
  well?
- Is a single global uncertainty setting hurting one output more than
  another?

.. GENERATED FROM PYTHON SOURCE LINES 355-361

Histogram for a chosen output
-----------------------------

For multi-output data, the histogram becomes easier to interpret when
we focus on one output at a time. The helper supports that through
``output_idx``.

.. GENERATED FROM PYTHON SOURCE LINES 361-387

.. code-block:: Python


    fig, axes = plt.subplots(1, 2, figsize=(11, 4.8), constrained_layout=True)

    plot_crps(
        y_true_multi,
        ensemble_multi,
        kind="scores_histogram",
        output_idx=0,
        title="CRPS histogram: output 0",
        hist_color="royalblue",
        hist_edgecolor="white",
        ax=axes[0],
    )

    plot_crps(
        y_true_multi,
        ensemble_multi,
        kind="scores_histogram",
        output_idx=1,
        title="CRPS histogram: output 1",
        hist_color="mediumorchid",
        hist_edgecolor="white",
        ax=axes[1],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_005.png
   :alt: CRPS histogram: output 0 (Mean CRPS: 0.1525), CRPS histogram: output 1 (Mean CRPS: 0.2863)
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_005.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'CRPS histogram: output 1\n(Mean CRPS: 0.2863)'}, xlabel='CRPS per Sample', ylabel='Frequency'>


.. GENERATED FROM PYTHON SOURCE LINES 388-398

Using precomputed CRPS values in reports
----------------------------------------

Sometimes the user already computed CRPS elsewhere and only wants the
plotting helper for presentation. The helper supports that through
``crps_values``.

Here we mimic that reporting workflow with two precomputed values. The
point of this section is not to replace the internal computation. It
is to show that the plotting API can also act as a presentation layer.

.. GENERATED FROM PYTHON SOURCE LINES 398-411

.. code-block:: Python


    precomputed_scores = np.array([0.4120, 0.7280])

    plot_crps(
        y_true_multi,
        ensemble_multi,
        crps_values=precomputed_scores,
        kind="summary_bar",
        title="Precomputed CRPS report by output",
        bar_color=["cadetblue", "tomato"],
    )


.. image-sg:: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_006.png
   :alt: Precomputed CRPS report by output
   :srcset: /auto_examples/evaluation/images/sphx_glr_plot_crps_overview_006.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <Axes: title={'center': 'Precomputed CRPS report by output'}, ylabel='CRPS'>


.. GENERATED FROM PYTHON SOURCE LINES 412-440

How to adapt this lesson to your own data
-----------------------------------------

In a real project, you usually already have arrays from your model
output. The practical mapping is simple:

- observed values -> ``y_true``
- ensemble draws or members -> ``y_pred_ensemble``

Then choose the plot based on your question:

- use ``kind='summary_bar'`` for a quick overall comparison,
- use ``kind='scores_histogram'`` when you want to inspect the spread
  of sample-wise forecast quality,
- use ``kind='ensemble_ecdf'`` when you want to teach or diagnose one
  concrete example.

A good workflow is therefore:

1. compare overall CRPS,
2. inspect the distribution,
3. inspect one or two representative samples,
4. then decide whether the ensemble needs recalibration, more members,
   or a better predictive distribution.

The key idea is simple: CRPS is not only a number. With the three plot
modes together, it becomes a small visual language for judging whether
uncertainty forecasts are believable and useful.

.. GENERATED FROM PYTHON SOURCE LINES 440-442

.. code-block:: Python


    plt.show()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.371 seconds)


.. _sphx_glr_download_auto_examples_evaluation_plot_crps_overview.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_crps_overview.ipynb <plot_crps_overview.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_crps_overview.py <plot_crps_overview.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_crps_overview.zip <plot_crps_overview.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_