Note

Go to the end to download the full example code.

Compare independent regression pairs with `plot_r2_in`#

This lesson explains how to use geoprior.plot.r2.plot_r2_in() when you do not have one shared truth vector, but instead several separate (y_true, y_pred) pairs.

Why this function matters#

There are many situations where each prediction task has its own target vector:

train vs validation predictions,
city A vs city B,
one forecast year vs another,
one output variable vs another,
or one experiment split vs another.

In those cases, the helper plot_r2() is not the most natural fit, because it assumes one common y_true and many competing prediction arrays.

plot_r2_in solves that problem by accepting an alternating sequence of pairs:

y_true_1, y_pred_1, y_true_2, y_pred_2, ...

and then building one diagnostic subplot for each pair.

What makes this helper special#

Compared with plot_r2, this version adds one useful diagnostic feature: an optional fitted regression line and displayed line equation. That turns the page into more than a score report. It also helps users see whether a pair suffers from:

slope shrinkage,
amplitude inflation,
offset bias,
or broader scatter.

from __future__ import annotations

import matplotlib.pyplot as plt
import numpy as np

from geoprior.plot.r2 import plot_r2_in

Build three independent regression tasks#

To make the purpose of plot_r2_in clear, we create three different truth/prediction pairs rather than one shared truth vector.

Here we mimic:

a training split,
a validation split,
and a more difficult external split.

This is exactly the kind of situation where alternating pairs are more natural than the single-truth design of plot_r2.

rng = np.random.default_rng(7)

n_train = 110
n_valid = 90
n_external = 100

x_train = np.linspace(0.0, 1.0, n_train)
x_valid = np.linspace(0.0, 1.0, n_valid)
x_external = np.linspace(0.0, 1.0, n_external)

y_true_train = 12.0 + 45.0 * x_train + 4.0 * np.sin(3.0 * np.pi * x_train)
y_true_valid = 11.0 + 46.0 * x_valid + 4.5 * np.sin(3.0 * np.pi * x_valid)
y_true_external = 10.0 + 48.0 * x_external + 5.5 * np.sin(3.2 * np.pi * x_external)

y_pred_train = y_true_train + rng.normal(0.0, 1.8, n_train)
y_pred_valid = 0.95 * y_true_valid + 1.2 + rng.normal(0.0, 2.8, n_valid)
y_pred_external = 0.86 * y_true_external + 3.0 + rng.normal(0.0, 4.5, n_external)

Start with the simplest independent-pairs comparison#

The first lesson use is to pass alternating pairs directly. This is the core mental model of the function.

Read the resulting figure in this order:

compare the R² annotations,
compare the width of the scatter clouds,
check whether the fitted line stays close to the perfect-fit line,
inspect whether slope and intercept suggest systematic bias.

fig = plot_r2_in(
    y_true_train,
    y_pred_train,
    y_true_valid,
    y_pred_valid,
    y_true_external,
    y_pred_external,
    titles=["Training split", "Validation split", "External split"],
    xlabel="Observed subsidence",
    ylabel="Predicted subsidence",
    scatter_colors=["#1f77b4", "#ff7f0e", "#d62728"],
    line_colors=["#3b3b3b", "#3b3b3b", "#3b3b3b"],
    line_styles=["--", "--", "--"],
    annotate=True,
    fit_eq=True,
    fit_line_color="#2ca02c",
    show_grid=True,
    max_cols=2,
)

Training split, Validation split, External split

Why the fitted line matters#

This helper is especially valuable when the fitted line tells a story that the R² score alone does not.

For example:

a slope clearly below 1 suggests amplitude compression,
a large intercept suggests offset bias,
and a fitted line that diverges from the 1:1 line reveals systematic distortion even when the cloud still looks correlated.

In the external split above, the fitted line is often the fastest way to notice that the model is not only noisier, but also less well calibrated in amplitude.

Add RMSE and MAE to each pair#

plot_r2_in can annotate extra metrics on each subplot. This is useful when each pair represents a different operational condition and you want a compact diagnostic card for each one.

fig = plot_r2_in(
    y_true_train,
    y_pred_train,
    y_true_valid,
    y_pred_valid,
    y_true_external,
    y_pred_external,
    titles=["Training split", "Validation split", "External split"],
    xlabel="Observed subsidence",
    ylabel="Predicted subsidence",
    scatter_colors=["#17becf", "#9467bd", "#8c564b"],
    line_colors=["#444444", "#444444", "#444444"],
    line_styles=[":", ":", ":"],
    other_metrics=["rmse", "mae"],
    annotate=True,
    fit_eq=True,
    fit_line_color="#e377c2",
    show_grid=True,
    max_cols=3,
)

Use the function without the fitted equation when you want a cleaner page#

When many pairs are compared, the figure can become text-heavy. In that case, disabling the fitted equation is often the better teaching and reporting choice.

fig = plot_r2_in(
    y_true_train,
    y_pred_train,
    y_true_valid,
    y_pred_valid,
    y_true_external,
    y_pred_external,
    titles=["Training split", "Validation split", "External split"],
    xlabel="Observed subsidence",
    ylabel="Predicted subsidence",
    scatter_colors=["#4daf4a", "#377eb8", "#e41a1c"],
    line_colors=["#222222", "#222222", "#222222"],
    line_styles=["--", "--", "--"],
    other_metrics=["rmse"],
    annotate=True,
    fit_eq=False,
    show_grid=True,
    max_cols=3,
)

Demonstrate pair-wise cleaning with imperfect arrays#

The implementation uses process_y_pairs to validate and clean the alternating pairs. That is helpful when one pair contains missing values but the others remain usable.

Here we introduce a few NaNs into one pair only.

y_true_valid_nan = y_true_valid.copy()
y_pred_valid_nan = y_pred_valid.copy()
y_true_valid_nan[[10, 17]] = np.nan
y_pred_valid_nan[[17, 25]] = np.nan

fig = plot_r2_in(
    y_true_train,
    y_pred_train,
    y_true_valid_nan,
    y_pred_valid_nan,
    y_true_external,
    y_pred_external,
    titles=["Training split", "Validation split with NaNs", "External split"],
    xlabel="Observed subsidence",
    ylabel="Predicted subsidence",
    scatter_colors=["#a6cee3", "#fb9a99", "#b2df8a"],
    line_colors=["#4c4c4c", "#4c4c4c", "#4c4c4c"],
    line_styles=["--", "--", "--"],
    other_metrics=["rmse"],
    annotate=True,
    fit_eq=True,
    fit_line_color="#1b9e77",
    show_grid=True,
    max_cols=2,
)

Training split, Validation split with NaNs, External split

When should you choose `plot_r2_in` instead of `plot_r2`?#

Use plot_r2_in when each subplot should represent its own independent diagnostic pair.

Good examples:

train, validation, and test predictions,
Nansha, Zhongshan, and an external city,
groundwater and subsidence outputs,
different forecast years evaluated separately.

A simple rule is:

shared truth, many predictions -> plot_r2
many independent truth/prediction pairs -> plot_r2_in

How to use it on your own data#

A realistic workflow might look like this:

plot_r2_in(
    train_df["subsidence_actual"].to_numpy(),
    train_df["subsidence_q50"].to_numpy(),
    valid_df["subsidence_actual"].to_numpy(),
    valid_df["subsidence_q50"].to_numpy(),
    test_df["subsidence_actual"].to_numpy(),
    test_df["subsidence_q50"].to_numpy(),
    titles=["Train", "Validation", "Test"],
    other_metrics=["rmse", "mae"],
    fit_eq=True,
    max_cols=2,
)

This is particularly useful in diagnostics pages because it makes the quality difference across operating conditions visible at a glance.

A compact reading checklist#

When reading a plot_r2_in figure, ask:

Which pair has the strongest R²?
Which pair has the narrowest scatter cloud?
Does the fitted line remain close to the perfect-fit line?
Is the slope near 1 and the intercept near 0?
Do RMSE and MAE confirm the same ranking?

That combination gives a much richer diagnostic than a single R² table.

plt.show()

Total running time of the script: (0 minutes 1.680 seconds)

Gallery generated by Sphinx-Gallery

Compare independent regression pairs with plot_r2_in#