.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/inspection/plot_xfer_results_overview.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_inspection_plot_xfer_results_overview.py: Inspect transfer-learning results before trusting cross-city conclusions ============================================================================ This lesson explains how to inspect the ``xfer_results.json`` artifact. Unlike most artifacts in the inspection gallery, transfer results are usually stored as a **JSON list of records**, not as one single mapping. Each record describes one transfer job and combines: - the transfer direction, - the strategy and calibration choices, - overall evaluation metrics, - per-horizon behavior, - schema mismatch diagnostics, - warm-start details, - exported CSV locations. That makes this artifact especially useful when you want to answer workflow questions such as: - Which transfer direction behaves better? - Does performance collapse as the forecast horizon grows? - Are poor metrics possibly explained by schema mismatch? - Did the warm-start settings look sensible? - Is a cross-city result good enough to compare, report, or extend? The goal of this page is not only to call plotting helpers. It is to teach how to read transfer results as a **comparison artifact** and turn that reading into an informed workflow decision. .. GENERATED FROM PYTHON SOURCE LINES 37-161 .. code-block:: Python from __future__ import annotations import json import tempfile from pathlib import Path from pprint import pprint import matplotlib.pyplot as plt import pandas as pd from geoprior.utils.inspect import ( generate_xfer_results, inspect_xfer_results, load_xfer_results, plot_xfer_boolean_summary, plot_xfer_direction_metric, plot_xfer_overall_metrics, plot_xfer_per_horizon_metrics, plot_xfer_schema_counts, summarize_xfer_results, xfer_overall_frame, xfer_per_horizon_frame, xfer_schema_frame, xfer_warm_frame, ) pd.set_option("display.max_columns", 24) pd.set_option("display.width", 108) XFER_COLORS = { "primary": "#0f766e", "secondary": "#2563eb", "accent": "#f97316", "rose": "#e11d48", "gold": "#ca8a04", "ink": "#0f172a", "muted": "#64748b", "grid": "#cbd5e1", "face": "#f8fafc", "pass": "#16a34a", "fail": "#dc2626", } def _style_axes( ax: plt.Axes, *, facecolor: str = XFER_COLORS["face"] ) -> None: ax.set_facecolor(facecolor) ax.set_axisbelow(True) ax.grid( True, color=XFER_COLORS["grid"], alpha=0.45, linewidth=0.8 ) for side in ("top", "right"): ax.spines[side].set_visible(False) for side in ("left", "bottom"): ax.spines[side].set_color("#94a3b8") ax.spines[side].set_linewidth(0.9) ax.tick_params(colors=XFER_COLORS["ink"], labelsize=9) if ax.get_title(): ax.set_title( ax.get_title(), fontsize=11, fontweight="bold", color=XFER_COLORS["ink"], pad=12 ) if ax.get_xlabel(): ax.set_xlabel( ax.get_xlabel(), fontsize=9.5, color=XFER_COLORS["muted"] ) if ax.get_ylabel(): ax.set_ylabel( ax.get_ylabel(), fontsize=9.5, color=XFER_COLORS["muted"] ) def _style_bars( ax: plt.Axes, colors: list[str], *, edgecolor: str = XFER_COLORS["ink"], linewidth: float = 1.1, alpha: float = 0.94 ) -> None: for idx, patch in enumerate(ax.patches): patch.set_facecolor(colors[idx % len(colors)]) patch.set_edgecolor(edgecolor) patch.set_linewidth(linewidth) patch.set_alpha(alpha) def _style_lines( ax: plt.Axes, colors: list[str], *, markers: tuple[str, ...] = ("o", "s", "D", "^"), linewidth: float = 2.4, markersize: float = 7.0) -> None: for idx, line in enumerate(ax.lines): color = colors[idx % len(colors)] line.set_color(color) line.set_linewidth(linewidth) line.set_marker(markers[idx % len(markers)]) line.set_markersize(markersize) line.set_markerfacecolor(color) line.set_markeredgecolor("white") line.set_markeredgewidth(0.8) def _style_boolean_bars(ax: plt.Axes) -> None: for patch in ax.patches: value = patch.get_width() if patch.get_width() else patch.get_height() color = XFER_COLORS["pass"] if value >= 0.5 else XFER_COLORS["fail"] patch.set_facecolor(color) patch.set_edgecolor(XFER_COLORS["ink"]) patch.set_linewidth(1.1) patch.set_alpha(0.95) .. GENERATED FROM PYTHON SOURCE LINES 162-178 Why this artifact matters ------------------------- Transfer results are not only about raw accuracy. In a cross-city or cross-domain workflow, a record can look weak for several different reasons: 1. the transfer direction may be intrinsically harder, 2. horizon-wise error may grow too quickly, 3. coverage may be acceptable but intervals may be too wide, 4. schema mismatch may make the transfer unfair, 5. warm-start settings may be too small or too aggressive. Because this file keeps those pieces together, it becomes one of the best places to inspect a transfer experiment before drawing strong scientific or operational conclusions. .. GENERATED FROM PYTHON SOURCE LINES 181-191 Create a realistic demo artifact -------------------------------- For documentation lessons, we usually want a stable artifact that is rich enough to compare directions without rerunning the full transfer workflow. Here we generate a small but realistic transfer-results file and make the two directions slightly different on purpose, so the lesson can teach how to interpret comparisons instead of only printing tables. .. GENERATED FROM PYTHON SOURCE LINES 191-243 .. code-block:: Python workdir = Path(tempfile.mkdtemp(prefix="gp_xfer_results_")) out_dir = workdir xfer_path = out_dir / "xfer_results.json" generate_xfer_results( xfer_path, overrides=[ { "strategy": "warm", "calibration": "source", "rescale_mode": "strict", "coverage80": 0.872, "sharpness80": 49.80, "overall_mae": 13.90, "overall_rmse": 21.75, "overall_r2": 0.832, "per_horizon_rmse": { "H1": 13.90, "H2": 20.90, "H3": 27.80, }, "schema": { "static_missing_n": 7, "static_extra_n": 5, }, }, { "strategy": "warm", "calibration": "source", "rescale_mode": "strict", "coverage80": 0.821, "sharpness80": 109.20, "overall_mae": 37.10, "overall_rmse": 61.20, "overall_r2": 0.748, "per_horizon_rmse": { "H1": 18.40, "H2": 51.60, "H3": 91.50, }, "schema": { "static_missing_n": 10, "static_extra_n": 12, }, }, ], ) print("Written transfer-results file") print(f" - {xfer_path}") .. rst-class:: sphx-glr-script-out .. code-block:: none Written transfer-results file - /tmp/gp_xfer_results_db3h6qd7/xfer_results.json .. GENERATED FROM PYTHON SOURCE LINES 244-251 Load the artifact with the real reader -------------------------------------- This artifact family is slightly special: the top-level payload is a list of records rather than a single JSON object. That is already something worth teaching because users often assume every artifact in the workflow has the same shape. .. GENERATED FROM PYTHON SOURCE LINES 251-266 .. code-block:: Python records = load_xfer_results(xfer_path) print("\nTransfer artifact basics") pprint( { "type": type(records).__name__, "n_records": len(records), "first_record_keys": sorted(records[0])[:12], } ) print("\nFirst record preview") print(json.dumps(records[0], indent=2)[:1200] + "\n...") .. rst-class:: sphx-glr-script-out .. code-block:: none Transfer artifact basics {'first_record_keys': ['calibration', 'coverage80', 'csv_eval', 'csv_future', 'direction', 'hps_mode', 'job_index', 'job_total', 'metrics_source', 'metrics_unit', 'model_dir', 'model_name'], 'n_records': 2, 'type': 'list'} First record preview { "strategy": "warm", "rescale_mode": "strict", "warm": { "warm_split": "val", "warm_samples": 20000, "warm_frac": null, "warm_epochs": 3, "warm_lr": 0.0001, "warm_seed": 42, "schema": { "src_city": "nansha", "tgt_city": "zhongshan", "static_aligned": true, "dynamic_reordered": false, "future_reordered": false, "dynamic_order_mismatch": false, "future_order_mismatch": false, "static_missing_n": 9, "static_extra_n": 6 } }, "model_path": "results/nansha_GeoPriorSubsNet_stage1/train_20260222-141331/nansha_GeoPriorSubsNet_H3_best.keras", "split": "test", "calibration": "source", "quantiles": [ 0.1, 0.5, 0.9 ], "coverage80": 0.872, "sharpness80": 49.8, "overall_mae": 13.9, "overall_mse": 499.0390742467, "overall_rmse": 21.75, "overall_r2": 0.832, "per_horizon_mae": { "H1": 9.2531121755, "H2": 14.4207786362, "H3": 19.232098912 }, "per_horizon_mse": { "H1": 196.9444467984, "H2": 465.9660166509, "H3": 834.2067592909 }, "per_horizon_rmse": { "H1": 13.9, "H2": 20.9, "H3": 27.8 }, "per_horizon_r2": { "H1": 0.88807 ... .. GENERATED FROM PYTHON SOURCE LINES 267-277 Start with the workflow summary ------------------------------- The compact summary is the quickest way to answer broad questions: - how many transfer jobs are present, - which directions are represented, - which strategies and calibration modes appear, - which record currently looks best by RMSE or R², - and whether schema mismatch is a recurring pattern. .. GENERATED FROM PYTHON SOURCE LINES 277-283 .. code-block:: Python summary = summarize_xfer_results(records) print("\nCompact summary") print(json.dumps(summary, indent=2)) .. rst-class:: sphx-glr-script-out .. code-block:: none Compact summary { "n_records": 2, "directions": [ "A_to_B", "B_to_A" ], "strategies": [ "warm" ], "calibrations": [ "source" ], "rescale_modes": [ "strict" ], "best_overall_rmse": { "label": "A_to_B | warm | source | strict", "value": 21.75 }, "best_overall_r2": { "label": "A_to_B | warm | source | strict", "value": 0.832 }, "mean_coverage80": 0.8465, "mean_sharpness80": 79.5, "worst_horizon_rmse": { "label": "B_to_A | warm | source | strict", "horizon": "H3", "value": 91.5 }, "schema_pass_rates": { "dynamic_order_mismatch": 0.0, "dynamic_reordered": 0.0, "future_order_mismatch": 0.0, "future_reordered": 0.0, "static_aligned": 1.0 }, "schema_mean_counts": { "static_extra_n": 8.5, "static_missing_n": 8.5 } } .. GENERATED FROM PYTHON SOURCE LINES 284-299 Read the main comparison table ------------------------------ The overall frame is usually the first table to inspect because it puts the most important workflow columns side by side: - direction, - source and target city, - strategy, - calibration, - rescale mode, - overall quality metrics. This table is where you normally rank records before drilling into horizon-wise behavior. .. GENERATED FROM PYTHON SOURCE LINES 299-334 .. code-block:: Python overall = xfer_overall_frame(records) print("\nOverall comparison table") print( overall[ [ "label", "direction", "strategy", "calibration", "rescale_mode", "coverage80", "sharpness80", "overall_mae", "overall_rmse", "overall_r2", ] ] ) ranked = overall.sort_values("overall_rmse").reset_index(drop=True) best = ranked.iloc[0] worst = ranked.iloc[-1] print("\nInitial reading") print( f"Best overall record by RMSE: {best['label']} " f"(RMSE={best['overall_rmse']:.3f}, R²={best['overall_r2']:.3f})." ) print( f"Weakest overall record by RMSE: {worst['label']} " f"(RMSE={worst['overall_rmse']:.3f}, R²={worst['overall_r2']:.3f})." ) .. rst-class:: sphx-glr-script-out .. code-block:: none Overall comparison table label direction strategy calibration rescale_mode coverage80 sharpness80 \ 0 A_to_B | warm | source | strict A_to_B warm source strict 0.872000 49.800000 1 B_to_A | warm | source | strict B_to_A warm source strict 0.821000 109.200000 overall_mae overall_rmse overall_r2 0 13.900000 21.750000 0.832000 1 37.100000 61.200000 0.748000 Initial reading Best overall record by RMSE: A_to_B | warm | source | strict (RMSE=21.750, R²=0.832). Weakest overall record by RMSE: B_to_A | warm | source | strict (RMSE=61.200, R²=0.748). .. GENERATED FROM PYTHON SOURCE LINES 335-350 Interpret overall metrics carefully ----------------------------------- In transfer settings, no single metric is enough. A practical reading habit is: 1. look at RMSE or MAE for error magnitude, 2. look at R² for explained variation, 3. look at coverage80 to see whether intervals are too narrow, 4. look at sharpness80 to see whether acceptable coverage was paid for with very wide intervals. This last point is especially important in transfer workflows: wide intervals can make a model look safer than it really is. .. GENERATED FROM PYTHON SOURCE LINES 350-364 .. code-block:: Python coverage_gap = float(best["coverage80"] - worst["coverage80"]) sharpness_gap = float(worst["sharpness80"] - best["sharpness80"]) print("\nCoverage / sharpness reading") print( f"Coverage difference between strongest and weakest record: " f"{coverage_gap:.3f}." ) print( f"Sharpness penalty of the weaker record relative to the stronger one: " f"{sharpness_gap:.3f}." ) .. rst-class:: sphx-glr-script-out .. code-block:: none Coverage / sharpness reading Coverage difference between strongest and weakest record: 0.051. Sharpness penalty of the weaker record relative to the stronger one: 59.400. .. GENERATED FROM PYTHON SOURCE LINES 365-375 Inspect per-horizon degradation ------------------------------- Transfer experiments often look acceptable overall while hiding a severe collapse at the farthest horizon. That is why the per-horizon frame is one of the most important views in this artifact family. We focus on RMSE here because it is easy to compare across directions, but the same table also contains MAE, MSE, and R². .. GENERATED FROM PYTHON SOURCE LINES 375-403 .. code-block:: Python per_h = xfer_per_horizon_frame(records) print("\nPer-horizon comparison (RMSE only)") print( per_h.loc[ per_h["metric"] == "rmse", ["label", "direction", "horizon", "value"], ] ) rmse_h = per_h[per_h["metric"] == "rmse"].copy() horizon_spread = ( rmse_h.groupby("label")["value"].agg(["min", "max"]) .assign(range=lambda d: d["max"] - d["min"]) .sort_values("range", ascending=False) ) print("\nHorizon spread in RMSE") print(horizon_spread) print( "\nInterpretation:\n" "A large RMSE range across H1→H3 usually means the transfer is\n" "less reliable for longer horizons, even if the overall metric\n" "still looks acceptable." ) .. rst-class:: sphx-glr-script-out .. code-block:: none Per-horizon comparison (RMSE only) label direction horizon value 18 A_to_B | warm | source | strict A_to_B H1 13.900000 19 A_to_B | warm | source | strict A_to_B H2 20.900000 20 A_to_B | warm | source | strict A_to_B H3 27.800000 21 B_to_A | warm | source | strict B_to_A H1 18.400000 22 B_to_A | warm | source | strict B_to_A H2 51.600000 23 B_to_A | warm | source | strict B_to_A H3 91.500000 Horizon spread in RMSE min max range label B_to_A | warm | source | strict 18.400000 91.500000 73.100000 A_to_B | warm | source | strict 13.900000 27.800000 13.900000 Interpretation: A large RMSE range across H1→H3 usually means the transfer is less reliable for longer horizons, even if the overall metric still looks acceptable. .. GENERATED FROM PYTHON SOURCE LINES 404-414 Inspect schema mismatch diagnostics ----------------------------------- One of the most helpful parts of this artifact is that it keeps schema alignment diagnostics next to the metrics. This is very useful when comparing cities with different static-feature support. A poor transfer result does not automatically mean the learning strategy failed. It may partly reflect missing or extra feature channels between source and target. .. GENERATED FROM PYTHON SOURCE LINES 414-430 .. code-block:: Python schema = xfer_schema_frame(records) print("\nSchema diagnostics") print(schema) schema_counts = schema[schema["kind"] == "count"].copy() if not schema_counts.empty: schema_pivot = schema_counts.pivot( index="label", columns="name", values="value", ).fillna(0.0) print("\nSchema mismatch count table") print(schema_pivot) .. rst-class:: sphx-glr-script-out .. code-block:: none Schema diagnostics label direction source_city target_city strategy kind \ 0 A_to_B | warm | source | strict A_to_B nansha zhongshan warm bool 1 A_to_B | warm | source | strict A_to_B nansha zhongshan warm bool 2 A_to_B | warm | source | strict A_to_B nansha zhongshan warm bool 3 A_to_B | warm | source | strict A_to_B nansha zhongshan warm bool 4 A_to_B | warm | source | strict A_to_B nansha zhongshan warm bool 5 A_to_B | warm | source | strict A_to_B nansha zhongshan warm count 6 A_to_B | warm | source | strict A_to_B nansha zhongshan warm count 7 B_to_A | warm | source | strict B_to_A zhongshan nansha warm bool 8 B_to_A | warm | source | strict B_to_A zhongshan nansha warm bool 9 B_to_A | warm | source | strict B_to_A zhongshan nansha warm bool 10 B_to_A | warm | source | strict B_to_A zhongshan nansha warm bool 11 B_to_A | warm | source | strict B_to_A zhongshan nansha warm bool 12 B_to_A | warm | source | strict B_to_A zhongshan nansha warm count 13 B_to_A | warm | source | strict B_to_A zhongshan nansha warm count name value 0 static_aligned True 1 dynamic_reordered False 2 future_reordered False 3 dynamic_order_mismatch False 4 future_order_mismatch False 5 static_missing_n 7.000000 6 static_extra_n 5.000000 7 static_aligned True 8 dynamic_reordered False 9 future_reordered False 10 dynamic_order_mismatch False 11 future_order_mismatch False 12 static_missing_n 10.000000 13 static_extra_n 12.000000 Schema mismatch count table name static_extra_n static_missing_n label A_to_B | warm | source | strict 5.000000 7.000000 B_to_A | warm | source | strict 12.000000 10.000000 .. GENERATED FROM PYTHON SOURCE LINES 431-438 Inspect warm-start details -------------------------- The warm-start frame is useful for reproducibility. When two runs are compared, we should confirm they were warmed under comparable settings before attributing differences entirely to city direction or schema quality. .. GENERATED FROM PYTHON SOURCE LINES 438-444 .. code-block:: Python warm = xfer_warm_frame(records) print("\nWarm-start settings") print(warm) .. rst-class:: sphx-glr-script-out .. code-block:: none Warm-start settings label direction strategy warm_split warm_samples warm_frac warm_epochs \ 0 A_to_B | warm | source | strict A_to_B warm val 20000.000000 None 3.000000 1 B_to_A | warm | source | strict B_to_A warm val 20000.000000 None 3.000000 warm_lr warm_seed 0 0.000100 42.000000 1 0.000100 42.000000 .. GENERATED FROM PYTHON SOURCE LINES 445-451 Use the all-in-one inspector when you want all core views together ------------------------------------------------------------------ ``inspect_xfer_results(...)`` is convenient when you want the semantic summary plus the main comparison tables returned in one normalized bundle. .. GENERATED FROM PYTHON SOURCE LINES 451-457 .. code-block:: Python bundle = inspect_xfer_results(records) print("\nInspector bundle keys") print(sorted(bundle)) .. rst-class:: sphx-glr-script-out .. code-block:: none Inspector bundle keys ['overall', 'per_horizon', 'schema', 'summary', 'warm'] .. GENERATED FROM PYTHON SOURCE LINES 458-470 Plot the main comparison views ------------------------------ A compact reading session usually benefits from four visuals: 1. overall metrics for each record, 2. one direct direction comparison, 3. per-horizon RMSE trajectories, 4. schema mismatch counts. Together, these answer the most common question in transfer work: *Which direction looks better, and why?* .. GENERATED FROM PYTHON SOURCE LINES 470-528 .. code-block:: Python fig, ax = plot_xfer_overall_metrics( records, metrics=["overall_rmse", "overall_r2"], figsize=(9.2, 4.8), ) _style_axes(ax, facecolor="#f8fafc") _style_bars( ax, [ XFER_COLORS["secondary"], XFER_COLORS["secondary"], XFER_COLORS["accent"], XFER_COLORS["accent"], ], ) ax.set_title("Transfer overview: RMSE and R²") if ax.get_legend() is not None: ax.get_legend().get_frame().set_facecolor("white") ax.get_legend().get_frame().set_edgecolor("#cbd5e1") fig, ax = plot_xfer_direction_metric( records, metric="overall_rmse", figsize=(8.2, 4.6), ) _style_axes(ax, facecolor="#fcfcfd") _style_bars(ax, [XFER_COLORS["primary"], XFER_COLORS["rose"]]) ax.set_title("Direction comparison: overall RMSE") fig, ax = plot_xfer_per_horizon_metrics( records, metric="rmse", figsize=(8.8, 5.0), ) _style_axes(ax, facecolor="#f8fafc") _style_lines(ax, [XFER_COLORS["primary"], XFER_COLORS["rose"]]) ax.set_title("Per-horizon transfer RMSE trajectories") if ax.get_legend() is not None: ax.get_legend().get_frame().set_facecolor("white") ax.get_legend().get_frame().set_edgecolor("#cbd5e1") fig, ax = plot_xfer_schema_counts(records, figsize=(8.8, 4.9)) _style_axes(ax, facecolor="#fffdf8") _style_bars( ax, [ XFER_COLORS["gold"], XFER_COLORS["gold"], XFER_COLORS["accent"], XFER_COLORS["accent"], ], ) ax.set_title("Schema mismatch counts across transfer jobs") if ax.get_legend() is not None: ax.get_legend().get_frame().set_facecolor("white") ax.get_legend().get_frame().set_edgecolor("#cbd5e1") .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_examples/inspection/images/sphx_glr_plot_xfer_results_overview_001.png :alt: Transfer overview: RMSE and R² :srcset: /auto_examples/inspection/images/sphx_glr_plot_xfer_results_overview_001.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/inspection/images/sphx_glr_plot_xfer_results_overview_002.png :alt: Direction comparison: overall RMSE :srcset: /auto_examples/inspection/images/sphx_glr_plot_xfer_results_overview_002.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/inspection/images/sphx_glr_plot_xfer_results_overview_003.png :alt: Per-horizon transfer RMSE trajectories :srcset: /auto_examples/inspection/images/sphx_glr_plot_xfer_results_overview_003.png :class: sphx-glr-multi-img * .. image-sg:: /auto_examples/inspection/images/sphx_glr_plot_xfer_results_overview_004.png :alt: Schema mismatch counts across transfer jobs :srcset: /auto_examples/inspection/images/sphx_glr_plot_xfer_results_overview_004.png :class: sphx-glr-multi-img .. GENERATED FROM PYTHON SOURCE LINES 529-543 How to read these plots ----------------------- A practical interpretation pattern is: - If one direction is better on overall RMSE *and* keeps a flatter per-horizon error curve, it is usually the stronger transfer. - If two directions look similar in metrics but one has far fewer schema mismatches, that direction is usually easier to trust. - If coverage is decent only because sharpness is extremely large, the intervals may be too conservative to be useful. - If H3 error grows sharply while H1 looks fine, the model may be acceptable for short-horizon adaptation but not for longer-horizon forecasting. .. GENERATED FROM PYTHON SOURCE LINES 545-551 Plot the boolean summary separately ----------------------------------- The boolean summary aggregates the schema pass/fail checks into one compact view. It is not a performance plot; it is a structural plausibility check for the transfer setup. .. GENERATED FROM PYTHON SOURCE LINES 551-557 .. code-block:: Python fig, ax = plot_xfer_boolean_summary(records, figsize=(8.6, 4.8)) _style_axes(ax, facecolor="#f8fafc") _style_boolean_bars(ax) ax.set_title("Transfer setup pass/fail summary") .. image-sg:: /auto_examples/inspection/images/sphx_glr_plot_xfer_results_overview_005.png :alt: Transfer setup pass/fail summary :srcset: /auto_examples/inspection/images/sphx_glr_plot_xfer_results_overview_005.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, 'Transfer setup pass/fail summary') .. GENERATED FROM PYTHON SOURCE LINES 558-573 A simple decision rule ---------------------- For transfer results, a pragmatic reading rule can be: - prefer the direction with the lower overall RMSE, - check that its farthest-horizon RMSE does not explode, - prefer runs with smaller schema mismatch counts, - and be cautious when sharpness becomes very large relative to the better direction. This is not a theorem. It is a good workflow habit for deciding whether a cross-city result is ready for reporting or whether it still needs schema cleanup, rescaling changes, or a different transfer strategy. .. GENERATED FROM PYTHON SOURCE LINES 573-612 .. code-block:: Python winner = ranked.iloc[0] loser = ranked.iloc[-1] winner_schema = schema_counts[ schema_counts["label"] == winner["label"] ]["value"].sum() loser_schema = schema_counts[ schema_counts["label"] == loser["label"] ]["value"].sum() winner_h3 = rmse_h[ (rmse_h["label"] == winner["label"]) & (rmse_h["horizon"] == "H3") ]["value"] winner_h3 = float(winner_h3.iloc[0]) if not winner_h3.empty else None print("\nDecision note") if ( float(winner["overall_rmse"]) < float(loser["overall_rmse"]) and float(winner["sharpness80"]) <= float(loser["sharpness80"]) and float(winner_schema) <= float(loser_schema) ): print( "The stronger direction in this demo looks structurally and " "numerically preferable for follow-up transfer analysis." ) else: print( "The transfer comparison still looks ambiguous and deserves " "closer schema or calibration review before reporting." ) if winner_h3 is not None: print( f"The preferred direction still reaches H3 RMSE={winner_h3:.3f}; " "decide whether that is acceptable for your scientific or " "operational horizon." ) .. rst-class:: sphx-glr-script-out .. code-block:: none Decision note The stronger direction in this demo looks structurally and numerically preferable for follow-up transfer analysis. The preferred direction still reaches H3 RMSE=27.800; decide whether that is acceptable for your scientific or operational horizon. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.563 seconds) .. _sphx_glr_download_auto_examples_inspection_plot_xfer_results_overview.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_xfer_results_overview.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_xfer_results_overview.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_xfer_results_overview.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_