.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/tables_and_summaries/build_full_inputs_npz.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_tables_and_summaries_build_full_inputs_npz.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_tables_and_summaries_build_full_inputs_npz.py:


Build one merged ``full_inputs.npz`` from Stage-1 split artifacts
=================================================================

This lesson teaches how to use GeoPrior's
``full-inputs-npz`` build command.

Unlike the earlier tabular builders, this command does not produce a
CSV or a summary table. Instead, it collects the **split NPZ inputs**
written by Stage-1 and merges them into one compressed archive such as
``full_inputs.npz``.

Why this matters
----------------
Many downstream tasks need a single bundle of arrays rather than three
separate train/validation/test NPZ files. For example, you may want to:

- inspect the full input payload quickly,
- pass one archive to another utility,
- archive a compact all-splits artifact,
- or debug shape mismatches across splits.

That is exactly what ``full-inputs-npz`` is for.

.. GENERATED FROM PYTHON SOURCE LINES 27-36

Imports
-------
We call the real production CLI entrypoint from the package.
For the synthetic lesson input, we build a minimal Stage-1 directory
with:

- three split NPZ files,
- one manifest.json,
- and one artifacts/ folder where the merged file will be written.

.. GENERATED FROM PYTHON SOURCE LINES 36-51

.. code-block:: Python


    from __future__ import annotations

    import json
    import tempfile
    from pathlib import Path

    import matplotlib.pyplot as plt
    import numpy as np

    from geoprior.cli.build_full_inputs_npz import (
        build_full_inputs_main,
    )


.. GENERATED FROM PYTHON SOURCE LINES 52-75

Step 1 - Build a compact synthetic Stage-1 directory
----------------------------------------------------
``full-inputs-npz`` is manifest-driven. The command does not guess
split file names directly from the filesystem; instead it reads a
Stage-1 ``manifest.json`` and looks under:

``manifest["artifacts"]["numpy"]["<split>_inputs_npz"]``

for each requested split.

We therefore create a tiny but realistic Stage-1 folder with three
split NPZ files and one manifest that points to them.

The synthetic arrays below mimic a typical forecasting payload:

- ``x_static``  : one row per sample, fixed features,
- ``x_dynamic`` : one row per sample with a short time axis,
- ``y``         : one target per sample,
- ``coords``    : sample coordinates,
- ``years``     : reference year per sample.

The parameter comments are intentionally explicit so the user sees how
the lesson data are constructed.

.. GENERATED FROM PYTHON SOURCE LINES 75-194

.. code-block:: Python


    rng = np.random.default_rng(2026)

    tmp_dir = Path(
        tempfile.mkdtemp(prefix="gp_sg_full_inputs_npz_")
    )
    stage1_dir = tmp_dir / "zhongshan_geoprior_stage1"
    stage1_dir.mkdir(parents=True, exist_ok=True)
    artifacts_dir = stage1_dir / "artifacts"
    artifacts_dir.mkdir(parents=True, exist_ok=True)


    # Small helper to build one split payload.
    #
    # Parameters
    # ----------
    # n:
    #     Number of samples in the split.
    # seed_shift:
    #     Small offset so train/val/test are not identical.
    # time_steps:
    #     Number of dynamic lookback steps.
    # n_static:
    #     Number of static features.
    # n_dynamic:
    #     Number of dynamic features.
    def make_split_payload(
        *,
        n: int,
        seed_shift: int,
        time_steps: int = 4,
        n_static: int = 3,
        n_dynamic: int = 2,
    ) -> dict[str, np.ndarray]:
        rr = np.random.default_rng(2026 + seed_shift)

        x_static = rr.normal(
            loc=0.0 + 0.05 * seed_shift,
            scale=1.0,
            size=(n, n_static),
        ).astype(np.float32)

        x_dynamic = rr.normal(
            loc=0.2 * seed_shift,
            scale=0.9,
            size=(n, time_steps, n_dynamic),
        ).astype(np.float32)

        coords = np.column_stack(
            [
                rr.uniform(113.18, 113.56, size=n),
                rr.uniform(22.28, 22.66, size=n),
            ]
        ).astype(np.float32)

        years = rr.integers(
            low=2019,
            high=2025,
            size=n,
            endpoint=False,
        ).astype(np.int32)

        # A simple synthetic target coupled loosely to the inputs.
        y = (
            0.45 * x_static[:, [0]]
            + 0.18 * x_dynamic[:, -1, [0]]
            + 0.08 * x_dynamic[:, -2, [1]]
            + rr.normal(0.0, 0.08, size=(n, 1))
        ).astype(np.float32)

        return {
            "x_static": x_static,
            "x_dynamic": x_dynamic,
            "y": y,
            "coords": coords,
            "years": years,
        }


    split_specs = {
        "train": {"n": 72, "seed_shift": 0},
        "val": {"n": 24, "seed_shift": 1},
        "test": {"n": 28, "seed_shift": 2},
    }

    split_paths: dict[str, Path] = {}
    for split, spec in split_specs.items():
        payload = make_split_payload(**spec)
        path = artifacts_dir / f"{split}_inputs.synthetic.npz"
        np.savez_compressed(path, **payload)
        split_paths[split] = path.resolve()

    manifest = {
        "stage": "stage1",
        "city": "Zhongshan",
        "model": "GeoPriorSubsNet",
        "artifacts": {
            "numpy": {
                "train_inputs_npz": str(split_paths["train"]),
                "val_inputs_npz": str(split_paths["val"]),
                "test_inputs_npz": str(split_paths["test"]),
            }
        },
    }

    manifest_path = stage1_dir / "manifest.json"
    manifest_path.write_text(
        json.dumps(manifest, indent=2),
        encoding="utf-8",
    )

    print("Synthetic Stage-1 directory")
    print(f" - stage1_dir : {stage1_dir}")
    print(f" - manifest   : {manifest_path}")
    print(" - split NPZ files")
    for split, path in split_paths.items():
        print(f"   - {split:5s}: {path.name}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Synthetic Stage-1 directory
     - stage1_dir : /tmp/gp_sg_full_inputs_npz_h79x20yz/zhongshan_geoprior_stage1
     - manifest   : /tmp/gp_sg_full_inputs_npz_h79x20yz/zhongshan_geoprior_stage1/manifest.json
     - split NPZ files
       - train: train_inputs.synthetic.npz
       - val  : val_inputs.synthetic.npz
       - test : test_inputs.synthetic.npz


.. GENERATED FROM PYTHON SOURCE LINES 195-201

Step 2 - Inspect the split payloads before merging
--------------------------------------------------
Before calling the builder, it is useful to confirm what each split
contains. Because ``full-inputs-npz`` concatenates arrays along
``axis=0``, the main thing to check is whether the keys and shapes are
compatible across splits.

.. GENERATED FROM PYTHON SOURCE LINES 201-215

.. code-block:: Python


    split_arrays: dict[str, dict[str, np.ndarray]] = {}
    for split, path in split_paths.items():
        with np.load(path, allow_pickle=False) as z:
            split_arrays[split] = {k: z[k] for k in z.files}

    print("")
    print("Split payload overview")
    for split, arrays in split_arrays.items():
        print(f"[{split}]")
        for key, value in sorted(arrays.items()):
            print(f" - {key:10s}: {tuple(value.shape)}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Split payload overview
    [train]
     - coords    : (72, 2)
     - x_dynamic : (72, 4, 2)
     - x_static  : (72, 3)
     - y         : (72, 1)
     - years     : (72,)
    [val]
     - coords    : (24, 2)
     - x_dynamic : (24, 4, 2)
     - x_static  : (24, 3)
     - y         : (24, 1)
     - years     : (24,)
    [test]
     - coords    : (28, 2)
     - x_dynamic : (28, 4, 2)
     - x_static  : (28, 3)
     - y         : (28, 1)
     - years     : (28,)


.. GENERATED FROM PYTHON SOURCE LINES 216-224

Step 3 - Run the real ``full-inputs-npz`` command
-------------------------------------------------
We now call the public CLI entrypoint exactly as a user would.

Here we use ``--stage1-dir`` because it is the clearest lesson path:
the command will resolve ``manifest.json``, read the split NPZ paths,
concatenate the arrays in split order, and write the merged file under
``<stage1_dir>/artifacts/`` when ``--output`` is omitted.

.. GENERATED FROM PYTHON SOURCE LINES 224-237

.. code-block:: Python


    build_full_inputs_main(
        [
            "--stage1-dir",
            str(stage1_dir),
            "--splits",
            "train",
            "val",
            "test",
        ]
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Saved: /tmp/gp_sg_full_inputs_npz_h79x20yz/zhongshan_geoprior_stage1/artifacts/full_inputs.npz
    coords: (124, 2)
    x_dynamic: (124, 4, 2)
    x_static: (124, 3)
    y: (124, 1)
    years: (124,)


.. GENERATED FROM PYTHON SOURCE LINES 238-243

Step 4 - Read the merged NPZ back in
------------------------------------
The default output name is ``full_inputs.npz`` under the Stage-1
``artifacts/`` folder. We load it back in and inspect the merged
shapes.

.. GENERATED FROM PYTHON SOURCE LINES 243-255

.. code-block:: Python


    merged_npz = artifacts_dir / "full_inputs.npz"
    with np.load(merged_npz, allow_pickle=False) as z:
        merged = {k: z[k] for k in z.files}

    print("")
    print(f"Merged output: {merged_npz}")
    print("Merged array shapes")
    for key, value in sorted(merged.items()):
        print(f" - {key:10s}: {tuple(value.shape)}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Merged output: /tmp/gp_sg_full_inputs_npz_h79x20yz/zhongshan_geoprior_stage1/artifacts/full_inputs.npz
    Merged array shapes
     - coords    : (124, 2)
     - x_dynamic : (124, 4, 2)
     - x_static  : (124, 3)
     - y         : (124, 1)
     - years     : (124,)


.. GENERATED FROM PYTHON SOURCE LINES 256-268

Step 5 - Verify that the concatenation did what we expect
---------------------------------------------------------
A good lesson should make the merge logic concrete. We therefore build
a compact check table showing:

- the train contribution,
- the validation contribution,
- the test contribution,
- and the merged axis-0 size.

For this builder, the merged axis-0 size should equal the sum of the
requested splits for every key.

.. GENERATED FROM PYTHON SOURCE LINES 268-298

.. code-block:: Python


    rows: list[dict[str, int | str]] = []
    for key in sorted(merged):
        train_n = int(split_arrays["train"][key].shape[0])
        val_n = int(split_arrays["val"][key].shape[0])
        test_n = int(split_arrays["test"][key].shape[0])
        merged_n = int(merged[key].shape[0])

        rows.append(
            {
                "array": key,
                "train": train_n,
                "val": val_n,
                "test": test_n,
                "merged": merged_n,
            }
        )

    print("")
    print("Axis-0 merge check")
    for row in rows:
        print(
            f" - {row['array']:10s}: "
            f"train={row['train']:3d}, "
            f"val={row['val']:3d}, "
            f"test={row['test']:3d}, "
            f"merged={row['merged']:3d}"
        )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Axis-0 merge check
     - coords    : train= 72, val= 24, test= 28, merged=124
     - x_dynamic : train= 72, val= 24, test= 28, merged=124
     - x_static  : train= 72, val= 24, test= 28, merged=124
     - y         : train= 72, val= 24, test= 28, merged=124
     - years     : train= 72, val= 24, test= 28, merged=124


.. GENERATED FROM PYTHON SOURCE LINES 299-309

Step 6 - Build one compact preview figure
-----------------------------------------
The command itself writes an NPZ file, not a figure. For the gallery,
we add one compact visual summary so the page is easier to read.

Left:
  sample counts by split and merged output.

Right:
  a small heatmap showing axis-0 sizes per array key.

.. GENERATED FROM PYTHON SOURCE LINES 309-375

.. code-block:: Python


    sample_counts = {
        "train": split_specs["train"]["n"],
        "val": split_specs["val"]["n"],
        "test": split_specs["test"]["n"],
        "merged": sum(spec["n"] for spec in split_specs.values()),
    }

    heat_keys = [row["array"] for row in rows]
    heat_data = np.array(
        [
            [row["train"], row["val"], row["test"], row["merged"]]
            for row in rows
        ],
        dtype=float,
    )

    fig, axes = plt.subplots(
        1,
        2,
        figsize=(12.4, 4.8),
        constrained_layout=True,
    )

    ax = axes[0]
    labels = list(sample_counts)
    values = [sample_counts[k] for k in labels]
    ax.bar(labels, values)
    ax.set_title("Sample counts by split")
    ax.set_ylabel("Rows / samples")
    ax.set_xlabel("Payload")
    for i, v in enumerate(values):
        ax.text(
            i,
            v + 1.5,
            str(v),
            ha="center",
            va="bottom",
            fontsize=9,
        )

    ax = axes[1]
    im = ax.imshow(heat_data, aspect="auto")
    ax.set_title("Axis-0 sizes by array key")
    ax.set_xticks(range(4))
    ax.set_xticklabels(["train", "val", "test", "merged"])
    ax.set_yticks(range(len(heat_keys)))
    ax.set_yticklabels(heat_keys)
    ax.set_xlabel("Payload")

    for i in range(heat_data.shape[0]):
        for j in range(heat_data.shape[1]):
            ax.text(
                j,
                i,
                f"{int(heat_data[i, j])}",
                ha="center",
                va="center",
                fontsize=8,
                color="white" if heat_data[i, j] > 60 else "black",
            )

    cbar = fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
    cbar.set_label("Axis-0 size")


.. image-sg:: /auto_examples/tables_and_summaries/images/sphx_glr_build_full_inputs_npz_001.png
   :alt: Sample counts by split, Axis-0 sizes by array key
   :srcset: /auto_examples/tables_and_summaries/images/sphx_glr_build_full_inputs_npz_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 376-390

How to read this result
-----------------------
The merged NPZ is not a mysterious black box. It is simply the split
payloads stacked in the order requested by ``--splits``.

In this lesson:

- each array key is preserved,
- the non-leading dimensions stay unchanged,
- and the leading dimension grows from
  ``train + val + test`` to one merged archive.

That is why this builder is a useful bridge between Stage-1 exports
and later inspection or packaging tasks.

.. GENERATED FROM PYTHON SOURCE LINES 393-407

Practical takeaway
------------------
Use ``full-inputs-npz`` when you already trust the split artifacts and
want one compact archive for debugging, exchange, or downstream tools.

Before running it on real data, it is worth checking two things:

1. the manifest points to the intended split NPZ files;
2. the splits expose compatible keys unless you deliberately use
   ``--allow-missing-keys``.

If the command raises an input-key mismatch, that is usually a useful
diagnostic rather than a nuisance: it means your Stage-1 splits were
not exported consistently.

.. GENERATED FROM PYTHON SOURCE LINES 410-450

Command-line version
--------------------
The same lesson can be reproduced from the terminal.

Most direct path:

.. code-block:: bash

   geoprior-build full-inputs-npz \
     --stage1-dir results/zhongshan_GeoPriorSubsNet_stage1 \
     --splits train val test

Equivalent root dispatcher:

.. code-block:: bash

   geoprior build full-inputs-npz \
     --stage1-dir results/zhongshan_GeoPriorSubsNet_stage1 \
     --splits train val test

Resolve the manifest automatically from ``results/`` + city + model:

.. code-block:: bash

   geoprior build full-inputs-npz \
     --results-dir results \
     --city zhongshan \
     --model GeoPriorSubsNet

Write to a custom path and allow non-identical split keys:

.. code-block:: bash

   geoprior-build full-inputs-npz \
     --stage1-dir results/zhongshan_GeoPriorSubsNet_stage1 \
     --output scripts/out/zhongshan_full_inputs_debug.npz \
     --allow-missing-keys

The gallery page teaches the command.
The terminal form reproduces it in a real workflow.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.271 seconds)


.. _sphx_glr_download_auto_examples_tables_and_summaries_build_full_inputs_npz.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: build_full_inputs_npz.ipynb <build_full_inputs_npz.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: build_full_inputs_npz.py <build_full_inputs_npz.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: build_full_inputs_npz.zip <build_full_inputs_npz.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_