.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/tables_and_summaries/compute_hotspots.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_tables_and_summaries_compute_hotspots.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_tables_and_summaries_compute_hotspots.py:


Compute hotspot summary tables from forecast CSVs
=================================================

This example teaches you how to use GeoPrior's
``compute-hotspots`` utility.

Unlike the figure-generation scripts, this command is an artifact
builder. It reads evaluation and future forecast CSVs, compares
future annual subsidence against a baseline year, and writes one
compact hotspot summary table per city and year.

Why this matters
----------------
Forecast maps can look dramatic, but downstream reporting usually
needs a much more compact artifact.

This builder converts forecast CSVs into a tidy hotspot table with:

- the number of hotspots,
- hotspot subsidence ranges,
- hotspot anomaly ranges relative to the baseline year,
- the baseline mean,
- and the percentile threshold used to define hotspots.

That makes it a strong lesson page for the
``tables_and_summaries`` gallery.

.. GENERATED FROM PYTHON SOURCE LINES 31-35

Imports
-------
We call the real production builder, then read its outputs back in
and build one compact teaching preview.

.. GENERATED FROM PYTHON SOURCE LINES 35-47

.. code-block:: Python


    from __future__ import annotations

    import tempfile
    from pathlib import Path

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd

    import geoprior.scripts.compute_hotspots as hotspots_mod


.. GENERATED FROM PYTHON SOURCE LINES 48-56

Compatibility shim
------------------
The current script forwards ``prog=...`` into ``parse_args()``,
but the helper does not yet accept that keyword.

For the lesson page, we patch the helper locally so the example
stays runnable. Once the script itself is patched, this shim can
be removed.

.. GENERATED FROM PYTHON SOURCE LINES 56-70

.. code-block:: Python


    _orig_parse_args = hotspots_mod.parse_args


    def _parse_args_compat(
        argv: list[str] | None = None,
        *,
        prog: str | None = None,
    ):
        return hotspots_mod.build_parser(prog=prog).parse_args(argv)


    hotspots_mod.parse_args = _parse_args_compat  # type: ignore[assignment]


.. GENERATED FROM PYTHON SOURCE LINES 71-99

Build compact synthetic cumulative forecast archives
----------------------------------------------------
The production builder can work from:

- explicit per-city eval and future CSVs,
- or auto-discovered city source folders.

For a self-contained lesson page, we use explicit CSVs.

We generate:

- Nansha
- Zhongshan

with:

- eval years 2021 and 2022,
- future years 2023, 2024, and 2025,
- cumulative subsidence columns,
- and a stronger hotspot trend in Zhongshan.

This is useful because the production script's default mode for
this builder is often:

- baseline year = 2022
- subsidence kind = cumulative
- hotspot quantile = q50
- hotspot threshold = 90th percentile of anomaly magnitude

.. GENERATED FROM PYTHON SOURCE LINES 99-229

.. code-block:: Python


    rng = np.random.default_rng(7)

    eval_years = [2021, 2022]
    future_years = [2023, 2024, 2025]
    n_samples = 120


    def _city_frames(
        *,
        city: str,
        level_shift: float,
        hotspot_boost: float,
    ) -> tuple[pd.DataFrame, pd.DataFrame]:
        rows_eval: list[dict[str, object]] = []
        rows_future: list[dict[str, object]] = []

        hotspot_ids = set(range(16))

        for sample_idx in range(n_samples):
            spatial = 4.5 * np.sin(sample_idx / 11.0)
            local = rng.normal(0.0, 1.4)

            annual_2021 = 24.0 + level_shift + 0.7 * spatial + local
            annual_2022 = 31.0 + level_shift + 0.9 * spatial + local

            is_hot = sample_idx in hotspot_ids
            hot_term = hotspot_boost if is_hot else 0.0

            annual_2023 = (
                annual_2022
                + 3.2
                + 0.6 * hot_term
                + rng.normal(0.0, 1.2)
            )
            annual_2024 = (
                annual_2023
                + 2.6
                + 0.8 * hot_term
                + rng.normal(0.0, 1.2)
            )
            annual_2025 = (
                annual_2024
                + 1.8
                + 1.0 * hot_term
                + rng.normal(0.0, 1.2)
            )

            # Build cumulative paths.
            q50_cum_2021 = annual_2021
            q50_cum_2022 = q50_cum_2021 + annual_2022
            q50_cum_2023 = q50_cum_2022 + annual_2023
            q50_cum_2024 = q50_cum_2023 + annual_2024
            q50_cum_2025 = q50_cum_2024 + annual_2025

            # Eval actuals are slightly noisy but remain monotone.
            act_annual_2021 = max(
                1.0,
                annual_2021 + rng.normal(0.0, 0.9),
            )
            act_annual_2022 = max(
                1.0,
                annual_2022 + rng.normal(0.0, 1.0),
            )
            act_cum_2021 = act_annual_2021
            act_cum_2022 = act_cum_2021 + act_annual_2022

            def _q_band(center: float, width: float) -> tuple[float, float]:
                return max(0.0, center - width), center + width

            # Eval rows.
            for year, q50, actual, width in [
                (2021, q50_cum_2021, act_cum_2021, 6.5),
                (2022, q50_cum_2022, act_cum_2022, 7.5),
            ]:
                q10, q90 = _q_band(q50, width)
                rows_eval.append(
                    {
                        "city": city,
                        "sample_idx": sample_idx,
                        "coord_t": year,
                        "subsidence_actual": float(actual),
                        "subsidence_q10": float(q10),
                        "subsidence_q50": float(q50),
                        "subsidence_q90": float(q90),
                        "subsidence_unit": "mm",
                    }
                )

            # Future rows.
            for year, q50, width in [
                (2023, q50_cum_2023, 9.0),
                (2024, q50_cum_2024, 10.5),
                (2025, q50_cum_2025, 12.0),
            ]:
                q10, q90 = _q_band(q50, width)
                rows_future.append(
                    {
                        "city": city,
                        "sample_idx": sample_idx,
                        "coord_t": year,
                        "subsidence_q10": float(q10),
                        "subsidence_q50": float(q50),
                        "subsidence_q90": float(q90),
                        "subsidence_unit": "mm",
                    }
                )

        return pd.DataFrame(rows_eval), pd.DataFrame(rows_future)


    ns_eval_df, ns_future_df = _city_frames(
        city="Nansha",
        level_shift=0.0,
        hotspot_boost=4.0,
    )

    zh_eval_df, zh_future_df = _city_frames(
        city="Zhongshan",
        level_shift=5.5,
        hotspot_boost=7.0,
    )

    print("Nansha eval preview")
    print(ns_eval_df.head(6).to_string(index=False))

    print("")
    print("Nansha future preview")
    print(ns_future_df.head(6).to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Nansha eval preview
      city  sample_idx  coord_t  subsidence_actual  subsidence_q10  subsidence_q50  subsidence_q90 subsidence_unit
    Nansha           0     2021            23.5925         17.5017         24.0017         30.5017              mm
    Nansha           0     2022            53.6026         47.5034         55.0034         62.5034              mm
    Nansha           1     2021            24.8110         17.8702         24.3702         30.8702              mm
    Nansha           1     2022            56.6198         48.3220         55.8220         63.3220              mm
    Nansha           2     2021            23.5074         18.2172         24.7172         31.2172              mm
    Nansha           2     2022            54.9296         49.0971         56.5971         64.0971              mm

    Nansha future preview
      city  sample_idx  coord_t  subsidence_q10  subsidence_q50  subsidence_q90 subsidence_unit
    Nansha           0     2023         82.9637         91.9637        100.9637              mm
    Nansha           0     2024        123.8949        134.3949        144.8949              mm
    Nansha           0     2025        169.5575        181.5575        193.5575              mm
    Nansha           1     2023         85.4822         94.4822        103.4822              mm
    Nansha           1     2024        127.8517        138.3517        148.8517              mm
    Nansha           1     2025        175.2766        187.2766        199.2766              mm


.. GENERATED FROM PYTHON SOURCE LINES 230-234

Write the lesson inputs
-----------------------
The production builder consumes CSV files, so we follow the real
workflow exactly.

.. GENERATED FROM PYTHON SOURCE LINES 234-259

.. code-block:: Python


    tmp_dir = Path(
        tempfile.mkdtemp(prefix="gp_sg_hotspots_")
    )

    ns_eval_csv = tmp_dir / "nansha_eval.csv"
    ns_future_csv = tmp_dir / "nansha_future.csv"
    zh_eval_csv = tmp_dir / "zhongshan_eval.csv"
    zh_future_csv = tmp_dir / "zhongshan_future.csv"

    ns_eval_df.to_csv(ns_eval_csv, index=False)
    ns_future_df.to_csv(ns_future_csv, index=False)
    zh_eval_df.to_csv(zh_eval_csv, index=False)
    zh_future_df.to_csv(zh_future_csv, index=False)

    print("")
    print("Input files")
    for p in [
        ns_eval_csv,
        ns_future_csv,
        zh_eval_csv,
        zh_future_csv,
    ]:
        print(" -", p.name)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Input files
     - nansha_eval.csv
     - nansha_future.csv
     - zhongshan_eval.csv
     - zhongshan_future.csv


.. GENERATED FROM PYTHON SOURCE LINES 260-271

Run the real hotspot builder
----------------------------
We ask the production command to:

- use 2022 as the baseline year,
- interpret the inputs as cumulative subsidence,
- use actual values for the baseline when available,
- summarize 2023, 2024, and 2025,
- and export both CSV and LaTeX.

The result is one compact city × year table.

.. GENERATED FROM PYTHON SOURCE LINES 271-306

.. code-block:: Python


    out_stem = tmp_dir / "hotspots_gallery"

    hotspots_mod.compute_hotspots_main(
        [
            "--ns-eval",
            str(ns_eval_csv),
            "--ns-future",
            str(ns_future_csv),
            "--zh-eval",
            str(zh_eval_csv),
            "--zh-future",
            str(zh_future_csv),
            "--baseline-year",
            "2022",
            "--percentile",
            "90",
            "--subsidence-kind",
            "cumulative",
            "--baseline-source",
            "actual",
            "--quantile",
            "q50",
            "--years",
            "2023",
            "2024",
            "2025",
            "--format",
            "both",
            "--out",
            str(out_stem),
        ],
        prog="compute-hotspots",
    )


.. GENERATED FROM PYTHON SOURCE LINES 307-315

Inspect the produced artifacts
------------------------------
The builder writes:

- one CSV summary table,
- one LaTeX sidewaystable.

That is the core artifact family for this command.

.. GENERATED FROM PYTHON SOURCE LINES 315-323

.. code-block:: Python


    written = sorted(tmp_dir.glob("hotspots_gallery*"))

    print("")
    print("Written files")
    for p in written:
        print(" -", p.name)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Written files
     - hotspots_gallery.csv
     - hotspots_gallery.tex


.. GENERATED FROM PYTHON SOURCE LINES 324-327

Read the generated hotspot table
--------------------------------
This is the main output users will usually inspect first.

.. GENERATED FROM PYTHON SOURCE LINES 327-342

.. code-block:: Python


    tab_csv = tmp_dir / "hotspots_gallery.csv"
    tab_tex = tmp_dir / "hotspots_gallery.tex"

    tab = pd.read_csv(tab_csv)

    print("")
    print("Hotspot summary table")
    print(tab.to_string(index=False))

    print("")
    print("LaTeX preview")
    for line in tab_tex.read_text(encoding="utf-8").splitlines()[:12]:
        print(line)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Hotspot summary table
         City  Year  Hotspots_n  s_hot_min  s_hot_mean  s_hot_max  d_hot_mean  d_hot_max  mean_2022   T_0p9
       Nansha  2023          12    31.9408     39.0185    44.0507      7.1779    10.3769    31.3303  6.1956
       Nansha  2024          12    42.1282     45.3584    47.5097     12.3683    14.1385    31.3303 10.4846
       Nansha  2025          12    47.1625     50.8033    53.7864     17.8132    19.7875    31.3303 15.3173
    Zhongshan  2023          12    40.1022     45.1890    50.6990      8.8040    12.0300    36.9405  7.6105
    Zhongshan  2024          12    50.9522     54.6610    60.0870     15.7146    18.1200    36.9405 13.4832
    Zhongshan  2025          12    60.0815     63.8033    69.7340     24.6216    27.7670    36.9405 21.5802

    LaTeX preview
    \begin{sidewaystable}[p]
    \centering
    \caption{Characteristics of forecast hotspot clusters. Hotspots are defined as locations where the absolute change in annual subsidence relative to 2022 exceeds the 90th percentile threshold $T_{0.9}$ within each city and year.}
    \label{tab:hotspots}
    \begin{tabular}{llrccccccc}
    \toprule
    & & & \multicolumn{3}{c}{Hotspot subsidence $s_{\mathrm{hot}}$} & \multicolumn{2}{c}{Hotspot anomaly $|\Delta s_{\mathrm{hot}}|$ vs.\ 2022} & Mean 2022 subsidence & $T_{0.9}$ \\
    City & Year & Hotspots & \multicolumn{3}{c}{(mm yr$^{-1}$)} & \multicolumn{2}{c}{(mm yr$^{-1}$)} & (mm yr$^{-1}$) & (mm yr$^{-1}$) \\
    \cmidrule(lr){4-6} \cmidrule(lr){7-8}
    & & $n$ & min & mean & max & mean & max & & \\
    \midrule
    Nansha & 2023 & 12 & 31.9 & 39.0 & 44.1 & 7.2 & 10.4 & 31.3 & 6.2 \\


.. GENERATED FROM PYTHON SOURCE LINES 343-358

Show how cumulative inputs become annual series
-----------------------------------------------
This builder supports both:

- rate / increment data, where annual values are already present,
- cumulative data, where annual values must be reconstructed.

For cumulative inputs, the script uses:

- baseline annual 2022 = cumulative_2022 - cumulative_2021
- annual 2023          = future_2023   - cumulative_2022
- annual 2024          = future_2024   - future_2023
- annual 2025          = future_2025   - future_2024

Below is a concrete one-sample demonstration for Nansha.

.. GENERATED FROM PYTHON SOURCE LINES 358-400

.. code-block:: Python


    demo_sid = 0

    ev0 = ns_eval_df.loc[
        ns_eval_df["sample_idx"] == demo_sid
    ].sort_values("coord_t")
    fu0 = ns_future_df.loc[
        ns_future_df["sample_idx"] == demo_sid
    ].sort_values("coord_t")

    cum_2021 = float(
        ev0.loc[ev0["coord_t"] == 2021, "subsidence_actual"].iloc[0]
    )
    cum_2022 = float(
        ev0.loc[ev0["coord_t"] == 2022, "subsidence_actual"].iloc[0]
    )
    fut_2023 = float(
        fu0.loc[fu0["coord_t"] == 2023, "subsidence_q50"].iloc[0]
    )
    fut_2024 = float(
        fu0.loc[fu0["coord_t"] == 2024, "subsidence_q50"].iloc[0]
    )
    fut_2025 = float(
        fu0.loc[fu0["coord_t"] == 2025, "subsidence_q50"].iloc[0]
    )

    annual_demo = pd.DataFrame(
        {
            "year": [2022, 2023, 2024, 2025],
            "annual_value_mm": [
                cum_2022 - cum_2021,
                fut_2023 - cum_2022,
                fut_2024 - fut_2023,
                fut_2025 - fut_2024,
            ],
        }
    )

    print("")
    print("Annual conversion demo for one sample")
    print(annual_demo.to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Annual conversion demo for one sample
     year  annual_value_mm
     2022          30.0101
     2023          38.3611
     2024          42.4313
     2025          47.1625


.. GENERATED FROM PYTHON SOURCE LINES 401-411

Build one compact visual preview
--------------------------------
This preview is not part of the production builder itself.
It is a teaching aid for the gallery page.

Left:
  hotspot counts by year and city.

Right:
  the threshold T_0p9 by year and city.

.. GENERATED FROM PYTHON SOURCE LINES 411-465

.. code-block:: Python


    fig, axes = plt.subplots(
        1,
        2,
        figsize=(9.2, 3.9),
        constrained_layout=True,
    )

    # Hotspot counts by year and city
    ax = axes[0]
    cities = ["Nansha", "Zhongshan"]
    years = sorted(tab["Year"].unique())
    x = np.arange(len(years))
    w = 0.35

    for i, city in enumerate(cities):
        vals = []
        for year in years:
            sub = tab.loc[
                (tab["City"] == city) & (tab["Year"] == year),
                "Hotspots_n",
            ]
            vals.append(float(sub.iloc[0]) if not sub.empty else np.nan)

        ax.bar(
            x + (i - 0.5) * w,
            vals,
            width=w,
            label=city,
        )

    ax.set_title("Hotspot counts")
    ax.set_xlabel("Year")
    ax.set_ylabel("Hotspots_n")
    ax.set_xticks(x)
    ax.set_xticklabels([str(y) for y in years])
    ax.legend(fontsize=8)

    # Threshold progression
    ax = axes[1]
    for city in cities:
        sub = tab.loc[tab["City"] == city].sort_values("Year")
        ax.plot(
            sub["Year"].to_numpy(int),
            sub["T_0p9"].to_numpy(float),
            marker="o",
            label=city,
        )

    ax.set_title(r"Threshold progression: $T_{0.9}$")
    ax.set_xlabel("Year")
    ax.set_ylabel("Threshold [mm/yr]")
    ax.legend(fontsize=8)


.. image-sg:: /auto_examples/tables_and_summaries/images/sphx_glr_compute_hotspots_001.png
   :alt: Hotspot counts, Threshold progression: $T_{0.9}$
   :srcset: /auto_examples/tables_and_summaries/images/sphx_glr_compute_hotspots_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <matplotlib.legend.Legend object at 0x749e56deb0e0>


.. GENERATED FROM PYTHON SOURCE LINES 466-491

Learn how to read the table
---------------------------
Each row is one:

- city
- year

combination.

The most important columns are:

- ``Hotspots_n``:
  how many locations exceeded the anomaly threshold;
- ``s_hot_*``:
  the subsidence range among hotspot points;
- ``d_hot_*``:
  the anomaly range among hotspot points;
- ``mean_2022``:
  the city-wide baseline annual mean for 2022;
- ``T_0p9``:
  the percentile threshold used to define the hotspots.

So the table is not a geometry layer.
It is the compact numerical summary that later reporting pages or
figures can build on.

.. GENERATED FROM PYTHON SOURCE LINES 493-508

What the hotspot rule means
---------------------------
For each city and year, the builder computes:

- the annual baseline series for 2022,
- the annual forecast series for the requested year,
- the absolute anomaly:

    :math:`\lvert future\_annual - baseline\_{2022,\mathrm{annual}} \rvert`

Then it marks hotspots where that anomaly exceeds the chosen
percentile threshold.

With the default percentile of 90, this means the most anomalous
~10 percent of valid locations become hotspots for that city-year.

.. GENERATED FROM PYTHON SOURCE LINES 510-526

Why cumulative vs rate mode matters
-----------------------------------
This script supports two data conventions.

``rate`` or ``increment``
    the CSV values are already annual values.

``cumulative``
    the CSV values are cumulative totals, so the script first
    converts them back to annual increments before any hotspot
    calculation.

That makes the builder useful for both:

- already-annual forecast archives,
- and future forecast exports stored in cumulative form.

.. GENERATED FROM PYTHON SOURCE LINES 528-542

Why the baseline source matters
-------------------------------
The baseline year can come from:

- ``actual``:
  use observed 2022 subsidence when available;
- ``q50``:
  use the median forecast from the eval CSV.

In practice:

- ``actual`` is better when the eval archive includes observations;
- ``q50`` is a useful fallback when the builder is used on a purely
  predictive archive.

.. GENERATED FROM PYTHON SOURCE LINES 544-563

Why this command belongs in tables_and_summaries
------------------------------------------------
This builder is a bridge between raw forecast CSVs and later
presentation layers.

A useful workflow is:

1. generate or calibrate the forecast CSVs,
2. build the hotspot summary table,
3. inspect counts and thresholds by city and year,
4. only then move on to hotspot maps or narrative reporting.

That separation keeps:

- forecast generation,
- summary tabulation,
- and visualization

clearly distinct.

.. GENERATED FROM PYTHON SOURCE LINES 565-613

Command-line version
--------------------
The same lesson can be reproduced from the CLI.

Legacy dispatcher:

.. code-block:: bash

   python -m scripts compute-hotspots \
     --ns-eval results/nansha_eval.csv \
     --ns-future results/nansha_future.csv \
     --zh-eval results/zhongshan_eval.csv \
     --zh-future results/zhongshan_future.csv \
     --baseline-year 2022 \
     --subsidence-kind cumulative \
     --quantile q50 \
     --years 2023 2024 2025 \
     --format both \
     --out tab_hotspots

Modern CLI:

.. code-block:: bash

   geoprior build hotspots \
     --ns-eval results/nansha_eval.csv \
     --ns-future results/nansha_future.csv \
     --zh-eval results/zhongshan_eval.csv \
     --zh-future results/zhongshan_future.csv \
     --baseline-year 2022 \
     --subsidence-kind cumulative \
     --quantile q50 \
     --years 2023 2024 2025 \
     --format both \
     --out tab_hotspots

Auto-discovery by city source folder:

.. code-block:: bash

   geoprior build hotspots \
     --ns-src results/nansha_run \
     --zh-src results/zhongshan_run \
     --baseline-year 2022 \
     --n-years 2 \
     --format csv

The gallery page teaches the builder.
The command line reproduces it in a workflow.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.311 seconds)


.. _sphx_glr_download_auto_examples_tables_and_summaries_compute_hotspots.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: compute_hotspots.ipynb <compute_hotspots.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: compute_hotspots.py <compute_hotspots.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: compute_hotspots.zip <compute_hotspots.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_