.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/tables_and_summaries/build_extract_zones.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_tables_and_summaries_build_extract_zones.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_tables_and_summaries_build_extract_zones.py:


Extract threshold-based spatial zones with ``extract-zones``
============================================================

This lesson teaches how to build compact zone tables from a spatial
field using GeoPrior's ``extract-zones`` workflow.

Unlike ROI extraction, which cuts out one rectangular window, this
builder keeps only the points whose target value satisfies a threshold
rule. That makes it a natural tool for:

- hotspot extraction,
- low-score or vulnerable-zone screening,
- transition-band inspection,
- and small zone tables that can be passed to later summaries or plots.

Why this matters
----------------
Threshold-based extraction is one of the simplest and most practical
ways to turn a continuous spatial score into a reusable table of points.

In many applied workflows, the first useful question is not:

"What does the full field look like?"

It is:

"Which points are above a critical level?"

or

"Which points fall in the lower-risk tail?"

That is the role of ``extract-zones``.

What this lesson teaches
------------------------
We will:

1. build a realistic synthetic spatial table,
2. save it as two separate input files,
3. extract three different zone types,
4. compare their thresholds and sizes,
5. build one compact visual preview,
6. end with direct command-line examples.

.. GENERATED FROM PYTHON SOURCE LINES 49-53

Imports
-------
We use the real zone-extraction helper from GeoPrior and the shared
synthetic spatial helpers already used in the other gallery lessons.

.. GENERATED FROM PYTHON SOURCE LINES 53-66

.. code-block:: Python


    from __future__ import annotations

    import tempfile
    from pathlib import Path

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd

    from geoprior.utils.spatial_utils import extract_zones_from
    from geoprior.scripts import utils as script_utils


.. GENERATED FROM PYTHON SOURCE LINES 67-91

Step 1 - Build one reusable synthetic city support
--------------------------------------------------
Instead of generating plain random coordinates, we reuse the shared
support helper. That gives us a more city-like footprint and keeps
the lesson visually consistent with the other build examples.

Key parameters
--------------
city:
    Only a label attached to the synthetic support.
center_x, center_y:
    Center of the projected coordinate system.
span_x, span_y:
    Half-width and half-height of the support extent.
nx, ny:
    Mesh density before masking.
jitter_x, jitter_y:
    Small perturbations so the support is not a perfect mesh.
footprint:
    Shape of the retained support.
keep_frac:
    Fraction of masked points to keep.
seed:
    Reproducibility seed.

.. GENERATED FROM PYTHON SOURCE LINES 91-109

.. code-block:: Python


    support = script_utils.make_spatial_support(
        script_utils.SpatialSupportSpec(
            city="ZonesDemo",
            center_x=5_000.0,
            center_y=3_100.0,
            span_x=4_800.0,
            span_y=3_400.0,
            nx=35,
            ny=27,
            jitter_x=30.0,
            jitter_y=24.0,
            footprint="nansha_like",
            keep_frac=0.92,
            seed=303,
        )
    )


.. GENERATED FROM PYTHON SOURCE LINES 110-120

Step 2 - Turn the support into a synthetic spatial score table
--------------------------------------------------------------
``extract_zones_from`` works on one score variable ``z`` and optional
coordinate variables ``x`` and ``y``. To make the lesson realistic, we
build one continuous score field that looks like an interpretable urban
response surface.

The score can be read as a generic hazard, severity, or hotspot index.
The exact semantics are not important here. The important point is that
it is continuous, spatially structured, and suitable for thresholding.

.. GENERATED FROM PYTHON SOURCE LINES 120-167

.. code-block:: Python


    rng = np.random.default_rng(303)

    base_field = script_utils.make_spatial_field(
        support,
        amplitude=2.45,
        drift_x=1.10,
        drift_y=0.35,
        phase=0.50,
        local_weight=0.18,
    )

    spread_field = script_utils.make_spatial_scale(
        support,
        base=0.25,
        x_weight=0.10,
        hotspot_weight=0.07,
    )

    city_df = support.to_frame().copy()
    city_df["district"] = np.where(
        city_df["x_norm"] > 0.62,
        "East corridor",
        np.where(city_df["y_norm"] > 0.58, "North belt", "Urban core"),
    )
    city_df["susceptibility_score"] = (
        6.2
        + 1.85 * base_field
        + 1.10 * spread_field
        + rng.normal(0.0, 0.18, len(city_df))
    )
    city_df["rainfall_mm"] = (
        1180
        + 95 * city_df["y_norm"]
        + rng.normal(0.0, 10.0, len(city_df))
    )
    city_df["groundwater_depth_m"] = (
        8.4
        + 0.9 * city_df["x_norm"]
        + 0.5 * city_df["y_norm"]
        + rng.normal(0.0, 0.08, len(city_df))
    )

    print("Synthetic table shape:", city_df.shape)
    print("")
    print(city_df.head(10).to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Synthetic table shape: (414, 10)

     sample_idx   coord_x  coord_y  x_norm  y_norm      city      district  susceptibility_score  rainfall_mm  groundwater_depth_m
              0 3654.7573 725.7191  0.3621  0.1582 ZonesDemo    Urban core                7.8543    1196.5087               8.8016
              1 3879.0054 716.6930  0.3851  0.1569 ZonesDemo    Urban core                7.8991    1203.0117               8.8067
              2 4127.1475 743.3290  0.4106  0.1608 ZonesDemo    Urban core                7.8215    1196.6609               8.9188
              3 4530.8041 743.3618  0.4521  0.1608 ZonesDemo    Urban core                8.0307    1166.7556               8.7492
              4 4741.9471 747.9166  0.4737  0.1615 ZonesDemo    Urban core                7.8903    1208.5079               9.0510
              5 4987.8599 751.7753  0.4990  0.1620 ZonesDemo    Urban core                8.3513    1191.2383               8.8633
              6 5290.1053 760.6047  0.5300  0.1633 ZonesDemo    Urban core                7.9618    1193.3879               9.1051
              7 5554.3981 728.7704  0.5572  0.1587 ZonesDemo    Urban core                8.0325    1199.9865               9.0134
              8 5886.9462 751.2118  0.5913  0.1619 ZonesDemo    Urban core                8.1350    1184.2525               8.9630
              9 6363.0598 730.2099  0.6402  0.1589 ZonesDemo East corridor                8.5844    1211.7271               9.0817


.. GENERATED FROM PYTHON SOURCE LINES 168-173

Step 3 - Save the table as two input files
------------------------------------------
The public build layer accepts one or many input tables. To make that
visible in the lesson, we split the synthetic city into two files and
conceptually rebuild the same combined table again.

.. GENERATED FROM PYTHON SOURCE LINES 173-197

.. code-block:: Python


    tmp_dir = Path(
        tempfile.mkdtemp(prefix="gp_sg_extract_zones_")
    )

    west_csv = tmp_dir / "zones_demo_west.csv"
    east_csv = tmp_dir / "zones_demo_east.csv"

    x_mid = float(city_df["coord_x"].median())
    city_df.loc[city_df["coord_x"] <= x_mid].to_csv(
        west_csv,
        index=False,
    )
    city_df.loc[city_df["coord_x"] > x_mid].to_csv(
        east_csv,
        index=False,
    )

    print("")
    print("Input files")
    print(" -", west_csv.name)
    print(" -", east_csv.name)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Input files
     - zones_demo_west.csv
     - zones_demo_east.csv


.. GENERATED FROM PYTHON SOURCE LINES 198-214

Step 4 - Define three thresholding stories
------------------------------------------
We extract three different kinds of zones from the same score field.
This makes the lesson more useful than a single one-off threshold.

1. ``low_auto``
   lower-tail automatic thresholding using a percentile.

2. ``hotspot``
   points above an explicit upper threshold.

3. ``transition``
   points inside an intermediate score band.

The explicit thresholds are derived from quantiles so the lesson stays
stable even if the synthetic field is adjusted later.

.. GENERATED FROM PYTHON SOURCE LINES 214-233

.. code-block:: Python


    z = city_df["susceptibility_score"].to_numpy(float)
    auto_percentile = 15
    auto_threshold = float(np.percentile(z, auto_percentile))
    hotspot_threshold = float(np.percentile(z, 82))
    transition_band = (
        float(np.percentile(z, 46)),
        float(np.percentile(z, 64)),
    )

    print("")
    print("Threshold design")
    print(f" - auto lower threshold (p{auto_percentile}) = {auto_threshold:.3f}")
    print(f" - hotspot threshold = {hotspot_threshold:.3f}")
    print(
        " - transition band = "
        f"({transition_band[0]:.3f}, {transition_band[1]:.3f})"
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Threshold design
     - auto lower threshold (p15) = 8.080
     - hotspot threshold = 10.528
     - transition band = (8.865, 9.539)


.. GENERATED FROM PYTHON SOURCE LINES 234-241

Step 5 - Extract the three zone tables
--------------------------------------
We call the real extraction helper directly. The public CLI command is
a thin convenience layer over the same threshold logic.

``extract_zones_from`` returns only the filtered coordinate and score
columns, which is exactly what we want for compact zone exports.

.. GENERATED FROM PYTHON SOURCE LINES 241-315

.. code-block:: Python


    def _zone_table(
        data: pd.DataFrame,
        *,
        threshold,
        condition: str,
        percentile: int | float = 10,
        use_negative_criteria: bool = True,
    ) -> pd.DataFrame:
        out = extract_zones_from(
            z="susceptibility_score",
            x="coord_x",
            y="coord_y",
            data=data,
            threshold=threshold,
            condition=condition,
            use_negative_criteria=use_negative_criteria,
            percentile=percentile,
            view=False,
        )
        out = out.copy()
        out.columns = ["coord_x", "coord_y", "susceptibility_score"]
        return out


    low_auto = _zone_table(
        city_df,
        threshold="auto",
        condition="auto",
        percentile=auto_percentile,
        use_negative_criteria=True,
    )

    hotspot = _zone_table(
        city_df,
        threshold=hotspot_threshold,
        condition="above",
        use_negative_criteria=False,
    )

    transition = _zone_table(
        city_df,
        threshold=transition_band,
        condition="between",
        use_negative_criteria=False,
    )

    low_csv = tmp_dir / "zones_demo_low_auto.csv"
    hotspot_csv = tmp_dir / "zones_demo_hotspot.csv"
    transition_csv = tmp_dir / "zones_demo_transition.csv"

    low_auto.to_csv(low_csv, index=False)
    hotspot.to_csv(hotspot_csv, index=False)
    transition.to_csv(transition_csv, index=False)

    print("")
    print("Written files")
    print(" -", low_csv.name)
    print(" -", hotspot_csv.name)
    print(" -", transition_csv.name)

    print("")
    print("Low-score auto zone")
    print(low_auto.head(8).to_string(index=False))

    print("")
    print("Hotspot zone")
    print(hotspot.head(8).to_string(index=False))

    print("")
    print("Transition band")
    print(transition.head(8).to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Written files
     - zones_demo_low_auto.csv
     - zones_demo_hotspot.csv
     - zones_demo_transition.csv

    Low-score auto zone
      coord_x   coord_y  susceptibility_score
    3654.7573  725.7191                7.8543
    3879.0054  716.6930                7.8991
    4127.1475  743.3290                7.8215
    4530.8041  743.3618                8.0307
    4741.9471  747.9166                7.8903
    5290.1053  760.6047                7.9618
    5554.3981  728.7704                8.0325
    2735.5954 1017.2204                7.8516

    Hotspot zone
      coord_x   coord_y  susceptibility_score
    5902.7448 1802.0862               11.1294
    6135.7027 1827.0407               11.6021
    6410.7167 1787.5672               11.4881
    6684.9167 1765.4587               11.1619
    6992.9990 1793.7800               11.4905
    7333.2870 1788.1407               10.5601
    5313.0400 2046.3373               10.6809
    5624.3402 2059.1097               11.0479

    Transition band
      coord_x   coord_y  susceptibility_score
    6143.2959  980.3361                9.0711
    6669.7483 1010.8933                9.0629
    6911.9625 1018.0694                9.4021
    7317.7457 1030.8998                8.8854
    5362.6798 1274.7406                9.1439
    6144.5651 1263.2123                9.3788
    6678.2034 1285.9056                9.3800
    6943.2837 1250.0980                9.2733


.. GENERATED FROM PYTHON SOURCE LINES 316-320

Step 6 - Summarize the three outputs
------------------------------------
A small summary table makes it easy to compare how the different
threshold rules behave.

.. GENERATED FROM PYTHON SOURCE LINES 320-365

.. code-block:: Python


    def summarize_zone(
        df: pd.DataFrame,
        *,
        label: str,
        threshold_text: str,
    ) -> dict[str, object]:
        return {
            "zone": label,
            "threshold_rule": threshold_text,
            "rows": int(len(df)),
            "score_min": float(df["susceptibility_score"].min()),
            "score_mean": float(df["susceptibility_score"].mean()),
            "score_max": float(df["susceptibility_score"].max()),
        }


    summary = pd.DataFrame(
        [
            summarize_zone(
                low_auto,
                label="low_auto",
                threshold_text=f"auto below p{auto_percentile}",
            ),
            summarize_zone(
                hotspot,
                label="hotspot",
                threshold_text=f"> {hotspot_threshold:.2f}",
            ),
            summarize_zone(
                transition,
                label="transition",
                threshold_text=(
                    f"between {transition_band[0]:.2f} and "
                    f"{transition_band[1]:.2f}"
                ),
            ),
        ]
    )

    print("")
    print("Zone summary")
    print(summary.to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Zone summary
          zone        threshold_rule  rows  score_min  score_mean  score_max
      low_auto        auto below p15    62     6.7745      7.6908     8.0779
       hotspot               > 10.53    75    10.5367     11.6053    13.4455
    transition between 8.87 and 9.54    75     8.8652      9.1966     9.5345


.. GENERATED FROM PYTHON SOURCE LINES 366-376

Step 7 - Build one compact visual preview
-----------------------------------------
Left:
  full score field with the low-score zone overlaid.

Middle:
  full score field with hotspot and transition points overlaid.

Right:
  score histogram with the three threshold rules annotated.

.. GENERATED FROM PYTHON SOURCE LINES 376-481

.. code-block:: Python


    fig, axes = plt.subplots(
        1,
        3,
        figsize=(15.0, 4.9),
        constrained_layout=True,
    )

    # Panel 1 - lower-tail zone.
    ax = axes[0]
    sc = ax.scatter(
        city_df["coord_x"],
        city_df["coord_y"],
        c=city_df["susceptibility_score"],
        s=14,
        cmap="viridis",
        alpha=0.86,
    )
    ax.scatter(
        low_auto["coord_x"],
        low_auto["coord_y"],
        s=32,
        facecolors="none",
        edgecolors="white",
        linewidths=1.2,
        label="low-score zone",
    )
    ax.set_title("Automatic lower-tail zone")
    ax.set_xlabel("coord_x")
    ax.set_ylabel("coord_y")
    ax.legend(frameon=False, fontsize=8)
    ax.grid(True, linestyle=":", alpha=0.35)
    ax.set_aspect("equal", adjustable="box")

    # Panel 2 - hotspot and transition zones.
    ax = axes[1]
    ax.scatter(
        city_df["coord_x"],
        city_df["coord_y"],
        c=city_df["susceptibility_score"],
        s=12,
        cmap="viridis",
        alpha=0.20,
    )
    ax.scatter(
        transition["coord_x"],
        transition["coord_y"],
        s=20,
        color="tab:orange",
        alpha=0.72,
        label="transition band",
    )
    ax.scatter(
        hotspot["coord_x"],
        hotspot["coord_y"],
        s=28,
        color="tab:red",
        alpha=0.88,
        label="hotspots",
    )
    ax.set_title("Intermediate band and hotspots")
    ax.set_xlabel("coord_x")
    ax.set_ylabel("coord_y")
    ax.legend(frameon=False, fontsize=8)
    ax.grid(True, linestyle=":", alpha=0.35)
    ax.set_aspect("equal", adjustable="box")

    # Panel 3 - score histogram and thresholds.
    ax = axes[2]
    ax.hist(
        city_df["susceptibility_score"],
        bins=24,
        alpha=0.75,
        edgecolor="white",
    )
    ax.axvline(
        auto_threshold,
        linestyle="--",
        linewidth=1.6,
        label=f"auto p{auto_percentile}",
    )
    ax.axvline(
        hotspot_threshold,
        linestyle="-.",
        linewidth=1.6,
        label="hotspot threshold",
    )
    ax.axvspan(
        transition_band[0],
        transition_band[1],
        alpha=0.20,
        label="transition band",
    )
    ax.set_title("Score distribution and thresholds")
    ax.set_xlabel("susceptibility_score")
    ax.set_ylabel("count")
    ax.legend(frameon=False, fontsize=8)
    ax.grid(True, axis="y", linestyle=":", alpha=0.35)

    # Add one shared colorbar for the spatial panels.
    cbar = fig.colorbar(sc, ax=axes[:2], fraction=0.035, pad=0.02)
    cbar.set_label("susceptibility_score")

    plt.show()


.. image-sg:: /auto_examples/tables_and_summaries/images/sphx_glr_build_extract_zones_001.png
   :alt: Automatic lower-tail zone, Intermediate band and hotspots, Score distribution and thresholds
   :srcset: /auto_examples/tables_and_summaries/images/sphx_glr_build_extract_zones_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 482-494

How to read this output
-----------------------
A useful reading order is:

1. inspect the continuous score field,
2. check which threshold rule was applied,
3. compare the selected point counts,
4. then compare the score ranges in the exported tables.

The key idea is that the builder is not estimating a new field. It is
simply turning one existing continuous score into one reusable point
table based on a threshold criterion.

.. GENERATED FROM PYTHON SOURCE LINES 496-507

Why this builder is useful in practice
--------------------------------------
``extract-zones`` is a strong support-layer builder when you need one
threshold-defined subset rather than a full-city table.

Typical uses include:

- extracting hotspot candidates,
- isolating the lower-risk or lower-response tail,
- creating intermediate bands for sensitivity checks,
- or preparing compact point tables for later reports.

.. GENERATED FROM PYTHON SOURCE LINES 509-577

Command-line usage
------------------
The examples below show the same workflow as direct terminal commands.

Explicit hotspot extraction
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   geoprior-build extract-zones \
       zones_demo_west.csv zones_demo_east.csv \
       --x-col coord_x \
       --y-col coord_y \
       --z-col susceptibility_score \
       --threshold 8.43 \
       --condition above \
       --output zones_demo_hotspot.csv

The same command through the root dispatcher
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   geoprior build extract-zones \
       zones_demo_west.csv zones_demo_east.csv \
       --x-col coord_x \
       --y-col coord_y \
       --z-col susceptibility_score \
       --threshold 8.43 \
       --condition above \
       --output zones_demo_hotspot.csv

Intermediate transition band
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   geoprior-build extract-zones \
       zones_demo_west.csv zones_demo_east.csv \
       --x-col coord_x \
       --y-col coord_y \
       --z-col susceptibility_score \
       --threshold 6.45 7.28 \
       --condition between \
       --output zones_demo_transition.csv

Lower-tail automatic extraction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The underlying helper also supports percentile-driven automatic
thresholds. When your local wrapper exposes the same options, a
typical command looks like:

.. code-block:: bash

   geoprior-build extract-zones \
       zones_demo_west.csv zones_demo_east.csv \
       --x-col coord_x \
       --y-col coord_y \
       --z-col susceptibility_score \
       --threshold auto \
       --percentile 15 \
       --condition auto \
       --output zones_demo_low_auto.csv

Because the shared reader accepts one or many input tables, the same
command family can also be used with CSV, TSV, Parquet, Excel, JSON,
Feather, or Pickle inputs.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.526 seconds)


.. _sphx_glr_download_auto_examples_tables_and_summaries_build_extract_zones.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: build_extract_zones.ipynb <build_extract_zones.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: build_extract_zones.py <build_extract_zones.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: build_extract_zones.zip <build_extract_zones.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_