Extract threshold-based spatial zones with extract-zones#

This lesson teaches how to build compact zone tables from a spatial field using GeoPrior’s extract-zones workflow.

Unlike ROI extraction, which cuts out one rectangular window, this builder keeps only the points whose target value satisfies a threshold rule. That makes it a natural tool for:

  • hotspot extraction,

  • low-score or vulnerable-zone screening,

  • transition-band inspection,

  • and small zone tables that can be passed to later summaries or plots.

Why this matters#

Threshold-based extraction is one of the simplest and most practical ways to turn a continuous spatial score into a reusable table of points.

In many applied workflows, the first useful question is not:

“What does the full field look like?”

It is:

“Which points are above a critical level?”

or

“Which points fall in the lower-risk tail?”

That is the role of extract-zones.

What this lesson teaches#

We will:

  1. build a realistic synthetic spatial table,

  2. save it as two separate input files,

  3. extract three different zone types,

  4. compare their thresholds and sizes,

  5. build one compact visual preview,

  6. end with direct command-line examples.

Imports#

We use the real zone-extraction helper from GeoPrior and the shared synthetic spatial helpers already used in the other gallery lessons.

from __future__ import annotations

import tempfile
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from geoprior.utils.spatial_utils import extract_zones_from
from geoprior.scripts import utils as script_utils

Step 1 - Build one reusable synthetic city support#

Instead of generating plain random coordinates, we reuse the shared support helper. That gives us a more city-like footprint and keeps the lesson visually consistent with the other build examples.

Key parameters#

city:

Only a label attached to the synthetic support.

center_x, center_y:

Center of the projected coordinate system.

span_x, span_y:

Half-width and half-height of the support extent.

nx, ny:

Mesh density before masking.

jitter_x, jitter_y:

Small perturbations so the support is not a perfect mesh.

footprint:

Shape of the retained support.

keep_frac:

Fraction of masked points to keep.

seed:

Reproducibility seed.

support = script_utils.make_spatial_support(
    script_utils.SpatialSupportSpec(
        city="ZonesDemo",
        center_x=5_000.0,
        center_y=3_100.0,
        span_x=4_800.0,
        span_y=3_400.0,
        nx=35,
        ny=27,
        jitter_x=30.0,
        jitter_y=24.0,
        footprint="nansha_like",
        keep_frac=0.92,
        seed=303,
    )
)

Step 2 - Turn the support into a synthetic spatial score table#

extract_zones_from works on one score variable z and optional coordinate variables x and y. To make the lesson realistic, we build one continuous score field that looks like an interpretable urban response surface.

The score can be read as a generic hazard, severity, or hotspot index. The exact semantics are not important here. The important point is that it is continuous, spatially structured, and suitable for thresholding.

rng = np.random.default_rng(303)

base_field = script_utils.make_spatial_field(
    support,
    amplitude=2.45,
    drift_x=1.10,
    drift_y=0.35,
    phase=0.50,
    local_weight=0.18,
)

spread_field = script_utils.make_spatial_scale(
    support,
    base=0.25,
    x_weight=0.10,
    hotspot_weight=0.07,
)

city_df = support.to_frame().copy()
city_df["district"] = np.where(
    city_df["x_norm"] > 0.62,
    "East corridor",
    np.where(city_df["y_norm"] > 0.58, "North belt", "Urban core"),
)
city_df["susceptibility_score"] = (
    6.2
    + 1.85 * base_field
    + 1.10 * spread_field
    + rng.normal(0.0, 0.18, len(city_df))
)
city_df["rainfall_mm"] = (
    1180
    + 95 * city_df["y_norm"]
    + rng.normal(0.0, 10.0, len(city_df))
)
city_df["groundwater_depth_m"] = (
    8.4
    + 0.9 * city_df["x_norm"]
    + 0.5 * city_df["y_norm"]
    + rng.normal(0.0, 0.08, len(city_df))
)

print("Synthetic table shape:", city_df.shape)
print("")
print(city_df.head(10).to_string(index=False))
Synthetic table shape: (414, 10)

 sample_idx   coord_x  coord_y  x_norm  y_norm      city      district  susceptibility_score  rainfall_mm  groundwater_depth_m
          0 3654.7573 725.7191  0.3621  0.1582 ZonesDemo    Urban core                7.8543    1196.5087               8.8016
          1 3879.0054 716.6930  0.3851  0.1569 ZonesDemo    Urban core                7.8991    1203.0117               8.8067
          2 4127.1475 743.3290  0.4106  0.1608 ZonesDemo    Urban core                7.8215    1196.6609               8.9188
          3 4530.8041 743.3618  0.4521  0.1608 ZonesDemo    Urban core                8.0307    1166.7556               8.7492
          4 4741.9471 747.9166  0.4737  0.1615 ZonesDemo    Urban core                7.8903    1208.5079               9.0510
          5 4987.8599 751.7753  0.4990  0.1620 ZonesDemo    Urban core                8.3513    1191.2383               8.8633
          6 5290.1053 760.6047  0.5300  0.1633 ZonesDemo    Urban core                7.9618    1193.3879               9.1051
          7 5554.3981 728.7704  0.5572  0.1587 ZonesDemo    Urban core                8.0325    1199.9865               9.0134
          8 5886.9462 751.2118  0.5913  0.1619 ZonesDemo    Urban core                8.1350    1184.2525               8.9630
          9 6363.0598 730.2099  0.6402  0.1589 ZonesDemo East corridor                8.5844    1211.7271               9.0817

Step 3 - Save the table as two input files#

The public build layer accepts one or many input tables. To make that visible in the lesson, we split the synthetic city into two files and conceptually rebuild the same combined table again.

tmp_dir = Path(
    tempfile.mkdtemp(prefix="gp_sg_extract_zones_")
)

west_csv = tmp_dir / "zones_demo_west.csv"
east_csv = tmp_dir / "zones_demo_east.csv"

x_mid = float(city_df["coord_x"].median())
city_df.loc[city_df["coord_x"] <= x_mid].to_csv(
    west_csv,
    index=False,
)
city_df.loc[city_df["coord_x"] > x_mid].to_csv(
    east_csv,
    index=False,
)

print("")
print("Input files")
print(" -", west_csv.name)
print(" -", east_csv.name)
Input files
 - zones_demo_west.csv
 - zones_demo_east.csv

Step 4 - Define three thresholding stories#

We extract three different kinds of zones from the same score field. This makes the lesson more useful than a single one-off threshold.

  1. low_auto lower-tail automatic thresholding using a percentile.

  2. hotspot points above an explicit upper threshold.

  3. transition points inside an intermediate score band.

The explicit thresholds are derived from quantiles so the lesson stays stable even if the synthetic field is adjusted later.

z = city_df["susceptibility_score"].to_numpy(float)
auto_percentile = 15
auto_threshold = float(np.percentile(z, auto_percentile))
hotspot_threshold = float(np.percentile(z, 82))
transition_band = (
    float(np.percentile(z, 46)),
    float(np.percentile(z, 64)),
)

print("")
print("Threshold design")
print(f" - auto lower threshold (p{auto_percentile}) = {auto_threshold:.3f}")
print(f" - hotspot threshold = {hotspot_threshold:.3f}")
print(
    " - transition band = "
    f"({transition_band[0]:.3f}, {transition_band[1]:.3f})"
)
Threshold design
 - auto lower threshold (p15) = 8.080
 - hotspot threshold = 10.528
 - transition band = (8.865, 9.539)

Step 5 - Extract the three zone tables#

We call the real extraction helper directly. The public CLI command is a thin convenience layer over the same threshold logic.

extract_zones_from returns only the filtered coordinate and score columns, which is exactly what we want for compact zone exports.

def _zone_table(
    data: pd.DataFrame,
    *,
    threshold,
    condition: str,
    percentile: int | float = 10,
    use_negative_criteria: bool = True,
) -> pd.DataFrame:
    out = extract_zones_from(
        z="susceptibility_score",
        x="coord_x",
        y="coord_y",
        data=data,
        threshold=threshold,
        condition=condition,
        use_negative_criteria=use_negative_criteria,
        percentile=percentile,
        view=False,
    )
    out = out.copy()
    out.columns = ["coord_x", "coord_y", "susceptibility_score"]
    return out


low_auto = _zone_table(
    city_df,
    threshold="auto",
    condition="auto",
    percentile=auto_percentile,
    use_negative_criteria=True,
)

hotspot = _zone_table(
    city_df,
    threshold=hotspot_threshold,
    condition="above",
    use_negative_criteria=False,
)

transition = _zone_table(
    city_df,
    threshold=transition_band,
    condition="between",
    use_negative_criteria=False,
)

low_csv = tmp_dir / "zones_demo_low_auto.csv"
hotspot_csv = tmp_dir / "zones_demo_hotspot.csv"
transition_csv = tmp_dir / "zones_demo_transition.csv"

low_auto.to_csv(low_csv, index=False)
hotspot.to_csv(hotspot_csv, index=False)
transition.to_csv(transition_csv, index=False)

print("")
print("Written files")
print(" -", low_csv.name)
print(" -", hotspot_csv.name)
print(" -", transition_csv.name)

print("")
print("Low-score auto zone")
print(low_auto.head(8).to_string(index=False))

print("")
print("Hotspot zone")
print(hotspot.head(8).to_string(index=False))

print("")
print("Transition band")
print(transition.head(8).to_string(index=False))
Written files
 - zones_demo_low_auto.csv
 - zones_demo_hotspot.csv
 - zones_demo_transition.csv

Low-score auto zone
  coord_x   coord_y  susceptibility_score
3654.7573  725.7191                7.8543
3879.0054  716.6930                7.8991
4127.1475  743.3290                7.8215
4530.8041  743.3618                8.0307
4741.9471  747.9166                7.8903
5290.1053  760.6047                7.9618
5554.3981  728.7704                8.0325
2735.5954 1017.2204                7.8516

Hotspot zone
  coord_x   coord_y  susceptibility_score
5902.7448 1802.0862               11.1294
6135.7027 1827.0407               11.6021
6410.7167 1787.5672               11.4881
6684.9167 1765.4587               11.1619
6992.9990 1793.7800               11.4905
7333.2870 1788.1407               10.5601
5313.0400 2046.3373               10.6809
5624.3402 2059.1097               11.0479

Transition band
  coord_x   coord_y  susceptibility_score
6143.2959  980.3361                9.0711
6669.7483 1010.8933                9.0629
6911.9625 1018.0694                9.4021
7317.7457 1030.8998                8.8854
5362.6798 1274.7406                9.1439
6144.5651 1263.2123                9.3788
6678.2034 1285.9056                9.3800
6943.2837 1250.0980                9.2733

Step 6 - Summarize the three outputs#

A small summary table makes it easy to compare how the different threshold rules behave.

def summarize_zone(
    df: pd.DataFrame,
    *,
    label: str,
    threshold_text: str,
) -> dict[str, object]:
    return {
        "zone": label,
        "threshold_rule": threshold_text,
        "rows": int(len(df)),
        "score_min": float(df["susceptibility_score"].min()),
        "score_mean": float(df["susceptibility_score"].mean()),
        "score_max": float(df["susceptibility_score"].max()),
    }


summary = pd.DataFrame(
    [
        summarize_zone(
            low_auto,
            label="low_auto",
            threshold_text=f"auto below p{auto_percentile}",
        ),
        summarize_zone(
            hotspot,
            label="hotspot",
            threshold_text=f"> {hotspot_threshold:.2f}",
        ),
        summarize_zone(
            transition,
            label="transition",
            threshold_text=(
                f"between {transition_band[0]:.2f} and "
                f"{transition_band[1]:.2f}"
            ),
        ),
    ]
)

print("")
print("Zone summary")
print(summary.to_string(index=False))
Zone summary
      zone        threshold_rule  rows  score_min  score_mean  score_max
  low_auto        auto below p15    62     6.7745      7.6908     8.0779
   hotspot               > 10.53    75    10.5367     11.6053    13.4455
transition between 8.87 and 9.54    75     8.8652      9.1966     9.5345

Step 7 - Build one compact visual preview#

Left:

full score field with the low-score zone overlaid.

Middle:

full score field with hotspot and transition points overlaid.

Right:

score histogram with the three threshold rules annotated.

fig, axes = plt.subplots(
    1,
    3,
    figsize=(15.0, 4.9),
    constrained_layout=True,
)

# Panel 1 - lower-tail zone.
ax = axes[0]
sc = ax.scatter(
    city_df["coord_x"],
    city_df["coord_y"],
    c=city_df["susceptibility_score"],
    s=14,
    cmap="viridis",
    alpha=0.86,
)
ax.scatter(
    low_auto["coord_x"],
    low_auto["coord_y"],
    s=32,
    facecolors="none",
    edgecolors="white",
    linewidths=1.2,
    label="low-score zone",
)
ax.set_title("Automatic lower-tail zone")
ax.set_xlabel("coord_x")
ax.set_ylabel("coord_y")
ax.legend(frameon=False, fontsize=8)
ax.grid(True, linestyle=":", alpha=0.35)
ax.set_aspect("equal", adjustable="box")

# Panel 2 - hotspot and transition zones.
ax = axes[1]
ax.scatter(
    city_df["coord_x"],
    city_df["coord_y"],
    c=city_df["susceptibility_score"],
    s=12,
    cmap="viridis",
    alpha=0.20,
)
ax.scatter(
    transition["coord_x"],
    transition["coord_y"],
    s=20,
    color="tab:orange",
    alpha=0.72,
    label="transition band",
)
ax.scatter(
    hotspot["coord_x"],
    hotspot["coord_y"],
    s=28,
    color="tab:red",
    alpha=0.88,
    label="hotspots",
)
ax.set_title("Intermediate band and hotspots")
ax.set_xlabel("coord_x")
ax.set_ylabel("coord_y")
ax.legend(frameon=False, fontsize=8)
ax.grid(True, linestyle=":", alpha=0.35)
ax.set_aspect("equal", adjustable="box")

# Panel 3 - score histogram and thresholds.
ax = axes[2]
ax.hist(
    city_df["susceptibility_score"],
    bins=24,
    alpha=0.75,
    edgecolor="white",
)
ax.axvline(
    auto_threshold,
    linestyle="--",
    linewidth=1.6,
    label=f"auto p{auto_percentile}",
)
ax.axvline(
    hotspot_threshold,
    linestyle="-.",
    linewidth=1.6,
    label="hotspot threshold",
)
ax.axvspan(
    transition_band[0],
    transition_band[1],
    alpha=0.20,
    label="transition band",
)
ax.set_title("Score distribution and thresholds")
ax.set_xlabel("susceptibility_score")
ax.set_ylabel("count")
ax.legend(frameon=False, fontsize=8)
ax.grid(True, axis="y", linestyle=":", alpha=0.35)

# Add one shared colorbar for the spatial panels.
cbar = fig.colorbar(sc, ax=axes[:2], fraction=0.035, pad=0.02)
cbar.set_label("susceptibility_score")

plt.show()
Automatic lower-tail zone, Intermediate band and hotspots, Score distribution and thresholds

How to read this output#

A useful reading order is:

  1. inspect the continuous score field,

  2. check which threshold rule was applied,

  3. compare the selected point counts,

  4. then compare the score ranges in the exported tables.

The key idea is that the builder is not estimating a new field. It is simply turning one existing continuous score into one reusable point table based on a threshold criterion.

Why this builder is useful in practice#

extract-zones is a strong support-layer builder when you need one threshold-defined subset rather than a full-city table.

Typical uses include:

  • extracting hotspot candidates,

  • isolating the lower-risk or lower-response tail,

  • creating intermediate bands for sensitivity checks,

  • or preparing compact point tables for later reports.

Command-line usage#

The examples below show the same workflow as direct terminal commands.

Explicit hotspot extraction#

geoprior-build extract-zones \
    zones_demo_west.csv zones_demo_east.csv \
    --x-col coord_x \
    --y-col coord_y \
    --z-col susceptibility_score \
    --threshold 8.43 \
    --condition above \
    --output zones_demo_hotspot.csv

The same command through the root dispatcher#

geoprior build extract-zones \
    zones_demo_west.csv zones_demo_east.csv \
    --x-col coord_x \
    --y-col coord_y \
    --z-col susceptibility_score \
    --threshold 8.43 \
    --condition above \
    --output zones_demo_hotspot.csv

Intermediate transition band#

geoprior-build extract-zones \
    zones_demo_west.csv zones_demo_east.csv \
    --x-col coord_x \
    --y-col coord_y \
    --z-col susceptibility_score \
    --threshold 6.45 7.28 \
    --condition between \
    --output zones_demo_transition.csv

Lower-tail automatic extraction#

The underlying helper also supports percentile-driven automatic thresholds. When your local wrapper exposes the same options, a typical command looks like:

geoprior-build extract-zones \
    zones_demo_west.csv zones_demo_east.csv \
    --x-col coord_x \
    --y-col coord_y \
    --z-col susceptibility_score \
    --threshold auto \
    --percentile 15 \
    --condition auto \
    --output zones_demo_low_auto.csv

Because the shared reader accepts one or many input tables, the same command family can also be used with CSV, TSV, Parquet, Excel, JSON, Feather, or Pickle inputs.

Total running time of the script: (0 minutes 0.526 seconds)

Gallery generated by Sphinx-Gallery