.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/tables_and_summaries/build_spatial_sampling.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_tables_and_summaries_build_spatial_sampling.py: Build a stratified spatial sample table ======================================= This example teaches you how to use GeoPrior's ``spatial-sampling`` build utility. Unlike figure-generation pages, this lesson starts from a tabular workflow. The goal is to build a smaller table that still preserves: - the broad spatial footprint, - the balance across years, - the balance across cities, - and the balance across geological classes. Why this matters ---------------- Large geospatial tables are often too heavy for quick prototyping, debugging, teaching, or compact downstream examples. A good spatial sample should be smaller, but it should not lose the main structure of what makes the dataset useful. That is exactly what ``spatial-sampling`` is for. .. GENERATED FROM PYTHON SOURCE LINES 27-33 Imports ------- We call the real production entrypoint from the project code. For the synthetic dataset itself, we reuse the robust spatial support helpers from ``geoprior.scripts.utils`` instead of generating ad hoc random coordinates by hand. .. GENERATED FROM PYTHON SOURCE LINES 33-53 .. code-block:: Python from __future__ import annotations import tempfile from pathlib import Path import matplotlib.pyplot as plt import numpy as np import pandas as pd from geoprior.cli.build_spatial_sampling import ( build_spatial_sampling_main, ) from geoprior.scripts.utils import ( SpatialSupportSpec, make_spatial_field, make_spatial_scale, make_spatial_support, ) .. GENERATED FROM PYTHON SOURCE LINES 54-69 Build two synthetic city supports --------------------------------- Each city support is created from ``SpatialSupportSpec``. The parameters are commented deliberately so readers can see how the spatial cloud is controlled. Important ideas: - ``center_x`` and ``center_y`` place the city support. - ``span_x`` and ``span_y`` control the city size. - ``nx`` and ``ny`` control the mesh density before masking. - ``jitter_x`` and ``jitter_y`` add small coordinate irregularity. - ``footprint`` chooses the synthetic city shape. - ``keep_frac`` optionally thins the support after masking. - ``seed`` keeps the support reproducible. .. GENERATED FROM PYTHON SOURCE LINES 69-107 .. code-block:: Python ns_spec = SpatialSupportSpec( city="Nansha", center_x=113.52, center_y=22.74, span_x=0.17, span_y=0.11, nx=62, ny=48, jitter_x=0.0014, jitter_y=0.0012, footprint="nansha_like", keep_frac=0.84, seed=7, ) zh_spec = SpatialSupportSpec( city="Zhongshan", center_x=113.38, center_y=22.53, span_x=0.19, span_y=0.13, nx=64, ny=50, jitter_x=0.0015, jitter_y=0.0012, footprint="zhongshan_like", keep_frac=0.82, seed=19, ) ns_support = make_spatial_support(ns_spec) zh_support = make_spatial_support(zh_spec) print("Synthetic support sizes") print(f" - Nansha : {ns_support.sample_idx.size:,} points") print(f" - Zhongshan: {zh_support.sample_idx.size:,} points") .. rst-class:: sphx-glr-script-out .. code-block:: none Synthetic support sizes - Nansha : 1,244 points - Zhongshan: 1,225 points .. GENERATED FROM PYTHON SOURCE LINES 108-120 Convert the spatial supports into a panel-like dataset ------------------------------------------------------ ``spatial-sampling`` works on ordinary tables, so we now turn the supports into a richer DataFrame with: - spatial coordinates, - a time column, - categorical stratification columns, - and one continuous target-like field. We keep the geometry synthetic, but the structure realistic enough to teach why stratified sampling matters. .. GENERATED FROM PYTHON SOURCE LINES 120-198 .. code-block:: Python rng = np.random.default_rng(42) years = [2021, 2022, 2023, 2024] frames: list[pd.DataFrame] = [] for support, amp0, drift_x, drift_y in [ (ns_support, 7.4, 0.11, 0.08), (zh_support, 8.1, 0.07, 0.10), ]: base_mean = make_spatial_field( support, amplitude=amp0, drift_x=drift_x, drift_y=drift_y, phase=0.20, hotspot_weight=0.92, secondary_weight=0.56, ridge_weight=0.18, wave_weight=0.14, local_weight=0.05, ) for year in years: step = year - years[0] scale = make_spatial_scale( support, base=0.35, x_weight=0.10, hotspot_weight=0.07, step_weight=0.03, step=step, ) year_boost = 0.75 * step mean = base_mean + year_boost noise = rng.normal(0.0, scale * 0.55) frame = support.to_frame().rename( columns={ "coord_x": "longitude", "coord_y": "latitude", } ) frame["year"] = int(year) # A categorical class that varies across the city footprint. frame["lithology_class"] = np.where( frame["y_norm"] > 0.62, "Clay", np.where(frame["x_norm"] > 0.57, "Fill", "Sand"), ) # A second categorical field can be useful later for other # builders, even though this lesson only stratifies on the # lithology class. frame["development_zone"] = np.where( frame["x_norm"] + frame["y_norm"] > 1.08, "Urban core", "Expansion belt", ) frame["rainfall_mm"] = ( 1320 + 85 * step + 45 * frame["y_norm"] + rng.normal(0.0, 14.0, len(frame)) ) frame["subsidence_mm"] = mean + noise frames.append(frame) full_df = pd.concat(frames, ignore_index=True) print("") print("Synthetic input table") print(full_df.head(8).to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Synthetic input table sample_idx longitude latitude x_norm y_norm city year lithology_class development_zone rainfall_mm subsidence_mm 0 113.4994 22.6596 0.4389 0.1443 Nansha 2021 Sand Expansion belt 1341.1237 0.0440 1 113.5322 22.6589 0.5339 0.1412 Nansha 2021 Sand Expansion belt 1327.3070 -0.4666 2 113.4671 22.6647 0.3456 0.1670 Nansha 2021 Sand Expansion belt 1321.7446 0.6510 3 113.4837 22.6639 0.3936 0.1635 Nansha 2021 Sand Expansion belt 1302.0500 0.4470 4 113.4869 22.6646 0.4029 0.1664 Nansha 2021 Sand Expansion belt 1325.0875 -0.2093 5 113.4958 22.6624 0.4286 0.1566 Nansha 2021 Sand Expansion belt 1305.2141 -0.2172 6 113.5035 22.6645 0.4509 0.1660 Nansha 2021 Sand Expansion belt 1341.0144 0.0566 7 113.5207 22.6627 0.5004 0.1581 Nansha 2021 Sand Expansion belt 1348.3502 -0.1689 .. GENERATED FROM PYTHON SOURCE LINES 199-203 Write one input file per city ----------------------------- The shared build-reader utilities accept one or many input files, so the lesson demonstrates the multi-file workflow directly. .. GENERATED FROM PYTHON SOURCE LINES 203-225 .. code-block:: Python tmp_dir = Path( tempfile.mkdtemp(prefix="gp_sg_spatial_sampling_") ) ns_csv = tmp_dir / "nansha_spatial_panel.csv" zh_csv = tmp_dir / "zhongshan_spatial_panel.csv" full_df.loc[full_df["city"] == "Nansha"].to_csv( ns_csv, index=False, ) full_df.loc[full_df["city"] == "Zhongshan"].to_csv( zh_csv, index=False, ) print("") print("Input files") print(" -", ns_csv.name) print(" -", zh_csv.name) .. rst-class:: sphx-glr-script-out .. code-block:: none Input files - nansha_spatial_panel.csv - zhongshan_spatial_panel.csv .. GENERATED FROM PYTHON SOURCE LINES 226-237 Run the real spatial-sampling builder ------------------------------------- We ask the command to: - read both city files, - build spatial bins on longitude and latitude, - preserve balance across city, year, and lithology class, - and write a compact CSV sample. We use ``relative`` mode here because it is especially helpful when some stratification groups are smaller than others. .. GENERATED FROM PYTHON SOURCE LINES 237-269 .. code-block:: Python out_csv = tmp_dir / "spatial_sampling_gallery.csv" build_spatial_sampling_main( [ str(ns_csv), str(zh_csv), "--sample-size", "0.18", "--stratify-by", "city", "year", "lithology_class", "--spatial-cols", "longitude", "latitude", "--spatial-bins", "8", "7", "--method", "relative", "--min-relative-ratio", "0.03", "--random-state", "42", "--output", str(out_csv), "--verbose", "1", ] ) .. rst-class:: sphx-glr-script-out .. code-block:: none Creating spat. bins: 2: 0%| | 0/2 [00:00` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: build_spatial_sampling.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: build_spatial_sampling.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_