.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/tables_and_summaries/build_assign_boreholes.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_tables_and_summaries_build_assign_boreholes.py: Assign boreholes to the nearest city cloud ========================================== This example teaches how to use GeoPrior's ``assign-boreholes`` build utility. Unlike the earlier builders in ``tables_and_summaries``, this command is not about sampling, clustering, or extracting one subset from a single city table. Instead, it solves a very practical multi-city integration task: **given one borehole or external-validation table, decide which city point cloud each borehole belongs to.** Why this matters ---------------- Nearest-city assignment is useful when you need to: - separate one mixed validation file into per-city subsets, - route boreholes to the right Stage-1 processed footprint, - compare full-city validation metrics city by city, - inspect which boreholes sit close to the boundary between cities, - or prepare later command-line workflows that expect one file per city. That is exactly what ``assign-boreholes`` is for. What this lesson teaches ------------------------ We will: 1. build two realistic synthetic city clouds, 2. save them into Stage-1-like folders, 3. create one synthetic borehole validation table, 4. run the real ``assign-boreholes`` command, 5. inspect the combined and per-city CSV outputs, 6. visualize the nearest-city assignment, 7. finish with direct command-line examples. .. GENERATED FROM PYTHON SOURCE LINES 42-47 Imports ------- We call the real production CLI entrypoint from the project code. For the synthetic city geometry, we reuse the shared spatial-support helpers instead of drawing arbitrary random points. .. GENERATED FROM PYTHON SOURCE LINES 47-62 .. code-block:: Python from __future__ import annotations import tempfile from pathlib import Path import matplotlib.pyplot as plt import numpy as np import pandas as pd from geoprior.cli.build_assign_boreholes import ( build_assign_boreholes_main, ) from geoprior.scripts import utils as script_utils .. GENERATED FROM PYTHON SOURCE LINES 63-90 Step 1 - Build two synthetic city supports ------------------------------------------ The command ultimately assigns boreholes against one or more city *point clouds*. To make the lesson realistic, we generate two compact projected supports that behave like two different Stage-1 cities. The parameter comments are intentionally explicit so users can see how the synthetic geography is controlled. Key parameters -------------- city: Human-readable city label stored in the support. center_x, center_y: Center of the projected city footprint. span_x, span_y: Half-width and half-height of the support extent in metres. nx, ny: Mesh density before masking. jitter_x, jitter_y: Small perturbations so the support is not a perfect grid. footprint: Synthetic urban shape retained after masking. keep_frac: Fraction of masked support points to keep. seed: Keeps the support reproducible. .. GENERATED FROM PYTHON SOURCE LINES 90-125 .. code-block:: Python ns_support = script_utils.make_spatial_support( script_utils.SpatialSupportSpec( city="Nansha", center_x=12_500.0, center_y=8_200.0, span_x=3_400.0, span_y=2_500.0, nx=30, ny=22, jitter_x=36.0, jitter_y=32.0, footprint="nansha_like", keep_frac=0.90, seed=31, ) ) zh_support = script_utils.make_spatial_support( script_utils.SpatialSupportSpec( city="Zhongshan", center_x=24_800.0, center_y=9_400.0, span_x=3_800.0, span_y=2_800.0, nx=30, ny=22, jitter_x=40.0, jitter_y=34.0, footprint="zhongshan_like", keep_frac=0.90, seed=47, ) ) .. GENERATED FROM PYTHON SOURCE LINES 126-139 Step 2 - Create Stage-1-like processed city CSVs ------------------------------------------------ ``assign-boreholes`` can resolve city clouds from several input forms. In this lesson, we use repeated ``--stage1-dir`` arguments because that reflects a very natural workflow: - Stage-1 has already produced processed city tables, - we point the command at those folders, - the command finds the ``*_proc.csv`` file under each folder, - and it infers the city name from the file or folder name. The city cloud needs projected coordinates. By default the command expects ``x_m`` and ``y_m`` inside the processed city CSVs. .. GENERATED FROM PYTHON SOURCE LINES 139-204 .. code-block:: Python def make_city_proc_frame( support: script_utils.SpatialSupport, *, city_name: str, year: int, ) -> pd.DataFrame: frame = support.to_frame().copy() frame = frame.rename( columns={ "coord_x": "x_m", "coord_y": "y_m", } ) frame["year"] = int(year) frame["city"] = city_name frame["subsidence_mm"] = ( 4.0 + 2.5 * frame["x_norm"] + 1.8 * frame["y_norm"] ) frame["groundwater_depth_m"] = ( 8.0 + 0.6 * frame["x_norm"] + 0.4 * frame["y_norm"] ) return frame[[ "sample_idx", "city", "year", "x_m", "y_m", "subsidence_mm", "groundwater_depth_m", ]] tmp_dir = Path( tempfile.mkdtemp(prefix="gp_sg_assign_boreholes_") ) ns_stage1 = tmp_dir / "nansha_demo_stage1" zh_stage1 = tmp_dir / "zhongshan_demo_stage1" ns_stage1.mkdir(parents=True, exist_ok=True) zh_stage1.mkdir(parents=True, exist_ok=True) ns_proc = ns_stage1 / "nansha_proc.csv" zh_proc = zh_stage1 / "zhongshan_proc.csv" make_city_proc_frame( ns_support, city_name="nansha", year=2025, ).to_csv(ns_proc, index=False) make_city_proc_frame( zh_support, city_name="zhongshan", year=2025, ).to_csv(zh_proc, index=False) print("Stage-1 processed city files") print(" -", ns_proc) print(" -", zh_proc) .. rst-class:: sphx-glr-script-out .. code-block:: none Stage-1 processed city files - /tmp/gp_sg_assign_boreholes_lql3nsm7/nansha_demo_stage1/nansha_proc.csv - /tmp/gp_sg_assign_boreholes_lql3nsm7/zhongshan_demo_stage1/zhongshan_proc.csv .. GENERATED FROM PYTHON SOURCE LINES 205-221 Step 3 - Build one synthetic borehole table ------------------------------------------- The external borehole table uses a slightly different convention from the processed city cloud: - city points use ``x_m`` and ``y_m`` - boreholes use ``x`` and ``y`` by default That difference is worth making visible because the CLI has separate arguments for each coordinate schema. We create three borehole groups: - boreholes near Nansha, - boreholes near Zhongshan, - two boundary boreholes between both cities. .. GENERATED FROM PYTHON SOURCE LINES 221-316 .. code-block:: Python rng = np.random.default_rng(2026) def make_boreholes_near_support( support: script_utils.SpatialSupport, *, n: int, prefix: str, start_idx: int, x_jitter: float, y_jitter: float, pumping_low: float, pumping_high: float, ) -> pd.DataFrame: base = support.to_frame().sample( n=n, random_state=int(start_idx), ) out = pd.DataFrame( { "well_id": [ f"{prefix}{i:03d}" for i in range(start_idx, start_idx + n) ], "x": base["coord_x"].to_numpy(float) + rng.normal(0.0, x_jitter, n), "y": base["coord_y"].to_numpy(float) + rng.normal(0.0, y_jitter, n), "screen_depth_m": rng.uniform(35.0, 85.0, n), "pumping_m3_day": rng.uniform( pumping_low, pumping_high, n, ), "cluster_hint": prefix.rstrip("_").lower(), } ) return out ns_bh = make_boreholes_near_support( ns_support, n=7, prefix="NS_", start_idx=1, x_jitter=110.0, y_jitter=95.0, pumping_low=220.0, pumping_high=430.0, ) zh_bh = make_boreholes_near_support( zh_support, n=7, prefix="ZH_", start_idx=101, x_jitter=120.0, y_jitter=105.0, pumping_low=180.0, pumping_high=390.0, ) # Boundary boreholes placed between the two city supports. mid_x = 0.5 * ( float(ns_support.coord_x.max()) + float(zh_support.coord_x.min()) ) mid_y = 0.5 * ( float(ns_support.coord_y.mean()) + float(zh_support.coord_y.mean()) ) boundary_bh = pd.DataFrame( { "well_id": ["BD_001", "BD_002"], "x": [mid_x - 240.0, mid_x + 220.0], "y": [mid_y - 130.0, mid_y + 160.0], "screen_depth_m": [62.0, 68.0], "pumping_m3_day": [260.0, 275.0], "cluster_hint": ["boundary", "boundary"], } ) boreholes = pd.concat( [ns_bh, zh_bh, boundary_bh], ignore_index=True, ) borehole_csv = tmp_dir / "boreholes_validation.csv" boreholes.to_csv(borehole_csv, index=False) print("") print("Borehole input") print(boreholes.to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Borehole input well_id x y screen_depth_m pumping_m3_day cluster_hint NS_001 14372.5689 7569.7904 60.7577 310.4915 ns NS_002 14252.2467 9522.0620 76.2948 359.2679 ns NS_003 10734.8601 7349.5597 57.4190 222.6965 ns NS_004 9696.4069 7885.9055 51.9406 314.0174 ns NS_005 14155.5003 9503.6353 48.8950 296.6880 ns NS_006 13750.0419 9017.3146 46.3167 261.0335 ns NS_007 11919.3457 7579.2496 61.2908 344.9218 ns ZH_101 21321.9623 9040.9550 46.3946 258.1503 zh ZH_102 23654.6433 7878.5219 59.6555 268.2406 zh ZH_103 24544.8808 10586.8949 64.0014 283.9142 zh ZH_104 23975.4820 7718.7054 44.4453 278.6939 zh ZH_105 24292.1819 9604.3266 71.5620 321.8842 zh ZH_106 26026.4843 9869.7395 62.4242 301.2073 zh ZH_107 22507.4152 8084.4834 66.0752 267.4168 zh BD_001 18215.5238 8534.2194 62.0000 260.0000 boundary BD_002 18675.5238 8824.2194 68.0000 275.0000 boundary .. GENERATED FROM PYTHON SOURCE LINES 317-325 Step 4 - Run the real ``assign-boreholes`` command -------------------------------------------------- We deliberately use the repeated ``--stage1-dir`` form here because it proves that the command can discover processed city CSVs from a Stage-1 layout without us passing explicit ``CITY=PATH`` mappings. We also set an explicit output directory and stem so the generated artifact family is easy to inspect. .. GENERATED FROM PYTHON SOURCE LINES 325-343 .. code-block:: Python out_dir = tmp_dir / "borehole_outputs" build_assign_boreholes_main( [ "--borehole-csv", str(borehole_csv), "--stage1-dir", str(ns_stage1), "--stage1-dir", str(zh_stage1), "--outdir", str(out_dir), "--output-stem", "borehole_demo", ] ) .. rst-class:: sphx-glr-script-out .. code-block:: none [OK] wrote: /tmp/gp_sg_assign_boreholes_lql3nsm7/borehole_outputs/borehole_demo_classified.csv [OK] wrote: /tmp/gp_sg_assign_boreholes_lql3nsm7/borehole_outputs/borehole_demo_nansha.csv (8 rows) [OK] wrote: /tmp/gp_sg_assign_boreholes_lql3nsm7/borehole_outputs/borehole_demo_zhongshan.csv (8 rows) well_id x y dist_to_nansha_m dist_to_zhongshan_m assigned_city NS_001 14372.5689 7569.7904 88.1023 7233.6257 nansha NS_002 14252.2467 9522.0620 36.6995 7238.4708 nansha NS_003 10734.8601 7349.5597 41.1068 10856.1826 nansha NS_004 9696.4069 7885.9055 168.0845 11822.8797 nansha NS_005 14155.5003 9503.6353 82.8432 7333.5721 nansha NS_006 13750.0419 9017.3146 32.6978 7719.6727 nansha NS_007 11919.3457 7579.2496 35.2621 9650.8179 nansha ZH_101 21321.9623 9040.9550 5926.2606 164.1728 zhongshan ZH_102 23654.6433 7878.5219 8214.8663 58.4016 zhongshan ZH_103 24544.8808 10586.8949 9292.5633 126.1985 zhongshan ZH_104 23975.4820 7718.7054 8540.2710 170.6737 zhongshan ZH_105 24292.1819 9604.3266 8926.2938 135.9719 zhongshan ZH_106 26026.4843 9869.7395 10675.2771 138.2030 zhongshan ZH_107 22507.4152 8084.4834 7066.0364 125.0597 zhongshan BD_001 18215.5238 8534.2194 2784.0184 3282.9678 nansha BD_002 18675.5238 8824.2194 3270.9630 2797.7926 zhongshan .. GENERATED FROM PYTHON SOURCE LINES 344-352 Step 5 - Read the generated outputs back in ------------------------------------------- The command writes: - one classified CSV with all boreholes, - one split CSV per assigned city. This is the key output family the lesson needs to make visible. .. GENERATED FROM PYTHON SOURCE LINES 352-371 .. code-block:: Python classified_csv = out_dir / "borehole_demo_classified.csv" ns_out = out_dir / "borehole_demo_nansha.csv" zh_out = out_dir / "borehole_demo_zhongshan.csv" classified = pd.read_csv(classified_csv) ns_part = pd.read_csv(ns_out) zh_part = pd.read_csv(zh_out) print("") print("Written files") print(" -", classified_csv.name) print(" -", ns_out.name) print(" -", zh_out.name) print("") print("Classified boreholes") print(classified.to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Written files - borehole_demo_classified.csv - borehole_demo_nansha.csv - borehole_demo_zhongshan.csv Classified boreholes well_id x y screen_depth_m pumping_m3_day cluster_hint dist_to_nansha_m dist_to_zhongshan_m assigned_city NS_001 14372.5689 7569.7904 60.7577 310.4915 ns 88.1023 7233.6257 nansha NS_002 14252.2467 9522.0620 76.2948 359.2679 ns 36.6995 7238.4708 nansha NS_003 10734.8601 7349.5597 57.4190 222.6965 ns 41.1068 10856.1826 nansha NS_004 9696.4069 7885.9055 51.9406 314.0174 ns 168.0845 11822.8797 nansha NS_005 14155.5003 9503.6353 48.8950 296.6880 ns 82.8432 7333.5721 nansha NS_006 13750.0419 9017.3146 46.3167 261.0335 ns 32.6978 7719.6727 nansha NS_007 11919.3457 7579.2496 61.2908 344.9218 ns 35.2621 9650.8179 nansha ZH_101 21321.9623 9040.9550 46.3946 258.1503 zh 5926.2606 164.1728 zhongshan ZH_102 23654.6433 7878.5219 59.6555 268.2406 zh 8214.8663 58.4016 zhongshan ZH_103 24544.8808 10586.8949 64.0014 283.9142 zh 9292.5633 126.1985 zhongshan ZH_104 23975.4820 7718.7054 44.4453 278.6939 zh 8540.2710 170.6737 zhongshan ZH_105 24292.1819 9604.3266 71.5620 321.8842 zh 8926.2938 135.9719 zhongshan ZH_106 26026.4843 9869.7395 62.4242 301.2073 zh 10675.2771 138.2030 zhongshan ZH_107 22507.4152 8084.4834 66.0752 267.4168 zh 7066.0364 125.0597 zhongshan BD_001 18215.5238 8534.2194 62.0000 260.0000 boundary 2784.0184 3282.9678 nansha BD_002 18675.5238 8824.2194 68.0000 275.0000 boundary 3270.9630 2797.7926 zhongshan .. GENERATED FROM PYTHON SOURCE LINES 372-383 Step 6 - Build a compact assignment summary ------------------------------------------- The combined CSV stores one distance column per city cloud, for example ``dist_to_nansha_m`` and ``dist_to_zhongshan_m``. That means we can derive two useful diagnostics: - nearest distance, - assignment margin = second-best distance minus best distance. Large margins indicate clearer separation between candidate cities. .. GENERATED FROM PYTHON SOURCE LINES 383-424 .. code-block:: Python dist_cols = [ c for c in classified.columns if c.startswith("dist_to_") ] dist_mat = classified[dist_cols].to_numpy(dtype=float) order = np.argsort(dist_mat, axis=1) nearest = np.take_along_axis( dist_mat, order[:, [0]], axis=1, ).ravel() runner_up = np.take_along_axis( dist_mat, order[:, [1]], axis=1, ).ravel() classified["nearest_distance_m"] = nearest classified["runner_up_distance_m"] = runner_up classified["margin_m"] = ( classified["runner_up_distance_m"] - classified["nearest_distance_m"] ) summary = ( classified.groupby("assigned_city") .agg( n_boreholes=("well_id", "size"), mean_nearest_distance_m=( "nearest_distance_m", "mean", ), mean_margin_m=("margin_m", "mean"), ) .reset_index() ) print("") print("Assignment summary") print(summary.to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Assignment summary assigned_city n_boreholes mean_nearest_distance_m mean_margin_m nansha 8 408.6018 7733.6718 zhongshan 8 464.5592 7274.5072 .. GENERATED FROM PYTHON SOURCE LINES 425-437 Step 7 - Build one compact visual preview ----------------------------------------- Left: full city clouds with boreholes colored by assigned city. Middle: assigned counts by city. Right: for each borehole, compare the nearest-city distance with the runner-up distance. The horizontal segment length is the assignment margin. .. GENERATED FROM PYTHON SOURCE LINES 437-563 .. code-block:: Python ns_cloud = pd.read_csv(ns_proc) zh_cloud = pd.read_csv(zh_proc) plot_df = classified.sort_values( "margin_m", ascending=True, ).reset_index(drop=True) pretty_city = { "nansha": "Nansha", "zhongshan": "Zhongshan", "tie": "Tie", } city_color = { "nansha": "tab:blue", "zhongshan": "tab:red", "tie": "tab:orange", } fig, axes = plt.subplots( 1, 3, figsize=(14.6, 4.9), constrained_layout=True, ) # Full map with assigned boreholes. ax = axes[0] ax.scatter( ns_cloud["x_m"], ns_cloud["y_m"], s=10, alpha=0.26, color="tab:blue", label="Nansha cloud", ) ax.scatter( zh_cloud["x_m"], zh_cloud["y_m"], s=10, alpha=0.26, color="tab:red", label="Zhongshan cloud", ) for city_key in ["nansha", "zhongshan", "tie"]: sub = classified.loc[ classified["assigned_city"] == city_key ] if sub.empty: continue ax.scatter( sub["x"], sub["y"], s=84, marker="^", edgecolor="black", linewidth=0.5, color=city_color[city_key], label=f"assigned to {pretty_city[city_key]}", ) for _, row in classified.iterrows(): ax.text( float(row["x"]) + 80.0, float(row["y"]) + 55.0, str(row["well_id"]), fontsize=7, ) ax.set_title("Nearest-city borehole assignment") ax.set_xlabel("projected x (m)") ax.set_ylabel("projected y (m)") ax.grid(True, linestyle=":", alpha=0.35) ax.set_aspect("equal", adjustable="box") ax.legend(frameon=False, fontsize=8, loc="upper left") # Assigned counts. ax = axes[1] bar_x = np.arange(len(summary)) counts = summary["n_boreholes"].to_numpy(int) labels = [pretty_city.get(v, v) for v in summary["assigned_city"]] bar_colors = [city_color.get(v, "0.5") for v in summary["assigned_city"]] ax.bar(bar_x, counts, color=bar_colors) ax.set_xticks(bar_x) ax.set_xticklabels(labels) ax.set_ylabel("count") ax.set_title("Boreholes per assigned city") ax.grid(True, axis="y", linestyle=":", alpha=0.35) for i, n in enumerate(counts): ax.text(i, n + 0.08, str(n), ha="center", va="bottom") # Nearest vs runner-up distance. ax = axes[2] y_pos = np.arange(len(plot_df)) ax.hlines( y=y_pos, xmin=plot_df["nearest_distance_m"], xmax=plot_df["runner_up_distance_m"], color="0.45", linewidth=1.6, ) ax.scatter( plot_df["nearest_distance_m"], y_pos, s=40, color="tab:green", label="nearest city", ) ax.scatter( plot_df["runner_up_distance_m"], y_pos, s=40, color="0.3", label="runner-up city", ) ax.set_yticks(y_pos) ax.set_yticklabels(plot_df["well_id"].tolist()) ax.set_xlabel("distance (m)") ax.set_title("Assignment separation per borehole") ax.grid(True, axis="x", linestyle=":", alpha=0.35) ax.legend(frameon=False, fontsize=8, loc="lower right") plt.show() .. image-sg:: /auto_examples/tables_and_summaries/images/sphx_glr_build_assign_boreholes_001.png :alt: Nearest-city borehole assignment, Boreholes per assigned city, Assignment separation per borehole :srcset: /auto_examples/tables_and_summaries/images/sphx_glr_build_assign_boreholes_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 564-576 How to read this output ----------------------- A practical reading order is: 1. inspect the full classified CSV, 2. check the per-city split files, 3. compare the nearest distances, 4. look at the assignment margin. In this lesson, most boreholes sit clearly closer to one city cloud, while the two boundary boreholes show smaller margins because both candidate cities are more competitive there. .. GENERATED FROM PYTHON SOURCE LINES 578-596 Why this builder is useful in practice -------------------------------------- ``assign-boreholes`` is a support-layer command that becomes very useful once a project starts mixing: - one external validation table, - several city-specific Stage-1 outputs, - and later city-specific evaluation workflows. It keeps the logic simple: - load one borehole table, - resolve one or more city clouds, - compute nearest-city distances, - export one combined CSV and optional split CSVs. That is often the cleanest way to prepare borehole validation files before later metrics or plotting commands. .. GENERATED FROM PYTHON SOURCE LINES 598-655 Command-line usage ------------------ The command is available through both public CLI styles: 1. the family-specific entrypoint, 2. the root ``geoprior`` dispatcher. The exact pattern used in this lesson is: .. code-block:: bash geoprior-build assign-boreholes \ --borehole-csv boreholes_validation.csv \ --stage1-dir results/nansha_demo_stage1 \ --stage1-dir results/zhongshan_demo_stage1 \ --outdir results/borehole_assignment \ --output-stem borehole_demo or equivalently: .. code-block:: bash geoprior build assign-boreholes \ --borehole-csv boreholes_validation.csv \ --stage1-dir results/nansha_demo_stage1 \ --stage1-dir results/zhongshan_demo_stage1 \ --outdir results/borehole_assignment \ --output-stem borehole_demo You can also provide city clouds explicitly: .. code-block:: bash geoprior-build assign-boreholes \ --borehole-csv boreholes_validation.csv \ --city-csv nansha=results/nansha_demo_stage1/nansha_proc.csv \ --city-csv zhongshan=results/zhongshan_demo_stage1/zhongshan_proc.csv Or resolve them from a results layout: .. code-block:: bash geoprior-build assign-boreholes \ --borehole-csv boreholes_validation.csv \ --results-dir results \ --cities nansha zhongshan Useful optional arguments include: - ``--borehole-x-col`` and ``--borehole-y-col`` when the borehole table uses different coordinate names, - ``--city-x-col`` and ``--city-y-col`` when the processed city CSV uses different coordinate names, - ``--classified-out`` for one explicit combined CSV path, - ``--no-split-files`` when you only want the combined output, - ``--tie-label`` and ``--tie-tol`` when near-equal distances should be treated as ties. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.505 seconds) .. _sphx_glr_download_auto_examples_tables_and_summaries_build_assign_boreholes.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: build_assign_boreholes.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: build_assign_boreholes.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: build_assign_boreholes.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_