.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/figure_generation/plot_litho_parity.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_figure_generation_plot_litho_parity.py: Lithology parity: comparing the geological composition of the two cities =========================================================================== This example teaches you how to read the GeoPrior lithology parity figure. Before comparing forecasts, uncertainty, or physics fields between two cities, it is useful to ask a simpler question: **Are the two cities compositionally similar in their lithology mix, or are they built from meaningfully different geological proportions?** That is what the lithology-parity figure is for. What the figure shows --------------------- The real plotting backend builds a two-panel figure. Left panel ~~~~~~~~~~ Normalized composition bars for Nansha and Zhongshan across the selected lithology classes. Right panel ~~~~~~~~~~~ Difference bars showing: .. math:: \Delta p = p_{Zhongshan} - p_{Nansha} for each class. The title line also reports a simple association test: - :math:`\chi^2` - degrees of freedom - approximate p-value - Cramér's :math:`V` Why this matters ---------------- A city-to-city model comparison can be misleading if the two cities differ strongly in their underlying material composition. This figure helps the reader see: - which classes dominate each city, - whether the differences are broad or localized, - whether the overall composition gap is small or large, - and whether "other" categories should be grouped for a cleaner comparison. This gallery page creates two compact synthetic city tables so the example is fully executable during the documentation build. .. GENERATED FROM PYTHON SOURCE LINES 61-69 Imports ------- We use the real helper functions from the existing script: - compute_proportions(...) - draw_lithology_parity(...) That keeps the gallery page aligned with the real figure logic. .. GENERATED FROM PYTHON SOURCE LINES 69-86 .. code-block:: Python from __future__ import annotations import tempfile from pathlib import Path import matplotlib.image as mpimg import matplotlib.pyplot as plt import numpy as np import pandas as pd from geoprior.scripts.plot_litho_parity import ( chisq_cramers_v, compute_proportions, draw_lithology_parity, ) .. GENERATED FROM PYTHON SOURCE LINES 87-95 Step 1 - Build two synthetic city datasets ------------------------------------------ The real script reads one CSV per city and then counts the selected categorical column, which defaults to ``lithology_class``. For a teaching page, we create two small synthetic datasets with overlapping but not identical lithology compositions. .. GENERATED FROM PYTHON SOURCE LINES 95-158 .. code-block:: Python rng = np.random.default_rng(8) classes = [ "Clay", "Silt", "Fine sand", "Medium sand", "Coarse sand", "Gravel", "Weathered rock", "Bedrock", "Peat", "Fill", ] # Nansha: more soft sediments and fill. p_ns = np.array( [0.19, 0.16, 0.15, 0.10, 0.07, 0.05, 0.09, 0.06, 0.05, 0.08] ) # Zhongshan: slightly more coarse and rocky material. p_zh = np.array( [0.12, 0.12, 0.13, 0.12, 0.10, 0.09, 0.12, 0.11, 0.03, 0.06] ) n_ns = 2600 n_zh = 2400 ns = pd.DataFrame( { "year": rng.choice( [2021, 2022, 2023], size=n_ns, p=[0.25, 0.40, 0.35], ), "lithology_class": rng.choice( classes, size=n_ns, p=p_ns, ), } ) zh = pd.DataFrame( { "year": rng.choice( [2021, 2022, 2023], size=n_zh, p=[0.22, 0.43, 0.35], ), "lithology_class": rng.choice( classes, size=n_zh, p=p_zh, ), } ) print("Synthetic dataset sizes") print(f" Nansha: {len(ns)}") print(f" Zhongshan: {len(zh)}") .. rst-class:: sphx-glr-script-out .. code-block:: none Synthetic dataset sizes Nansha: 2600 Zhongshan: 2400 .. GENERATED FROM PYTHON SOURCE LINES 159-170 Step 2 - Select the comparison setup ------------------------------------ The real script supports: - a target categorical column, - optional year filtering, - top_n class selection, - optional grouping of the rest into "Others". Here we keep all years together and ask for the 8 most common classes, with the remainder grouped as ``Others``. .. GENERATED FROM PYTHON SOURCE LINES 170-187 .. code-block:: Python col = "lithology_class" top_n = 8 group_others = True dfp, core_classes, counts_mat = compute_proportions( ns, zh, col=col, top_n=top_n, group_others=group_others, ) print("") print("Proportion table preview") print(dfp.head(10).to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Proportion table preview class city proportion Coarse sand Nansha 0.0627 Coarse sand Zhongshan 0.0992 Gravel Nansha 0.0523 Gravel Zhongshan 0.1025 Bedrock Nansha 0.0600 Bedrock Zhongshan 0.1050 Weathered rock Nansha 0.1004 Weathered rock Zhongshan 0.1092 Medium sand Nansha 0.0954 Medium sand Zhongshan 0.1313 .. GENERATED FROM PYTHON SOURCE LINES 188-193 Step 3 - Read the association statistics directly ------------------------------------------------- The real figure title includes the results of the 2×K composition test. We compute them here before plotting so the reader sees what those summary numbers mean. .. GENERATED FROM PYTHON SOURCE LINES 193-203 .. code-block:: Python chi2, pval, dof, cv = chisq_cramers_v(counts_mat) print("") print("Parity statistics") print(f" chi2 = {chi2:.4f}") print(f" dof = {dof}") print(f" p-value = {pval:.4f}") print(f" Cramer'sV = {cv:.4f}") .. rst-class:: sphx-glr-script-out .. code-block:: none Parity statistics chi2 = 181.9989 dof = 8 p-value = 0.0000 Cramer'sV = 0.1908 .. GENERATED FROM PYTHON SOURCE LINES 204-216 Step 4 - Render the real figure ------------------------------- The real plotting helper only needs: - the proportion table, - the 2×K count matrix, - the column name, - the output path, - and whether the two panels should share the y axis. It does not accept the generic plot text controls used by some of the other scripts, so we keep this call minimal and exact. .. GENERATED FROM PYTHON SOURCE LINES 216-230 .. code-block:: Python tmp_dir = Path( tempfile.mkdtemp(prefix="gp_sg_litho_par_") ) out_base = tmp_dir / "lithology_parity_gallery" draw_lithology_parity( dfp, counts_mat, col=col, outpath=out_base, sharey=True, ) .. rst-class:: sphx-glr-script-out .. code-block:: none [OK] Wrote /tmp/gp_sg_litho_par_elrak10n/lithology_parity_gallery.png/.svg .. GENERATED FROM PYTHON SOURCE LINES 231-235 Step 5 - Show the actual PNG produced by the backend ---------------------------------------------------- The real helper writes PNG and SVG. For the gallery page, we display the PNG result directly. .. GENERATED FROM PYTHON SOURCE LINES 235-242 .. code-block:: Python img = mpimg.imread(str(out_base) + ".png") fig, ax = plt.subplots(figsize=(8.2, 4.6)) ax.imshow(img) ax.axis("off") .. image-sg:: /auto_examples/figure_generation/images/sphx_glr_plot_litho_parity_001.png :alt: plot litho parity :srcset: /auto_examples/figure_generation/images/sphx_glr_plot_litho_parity_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none (np.float64(-0.5), np.float64(4000.5), np.float64(2454.5), np.float64(-0.5)) .. GENERATED FROM PYTHON SOURCE LINES 243-251 Step 6 - Quantify the largest class differences ----------------------------------------------- The right-hand panel shows: proportion(Zhongshan) - proportion(Nansha) Let us compute those differences explicitly so the figure and the numbers tell the same story. .. GENERATED FROM PYTHON SOURCE LINES 251-286 .. code-block:: Python classes_order = list(dfp["class"].cat.categories) ns_props = [] zh_props = [] for cls in classes_order: m_ns = (dfp["class"] == cls) & (dfp["city"] == "Nansha") m_zh = (dfp["class"] == cls) & (dfp["city"] == "Zhongshan") ns_props.append( float(dfp.loc[m_ns, "proportion"].iloc[0]) if m_ns.any() else 0.0 ) zh_props.append( float(dfp.loc[m_zh, "proportion"].iloc[0]) if m_zh.any() else 0.0 ) diff_df = pd.DataFrame( { "class": classes_order, "delta_prop": np.asarray(zh_props) - np.asarray(ns_props), } ).sort_values("delta_prop", ascending=False) print("") print("Largest positive differences (Zhongshan - Nansha)") print(diff_df.head(5).to_string(index=False)) print("") print("Largest negative differences (Zhongshan - Nansha)") print(diff_df.tail(5).to_string(index=False)) .. rst-class:: sphx-glr-script-out .. code-block:: none Largest positive differences (Zhongshan - Nansha) class delta_prop Gravel 0.0502 Bedrock 0.0450 Coarse sand 0.0365 Medium sand 0.0359 Weathered rock 0.0088 Largest negative differences (Zhongshan - Nansha) class delta_prop Weathered rock 0.0088 Fine sand -0.0324 Others -0.0344 Silt -0.0521 Clay -0.0574 .. GENERATED FROM PYTHON SOURCE LINES 287-305 Step 7 - Learn how to read the left panel ----------------------------------------- The left panel is the composition panel. It shows the normalized proportions for the two cities, not raw counts. That matters because the two datasets do not need to have exactly the same number of rows. A useful reading order is: 1. identify the largest bars in each city, 2. see which classes are dominant in both cities, 3. check whether the same top classes appear in roughly the same order. If the two composition profiles are broadly similar, then later differences in model behavior are less likely to be explained only by a gross mismatch in lithology mix. .. GENERATED FROM PYTHON SOURCE LINES 307-326 Step 8 - Learn how to read the right panel ------------------------------------------ The right panel is the parity-difference panel. Positive bars mean: Zhongshan has a larger proportion than Nansha Negative bars mean: Nansha has a larger proportion than Zhongshan This panel is often easier to interpret than the left one when the reader wants to know: "Which lithology classes are really driving the city-to-city difference?" It turns a two-profile comparison into a direct signed contrast. .. GENERATED FROM PYTHON SOURCE LINES 328-346 Step 9 - Learn how to read the title statistics ----------------------------------------------- The figure title includes chi-square style association information: - chi2 - dof - p - Cramer's V The most intuitive one here is often Cramer's V. It is an effect-size style measure for the overall difference in composition. A larger value means the two cities differ more strongly in their categorical makeup. So the title tells you not only *where* the bars differ, but also whether the overall composition gap is weak or strong. .. GENERATED FROM PYTHON SOURCE LINES 348-363 Step 10 - Practical takeaway ---------------------------- This figure is useful because it gives a fast geological context page before deeper model comparisons. It helps answer: - are the two cities compositionally similar? - which classes differ the most? - is the overall difference mild or substantial? In practice, this makes the later forecasting and physics figures easier to interpret, because the reader has already seen whether the two cities start from similar or different lithological foundations. .. GENERATED FROM PYTHON SOURCE LINES 365-419 Command-line version -------------------- The same figure can be produced from the command line. The real script supports: - ``--src`` for the dataset directory, - ``--ns-file`` and ``--zh-file`` for the two CSV files, - city flags (which must resolve to both Nansha and Zhongshan), - ``--col`` for the categorical column, - ``--year`` for optional filtering, - ``--sample-frac`` or ``--sample-n``, - ``--top-n``, - ``--group-others``, - ``--sharey``, - and ``--out``. Legacy dispatcher: .. code-block:: bash python -m scripts plot-litho-parity \ --src data/final_dataset \ -ns -zh \ --col lithology_class \ --year all \ --top-n 8 \ --group-others true \ --sharey true \ --out figS1_lithology_parity Year-specific comparison: .. code-block:: bash python -m scripts plot-litho-parity \ --src data/final_dataset \ -ns -zh \ --col lithology_class \ --year 2022 \ --sample-n 5000 \ --out figS1_lithology_parity_2022 Modern CLI: .. code-block:: bash geoprior plot litho-parity \ --src data/final_dataset \ -ns -zh \ --col lithology_class \ --out figS1_lithology_parity The gallery page teaches the figure. The command line reproduces it in a workflow. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.846 seconds) .. _sphx_glr_download_auto_examples_figure_generation_plot_litho_parity.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_litho_parity.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_litho_parity.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_litho_parity.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_