.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/figure_generation/plot_litho_parity.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_figure_generation_plot_litho_parity.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_figure_generation_plot_litho_parity.py:


Lithology parity: comparing the geological composition of the two cities
===========================================================================

This example teaches you how to read the GeoPrior lithology
parity figure.

Before comparing forecasts, uncertainty, or physics fields
between two cities, it is useful to ask a simpler question:

**Are the two cities compositionally similar in their lithology
mix, or are they built from meaningfully different geological
proportions?**

That is what the lithology-parity figure is for.

What the figure shows
---------------------
The real plotting backend builds a two-panel figure.

Left panel
~~~~~~~~~~
Normalized composition bars for Nansha and Zhongshan across the
selected lithology classes.

Right panel
~~~~~~~~~~~
Difference bars showing:

.. math::

   \Delta p = p_{Zhongshan} - p_{Nansha}

for each class.

The title line also reports a simple association test:

- :math:`\chi^2`
- degrees of freedom
- approximate p-value
- Cramér's :math:`V`

Why this matters
----------------
A city-to-city model comparison can be misleading if the two
cities differ strongly in their underlying material composition.

This figure helps the reader see:

- which classes dominate each city,
- whether the differences are broad or localized,
- whether the overall composition gap is small or large,
- and whether "other" categories should be grouped for a cleaner
  comparison.

This gallery page creates two compact synthetic city tables so
the example is fully executable during the documentation build.

.. GENERATED FROM PYTHON SOURCE LINES 61-69

Imports
-------
We use the real helper functions from the existing script:

- compute_proportions(...)
- draw_lithology_parity(...)

That keeps the gallery page aligned with the real figure logic.

.. GENERATED FROM PYTHON SOURCE LINES 69-86

.. code-block:: Python


    from __future__ import annotations

    import tempfile
    from pathlib import Path

    import matplotlib.image as mpimg
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd

    from geoprior.scripts.plot_litho_parity import (
        chisq_cramers_v,
        compute_proportions,
        draw_lithology_parity,
    )


.. GENERATED FROM PYTHON SOURCE LINES 87-95

Step 1 - Build two synthetic city datasets
------------------------------------------
The real script reads one CSV per city and then counts the
selected categorical column, which defaults to
``lithology_class``.

For a teaching page, we create two small synthetic datasets with
overlapping but not identical lithology compositions.

.. GENERATED FROM PYTHON SOURCE LINES 95-158

.. code-block:: Python


    rng = np.random.default_rng(8)

    classes = [
        "Clay",
        "Silt",
        "Fine sand",
        "Medium sand",
        "Coarse sand",
        "Gravel",
        "Weathered rock",
        "Bedrock",
        "Peat",
        "Fill",
    ]

    # Nansha: more soft sediments and fill.
    p_ns = np.array(
        [0.19, 0.16, 0.15, 0.10, 0.07, 0.05, 0.09, 0.06, 0.05, 0.08]
    )

    # Zhongshan: slightly more coarse and rocky material.
    p_zh = np.array(
        [0.12, 0.12, 0.13, 0.12, 0.10, 0.09, 0.12, 0.11, 0.03, 0.06]
    )

    n_ns = 2600
    n_zh = 2400

    ns = pd.DataFrame(
        {
            "year": rng.choice(
                [2021, 2022, 2023],
                size=n_ns,
                p=[0.25, 0.40, 0.35],
            ),
            "lithology_class": rng.choice(
                classes,
                size=n_ns,
                p=p_ns,
            ),
        }
    )

    zh = pd.DataFrame(
        {
            "year": rng.choice(
                [2021, 2022, 2023],
                size=n_zh,
                p=[0.22, 0.43, 0.35],
            ),
            "lithology_class": rng.choice(
                classes,
                size=n_zh,
                p=p_zh,
            ),
        }
    )

    print("Synthetic dataset sizes")
    print(f"  Nansha:    {len(ns)}")
    print(f"  Zhongshan: {len(zh)}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Synthetic dataset sizes
      Nansha:    2600
      Zhongshan: 2400


.. GENERATED FROM PYTHON SOURCE LINES 159-170

Step 2 - Select the comparison setup
------------------------------------
The real script supports:

- a target categorical column,
- optional year filtering,
- top_n class selection,
- optional grouping of the rest into "Others".

Here we keep all years together and ask for the 8 most common
classes, with the remainder grouped as ``Others``.

.. GENERATED FROM PYTHON SOURCE LINES 170-187

.. code-block:: Python


    col = "lithology_class"
    top_n = 8
    group_others = True

    dfp, core_classes, counts_mat = compute_proportions(
        ns,
        zh,
        col=col,
        top_n=top_n,
        group_others=group_others,
    )

    print("")
    print("Proportion table preview")
    print(dfp.head(10).to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Proportion table preview
             class      city  proportion
       Coarse sand    Nansha      0.0627
       Coarse sand Zhongshan      0.0992
            Gravel    Nansha      0.0523
            Gravel Zhongshan      0.1025
           Bedrock    Nansha      0.0600
           Bedrock Zhongshan      0.1050
    Weathered rock    Nansha      0.1004
    Weathered rock Zhongshan      0.1092
       Medium sand    Nansha      0.0954
       Medium sand Zhongshan      0.1313


.. GENERATED FROM PYTHON SOURCE LINES 188-193

Step 3 - Read the association statistics directly
-------------------------------------------------
The real figure title includes the results of the 2×K
composition test. We compute them here before plotting so the
reader sees what those summary numbers mean.

.. GENERATED FROM PYTHON SOURCE LINES 193-203

.. code-block:: Python


    chi2, pval, dof, cv = chisq_cramers_v(counts_mat)

    print("")
    print("Parity statistics")
    print(f"  chi2      = {chi2:.4f}")
    print(f"  dof       = {dof}")
    print(f"  p-value   = {pval:.4f}")
    print(f"  Cramer'sV = {cv:.4f}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Parity statistics
      chi2      = 181.9989
      dof       = 8
      p-value   = 0.0000
      Cramer'sV = 0.1908


.. GENERATED FROM PYTHON SOURCE LINES 204-216

Step 4 - Render the real figure
-------------------------------
The real plotting helper only needs:

- the proportion table,
- the 2×K count matrix,
- the column name,
- the output path,
- and whether the two panels should share the y axis.

It does not accept the generic plot text controls used by some
of the other scripts, so we keep this call minimal and exact.

.. GENERATED FROM PYTHON SOURCE LINES 216-230

.. code-block:: Python


    tmp_dir = Path(
        tempfile.mkdtemp(prefix="gp_sg_litho_par_")
    )
    out_base = tmp_dir / "lithology_parity_gallery"

    draw_lithology_parity(
        dfp,
        counts_mat,
        col=col,
        outpath=out_base,
        sharey=True,
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [OK] Wrote /tmp/gp_sg_litho_par_elrak10n/lithology_parity_gallery.png/.svg


.. GENERATED FROM PYTHON SOURCE LINES 231-235

Step 5 - Show the actual PNG produced by the backend
----------------------------------------------------
The real helper writes PNG and SVG. For the gallery page, we
display the PNG result directly.

.. GENERATED FROM PYTHON SOURCE LINES 235-242

.. code-block:: Python


    img = mpimg.imread(str(out_base) + ".png")

    fig, ax = plt.subplots(figsize=(8.2, 4.6))
    ax.imshow(img)
    ax.axis("off")


.. image-sg:: /auto_examples/figure_generation/images/sphx_glr_plot_litho_parity_001.png
   :alt: plot litho parity
   :srcset: /auto_examples/figure_generation/images/sphx_glr_plot_litho_parity_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (np.float64(-0.5), np.float64(4000.5), np.float64(2454.5), np.float64(-0.5))


.. GENERATED FROM PYTHON SOURCE LINES 243-251

Step 6 - Quantify the largest class differences
-----------------------------------------------
The right-hand panel shows:

  proportion(Zhongshan) - proportion(Nansha)

Let us compute those differences explicitly so the figure and
the numbers tell the same story.

.. GENERATED FROM PYTHON SOURCE LINES 251-286

.. code-block:: Python


    classes_order = list(dfp["class"].cat.categories)

    ns_props = []
    zh_props = []

    for cls in classes_order:
        m_ns = (dfp["class"] == cls) & (dfp["city"] == "Nansha")
        m_zh = (dfp["class"] == cls) & (dfp["city"] == "Zhongshan")
        ns_props.append(
            float(dfp.loc[m_ns, "proportion"].iloc[0])
            if m_ns.any()
            else 0.0
        )
        zh_props.append(
            float(dfp.loc[m_zh, "proportion"].iloc[0])
            if m_zh.any()
            else 0.0
        )

    diff_df = pd.DataFrame(
        {
            "class": classes_order,
            "delta_prop": np.asarray(zh_props) - np.asarray(ns_props),
        }
    ).sort_values("delta_prop", ascending=False)

    print("")
    print("Largest positive differences (Zhongshan - Nansha)")
    print(diff_df.head(5).to_string(index=False))

    print("")
    print("Largest negative differences (Zhongshan - Nansha)")
    print(diff_df.tail(5).to_string(index=False))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Largest positive differences (Zhongshan - Nansha)
             class  delta_prop
            Gravel      0.0502
           Bedrock      0.0450
       Coarse sand      0.0365
       Medium sand      0.0359
    Weathered rock      0.0088

    Largest negative differences (Zhongshan - Nansha)
             class  delta_prop
    Weathered rock      0.0088
         Fine sand     -0.0324
            Others     -0.0344
              Silt     -0.0521
              Clay     -0.0574


.. GENERATED FROM PYTHON SOURCE LINES 287-305

Step 7 - Learn how to read the left panel
-----------------------------------------
The left panel is the composition panel.

It shows the normalized proportions for the two cities, not raw
counts. That matters because the two datasets do not need to
have exactly the same number of rows.

A useful reading order is:

1. identify the largest bars in each city,
2. see which classes are dominant in both cities,
3. check whether the same top classes appear in roughly the
   same order.

If the two composition profiles are broadly similar, then later
differences in model behavior are less likely to be explained
only by a gross mismatch in lithology mix.

.. GENERATED FROM PYTHON SOURCE LINES 307-326

Step 8 - Learn how to read the right panel
------------------------------------------
The right panel is the parity-difference panel.

Positive bars mean:

  Zhongshan has a larger proportion than Nansha

Negative bars mean:

  Nansha has a larger proportion than Zhongshan

This panel is often easier to interpret than the left one when
the reader wants to know:

"Which lithology classes are really driving the city-to-city
difference?"

It turns a two-profile comparison into a direct signed contrast.

.. GENERATED FROM PYTHON SOURCE LINES 328-346

Step 9 - Learn how to read the title statistics
-----------------------------------------------
The figure title includes chi-square style association
information:

- chi2
- dof
- p
- Cramer's V

The most intuitive one here is often Cramer's V.

It is an effect-size style measure for the overall difference
in composition. A larger value means the two cities differ more
strongly in their categorical makeup.

So the title tells you not only *where* the bars differ, but
also whether the overall composition gap is weak or strong.

.. GENERATED FROM PYTHON SOURCE LINES 348-363

Step 10 - Practical takeaway
----------------------------
This figure is useful because it gives a fast geological
context page before deeper model comparisons.

It helps answer:

- are the two cities compositionally similar?
- which classes differ the most?
- is the overall difference mild or substantial?

In practice, this makes the later forecasting and physics
figures easier to interpret, because the reader has already
seen whether the two cities start from similar or different
lithological foundations.

.. GENERATED FROM PYTHON SOURCE LINES 365-419

Command-line version
--------------------
The same figure can be produced from the command line.

The real script supports:

- ``--src`` for the dataset directory,
- ``--ns-file`` and ``--zh-file`` for the two CSV files,
- city flags (which must resolve to both Nansha and Zhongshan),
- ``--col`` for the categorical column,
- ``--year`` for optional filtering,
- ``--sample-frac`` or ``--sample-n``,
- ``--top-n``,
- ``--group-others``,
- ``--sharey``,
- and ``--out``. 

Legacy dispatcher:

.. code-block:: bash

   python -m scripts plot-litho-parity \
     --src data/final_dataset \
     -ns -zh \
     --col lithology_class \
     --year all \
     --top-n 8 \
     --group-others true \
     --sharey true \
     --out figS1_lithology_parity

Year-specific comparison:

.. code-block:: bash

   python -m scripts plot-litho-parity \
     --src data/final_dataset \
     -ns -zh \
     --col lithology_class \
     --year 2022 \
     --sample-n 5000 \
     --out figS1_lithology_parity_2022

Modern CLI:

.. code-block:: bash

   geoprior plot litho-parity \
     --src data/final_dataset \
     -ns -zh \
     --col lithology_class \
     --out figS1_lithology_parity

The gallery page teaches the figure.
The command line reproduces it in a workflow.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.846 seconds)


.. _sphx_glr_download_auto_examples_figure_generation_plot_litho_parity.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_litho_parity.ipynb <plot_litho_parity.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_litho_parity.py <plot_litho_parity.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_litho_parity.zip <plot_litho_parity.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_