geoprior.utils.generic_utils#
Provides common helper functions and for validation, comparison, and other generic operations
Functions
|
Apply either a prefix or suffix (with optional version, date, and custom label) to the string form of value. |
|
Check if all evaluable input values are within specified numeric bounds. |
|
Convert model I/O structures (dict/list/tuple/tensor-like) into a tuple. |
|
Casts a hyperparameter value in the params dictionary to a boolean. |
|
Casts a list of boolean hyperparameters to ensure they are Python booleans. |
|
Validate a grouping column for categorical-style use and optionally bin it. |
|
Resolve the canonical 'results' directory with robust fallbacks. |
|
Detect the datetime format of a pandas Series containing datetime values. |
|
Validate that every requested column is present in a DataFrame. |
|
Ensure that a directory exists at the given path, creating it if needed. |
|
Prevents the user from overriding existing parameters |
|
Identify potential ID column(s) in a pandas DataFrame using multiple heuristic strategies. |
|
Determines the actual target column name in the given DataFrame. |
|
Read an environment variable and strip whitespace robustly. |
|
Smart helper to check or normalize empty/None values. |
|
Insert an affix between the base name and extension of a filename. |
|
Maps a string choice for scales to an actual list of scale values or None if no scales are provided. |
|
|
|
Normalize a time column into a datetime column and an integer year. |
|
Print a boxed message with customizable styling. |
|
Pretty-print configuration or hyperparameters as a key/value table. |
|
Renames keys in the data dictionary based on the provided param_to_rename dictionary. |
|
Reorder columns in a DataFrame by moving specified columns to a chosen position. |
|
Save all currently open Matplotlib figures to disk in specified formats. |
|
Save the given matplotlib figure to disk in one or more formats. |
|
Resolve a user-supplied mode alias to a canonical value. |
|
Split a DataFrame into train/test based on a time cutoff, robust to different time formats. |
|
Converts the feature contributions either to a direct percentage, |
|
Check if two lists contain identical elements according to the specified mode. |
|
Log or naive messages with optional indentation and bracketed tags. |
Classes
A utility class for checking and ensuring the existence of files and directories on the filesystem. |
- class geoprior.utils.generic_utils.ExistenceChecker[source]#
Bases:
objectA utility class for checking and ensuring the existence of files and directories on the filesystem.
This class provides static methods to verify whether a given path exists and to create directories or files if necessary. It raises informative exceptions when paths are invalid or cannot be created.
- ensure_file(path, create_parent_dirs=False)[source]#
Ensure a file exists at the specified path, optionally creating parent directories.
Examples
>>> from geoprior.utils.generic_utils import ExistenceChecker >>> # Ensure a directory exists >>> dir_path = ExistenceChecker.ensure_directory("data/output") >>> isinstance(dir_path, Path) True >>> # Ensure a file exists, creating parent directories >>> file_path = ExistenceChecker.ensure_file( ... "data/output/results.txt", create_parent_dirs=True ... ) >>> file_path.exists() True
Notes
Uses pathlib.Path.mkdir(…, parents=True, exist_ok=True) under the hood to create directories.
Creating a file will produce an empty file if it does not exist.
Raises TypeError if the given path is not a str or pathlib.Path, and appropriate OSError/FileExistsError for filesystem errors.
See also
pathlib.Path.mkdirMethod to create directories.
pathlib.Path.touchMethod to create an empty file.
os.makedirsLegacy function for creating directories recursively.
os.path.existsCheck if a path exists.
- static ensure_directory(path)[source]#
Ensure that a directory exists at the given path, creating it if needed.
- Parameters:
path (
strorpathlib.Path) – The filesystem path for which to ensure directory existence. Can be either a string or a pathlib.Path object.- Returns:
A Path object pointing to the existing (or newly created) directory.
- Return type:
- Raises:
TypeError – If path is not a string or pathlib.Path.
FileExistsError – If a file (not a directory) already exists at path.
OSError – If the directory cannot be created for any other reason (e.g., insufficient permissions).
- static ensure_file(path, create_parent_dirs=False)[source]#
Ensure that a file exists at the given path, creating it if needed.
If create_parent_dirs is True, any missing parent directories will be created automatically.
- Parameters:
path (
strorpathlib.Path) – The filesystem path for the file that must exist.create_parent_dirs (
bool, optional) – If True, create any missing parent directories. Default is False.
- Returns:
A Path object pointing to the existing (or newly created) file.
- Return type:
- Raises:
TypeError – If path is not a string or pathlib.Path.
FileExistsError – If a directory (not a file) already exists at path.
OSError – If the file or parent directories cannot be created due to filesystem errors.
- geoprior.utils.generic_utils.ensure_directory_exists(path)[source]#
Ensure that a directory exists at the given path, creating it if needed.
This function checks whether the provided path exists and is a directory. If the path does not exist, it attempts to create the directory (including any necessary parent directories). If a file with the same name already exists, or if creation fails, an exception is raised.
- Parameters:
path (
strorpathlib.Path) – The filesystem path for which to ensure directory existence. Can be either a string or a pathlib.Path object.- Returns:
A Path object pointing to the existing (or newly created) directory.
- Return type:
- Raises:
TypeError – If path is not a string or pathlib.Path.
FileExistsError – If a file (not a directory) already exists at path.
OSError – If the directory cannot be created for any other reason (e.g., insufficient permissions).
Examples
>>> from pathlib import Path >>> from geoprior.utils.generic_utils import ensure_directory_exists >>> output_dir = ensure_directory_exists("data/output") >>> isinstance(output_dir, Path) True >>> # The directory "data/output" now exists on disk.
Notes
Uses pathlib.Path.mkdir(…, parents=True, exist_ok=True) under the hood for cross-platform compatibility.
If path already exists as a directory, this function returns immediately without modifying it.
See also
pathlib.Path.mkdirMethod to create a directory.
os.makedirsLegacy function for creating directories recursively.
- geoprior.utils.generic_utils.verify_identical_items(list1, list2, mode='unique', ops='check_only', error='raise', objname=None)[source]#
Check if two lists contain identical elements according to the specified mode.
In “unique” mode, the function compares the unique elements in each list. In “ascending” mode, it compares elements pairwise in order.
- Parameters:
list1 (
list) – The first list of items.list2` (
list) – The second list of items.mode (
{'unique', 'ascending'}, default"unique") –- The mode of comparison:
”unique”: Compare unique elements (order-insensitive).
”ascending”: Compare each element pairwise in order.
ops (
{'check_only', 'validate'}, default"check_only") – If “check_only”, returns True/False indicating a match. If “validate”, returns the validated list.error (
{'raise', 'warn', 'ignore'}, default"raise") – Specifies how to handle mismatches.objname (
str, optional) – A name to include in error messages.
- Returns:
Depending on ops, returns True/False or the validated list.
- Return type:
Examples
>>> from geoprior.utils.generic_utils import verify_identical_items >>> list1 = [0.1, 0.5, 0.9] >>> list2 = [0.1, 0.5, 0.9] >>> verify_identical_items(list1, list2, mode="unique", ops="validate") [0.1, 0.5, 0.9] >>> verify_identical_items(list1, list2, mode="ascending", ops="check_only") True
Notes
In “ascending” mode, both lists must have the same length, and the function compares each corresponding pair of elements. In “unique” mode, the function uses the set of unique values for comparison. If the lists contain mixed types, the function attempts to compare their string representations.
- geoprior.utils.generic_utils.vlog(message, verbose=None, level=3, depth='auto', mode=None, vp=True, logger=None, **kws)[source]#
Log or naive messages with optional indentation and bracketed tags.
This function, vlog, allows conditional logging or printing of messages based on a global or passed in <parameter inline> verbose level. By default, it behaves differently depending on whether
modeis'log'or'naive'. When \(mode = 'log'\), the message is printed only if \(\text{verbose} \geq \text{level}\). Otherwise, for \(mode\) in [None,'naive'], the verbosity threshold leads to various bracketed prefixes (e.g. [INFO], [DEBUG], [TRACE]) unless the message already contains such a prefix.(1)#\[\text{indentation} = 2 \times \text{depth}\]where \(\text{depth}\) is either manually specified or auto-derived based on <parameter inline> level (1 = ERROR, 2 = WARNING, 3 = INFO, 4/5 = DEBUG, 6/7 = TRACE).
- Parameters:
message (
str) – The text to be printed or logged.verbose (
int, optional) – Overall verbosity threshold. IfNone, it looks for a global variable namedverbose. Default isNone.level (
int, default3) –Severity or importance level of the message. Commonly:
1 = ERROR
2 = WARNING
3 = INFO
4,5 = DEBUG
6,7 = TRACE
depth (
intorstr, default"auto") – Indentation level used for the printed message. If"auto", the depth is computed from <parameter inline> level.mode (
str, optional) – Determines logging mode. If set to'log', prints messages only if \(\text{verbose} \geq \text{level}\). Otherwise (ifNoneor'naive'), it follows a custom logic driven by <parameter inline> verbose.vp (
bool, defaultTrue) – IfTrue, the function automatically prepends bracketed tags (e.g. [INFO]) unless the message already contains one of [INFO], [DEBUG], [ERROR], [WARNING], or [TRACE].logger (
logging.LoggerorCallable[[str],None], optional) –Custom sink that receives the already-formatted message string.
If you pass a standard :pyclass:`logging.Logger` instance, the message is routed through
logger.info.If you supply any
callablethat accepts a singlestr(e.g. a GUI text-append function), that callable is invoked directly.Defaults to :pyfunc:`print`, which writes to stdout.
kws (
Logging instance, optional) – For future extensions.
- Returns:
This function does not return anything. It either prints the message to stdout or omits it, depending on <parameter inline> verbose, <parameter inline> level, and
mode.- Return type:
Notes
This function is helpful for selectively displaying or logging messages in applications that adapt to the user’s required verbosity. By default, each level has a specific bracketed tag and an auto indentation depth.
Examples
>>> from geoprior.utils.generic_utils import vlog >>> # Example with mode='log' >>> # This prints only if global or passed-in >>> # verbose >= 4. >>> vlog("Check debugging details.", verbose=3, ... level=4, mode='log') >>> # Example with mode='naive' >>> # If verbose=2, it displays as [INFO] prefixed. >>> vlog("Loading data...", verbose=2, mode='naive')
See also
globalsUsed to retrieve the fallback verbose value if not explicitly passed.
- geoprior.utils.generic_utils.detect_dt_format(series)[source]#
Detect the datetime format of a pandas Series containing datetime values.
This function inspects a non-null sample from the datetime Series and infers the format string based on its components (year, month, day, hour, minute, and second). It returns a format string that can be used with
strftime. For example, if the sample indicates only a year is relevant, it returns"%Y"; if full date information is present, it returns"%Y-%m-%d"; and if time details are also present, it extends the format accordingly.- Parameters:
series (
pandas.Series) – A Series containing datetime values (dtype datetime64).- Returns:
A datetime format string (e.g.,
"%Y","%Y-%m-%d", or"%Y-%m-%d %H:%M:%S") that represents the resolution of the data.- Return type:
Examples
>>> from geoprior.utils.generic_utils import detect_dt_format >>> import pandas as pd >>> dates = pd.to_datetime(['2023-01-01', '2024-01-01', '2025-01-01']) >>> fmt = detect_dt_format(pd.Series(dates)) >>> print(fmt) %Y
Notes
The detection logic checks if month, day, hour, minute, and second are all default values (e.g., month == 1, day == 1, hour == 0, etc.) and infers the most compact format that still represents the data accurately.
- geoprior.utils.generic_utils.get_actual_column_name(df, tname=None, actual_name=None, error='raise', default_to=None)[source]#
Determines the actual target column name in the given DataFrame.
- Parameters:
df (
pandas.DataFrame) – The DataFrame containing the target column.tname (
str, optional) – The base target name (e.g., “subsidence”). If not found in the DataFrame, it will attempt to find a matching column using “<tname>_actual” format.actual_name (
str, optional) – If provided, this name will be returned as the actual target column name.error (
{'raise', 'warn', 'ignore'}, default'raise') – Specifies how to handle the case when no valid column is found: - ‘raise’: Raises a ValueError. - ‘warn’: Issues a warning and returns None. - ‘ignore’: Silently returns None.
- Returns:
The determined actual column name, or None if no match is found and error=’warn’ or error=’ignore’.
- Return type:
- Raises:
ValueError – If no valid target column is found and error=’raise’.
Examples
>>> from geoprior.utils.generic_utils import get_actual_column_name >>> df = pd.DataFrame({'subsidence_actual': [1, 2, 3]}) >>> get_actual_column_name(df, tname="subsidence") 'subsidence_actual'
>>> df = pd.DataFrame({'subsidence': [1, 2, 3]}) >>> get_actual_column_name(df, tname="subsidence") 'subsidence'
>>> df = pd.DataFrame({'actual': [1, 2, 3]}) >>> get_actual_column_name(df) 'actual'
>>> df = pd.DataFrame({'measurement': [1, 2, 3]}) >>> get_actual_column_name(df, tname="subsidence", error="warn") Warning: Could not determine the actual target column in the DataFrame. None
- geoprior.utils.generic_utils.transform_contributions(contributions, to_percent=True, normalize=False, norm_range=(0, 1), scale_type=None, zero_division='warn', epsilon=1e-06, log_transform=False)[source]#
Converts the feature contributions either to a direct percentage, normalizes them to a custom range, or applies a scaling strategy based on the chosen parameters.
- Parameters:
- contributions
dict A dictionary where keys are feature names and values are the feature contributions. Each value is expected to be a numerical value representing the contribution of the respective feature.
- to_percentbool,
optional, default=True Whether to convert the contributions to percentages. If True, each value in contributions will be multiplied by 100. This is useful when contributions are given in decimal form but are expected as percentages.
- normalizebool,
optional, default=False Whether to normalize the contributions using min-max scaling. If True, the values will be scaled to the range defined in
norm_range.- norm_range
tuple,optional, default=(0, 1) A tuple specifying the range (min, max) for normalization. This range is applied when normalize is set to True. The contributions will be rescaled so that the minimum value maps to norm_range[0] and the maximum value maps to norm_range[1].
- scale_type
str,optional, default=None The scaling strategy. Options include: -
'zscore': Performs Z-score normalization. -'log': Applies a logarithmic transformation to the data. If None, no scaling is applied.- zero_division
str,optional, default=’warn’ Defines how to handle zero or missing values in the contributions. Options include: -
'skip': Skips zero values (no modification). -'warn': Issues a warning if zero values are found. -'replace': Replaces zeros with a small value defined byepsilonto avoid division by zero or undefined results.- epsilon
float,optional, default=1e-6 A small value used to replace zeros when zero_division is set to
'replace'. This prevents division by zero errors during transformations like Z-score or log transformation.- log_transformbool,
optional, default=False Whether to apply a logarithmic transformation to the contributions. If True, it applies the natural logarithm to each value in the contributions dictionary. Only positive values are valid for log transformation, and zero values are either skipped or replaced based on the
zero_divisionparameter.
- contributions
- Returns:
dictA dictionary with feature names as keys and the transformed feature contributions as values. The transformation is applied according to the chosen parameters.
See also
numpy.meanCompute the arithmetic mean of an array.
numpy.stdCompute the standard deviation of an array.
rac{X - mu}{sigma}
where \(X\) is the contribution, \(\mu\) is the mean of the contributions, and \(\sigma\) is the standard deviation of the contributions.
If
log_transform=True, the function applies the natural logarithm:(2)#\[ext{log}(X) ext{ for } X > 0\]The
zero_divisionparameter handles zero values by either skipping, warning, or replacing them with a small value (epsilon).
Examples
>>> from geoprior.utils.generic_utils import transform_contributions >>> contributions = { >>> 'GWL': 2.226836617133828, >>> 'rainfall_mm': 12.398293851061492, >>> 'normalized_seismic_risk_score': 0.9402759347406523, >>> 'normalized_density': 4.806074194258057, >>> 'density_concentration': 5.666943330566496e-06, >>> 'geology': 1.2798872011280326e-05, >>> 'density_tier': 1.044039559604414e-05, >>> 'rainfall_category': 0.0 >>> } >>> transform_contributions(contributions, to_percent=True, normalize=True) >>> transform_contributions(contributions, to_percent=False, scale_type='zscore')
- geoprior.utils.generic_utils.exclude_duplicate_kwargs(func, existing_kwargs, user_kwargs)[source]#
Prevents the user from overriding existing parameters in a target function. The method exclude_duplicate_kwargs checks both developer-specified and function-level parameter names to exclude them from user_kwargs.
(3)#\[ext{final\_kwargs} = \{\,(k, v) \in ext{user\_kwargs} \,\mid\, k\]otin ext{protected_params},}
- Parameters:
- func
callable() The target function whose valid parameters are checked. It uses Python’s introspection to gather the acceptable parameter names.
- existing_kwargs
dictorlist Developer-defined parameters to protect. Can be: * A dictionary of parameter-value pairs (e.g.,
{'ax': ax_obj, 'data': df}) whose keys are excluded from user overrides.A list of parameter names (e.g.,
['ax', 'data']) to protect from user overrides.
- user_kwargs
dict The user-supplied keyword arguments that are candidates for merging with existing_kwargs. This dictionary is filtered to remove collisions with protected parameters.
- func
- Returns:
dictA filtered dictionary of user-defined arguments that do not overlap with protected parameters.
- Parameters:
- Return type:
See also
inspect.signatureUsed to introspect function parameters.
filter_valid_kwargsAnother inline function that discards user params not valid for a given function.
Notes
By default, if existing_kwargs is a dictionary, its keys are treated as protected parameter names. If it’s a list, those items are protected. The function signature of func is also used to verify that only recognized parameters are protected. Keyword-filtering patterns like this are covered in Beazley and Jones [26].
Examples
>>> from geoprior.utils.generic_utils import exclude_duplicate_kwargs >>> import seaborn as sns >>> # Developer has some base kwargs ... base_kwargs = { ... 'x': 'species', ... 'y': 'sepal_length', ... 'palette': 'viridis' ... } >>> # User tries to override 'x' with new param ... user_args = { ... 'x': 'petal_width', ... 'color': 'red' ... } >>> # Filter out duplicates ... safe_args = exclude_duplicate_kwargs( ... sns.scatterplot, ... base_kwargs, ... user_args ... ) >>> safe_args {'color': 'red'}
- geoprior.utils.generic_utils.reorder_columns(df, columns, pos='end')[source]#
Reorder columns in a DataFrame by moving specified columns to a chosen position.
This function locates <columns> in the original DataFrame <df> and rearranges them based on the parameter
pos. Ifposis “end”, columns are appended to the end. If “begin” or “start”, they are placed at the front. If “center”, they are inserted at the midpoint:- Parameters:
df (
pandas.DataFrame) – The input DataFrame to be modified.columns (
stroriterableofstr) – A single column name or multiple column names to reposition. If a single string is given, it is converted to a list with one element.pos (
str,int, orfloat, default :py:class:``”end”:py:class:``) –- Determines the target placement:
"end": Append after all other columns."begin"or"start": Prepend at the start."center": Insert at the midpoint of remaining columns.integer or float: Insert at zero-based index among the remaining columns. If out of bounds, the original DataFrame is returned unchanged.
- Returns:
A new DataFrame with <columns> moved as specified by
pos.- Return type:
- `reorder_columns_in`
This method rearranges columns without altering values or data order beyond column placement.
Notes
The function checks if <columns> exist in <df>, ignoring columns not present.
A warning is issued if the position is beyond the range of valid indices.
Negative indices for integer
posare converted to positive by adding the total number of remaining columns.
(4)#\[i_{\text{center}} = \left\lfloor \frac{|R|}{2} \right\rfloor,\]where \(|R|\) is the number of remaining columns after removing the target columns. For integer or float
pos, the target columns are inserted at index \(\lfloor pos \rfloor\) among the remaining columns. Column-order management follows common DataFrame practices discussed in McKinney [27].Examples
>>> from geoprior.utils.generic_utils import reorder_columns >>> import pandas as pd >>> data = pd.DataFrame({ ... 'id': [1, 2, 3], ... 'latitude': [10.1, 10.2, 10.3], ... 'landslide': [0, 1, 0], ... 'longitude': [20.1, 20.2, 20.3] ... }) >>> # Move 'landslide' to the end (default) >>> reorder_columns(data, 'landslide', pos="end") id latitude longitude landslide 0 1 10.1 20.1 0 1 2 10.2 20.2 1 2 3 10.3 20.3 0
See also
pandas.DataFrame.reindexPandas method for reindexing or reordering columns more generally.
- geoprior.utils.generic_utils.find_id_column(df, strategy='naive', regex_pattern=None, uniqueness_threshold=0.95, errors='raise', empty_as_none=True, as_list=False, case_sensitive=False, as_frame=False)[source]#
Identify potential ID column(s) in a pandas DataFrame using multiple heuristic strategies.
The function examines column names and/or data properties to detect columns likely to serve as unique identifiers. This is particularly useful for large datasets where the ID field is not explicitly labeled, and for quick scanning of possible key columns.
- Parameters:
df (
pandas.DataFrame) – The input DataFrame in which to search for potential ID columns.strategy (
{'naive', 'exact', 'dtype', 'regex','prefix_suffix'}, default'naive') –Defines the logic for detecting ID columns: - exact: Checks for a column name that exactly
matches id (case sensitivity controlled by
case_sensitive).naive: Searches for columns where id is part of the name (e.g., location_id) subject to case sensitivity.
prefix_suffix: Considers columns prefixed or suffixed with id or _id.
dtype: Examines columns having data types commonly used for IDs (integer, string, or object) and checks if they show high uniqueness via \(\text{uniqueness\_ratio} \geq \text{uniqueness\_threshold}\).
<regex>: Uses a custom regular expression
<regex_pattern>to find matches in column names.
regex_pattern (
str, optional) – Required if strategy is ‘regex’. The pattern is compiled via re.compile, with case sensitivity determined by <case_sensitive>.uniqueness_threshold (
float, default0.95) –For <dtype> strategy, columns are flagged as ID candidates if the ratio:
(5)\[r = \frac{ \text{unique\_values} }{ \text{non\_NA\_rows} }\]satisfies \(r \geq \text{uniqueness\_threshold}\), or if the number of unique values equals the number of non-null rows.
errors (
{'raise', 'warn', 'ignore'}, default'raise') –- How to handle no-match cases:
raise: Raises a ValueError.
warn: Issues a UserWarning and returns based on <as_frame> or <empty_as_none>.
ignore: Returns an empty result based on the same parameters without warning.
empty_as_none (
bool, defaultTrue) – Applies only if `as_frame` is False. Defines whether to return None (if True) or an empty list (if False) when no ID column is found and <errors> is ‘warn’ or ‘ignore’.as_list (
bool, defaultFalse) – If True, return all matched columns. If False, return only the first match. Affects both name returns and DataFrame returns.case_sensitive (
bool, defaultFalse) – If False, comparisons (including regex) are performed in a case-insensitive manner.as_frame (
bool, defaultFalse) – If True, return the matched columns as a pandas DataFrame. If as_list is True, it may include multiple columns. If no column is found, returns an empty DataFrame (if <errors> is ‘warn’ or ‘ignore’).
- Returns:
Depends on as_frame, as_list, and the number of matching columns: - `<as_frame>`=False, `as_list`=False:
returns the first match as a string, or None/[].
`as_frame`=False, `as_list`=True: returns all matching column names as a list of strings.
`as_frame`=True, `as_list`=False: returns a DataFrame with the first matched column. If no match is found, an empty DataFrame may be returned.
`as_frame`=True, `as_list`=True: returns a DataFrame with all matched columns included.
- Return type:
strorList[str]orpandas.DataFrameorNone
Notes
For <dtype> strategy, integer, string, and object columns are inspected. The function calculates a uniqueness ratio and compares it against <uniqueness_threshold>.
Negative or zero thresholds are invalid, as are values above 1.
If the DataFrame has no columns or is empty, the behavior is determined by <errors>.
The relational-model motivation for schema-oriented column handling goes back to Codd [28].
Examples
>>> from geoprior.utils.generic_utils import find_id_column >>> import pandas as pd >>> data = pd.DataFrame({ ... 'ID_code': [101, 102, 103], ... 'Name': ['Alice', 'Bob', 'Charlie'], ... 'value': [10, 20, 30] ... }) >>> # Example using the 'naive' strategy >>> col = find_id_column(data, strategy='naive') >>> print(col) # Might return 'ID_code' >>> # Example with as_list=True >>> cols = find_id_column(data, strategy='naive', ... as_list=True) >>> print(cols) # ['ID_code']
See also
re.compileThe regex compilation method used when `strategy`=’regex’.
pandas.api.types.is_integer_dtypeChecks integer type.
pandas.api.types.is_string_dtypeChecksstring type.
pandas.api.types.is_object_dtypeChecksobject type.
- geoprior.utils.generic_utils.check_group_column_validity(df, group_col, ops='check_only', max_unique=10, auto_bin=False, bins=4, error='warn', bin_labels=None, verbose=True)[source]#
Validate a grouping column for categorical-style use and optionally bin it.
- Parameters:
df (
pandas.DataFrame) – Input DataFrame holding the grouping column.group_col (
str) – Name of the candidate grouping column indf.ops (
{'check_only', 'binning', 'validate'}, optional) – Operation mode. Use"check_only"to return a boolean,"binning"to bin the column when needed and return a modified DataFrame, or"validate"to check validity while honoringerror.max_unique (
int, optional) – Maximum number of unique numeric values allowed before the column is treated as too continuous for categorical use.auto_bin (
bool, optional) – Whether to auto-bin a numeric column whenops='binning'.bins (
int, optional) – Number of bins to create when binning is applied.error (
{'warn', 'raise', 'ignore'}, optional) – Policy used when validation fails.bin_labels (
listofstrorNone, optional) – Custom labels for generated bins.verbose (
bool, optional) – Whether to emit informational messages.
- Returns:
Returns a boolean for
ops='check_only'. Otherwise returns a DataFrame, possibly with a transformedgroup_col.- Return type:
Notes
When quantile binning is used, interval boundaries are derived from the numeric distribution of
group_col.
- geoprior.utils.generic_utils.save_all_figures(output_dir='figures', prefix='figure', fmts=('png',), close=True, dpi=150, transparent=False, timestamp=True, verbose=True)[source]#
Save all currently open Matplotlib figures to disk in specified formats.
- Parameters:
output_dir (
str) – Directory where figures will be saved. Created if not exists.prefix (
str) – Filename prefix for each figure.formats (
listortupleofstr) – File formats/extensions to use (e.g., (‘png’,’pdf’)).close (
bool) – Whether to close each figure after saving. Default is True.dpi (
intorNone) – Resolution in dots per inch. None uses Matplotlib default.transparent (
bool) – Whether to save figures with transparent background.timestamp (
bool) – Append current timestamp (YYYYmmddTHHMMSS) to filenames.verbose (
bool) – Print progress messages.
- Returns:
List of saved file paths.
- Return type:
List[str]
Examples
>>> import matplotlib.pyplot as plt >>> plt.figure(); plt.plot([1, 2, 3]) >>> from geoprior.utils.generic_utils import save_all_figures >>> paths = save_all_figures(output_dir="plots", formats=("png",)) >>> print(paths) ['plots/figure_1_20250521T153045.png']
- geoprior.utils.generic_utils.rename_dict_keys(data, param_to_rename=None, order='forward')[source]#
Renames keys in the data dictionary based on the provided param_to_rename dictionary.
This function will check if the key exists in the data dictionary. If the key is present, it will be renamed according to the mapping provided in the param_to_rename dictionary. If the key is not found in data and a mapping exists in param_to_rename, the function will apply the rename. If no rename is required, the function will return the original dictionary.
- Parameters:
data (
dict) – The dictionary whose keys may be renamed. The function will iterate over the keys of this dictionary and rename them according to the mapping provided in param_to_rename.param_to_rename (
dict, optional) – A dictionary mapping old keys to new keys. Each key in this dictionary represents an old key that may be found in data, and the corresponding value is the new key. If None, no renaming is performed. If a key in data matches an old key in param_to_rename, that key will be renamed.order (
str,{'forward', 'reverse'}:) –Order for renaming keys in a flat dict:
forward (default): param_to_rename = {old_key: new_key} reverse: param_to_rename = { canonical_key: alias or (alias1, alias2, ...) } The first alias found in `data` is moved under the canonical key. If the canonical key already exists, nothing is changed for that mapping.
- Returns:
The updated dictionary with keys renamed as per the param_to_rename mapping. If no keys need renaming, the original dictionary is returned.
- Return type:
- Raises:
ValueError – If param_to_rename is not a dictionary, a ValueError will be raised.
Examples
>>> from geoprior.utils.generic_utils import Example 1: Renaming a key in the dictionary:
>>> data = {"subsidence": 100} >>> param_to_rename = {"subsidence": "subs_pred"} >>> rename_dict_keys(data, param_to_rename) {'subs_pred': 100}
Example 2: When the key is already valid (no change needed):
>>> data = {"subs_pred": 100} >>> param_to_rename = {"subsidence": "subs_pred"} >>> rename_dict_keys(data, param_to_rename) {'subs_pred': 100}
Example 3: When param_to_rename is None, no renaming is performed:
>>> data = {"subsidence": 100} >>> rename_dict_keys(data) {'subsidence': 100}
Notes
If param_to_rename is None, no renaming occurs, and the data dictionary is returned as is.
This function raises an error if param_to_rename is not a dictionary. Ensure that the parameter is a valid dictionary of old-to-new key mappings.
- geoprior.utils.generic_utils.normalize_time_column(df, time_col, datetime_col='datetime_temp', year_col='year_int', drop_orig=False)[source]#
Normalize a time column into a datetime column and an integer year.
The input column may contain integer years, strings, or existing pandas Datetime values. The function creates
datetime_colwith parsed timestamps andyear_colwith the extracted integer year. Whendrop_orig=True, the originaltime_colis removed anddatetime_colis renamed back totime_col.- Parameters:
df (
pandas.DataFrame) – Input DataFrame containing a time column namedtime_col.time_col (
str) – Name of the column to normalize.datetime_col (
str, default'datetime_temp') – Name of the parsed datetime column.year_col (
str, default'year_int') – Name of the extracted integer year column.drop_orig (
bool, defaultFalse) – IfTrue, drop the originaltime_colafter parsing and renamedatetime_colback totime_col.
- Returns:
A copy of
dfwith the parsed datetime column and integer year column.- Return type:
- Raises:
ValueError – If
time_colis missing or parsing fails for any entry.TypeError – If
dfis not a pandas DataFrame.
- geoprior.utils.generic_utils.select_mode(mode=None, default='pihal_like', canonical=None)[source]#
Resolve a user-supplied mode alias to a canonical value.
- Parameters:
mode (
strorNone, optional) – Case-insensitive mode alias. Accepted values include'pihal','pihal_like','tft','tft_like', orNoneto fall back todefault.default (
{'pihal', 'tft'}, optional) – Canonical value returned whenmodeisNone.canonical (
dictorlistorNone, optional) – Custom alias mapping. A dictionary maps input strings to canonical values. A list is treated as an identity mapping for its items.
- Returns:
Canonical string corresponding to the resolved mode.
- Return type:
- Raises:
ValueError – If
modedoes not match any accepted alias.
- geoprior.utils.generic_utils.print_config_table(sections, title=None, table_width=None, sort_keys=True, key_col_fraction=0.35, max_value_length=200, log_fn=None)[source]#
Pretty-print configuration or hyperparameters as a key/value table.
This helper is intended for CLI scripts (Stage-1, training, tuning) so that the user can quickly inspect which parameters are actually in effect.
- Parameters:
sections (
dictorsequenceof(str,dict)) –If a single dict is passed, all key/value pairs are printed in one block.
If a sequence is passed, it must contain
(name, params)tuples, wherenameis a section label (e.g."Physics") andparamsis a dict mapping parameter names to values.title (
str, optional) – Optional title displayed above the table (centered).table_width (
int, optional) – Total width of the printed table. IfNone, the function tries to usegeoprior.api.util.get_table_size(). If that fails, it falls back to the terminal width (viashutil.get_terminal_size) or 80 characters.sort_keys (
bool, defaultTrue) – Whether to sort parameter names alphabetically within each section.key_col_fraction (
float, default0.35) – Fraction of the table width allocated to the parameter-name column. The remainder is used for the value column.max_value_length (
int, default200) – Maximum number of characters kept from the stringified value. Longer values are truncated with an ellipsis ("...") before being wrapped onto multiple lines.log_fn (
callable, optional) – Function used to emit lines (defaults toprint()). This allows capturing the table in logs if needed.
- Returns:
The full rendered table as a single string. It is always printed via
print_fnas a side effect.- Return type:
Notes
Nested containers (lists, tuples, dicts) are rendered in a compact one-line form and then wrapped to fill the value column.
This function is intentionally lightweight and does not depend on external tabulation libraries, so it can be safely used in lightweight Stage-1 / Stage-2 scripts.