geoprior.utils.io_utils#

Input/Output utilities for managing file paths, directories, and loading serialized data within FusionLab. Provides error-checked deserialization, directory management, and archive handling (e.g., .tgz, .zip), streamlining file operations and data recovery.

Adapted for FusionLab from the original geoprior.utils.io_utils.

Functions

`cpath`([savepath, dpath])	Ensures a directory exists for saving files, creating it if necessary.
`deserialize_data`(filename[, verbose])	Deserialize and load data from a serialized file using joblib or pickle.
`dummy_csv_translator`(csv_fn, pf[, ...])	Translate a CSV file using a dictionary created from a markdown-style parser file.
`extract_tar_with_progress`(tar, member, path)	Extracts a single file from a tar archive with a progress bar.
`fetch_joblib_data`(job_file, *keys[, ...])	Dynamically load data from a joblib-saved dictionary with flexible key access.
`fetch_json_data_from_url`(url[, todo])	Retrieve and parse JSON data from a URL.
`fetch_tgz_from_url`(data_url, tgz_filename[, ...])	Downloads a .tgz file from a specified URL, saves it to a directory, and optionally extracts a specific file from the archive.
`fetch_tgz_locally`(tgz_file, filename[, ...])	Extracts a specific file from a local .tgz archive and optionally renames it.
`get_config_fname_from_varname`(data[, ...])	Generate a filename based on a variable name for YAML configuration.
`get_valid_key`(input_key, default_key[, ...])	Validates an input key and substitutes it with a valid key if necessary, based on a mapping of valid keys to their possible substitutes.
`key_checker`(keys, valid_keys[, regex, ...])	check whether a give key exists in valid_keys and return a list if many keys are found.
`key_search`(keys, default_keys[, parse_keys, ...])	Find key in a list of default keys and select the best match.
`load_csv`(data_path[, delimiter])	Loads a CSV file into a pandas DataFrame.
`load_serialized_data`(filename[, verbose])	Load data from a serialized file (e.g., pickle or joblib format).
`move_cfile`(cfile[, savepath])	Moves a file to the specified path.
`parse_csv`([csv_fn, data, todo, fieldnames, ...])	Parses a CSV file or serializes data to a CSV file.
`parse_json`([json_fn, data, todo, savepath, ...])	Parse and manage JSON configuration files, either loading data from or saving data to a JSON file.
`parse_md`(pf[, delimiter])	Parse a markdown-style file with key-value pairs separated by a delimiter.
`parse_yaml`([yml_fn, data, todo, savepath, ...])	Parse and handle YAML configuration files for loading or saving data.
`print_cmsg`(cfile[, todo, config])	Generates output message for configuration file operations.
`rename_files`(src_files, dst_files[, ...])	Rename files from one set of names or paths to another.
`sanitize_unicode_string`(str_)	Removes spaces and replaces accented characters in a string.
`save_job`(job, savefile, *[, protocol, ...])	Quick save your job using 'joblib' or persistent Python pickle module.
`save_path`(nameOfPath)	Creates a directory if it does not exist.
`serialize_data`(data[, filename, savepath, ...])	Serialize and save a Python object to a binary file using either `joblib` or `pickle`.
`serialize_data_in`(data[, filename, force, ...])	Serializes a Python object to a binary file using either joblib or pickle.
`spath`(name_of_path)	Create a directory if it does not already exist.
`store_or_write_hdf5`(df[, key, mode, kind, ...])	Store a DataFrame to HDF5, write it to CSV, or sanitize it in memory.
`to_hdf5`(data, fn[, objname, close])	Store a data object in Hierarchical Data Format 5 (HDF5).
`to_txt`(d[, filename, format, indent, width, ...])	Export data objects to a text or JSON file with optional custom formatting.
`zip_extractor`(zip_file[, samples, ftype, ...])	Extracts files from a ZIP archive based on various filtering criteria and saves them to a specified directory.

Classes

FileManager(root_dir, target_dir[, ...])

A class for managing and organizing files within a directory structure.

class geoprior.utils.io_utils.FileManager(root_dir, target_dir, file_types=None, name_patterns=None, move=False, overwrite=False, create_dirs=False)[source]#

Bases: BaseClass

A class for managing and organizing files within a directory structure. This class provides methods to filter, organize, and rename files in bulk based on file extensions and name patterns. All operations are executed via the run method to ensure proper initialization and state management.

Mathematically, if \(\mathcal{F}\) represents the set of files in the root directory and \(\phi(f)\) is a filtering function that selects files based on file type and name pattern, then the FileManager produces a subset

(1)#\[\mathcal{F}' = \{ f \in \mathcal{F} \mid \phi(f) \}\]

and performs operations such as moving or copying to reorganize these files into a target directory.

Parameters:

root_dir (str) – The root directory containing the files to be managed. This directory must exist and contain the files subject to filtering.
target_dir (str) – The directory where the organized files will be placed. If necessary, this directory can be created when create_dirs is True.
file_types (list of str, optional) – A list of file extensions (e.g., ['.csv', '.json']) used to filter the files. If None, no file type filtering is applied.
name_patterns (list of str, optional) – A list of substrings (e.g., ['2023', 'report']) to filter file names. If None, all file names are included.
move (bool, optional) – If True, files are moved from the source to the target directory; otherwise, they are copied. Default is False.
overwrite (bool, optional) – If True, existing files in the target directory will be overwritten. If False, existing files are skipped. Default is False.
create_dirs (bool, optional) – If True, missing directories in the target path are created. Default is False.

Variables:

root_dir (str) – The validated root directory from which files are managed.
target_dir (str) – The directory where the processed files are stored.

run(pattern, replacement)[source]#

Executes the file organization process. It filters files using the criteria provided at initialization and, if a pattern and corresponding replacement are given, performs bulk renaming.

Parameters:

pattern (str | None)
replacement (str | None)

get_processed_files()[source]#

Returns a list of file paths that have been processed and organized into the target directory.

Return type:: list[str]

Examples

>>> from geoprior.utils.io_utils import FileManager
>>> manager = FileManager(
...     root_dir='data/raw',
...     target_dir='data/processed',
...     file_types=['.csv', '.json'],
...     name_patterns=['2023', 'report'],
...     move=True,
...     overwrite=True,
...     create_dirs=True
... )
>>> manager.run(pattern='old', replacement='new')
>>> processed = manager.get_processed_files()
>>> print(processed)

Notes

The public method run orchestrates the file management operations by first calling the internal method _organize_files() to filter and move or copy files from the source directory to the target directory. If renaming is needed, _rename_files() is invoked with the specified pattern and replacement. The method get_processed_files() compiles a list of all files that have been organized, based on a walk of the target directory. The directory traversal and file-operation APIs are documented in [29, 30].

See also

shutil.move: To move files between directories.
shutil.copy2: To copy files while preserving file metadata.

__init__(root_dir, target_dir, file_types=None, name_patterns=None, move=False, overwrite=False, create_dirs=False)[source]#

Initialize the base class.

Parameters:

verbose (int, optional) – Verbosity level controlling logging (0 to 3). Defaults to 0.
root_dir (str)
target_dir (str)
file_types (list[str] | None)
name_patterns (list[str] | None)
move (bool)
overwrite (bool)
create_dirs (bool)

run(pattern=None, replacement=None)[source]#

Executes file organization operations.

This method filters files based on the specified file types and name patterns, then organizes them by moving or copying into the target directory. Additionally, if a pattern is provided, file names containing that pattern are renamed by replacing the pattern with the specified replacement.

Parameters:

pattern (str, optional) – The substring to search for in file names. If provided, file names containing this pattern will be renamed.
replacement (str, optional) – The string to replace pattern with in file names. Required if pattern is specified.

Returns:

self – The instance itself after executing operations.

Return type:

FileManager

Examples

>>> manager = FileManager(...)
>>> manager.run(pattern='old', replacement='new')

get_processed_files()[source]#

Retrieves a list of processed files in the target directory.

Returns:: files – A list containing the full paths of the files that have been organized into the target directory.
Return type:: list of str

Examples

>>> manager = FileManager(...)
>>> manager.run()
>>> files = manager.get_processed_files()
>>> print(files)

fit(pattern=None, replacement=None)#

Executes file organization operations.

This method filters files based on the specified file types and name patterns, then organizes them by moving or copying into the target directory. Additionally, if a pattern is provided, file names containing that pattern are renamed by replacing the pattern with the specified replacement.

Parameters:

pattern (str, optional) – The substring to search for in file names. If provided, file names containing this pattern will be renamed.
replacement (str, optional) – The string to replace pattern with in file names. Required if pattern is specified.

Returns:

self – The instance itself after executing operations.

Return type:

FileManager

Examples

>>> manager = FileManager(...)
>>> manager.run(pattern='old', replacement='new')

help(**kwargs)#

my_params = FileManager( root_dir, target_dir, file_types=None, name_patterns=None, move=False, overwrite=False, create_dirs=False )#

geoprior.utils.io_utils.cpath(savepath=None, dpath='_default_path_')[source]#

Ensures a directory exists for saving files, creating it if necessary.

Parameters:

savepath (str, optional) – The target directory to validate or create. If None, dpath is used as the directory.
dpath (str, default '_default_path_') – Default directory created in the current working directory if savepath is None.

Returns:

The absolute path to the validated or created directory.

Return type:

str

Examples

>>> from geoprior.utils.io_utils import cpath
>>> default_path = cpath()
>>> print(f"Files will be saved to: {default_path}")

>>> custom_path = cpath('/path/to/save')
>>> print(f"Files will be saved to: {custom_path}")

Notes

cpath validates the directory path and, if necessary, creates the directory tree. If a problem occurs during creation, an error message is printed.

See also

pathlib.Path.mkdir: Utility for directory creation.

geoprior.utils.io_utils.deserialize_data(filename, verbose=0)[source]#

Deserialize and load data from a serialized file using joblib or pickle.

The function attempts to load the serialized data from the provided file filename using joblib first. If joblib fails, it tries to load the data using pickle. An error is raised if both methods fail.

Parameters:

filename (str) – The name or path of the file containing the serialized data. This file is expected to be in a compatible format with either joblib or pickle.
verbose (int, optional) – Verbosity level. Messages indicating loading progress will be displayed if verbose is greater than 0.

Returns:

The data loaded from the serialized file, or None if loading fails.

Return type:

Any

Raises:

TypeError – If filename is not a string, as file paths must be provided as strings.
FileNotFoundError – If the specified filename does not exist or cannot be located.
IOError – If both joblib and pickle fail to deserialize the data from the file.
ValueError – If the file was successfully read but yielded no data (i.e., None).

Examples

>>> from geoprior.utils.io_utils import deserialize_data
>>> data = deserialize_data('path/to/serialized_data.pkl', verbose=1)
Data loaded successfully from 'path/to/serialized_data.pkl' using joblib.

Notes

The function first attempts deserialization with joblib to leverage efficient file handling for large datasets. If joblib encounters an error, it falls back to pickle, which provides broader compatibility with Python objects but may be less optimized for large datasets. Loader semantics for the two backends are documented in [31, 32].

See also

joblib.load: Joblib’s load function for fast I/O operations on large data.
pickle.load: Pickle’s load function for serializing and deserializing Python objects.

geoprior.utils.io_utils.extract_tar_with_progress(tar, member, path)[source]#

Extracts a single file from a tar archive with a progress bar.

Parameters:

tar (tarfile.TarFile) – Opened tar file object.
member (tarfile.TarInfo) – Tar member (file) to be extracted.
path (Path) – Directory path where the file will be extracted.

Examples

>>> from geoprior.utils.io_utils import extract_tar_with_progress
>>> with tarfile.open('data.tar.gz', 'r:gz') as tar:
...     member = tar.getmember('file.csv')
...     extract_tar_with_progress(tar, member, Path('output_dir'))

Notes

Uses tqdm for progress tracking of the file extraction process.

geoprior.utils.io_utils.fetch_tgz_from_url(data_url, tgz_filename, data_path=None, file_to_retrieve=None, **kwargs)[source]#

Downloads a .tgz file from a specified URL, saves it to a directory, and optionally extracts a specific file from the archive.

This function retrieves a .tgz file from the provided data_url and saves it to the specified data_path directory. If file_to_retrieve is specified, the function will extract only that file from the archive; otherwise, the entire archive will be extracted.

Parameters:

data_url (str) – The URL to download the .tgz file from.
tgz_filename (str) – The name to assign to the downloaded .tgz file.
data_path (Union[str, Path], optional) – Directory where the downloaded file will be saved. Defaults to a ‘tgz_data’ directory in the current working directory if not specified.
file_to_retrieve (str, optional) – Specific filename to extract from the .tgz archive. If not provided, the entire archive is extracted.
**kwargs (dict) – Additional keyword arguments to pass to the extraction method.

Returns:

Path to the extracted file if a specific file was requested; otherwise, returns None.

Return type:

Optional[Path]

Raises:

FileNotFoundError – If the specified file_to_retrieve is not found in the archive.

Examples

>>> from geoprior.utils.io_utils import fetch_tgz_from_url
>>> data_url = 'https://example.com/data.tar.gz'
>>> extracted_file = fetch_tgz_from_url(
...     data_url, 'data.tar.gz', data_path='data_dir', file_to_retrieve='file.csv')
>>> print(extracted_file)

Notes

Uses the tqdm progress bar for tracking download progress.

geoprior.utils.io_utils.fetch_tgz_locally(tgz_file, filename, savefile='tgz', rename_outfile=None)[source]#

Extracts a specific file from a local .tgz archive and optionally renames it.

This function fetches a specific file filename from a local tar archive located at tgz_file, and saves it to savefile. If rename_outfile is specified, the file is renamed after extraction.

Parameters:

tgz_file (str) – Full path to the tar file.
filename (str) – Name of the target file to extract from the archive.
savefile (str, optional) – Destination directory for the extracted file, defaulting to ‘tgz’.
rename_outfile (str, optional) – New name for the fetched file. If not provided, retains the original name.

Returns:

Full path to the fetched and possibly renamed file.

Return type:

str

Raises:

FileNotFoundError – If the tgz_file or the specified filename is not found.

Examples

>>> from geoprior.utils.io_utils import fetch_tgz_locally
>>> fetched_file = fetch_tgz_locally(
...     'path/to/archive.tgz', 'file.csv', savefile='extracted', rename_outfile='renamed.csv')
>>> print(fetched_file)

geoprior.utils.io_utils.dummy_csv_translator(csv_fn, pf, delimiter=':', destfile='pme.en.csv')[source]#

Translate a CSV file using a dictionary created from a markdown-style parser file.

Parameters:

csv_fn (str) – Path to the source CSV file.
pf (str) – Path to the markdown-style file used to create the translation dictionary.
delimiter (str, default ':') – Delimiter used in the parser file to separate key-value pairs.
destfile (str, default 'pme.en.csv') – Name of the destination file for the translated CSV.

Returns:

DataFrame – Translated CSV data as a DataFrame.
list – List of untranslated terms found in the source CSV.

Notes

This function uses parse_md_data to read the parser file and apply translations to the CSV content.
Missing translations are collected and returned for review.

Examples

>>> df, missing = dummy_csv_translator(
    "data.csv", "parser_file.md", delimiter=":", destfile="output.csv")
>>> print(df.head())
>>> print(missing)

geoprior.utils.io_utils.fetch_json_data_from_url(url, todo='load')[source]#

Retrieve and parse JSON data from a URL.

Parameters:

url (str) – Universal Resource Locator (URL) from which JSON data is fetched.
todo ({'load', 'dump'}, default 'load') – Action to perform with JSON: - ‘load’: Load JSON data from the URL. - ‘dump’: Parse and prepare data from the URL for saving in a JSON file.

Returns:

A tuple of todo action, filename (or data source), and parsed data.

Return type:

tuple

Raises:

urllib.error.URLError – If there is an issue accessing the URL.

Notes

The function uses json.loads to parse data directly from a URL response, supporting convenient access to web-hosted JSON content.

geoprior.utils.io_utils.get_config_fname_from_varname(data, config_fname=None, config='.yml')[source]#

Generate a filename based on a variable name for YAML configuration.

Parameters:

data (Any) – The data object from which the variable name will be derived to create a YAML configuration filename.
config_fname (str, optional) – Custom configuration filename. If None, the name of data will be used as the filename.
config (str, default '.yml') – The file extension/type for the configuration file. Can be ‘.yml’, ‘.json’, or ‘.csv’.

Returns:

A suitable filename for saving the configuration data.

Return type:

str

Raises:

ValueError – If config_fname cannot be derived or an invalid file type is provided.

Notes

This function supports dynamic filename generation based on variable names, which aids in maintaining a clear configuration structure for serialized data. Files are saved with appropriate extensions based on the config type.

geoprior.utils.io_utils.get_valid_key(input_key, default_key, substitute_key_dict=None, regex_pattern='[#&*@!,;\\s]\\s*', deep_search=True)[source]#

Validates an input key and substitutes it with a valid key if necessary, based on a mapping of valid keys to their possible substitutes. If the input key is not provided or is invalid, a default key is used.

Parameters:

input_key (str) – The key to validate and possibly substitute.
default_key (str) – The default key to use if input_key is None, empty, or not found in the substitute mapping.
substitute_key_dict (dict, optional) – A mapping of valid keys to lists of their possible substitutes. This allows for flexible key substitution and validation.
regex_pattern (str, default = '[#&*@!,;\s-]\s*') – The base pattern to split the text into a columns
deep_search (bool, default False) – If deep-search, the key finder is no sensistive to lower/upper case or whether a numeric data is included.

Returns:

A valid key, which is either the original input_key if valid, a substituted key if the original was found in the substitute mappings, or the default_key.

Return type:

str

Notes

This function also leverages an external validation through key_checker for a deep search validation, ensuring the returned key is within the set of valid keys.

Example

>>> from geoprior.utils.io_utils import get_valid_key
>>> substitute_key_dict = {'valid_key1': ['vk1', 'key1'], 'valid_key2': ['vk2', 'key2']}
>>> get_valid_key('vk1', 'default_key', substitute_key_dict)
'valid_key1'
>>> get_valid_key('unknown_key', 'default_key', substitute_key_dict)
'KeyError...'

geoprior.utils.io_utils.key_checker(keys, valid_keys, regex=None, pattern=None, deep_search=False)[source]#

check whether a give key exists in valid_keys and return a list if many keys are found.

Parameters:

keys (str, list of str) – Key value to find in the valid_keys
valid_keys (list) – List of valid keys by default.

regex (re object,) –

Regular expresion object. the default is:

>>> import re
>>> re.compile (r'[_#&*@!_,;\s-]\s*', flags=re.IGNORECASE)

pattern (str, default = '[_#&*@!_,;\s-]\s*') – The base pattern to split the text into a columns
deep_search (bool, default False) – If deep-search, the key finder is no sensistive to lower/upper case or whether a numeric data is included.

Returns:

keys – List of keys that exists in the valid_keys.

Return type:

str, list ,

Examples

>>> from geoprior.utils.io_utils import key_checker
>>> key_checker('h502', valid_keys= ['h502', 'h253','h2601'])
Out[68]: 'h502'
>>> key_checker('h502+h2601', valid_keys= ['h502', 'h253','h2601'])
Out[69]: ['h502', 'h2601']
>>> key_checker('h502 h2601', valid_keys= ['h502', 'h253','h2601'])
Out[70]: ['h502', 'h2601']
>>> key_checker(['h502',  'h2601'], valid_keys= ['h502', 'h253','h2601'])
Out[73]: ['h502', 'h2601']
>>> key_checker(['h502',  'h2602'], valid_keys= ['h502', 'h253','h2601'])
UserWarning: key 'h2602' is missing in ['h502', 'h2602']
Out[82]: 'h502'
>>> key_checker(['502',  'H2601'], valid_keys= ['h502', 'h253','h2601'],
                deep_search=True )
Out[57]: ['h502', 'h2601']

geoprior.utils.io_utils.key_search(keys, default_keys, parse_keys=True, regex=None, pattern=None, deep=Ellipsis, raise_exception=Ellipsis)[source]#

Find key in a list of default keys and select the best match.

Parameters:

keys (str or list) – The string or a list of key. When multiple keys is passed as a string, use the space for key separating.
default_keys (str or list) – The likehood key to find. Can be a litteral text. When a litteral text is passed, it is better to provide the regex in order to skip some character to parse the text properly.
parse_keys (bool, default True) –
Parse litteral string using default pattern and regex.

Added in version 0.2.7.
regex (re object,) –
Regular expresion object. Regex is important to specify the kind of data to parse. the default is:
```
>>> import re
>>> re.compile (r'[_#&*@!_,;\s-]\s*', flags=re.IGNORECASE)
```
pattern (str, default = '[_#&*@!_,;\s-]\s*') – The base pattern to split the text into a columns. Pattern is important especially when some character are considers as a part of word but they are not a separator. For example a data columns with a name ‘DH_Azimuth’, if a pattern is not explicitely provided, the default pattern will parse as two separated word which is far from the expected results.
deep (bool, default False) – Not sensistive to uppercase.
raise_exception (bool, default False) – raise error when key is not find.

Returns:

list

Return type:

list of valid keys or None if not find ( default)

Examples

>>> from geoprior.utils.io_utils import key_search
>>> key_search('h502-hh2601', default_keys= ['h502', 'h253','HH2601'])
Out[44]: ['h502']
>>> key_search('h502-hh2601', default_keys= ['h502', 'h253','HH2601'],
               deep=True)
Out[46]: ['h502', 'HH2601']
>>> key_search('253', default_keys= ("I m here to find key among h502,
                                         h253 and HH2601"))
Out[53]: ['h253']
>>> key_search ('east', default_keys= ['DH_East', 'DH_North']  , deep =True,)
Out[37]: ['East']
key_search ('east', default_keys= ['DH_East', 'DH_North'],
            deep =True,parse_keys= False)
Out[39]: ['DH_East']

geoprior.utils.io_utils.load_serialized_data(filename, verbose=0)[source]#

Load data from a serialized file (e.g., pickle or joblib format).

Parameters:

filename (str) – Name of the file to load data from.
verbose (int, default 0) – Verbosity level. Controls the amount of output information: - 0: No output - >2: Detailed loading process messages.

Returns:

Data loaded from the file, or None if deserialization fails.

Return type:

Any

Raises:

TypeError – If filename is not a string.
FileExistsError – If the specified file does not exist.

Examples

>>> from geoprior.utils.io_utils import load_serialized_data
>>> data = load_serialized_data('data/my_data.pkl', verbose=3)

Notes

This function attempts to load serialized data using joblib and fallbacks to pickle if needed. Verbose output provides feedback on the loading process and success or failure of each step.

See also

joblib.load: High-performance loading utility.
pickle.load: General-purpose Python serialization library.

geoprior.utils.io_utils.load_csv(data_path, delimiter=',', **kwargs)[source]#

Loads a CSV file into a pandas DataFrame.

This function reads a comma-separated values (CSV) file into a pandas DataFrame, with the ability to specify a custom delimiter. It provides support for additional options passed to pandas.read_csv for more granular control over the data loading process.

Parameters:

data_path (str) – The file path to the CSV file that is to be loaded. The file path must lead to a .csv file. If the file does not exist at the specified path, a FileNotFoundError is raised.
delimiter (str, optional) – The character used to separate values in the CSV file. The default is , for standard CSVs. If a different delimiter is used in the file (e.g., ;), it can be specified here.
**kwargs (dict) – Additional keyword arguments that will be passed directly to pandas.read_csv. For instance, users can specify header, index_col, dtype, and other options supported by read_csv for more customized data handling.

Returns:

A pandas DataFrame containing the loaded data, with the specified options applied.

Return type:

DataFrame

Raises:

FileNotFoundError – If the specified file does not exist at the provided data_path.
ValueError – If the file specified by data_path is not a CSV file (i.e., does not have a .csv extension), a ValueError is raised to ensure correct file type.

Notes

This function simplifies the process of loading CSV data into a DataFrame, with a straightforward parameter for delimiter customization and full access to pandas.read_csv options. It is ideal for basic CSV loading tasks, as well as more complex ones requiring specific column handling, type casting, and missing value handling, which can be passed via **kwargs. CSV-oriented DataFrame loading patterns are discussed in McKinney [27].

Examples

Suppose you have a CSV file example.csv with the following content:

` name,age,city Alice,30,New York Bob,25,Los Angeles `

To load this file into a DataFrame:

>>> from geoprior.utils.io_utils import load_csv
>>> df = load_csv('example.csv')
>>> print(df)
     name  age         city
0   Alice   30     New York
1     Bob   25  Los Angeles

If the file uses a semicolon (;) as the delimiter:

>>> df = load_csv('example.csv', delimiter=';')

Additionally, you can pass custom read_csv parameters through **kwargs, such as specifying a column as the index:

>>> df = load_csv('example.csv', index_col='name')
>>> print(df)
       age         city
name
Alice    30     New York
Bob      25  Los Angeles

See also

pandas.read_csv: Full documentation for loading CSV files into a DataFrame with detailed parameter options.

geoprior.utils.io_utils.move_cfile(cfile, savepath=None, **ckws)[source]#

Moves a file to the specified path. If moving fails, copies and deletes the original.

Parameters:

cfile (str) – Name of the file to move.
savepath (str, optional) – Target directory. If not specified, uses default path via cpath.

Returns:

The new file path and a confirmation message.

Return type:

Tuple[str, str]

Examples

>>> from geoprior.utils.io_utils import move_cfile
>>> new_path, msg = move_cfile('myfile.txt', 'new_directory')
>>> print(new_path, msg)

geoprior.utils.io_utils.parse_csv(csv_fn=None, data=None, todo='reader', fieldnames=None, savepath=None, header=False, verbose=0, **csvkws)[source]#

Parses a CSV file or serializes data to a CSV file.

This function allows loading (reading) from or dumping (writing) to a CSV file. It supports standard CSV and dictionary-based CSV formats.

Parameters:

csv_fn (str, optional) – The CSV filename for reading or writing. For writing operations, if data is provided and todo is set to ‘write’ or ‘dictwriter’, this specifies the output CSV filename.
data (list, optional) – Data to write in the form of a list of lists or dictionaries.
todo (str, default 'reader') – Specifies the operation type: - ‘reader’ or ‘dictreader’: Reads data from a CSV file. - ‘writer’ or ‘dictwriter’: Writes data to a CSV file.
fieldnames (list of str, optional) – List of keys for dictionary-based writing to specify the field order.
savepath (str, optional) – Directory to save the CSV file when writing. Defaults to ‘_savecsv_’ if not provided and the path does not exist.
header (bool, default False) – If True, includes headers when writing with DictWriter.
verbose (int, default 0) – Controls the verbosity level for output messages.
csvkws (dict, optional) – Additional arguments passed to csv.writer or csv.DictWriter.

Returns:

Parsed data from the CSV file, as a list of lists or a list of dictionaries, based on the operation. Returns None when writing.

Return type:

Union[List[Dict], List[List[str]], None]

Notes

For writing data, the method uses either csv.writer for regular CSV or csv.DictWriter for dictionary-based CSV depending on the value of todo.

Examples

>>> from geoprior.utils.io_utils import parse_csv
>>> data = [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]
>>> parse_csv(csv_fn='output.csv', data=data, todo='dictwriter', fieldnames=['name', 'age'])
>>> loaded_data = parse_csv(csv_fn='output.csv', todo='dictreader', fieldnames=['name', 'age'])
>>> print(loaded_data)
[{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]

geoprior.utils.io_utils.parse_json(json_fn=None, data=None, todo='load', savepath=None, verbose=0, **jsonkws)[source]#

Parse and manage JSON configuration files, either loading data from or saving data to a JSON file.

Parameters:

json_fn (str, optional) – JSON filename or URL. If data is provided and todo is ‘dump’, json_fn will be used as the output filename. If todo is ‘load’, json_fn is the input filename or URL.
data (Any, optional) – Data in Python object format to serialize and save if todo is ‘dump’.
todo ({'load', 'loads', 'dump', 'dumps'}, default 'load') – Action to perform with JSON: - ‘load’: Load data from a JSON file. - ‘loads’: Parse a JSON string. - ‘dump’: Serialize data to a JSON file. - ‘dumps’: Serialize data to a JSON string.
savepath (str, optional) – Path where the JSON file will be saved if todo is ‘dump’. If savepath does not exist, it will save to the default path ‘_savejson_’.
verbose (int, default 0) – Controls verbosity of output messages.
**jsonkws (dict) – Additional keyword arguments passed to json.dump or json.dumps when saving data.

Returns:

The data loaded from the JSON file or URL if todo is ‘load’, or data after saving if todo is ‘dump’.

Return type:

Any

Raises:

json.JSONDecodeError – If there is an issue with reading or writing the JSON file.
TypeError – If the JSON file or data cannot be processed.

Notes

This function uses json.load, json.loads, json.dump, and json.dumps for efficient handling of JSON files and strings.

See also

fetch_json_data_from_url: Fetches JSON data from a given URL.
get_config_fname_from_varname: Utility for generating JSON configuration filenames based on variable names.

geoprior.utils.io_utils.parse_md(pf, delimiter=':')[source]#

Parse a markdown-style file with key-value pairs separated by a delimiter.

Parameters:

pf (str) – Path to the markdown file containing key-value pairs.
delimiter (str, default ':') – Delimiter used to separate key-value pairs.

Yields:

Tuple[str, str] – A tuple containing the key and processed value.

Raises:

IOError – If the provided path does not lead to a valid file.

Notes

This function yields key-value pairs by reading the file line-by-line.
It applies sanitize_unicode_string to keys to ensure data consistency.

Examples

>>> list(parse_md_data('parser_file.md', delimiter=':'))
[('key1', 'Value1'), ('key2', 'Value2')]

geoprior.utils.io_utils.parse_yaml(yml_fn=None, data=None, todo='load', savepath=None, verbose=0, **ymlkws)[source]#

Parse and handle YAML configuration files for loading or saving data.

Parameters:

yml_fn (str, optional) – The YAML filename. If data is provided and todo is set to ‘dump’, yml_fn will be used as the output filename. If todo is set to ‘load’, yml_fn is the input filename to read from.
data (Any, optional) – Data in a Python object format that will be serialized and saved as a YAML file if todo is ‘dump’.
todo ({'load', 'dump'}, default 'load') – Action to perform with the YAML file: - ‘load’: Load data from the YAML file specified by yml_fn. - ‘dump’: Serialize data into a YAML format and save to yml_fn.
savepath (str, optional) – Path where the YAML file will be saved if todo is ‘dump’. If not provided, a default path will be used. The function will ensure that the path exists.
verbose (int, default 0) – Controls verbosity of output messages.
**ymlkws (dict) – Additional keyword arguments passed to yaml.dump when saving data.

Returns:

The data loaded from the YAML file if todo is ‘load’, or data after saving if todo is ‘dump’.

Return type:

Any

Raises:

yaml.YAMLError – If there is an issue with reading or writing the YAML file.

Notes

This function uses safe_load and safe_dump methods from PyYAML for secure handling of YAML files.

See also

get_config_fname_from_varname: Utility for generating YAML configuration filenames based on variable names.

geoprior.utils.io_utils.print_cmsg(cfile, todo='load', config='YAML')[source]#

Generates output message for configuration file operations.

Parameters:

cfile (str) – Name of the configuration file.
todo (str, default 'load') – Operation performed (‘load’ or ‘dump’).
config (str, default 'YAML') – Type of configuration file (e.g., ‘YAML’, ‘CSV’, ‘JSON’).

Returns:

Confirmation message for the configuration operation.

Return type:

str

Examples

>>> from geoprior.utils.io_utils import print_cmsg
>>> msg = print_cmsg('config.yml', 'dump')
>>> print(msg)
--> YAML 'config.yml' data was successfully saved.

geoprior.utils.io_utils.rename_files(src_files, dst_files, basename=None, extension=None, how='py', prefix=True, keep_copy=True, trailer='_', sortby=None, **kws)[source]#

Rename files from one set of names or paths to another.

Parameters:

src_files (str or list of str) – Source files or a directory containing files to rename.
dst_files (str or list of str) – Destination file names or destination directory.
basename (str or None, optional) – Base name used when generating numbered destination files.
extension (str or None, optional) – Optional extension filter when src_files is a directory.
how (str, optional) – Numbering convention used when destination names are generated.
prefix (bool, optional) – Whether generated numbering is appended after the basename.
keep_copy (bool, optional) – Whether to keep copies of the original files.
trailer (str, optional) – Separator inserted between the basename and the generated counter.
sortby (regex, callable, or None, optional) – Optional sort key used when collecting files from a directory.
**kws (dict) – Additional keyword arguments forwarded to os.rename.

Return type:

None

geoprior.utils.io_utils.sanitize_unicode_string(str_)[source]#

Removes spaces and replaces accented characters in a string.

Parameters:

str (str) – The string to sanitize.
str_ (str)

Returns:

The sanitized string with removed spaces and replaced accents.

Return type:

str

Examples

>>> from geoprior.utils.io_utils import sanitize_unicode_string
>>> sentence ='Nos clients sont extrêmement satisfaits '
    'de la qualité du service fourni. En outre Nos clients '
        'rachètent frequemment nos "services".'
>>> sanitize_unicode_string  (sentence)
... 'nosclientssontextrmementsatisfaitsdelaqualitduservice'
    'fournienoutrenosclientsrachtentfrequemmentnosservices'
>>> sanitize_unicode_string("Élève à l'école")
'elevealecole'

geoprior.utils.io_utils.save_job(job, savefile, *, protocol=None, append_versions=True, append_date=True, fix_imports=True, buffer_callback=None, **job_kws)[source]#

Quick save your job using ‘joblib’ or persistent Python pickle module.

Parameters:

job (Any) – Anything to save, preferabaly a models in dict
savefile (str, or path-like object) – name of file to store the model. The file argument must have a write() method that accepts a single bytes argument. It can thus be a file object opened for binary writing, an io.BytesIO instance, or any other custom object that meets this interface.
append_versions (bool, default =True) – Append the version of Joblib module or Python Pickle module following by the scikit-learn, numpy and also pandas versions. This is useful to have idea about previous versions for loading file when system or modules have been upgraded. This could avoid bottleneck when data have been stored for long times and user has forgotten the date and versions at the time the file was saved.
append_date (bool, default True,) – Append the date of the day to the filename.
protocol (int, optional) –
The optional protocol argument tells the pickler to use the given protocol; supported protocols are 0, 1, 2, 3, 4 and 5. The default protocol is 4. It was introduced in Python 3.4, and is incompatible with previous versions.

Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.
fix_imports (bool, default True,) – If fix_imports is True and protocol is less than 3, pickle will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2.
buffer_call_back (int, optional) –
If buffer_callback is None (the default), buffer views are serialized into file as part of the pickle stream.

If buffer_callback is not None, then it can be called any number of times with a buffer view. If the callback returns a false value (such as None), the given buffer is out-of-band; otherwise the buffer is serialized in-band, i.e. inside the pickle stream.

It is an error if buffer_callback is not None and protocol is None or smaller than 5.
job_kws (dict,) – Additional keywords arguments passed to joblib.dump().

Returns:

The final filename where the job was saved.

Return type:

str

Notes

This function appends system-specific metadata like versions and date to the filename, which can aid in tracking compatibility over time.

Examples

>>> from geoprior.utils.io_utils import save_job
>>> model = {"key": "value"}  # Replace with actual model object
>>> savefile = save_job(model, "my_model", append_date=True, append_versions=True)
>>> print(savefile)
'my_model.20240101.sklearn_v1.0.numpy_v1.21.joblib'

geoprior.utils.io_utils.save_path(nameOfPath)[source]#

Creates a directory if it does not exist.

Parameters:: nameOfPath (str) – Name or path of the directory to create.
Returns:: The path of the created directory. If it exists, returns the existing path.
Return type:: str

Examples

>>> save_path("test_directory")
'path/to/test_directory'

geoprior.utils.io_utils.serialize_data(data, filename=None, savepath=None, to=None, force=True, compress=None, pickle_protocol=5, verbose=0)[source]#

Serialize and save a Python object to a binary file using either joblib or pickle. This function is designed to be robust and versatile, handling multiple cases including file naming, overwriting behavior, and compression options.

The final file path is computed as:

(2)#\[\text{filepath} = \text{savepath} \oplus \text{filename}\]

where \(\oplus\) denotes string concatenation.

Parameters:

data (Any) – The Python object to serialize. The object must be compatible with either joblib.dump or pickle.dump.
filename (str, optional) – The target filename for the serialized data. If None, a filename is generated using the current timestamp, e.g., "__mydumpedfile_20230315_123045.pkl".
savepath (str, optional) – The directory in which to save the file. If not specified, the current working directory (os.getcwd()) is used. The directory is created if it does not exist.
to (str, optional) – The serialization method to use. Acceptable values are 'joblib' and 'pickle'. If None, the default is 'joblib'.
force (bool, default True) – If True, any existing file with the same name is overwritten. If False, a timestamp is appended to the filename to ensure uniqueness.
compress (int or str, optional) – Compression level or method for joblib.dump. If None, no compression is applied.
pickle_protocol (int, default pickle.HIGHEST_PROTOCOL) – The pickle protocol to use when serializing with pickle.dump.
verbose (int, default 0) – Controls the verbosity of output messages. Higher values produce more detailed logging during the serialization process.

Returns:

The full path to the saved serialized file.

Return type:

str

Examples

>>> from geoprior.utils.io_utils import serialize_data
>>> import numpy as np
>>> data = {"a": np.arange(10), "b": np.random.rand(10)}
>>> filepath = serialize_data(
...     data, filename="mydata.pkl", savepath="output",
...     to="pickle", force=False, verbose=1
... )
>>> print(filepath)
/current/working/directory/output/mydata_<timestamp>.pkl

Notes

The function first constructs the file path from savepath and filename. If a file already exists and force is False, a timestamp is appended to ensure uniqueness. Then, depending on the value of to, the function attempts to serialize the data using either joblib.dump (with optional compression via the compress parameter) or pickle.dump (using the specified pickle_protocol). If an error occurs during serialization, an IOError is raised.

See also

joblib.dump: Serialize objects to disk using Joblib.
pickle.dump: Serialize objects to disk using Pickle.
os.getcwd: Retrieve the current working directory.

geoprior.utils.io_utils.serialize_data_in(data, filename=None, force=True, savepath=None, verbose=0)[source]#

Serializes a Python object to a binary file using either joblib or pickle.

This function attempts to serialize the input data using the joblib.dump method. If this attempt fails, it falls back to using pickle.dump. The final file path is constructed by concatenating the directory specified by savepath (or the current working directory if savepath is None) with the given filename. Mathematically, the file path is given by:

(3)#\[\text{filepath} = \text{savepath} \oplus \text{filename}\]

where \(\oplus\) denotes string concatenation.

Parameters:

data (Any) – The Python object to serialize. It must be compatible with either joblib or pickle serialization.
filename (str, optional) – The target filename for the serialized data. If None, a filename is generated using the current timestamp formatted as "%Y%m%d%H%M%S" (e.g., "serialized_20230315123045.pkl").
force (bool, default True) – Determines whether to overwrite an existing file with the same filename. If False, a timestamp is appended to the filename to ensure uniqueness.
savepath (str, optional) – The directory in which to save the serialized file. If not specified, the file is saved to the current working directory (os.getcwd()).
verbose (int, default 0) – Controls the verbosity of output messages. Higher values produce more detailed logging during the serialization process.

Returns:

The complete file path to which the data has been serialized.

Return type:

str

Examples

>>> from geoprior.utils.io_utils import serialize_data_in
>>> data = {"a": 1, "b": 2}
>>> filepath = serialize_data_in(data, filename='data.pkl',
...                              force=True, verbose=1)
>>> print(filepath)
/path/to/current/directory/data.pkl

Notes

The function first tries to serialize the input data using joblib.dump. In case of any exception during this attempt, it falls back to using pickle.dump. This dual approach improves robustness in diverse runtime environments where one serialization method might be unsupported or encounter issues with the given data type.

See also

joblib.dump: Serialize objects to disk using Joblib.
pickle.dump: Serialize objects to disk using Pickle.
os.getcwd: Retrieve the current working directory.

geoprior.utils.io_utils.spath(name_of_path)[source]#

Create a directory if it does not already exist.

Parameters:: name_of_path (str) – Path-like object to create if it doesn’t exist.
Returns:: The absolute path to the created or existing directory.
Return type:: str

Examples

>>> from geoprior.utils.io_utils import spath
>>> path = spath('data/saved_models')
>>> print(f"Directory available at: {path}")

Notes

spath is useful for quickly ensuring that a specific directory is available for storing files. It provides feedback if the directory already exists.

geoprior.utils.io_utils.store_or_write_hdf5(df, key=None, mode='a', kind=None, path_or_buf=None, encoding='utf8', csv_sep=',', index=Ellipsis, columns=None, sanitize_columns=False, func=None, args=(), applyto=None, **func_kwds)[source]#

Store a DataFrame to HDF5, write it to CSV, or sanitize it in memory.

Parameters:

df (pandas.DataFrame or array-like) – Input data to store, export, or sanitize.
key (str or None, optional) – Group key used when storing to HDF5.
mode ({'a', 'w', 'r+'}, optional) – File mode used when opening an HDF5 store.
kind ({'store', 'write', None}, optional) – Operation to perform. Use 'store' for HDF5 output, 'write' for CSV export, or None to return a sanitized DataFrame.
path_or_buf (str, path-like, pandas.HDFStore, file-like, or None, optional) – Destination path, buffer, or open HDF5 store.
encoding (str, optional) – Output encoding used for CSV export.
csv_sep (str, optional) – Field separator used for CSV export.
index (bool, optional) – Whether to write the index when exporting to CSV.
columns (list of str or None, optional) – Column names used when constructing a DataFrame from an array.
sanitize_columns (bool, optional) – Whether to sanitize column names with the built-in regex helper.
func (callable or None, optional) – Optional custom sanitizing function applied to selected columns.
args (tuple, optional) – Positional arguments forwarded to func.
applyto (str or list of str or None, optional) – Column or columns to which func should be applied.
func_kwds (dict) – Keyword arguments forwarded to func.

Returns:

Returns None when kind is 'store' or 'write'. Otherwise returns the resulting DataFrame.

Return type:

None or pandas.DataFrame

geoprior.utils.io_utils.to_hdf5(data, fn, objname=None, close=True, **hdf5_kws)[source]#

Store a data object in Hierarchical Data Format 5 (HDF5).

This function serializes the input data into an HDF5 file. It supports both pandas DataFrames and NumPy arrays. If data is a DataFrame, it uses pd.HDFStore (which requires the pytables package) to store the data. If data is a NumPy array, it uses h5py.File to create a dataset.

The file path is constructed by concatenating the specified savepath (or the current working directory if savepath is not provided) with the provided filename (fn). The function automatically appends the appropriate file extension: .h5 for DataFrames and .hdf5 for arrays.

(4)#\[\text{filepath} = \text{savepath} \oplus \text{filename} \oplus \text{extension}\]

where \(\oplus\) denotes string concatenation.

Parameters:

data (Any) – The data object to be stored. Must be either a NumPy array or a pandas DataFrame.
fn (str) – The file path (without extension) where the HDF5 file will be saved.
objname (str, optional) – The name under which to store the data within the HDF5 file. Defaults to 'data' if not provided.
close (bool, default True) – If True, the file is closed after writing. If False, the file remains open for additional modifications.
**hdf5_kws (dict, optional) – Additional keyword arguments to pass to the HDFStore constructor (for DataFrames) or to customize dataset creation (for arrays). Common options include mode for the file mode, complevel for compression level, complib for the compression library, and fletcher32 to enable the Fletcher32 checksum. For mode, use 'r' for read-only access, 'w' to create a new file, 'a' to append or create, and 'r+' to open an existing file for reading and writing.

Returns:

store – An IO interface for the stored data. For DataFrames, this is a pd.HDFStore object; for arrays, an h5py.File object.

Return type:

object

Examples

>>> import os
>>> import pandas as pd
>>> from geoprior.utils.io_utils import to_hdf5
>>> data = pd.DataFrame({
...     'a': [1, 2, 3],
...     'b': [4, 5, 6]
... })
>>> save_path = os.path.join('output', 'datafile')
>>> store = to_hdf5(data, fn=save_path, objname='mydata', verbose=1)
>>> # Access stored data:
>>> retrieved = store['mydata']
>>> print(retrieved.head())

Notes

Ensure the dependency pytables is installed when serializing a DataFrame. When serializing NumPy arrays, the dataset is created with the name "dataset_01". If close is set to False, the caller is responsible for closing the store. The pandas and NumPy foundations underlying this serialization path are summarized in [33, 34].

See also

joblib.dump, pickle.dump, h5py.File

geoprior.utils.io_utils.zip_extractor(zip_file, samples='*', ftype=None, savepath=None, pwd=None)[source]#

Extracts files from a ZIP archive based on various filtering criteria and saves them to a specified directory.

The extraction process can be controlled by the samples parameter to limit the number of files extracted, or by the ftype parameter to filter by a specific file extension. The resulting file names are returned as a list.

(5)#\[\text{Extracted Files} = \{ f \in \mathcal{A} \mid \phi(f) \}\]

where \(\mathcal{A}\) is the set of all files in the archive, and \(\phi(f)\) is a predicate that checks if a file matches the desired extension and is within the specified sample count.

Parameters:

zip_file (str) – Full path to the ZIP archive file.
samples (int or str, optional) – Number of files to extract. If set to '*', all files are extracted. Default is '*'.
ftype (str, optional) – File extension filter (e.g., '.csv'). Only files with this extension are extracted. If no matching files are found, a ValueError is raised.
savepath (str, optional) – Directory where the extracted files will be stored. If not provided, files are extracted to the current working directory.
pwd (str or bytes, optional) – Password for encrypted ZIP files. If provided as a string, it will be used as is (or can be encoded to bytes as needed).

Returns:

A list of extracted file names (with paths).

Return type:

list of str

Examples

>>> from geoprior.utils.io_utils import zip_extractor
>>> extracted_files = zip_extractor(
...     'data/archive.zip',
...     samples='*',
...     ftype='.csv',
...     savepath='data/extracted',
...     pwd='secret'
... )
>>> print(extracted_files)
['folder1/file1.csv', 'folder2/file2.csv', ...]

Notes

The function first validates the input ZIP file using check_files (assumed to be defined in the package). It then determines the sample count and filters files by extension if ftype is provided. Extraction is done via the standard ZipFile.extract or ZipFile.extractall methods.

See also

zipfile.ZipFile.extract: Extract a single file from a ZIP archive.
zipfile.ZipFile.extractall: Extract all files from a ZIP archive.

geoprior.utils.io_utils.fetch_joblib_data(job_file, *keys, error_mode='raise', verbose=0)[source]#

Dynamically load data from a joblib-saved dictionary with flexible key access.

Parameters:

job_file (str) – Path to the joblib file containing a dictionary
*keys (str) – Variable-length list of dictionary keys to retrieve
error_mode ({'raise', 'warn', 'ignore'}, default 'raise') – Handling of missing keys: - ‘raise’: Immediately raise KeyError - ‘warn’: Issue warning and skip missing keys - ‘ignore’: Silently skip missing keys
verbose (int, default 0) – Verbosity level: - 0: No output - 1: Basic loading information - 2: Detailed debugging output

Returns:

Full dictionary if no keys specified
Tuple of values for requested keys (maintaining order)

Return type:

Union[Dict, Tuple]

Raises:

FileNotFoundError – If specified job_file doesn’t exist
TypeError – If loaded data isn’t a dictionary
KeyError – If requested key not found and error_mode=’raise’

Examples

>>> from geoprior.utils.io_utils import fetch_joblib_data
>>> data = fetch_joblib_data('data.joblib', 'X_train', 'y_train')
>>> X, y = fetch_joblib_data('data.joblib', 'X_val', 'y_val', verbose=1)
>>> full_dict = fetch_joblib_data('data.joblib')

Notes

Maintains original insertion order for Python 3.7+ dictionaries
Missing keys in ‘warn’/’ignore’ modes result in shorter return tuple
Joblib files must contain dictionary objects

geoprior.utils.io_utils.to_txt(d, filename=None, format='txt', indent=2, width=80, depth=None, compat=False, include_header=True, mode='w', encoding='utf-8', overwrite=True, header=None, footer=None, serializer=None, savepath=None, verbose=1, logger=None, **kwargs)[source]#

Export data objects to a text or JSON file with optional custom formatting.

The function, <to_txt>, handles writing <d> (a string, dict, list, or general object) to a file named <filename>. When no filename is given, it automatically generates one based on the current date/time. If <format> is “json” and <d> is valid for JSON serialization, it attempts a JSON export. Otherwise, it falls back to text mode, leveraging Python’s built-in pformat and an optional <serializer> for advanced transformations.

(6)#\[\begin{split}\\text{FileName}_{timestamp} \\rightarrow \\text{output}\end{split}\]

where \(\\text{FileName}_{timestamp}\) is an auto-generated name like output_20230101_123456.txt if <filename> is not provided.

Parameters:

d (object) – Data to write. Can be any Python object supported by pformat, or a dict if <format> is ‘json’.
filename (str, optional) – Full path (or name) of the output file. If None, a time-stamped name is produced, prefixed with ‘output_’.
format (str, default 'txt') – File format, either "txt" or "json". If it fails to serialize as JSON, the process reverts to text.
indent (int, default 2) – Indentation level for pretty-printing text or JSON.
width (int, default 80) – Wrap width for formatted text lines.
depth (int, optional) – Maximum depth to which nested structures are expanded. If None, no limit is applied.
compat (bool, default False) – If True, instructs pformat to produce more compact text. Not used when exporting JSON.
include_header (bool, default True) – Whether to include a decorative header (with timestamp) at the top of the file in text mode.
mode (str, default 'w') – File writing mode. Typically ‘w’ for overwrite, ‘a’ for append.
encoding (str, default 'utf-8') – Text encoding used when opening the file.
overwrite (bool, default True) – If False, raises an error if the file already exists.
header (str, optional) – Custom header text (if <include_header> is True). Overwrites the default header if given.
footer (str, optional) – Custom footer text appended at the end of the file, if <include_header> is True.
serializer (callable, optional) – A function that transforms <d> before printing. If it fails, <d> remains unchanged.
verbose (int, default 1) – Verbosity level for logging. Higher values yield more console messages (e.g., file stats at <verbose>>=3).
**kwargs – Additional parameters passed to the JSON serializer (json.dump) or pformat.

Returns:

The final filename used to store the output (potentially auto-generated).

Return type:

str

Notes

If <format> is “json”, the function tries json.dump with a few standard parameters. If an exception occurs, it reverts to text export. The <serializer> argument allows custom transformations, such as flattening nested dicts or converting objects to JSON- serializable representations. The standard-library JSON behavior used here is documented in Python Software Foundation [35].

Examples

>>> from geoprior.utils.io_utils import to_txt
>>> my_data = {"name":"Alice","age":30}
>>> # Basic text export
>>> txt_file = to_txt(my_data, verbose=2)
>>> # Enforce JSON format
>>> json_file = to_txt(my_data, format='json', indent=4)

See also

pformat: Pretty-print complex Python data structures.