geoprior.utils.split#

geoprior.utils.split

Group-holdout split for sequence data.

Exports:

train_windows_T{T}_H{H}.npz val_windows_T{T}_H{H}.npz test_windows_T{T}_H{H}.npz future_inputs_T{T}_H{H}.npz splits_groups.json

Leakage fix (Zhongshan 2 windows/pixel):

split by group_id first, then window inside split.

Functions

build_future_inputs_npz(*, df_scaled, ...[, ...])

build_group_holdout_npzs(*, df_train, ...[, ...])

Build train/val/test windows using group holdout.

pack_xy_npz(x, y)

split_group_keys(keys, *[, cfg])

subset_by_keys(df, *, group_cols, keys[, ...])

write_splits_json(path, *, group_cols, ...)

Classes

SplitCfg([seed, ratios, decimals])

class geoprior.utils.split.SplitCfg(seed: 'int' = 42, ratios: 'tuple[float, float, float]' = (0.7, 0.15, 0.15), decimals: 'int' = 8)[source]#

Bases: object

Parameters:
seed: int = 42#
ratios: tuple[float, float, float] = (0.7, 0.15, 0.15)#
decimals: int = 8#
__init__(seed=42, ratios=(0.7, 0.15, 0.15), decimals=8)#
Parameters:
Return type:

None

geoprior.utils.split.split_group_keys(keys, *, cfg=SplitCfg(seed=42, ratios=(0.7, 0.15, 0.15), decimals=8))[source]#
Parameters:
Return type:

dict[str, ndarray]

geoprior.utils.split.subset_by_keys(df, *, group_cols, keys, decimals=8)[source]#
Parameters:
Return type:

DataFrame

geoprior.utils.split.write_splits_json(path, *, group_cols, time_steps, horizon, train_end, cfg, splits)[source]#
Parameters:
Return type:

str

geoprior.utils.split.pack_xy_npz(x, y)[source]#
Parameters:
Return type:

dict[str, ndarray]

geoprior.utils.split.build_group_holdout_npzs(*, df_train, artifacts_dir, group_cols, time_col_used, x_col_used, y_col_used, subs_col, gwl_target_col, gwl_dyn_col, h_field_col, static_cols, dynamic_cols, future_cols, time_steps, horizon, mode, model_name, train_end, keys_ok, cfg=SplitCfg(seed=42, ratios=(0.7, 0.15, 0.15), decimals=8), normalize_coords=True)[source]#

Build train/val/test windows using group holdout.

Returns dict containing paths and coord_scaler.

Parameters:
Return type:

dict[str, Any]

geoprior.utils.split.build_future_inputs_npz(*, df_scaled, artifacts_dir, time_col, time_col_num, lon_col, lat_col, subs_col, gwl_col, h_field_col, static_features, dynamic_features, future_features, group_cols, train_end_time, forecast_start_time, horizon, time_steps, mode, model_name, normalize_coords, coord_scaler=None)[source]#
Parameters:
Return type:

str