geoprior.utils.holdout_utils#

Utility helpers for holdout and split workflows.

Functions

compute_group_masks(df, *, group_cols, ...)

Build:

filter_df_by_groups(df, *, group_cols, groups)

Keep only rows in df whose (group_cols) exist in groups DataFrame.

split_groups_holdout(groups, *[, seed, ...])

Split unique groups into train/val/test (pixel-level holdout).

Classes

GroupMasks(required_train_years, ...)

Group-level validity masks for early filtering.

HoldoutSplit(train_groups, val_groups, ...)

Pixel holdout split (disjoint groups).

class geoprior.utils.holdout_utils.GroupMasks(required_train_years, required_forecast_years, valid_for_train, valid_for_forecast)[source]#

Bases: object

Group-level validity masks for early filtering.

Parameters:
required_train_years: list[int]#
required_forecast_years: list[int]#
valid_for_train: DataFrame#
valid_for_forecast: DataFrame#
property keep_for_processing: DataFrame#

Union(valid_for_train, valid_for_forecast).

__init__(required_train_years, required_forecast_years, valid_for_train, valid_for_forecast)#
Parameters:
Return type:

None

geoprior.utils.holdout_utils.compute_group_masks(df, *, group_cols, time_col, train_end_year, time_steps, horizon)[source]#
Build:
  • valid_for_train: groups containing all years for last (T+H)

  • valid_for_forecast: groups containing all years for last T

This assumes annual steps and integer years in time_col.

Parameters:
Return type:

GroupMasks

geoprior.utils.holdout_utils.filter_df_by_groups(df, *, group_cols, groups)[source]#

Keep only rows in df whose (group_cols) exist in groups DataFrame.

Parameters:
Return type:

DataFrame

class geoprior.utils.holdout_utils.HoldoutSplit(train_groups, val_groups, test_groups)[source]#

Bases: object

Pixel holdout split (disjoint groups).

Parameters:
train_groups: DataFrame#
val_groups: DataFrame#
test_groups: DataFrame#
check_disjoint()[source]#
Return type:

None

__init__(train_groups, val_groups, test_groups)#
Parameters:
Return type:

None

geoprior.utils.holdout_utils.split_groups_holdout(groups, *, seed=42, val_frac=0.2, test_frac=0.1, strategy='random', x_col=None, y_col=None, block_size=None)[source]#

Split unique groups into train/val/test (pixel-level holdout).

strategy:
  • “random”: shuffle groups

  • “spatial_block”: shuffle spatial blocks (needs x_col,y_col,block)

Parameters:
Return type:

HoldoutSplit