geoprior.utils.audit_utils#

Audit helpers for stage handshakes and scaling artifacts.

Functions

audit_stage1_scaling(*, df_train, ...[, ...])

Stage-1 audit: - raw df_train coord stats (t/x/y) + heuristic units - model-fed coords stats from inputs_train["coords"] (flattened) - coord scaler min/max + coord_ranges - SI channel sanity for physics cols (if present) - target arrays sanity - split of features: scaled ML vs __si vs other Saves a machine-readable JSON if save_dir is provided.

audit_stage1_stage2_coord_consistency(*, ...)

Cross-check coordinate semantics between Stage-1 scaler and Stage-2 NPZ coords.

audit_stage2_handshake(*, X_train, X_val, ...)

audit_stage3_run(*, manifest_path, manifest, ...)

Stage-3 audit: tuned artifacts + eval sanity.

resolve_audit_stages(audit_stages, *[, ...])

Resolve cfg["AUDIT_STAGES"] into a canonical set like {"stage1","stage2"}.

should_audit(audit_stages, *, stage[, default])

Convenience: should we audit this stage?

geoprior.utils.audit_utils.resolve_audit_stages(audit_stages, *, known=('stage1', 'stage2', 'stage3'), default=None)[source]#

Resolve cfg[“AUDIT_STAGES”] into a canonical set like {“stage1”,”stage2”}.

Parameters:
Return type:

set[str]

geoprior.utils.audit_utils.should_audit(audit_stages, *, stage, default=None)[source]#

Convenience: should we audit this stage?

Parameters:
  • audit_stages (Any)

  • stage (str)

  • default (Any)

Return type:

bool

geoprior.utils.audit_utils.audit_stage1_scaling(*, df_train, inputs_train, targets_train, coord_scaler=None, coord_ranges=None, coord_mode='auto', coords_in_degrees=False, coord_epsg_used=None, coord_x_col_used='x', coord_y_col_used='y', x_col_used='x', y_col_used='y', time_col_used='t', normalize_coords=True, keep_coords_raw=False, shift_raw_coords=False, subs_model_col=None, gwl_dyn_col=None, gwl_target_col=None, h_field_col=None, dynamic_features=None, static_features=None, future_features=None, scaled_ml_numeric_cols=None, main_scaler_path=None, scaler_info=None, save_dir=None, table_width=110, title_prefix='COORDINATE + FEATURE SCALING AUDIT (Stage-1)', city='Unknown', model_name='Model', sample_rows=5, log_fn=None)[source]#

Stage-1 audit: - raw df_train coord stats (t/x/y) + heuristic units - model-fed coords stats from inputs_train[“coords”] (flattened) - coord scaler min/max + coord_ranges - SI channel sanity for physics cols (if present) - target arrays sanity - split of features: scaled ML vs __si vs other Saves a machine-readable JSON if save_dir is provided.

Parameters:
  • inputs_train (dict[str, Any])

  • targets_train (dict[str, Any])

  • coord_scaler (Any)

  • coord_ranges (dict[str, float] | None)

  • coord_mode (str)

  • coords_in_degrees (bool)

  • coord_epsg_used (Any)

  • coord_x_col_used (str)

  • coord_y_col_used (str)

  • x_col_used (str)

  • y_col_used (str)

  • time_col_used (str)

  • normalize_coords (bool)

  • keep_coords_raw (bool)

  • shift_raw_coords (bool)

  • subs_model_col (str | None)

  • gwl_dyn_col (str | None)

  • gwl_target_col (str | None)

  • h_field_col (str | None)

  • dynamic_features (Iterable[str] | None)

  • static_features (Iterable[str] | None)

  • future_features (Iterable[str] | None)

  • scaled_ml_numeric_cols (Iterable[str] | None)

  • main_scaler_path (str | None)

  • scaler_info (dict | None)

  • save_dir (str | None)

  • table_width (int)

  • title_prefix (str)

  • city (str)

  • model_name (str)

  • sample_rows (int)

Return type:

str | None

geoprior.utils.audit_utils.audit_stage2_handshake(*, X_train, X_val, y_train, y_val, time_steps, forecast_horizon, mode, dyn_names, fut_names, sta_names, coord_scaler=None, sk_final, save_dir, table_width=100, title_prefix='STAGE-2 HANDSHAKE AUDIT', city='Unkown', model_name='Model', log_fn=None)[source]#
Parameters:
geoprior.utils.audit_utils.audit_stage1_stage2_coord_consistency(*, X_train, coord_scaler, sk_final, time_steps, forecast_horizon, time_units='year', save_dir=None, table_width=110, title_prefix='STAGE-1 <-> STAGE-2 COORD CONSISTENCY', city='Unknown', model_name='Model', log_fn=None)[source]#

Cross-check coordinate semantics between Stage-1 scaler and Stage-2 NPZ coords.

Key facts for GeoPrior Stage-2:
  • coords are (N, H, 3) and correspond to target horizon times not the full dynamic history. So t has exactly H unique values.

  • x/y typically cover full normalized [0,1] range if you have spatial coverage (often min=0 and max=1).

This audit:
  • computes normalized min/max for t/x/y in X_train[“coords”]

  • derives implied raw min/max using MinMaxScaler data_min_ / data_max_

  • checks raw ranges are within Stage-1 scaler bounds

  • checks t_unique count == H and t_raw_unique spacing (≈1 year)

  • provides UTM plausibility hint if epsg is UTM-like

Parameters:
  • X_train (dict)

  • sk_final (dict)

  • time_steps (int)

  • forecast_horizon (int)

  • time_units (str)

  • save_dir (str | None)

  • table_width (int)

  • title_prefix (str)

  • city (str)

  • model_name (str)

geoprior.utils.audit_utils.audit_stage3_run(*, manifest_path, manifest, cfg, fixed_params, best_hps, run_dir, best_model_path, best_weights_path, use_tf_savedmodel, quantiles, forecast_horizon, mode, pred_shapes=None, eval_results=None, phys_diag=None, calibrator_factors=None, forecast_csv_eval=None, forecast_csv_future=None, metrics_json_path=None, physics_payload_path=None, save_dir=None, table_width=100, title_prefix='STAGE-3 AUDIT', city='Unknown', model_name='Model', log_fn=None)[source]#

Stage-3 audit: tuned artifacts + eval sanity.

Parameters:
  • manifest_path (str | None)

  • manifest (dict[str, Any])

  • cfg (dict[str, Any])

  • fixed_params (dict[str, Any])

  • best_hps (dict[str, Any] | None)

  • run_dir (str)

  • best_model_path (str | None)

  • best_weights_path (str | None)

  • use_tf_savedmodel (bool)

  • quantiles (Any)

  • forecast_horizon (int)

  • mode (str)

  • pred_shapes (dict[str, Any] | None)

  • eval_results (dict[str, Any] | None)

  • phys_diag (dict[str, Any] | None)

  • calibrator_factors (Any)

  • forecast_csv_eval (str | None)

  • forecast_csv_future (str | None)

  • metrics_json_path (str | None)

  • physics_payload_path (str | None)

  • save_dir (str | None)

  • table_width (int)

  • title_prefix (str)

  • city (str)

  • model_name (str)

Return type:

str | None