Model Evaluation

Author: Isaac Godfried Description:

This module contains functions for evaluating models. The basic logic flow is as follows: 1. evaluate_model is called from trainer.py at the end of training. It calls infer_on_torch_model which does the actual inference. # noqa 2. infer_on_torch_model calls generate_predictions which calls generate_decoded_predictions or generate_predictions_non_decoded depending on whether the model uses a decoder or not. 3. generate_decoded_predictions calls decoding_functions which calls greedy_decode or beam_decode depending on the decoder function specified in the config file. 4. The returned value from generate_decoded_predictions is then used to calculate the evaluation metrics in run_evaluation. 5. run_evaluation returns the evaluation metrics to evaluate_model which returns them to trainer.py.

flood_forecast.evaluator.stream_baseline(river_flow_df: DataFrame, forecast_column: str, hours_forecast=336) Tuple[DataFrame, float][source]

Function to compute the baseline MSE by using the mean value from the train data.

flood_forecast.evaluator.get_model_r2_score(river_flow_df: DataFrame, model_evaluate_function: Callable, forecast_column: str, hours_forecast=336)[source]

model_evaluate_function should call any necessary preprocessing.

flood_forecast.evaluator.get_r2_value(model_mse, baseline_mse)[source]
flood_forecast.evaluator.get_value(the_path: str) None[source]
flood_forecast.evaluator.evaluate_model(model: Type[TimeSeriesModel], model_type: str, target_col: List[str], evaluation_metrics: List, inference_params: Dict, eval_log: Dict) Tuple[Dict, DataFrame, int, DataFrame][source]

A function to evaluate a model. Called automatically at end of training. Can be imported for continuing to evaluate a model in other places as well.

from flood_forecast.evaluator import evaluate_model
forecast_model = PyTorchForecast(config_file)
e_log, df_train_test, f_idx, df_preds = evaluate_model(forecast_model, "PyTorch", ["cfs"], ["MSE", "MAPE"], {})
print(e_log) # {"MSE":0.2, "MAPE":0.1}
print(df_train_test) # will print a pandas dataframe
...

‘’’

flood_forecast.evaluator.run_evaluation(model, df_train_and_test, forecast_history, target_col, end_tensor, g_loss=False, eval_log={}, end_tensor_0=None) Dict[source]
flood_forecast.evaluator.infer_on_torch_model(model, test_csv_path: str = None, datetime_start: datetime = datetime.datetime(2018, 9, 22, 0, 0), hours_to_forecast: int = 336, decoder_params=None, dataset_params: Dict = {}, num_prediction_samples: int = None, probabilistic: bool = False, criterion_params: Dict = None) Tuple[DataFrame, Tensor, int, int, CSVTestLoader, List[DataFrame]][source]

Function to handle both test evaluation and inference on a test data-frame.

Parameters:
  • model – The time series model present in the model zoo

  • test_csv_path – The path to the test data-frame

Returns:

df: df including training and test data end_tensor: the final tensor after the model has finished predictions history_length: num rows to use in training forecast_start_idx: row index to start forecasting test_data: CSVTestLoader instance df_prediction_samples: has same index as df, and num cols equal to num_prediction_samples or no columns if num_prediction_samples is None

Return type:

tuple()

flood_forecast.evaluator.handle_later_ev(model, df_train_and_test, end_tensor, params, csv_test_loader, multi_params, forecast_start_idx, history, datetime_start)[source]
flood_forecast.evaluator.handle_evaluation_series_loader(csv_series_id_loader: SeriesIDTestLoader, model, device, hours_to_forecast: int, datetime_start) Tuple[List[DataFrame], List][source]
flood_forecast.evaluator.handle_ci_multi(prediction_samples: Tensor, csv_test_loader: CSVTestLoader, multi_params: int, df_pred, decoder_param: bool, history_length: int, num_samples: int) List[DataFrame][source]

Handles the CI confidence interval.

Parameters:
  • prediction_samples (torch.Tensor) – The number of predictions to generate

  • csv_test_loader (CSVTestLoader) – The test loader genreated in the previous

  • multi_params (int) – [description]

  • df_pred ([type]) – The pandas dataframe of the returned prediction

  • decoder_param (bool) – [description]

  • history_length (int) – The number of historical time-steps

  • num_samples (int) – The number of samples to generate (i.e. larger ci)

Raises:
  • ValueError – [description]

  • ValueError – [descriptsion]

Returns:

Returns an array with different CI predictions

Return type:

List[pd.DataFrame]

flood_forecast.evaluator.generate_predictions(model: Type[TimeSeriesModel], df: DataFrame, test_data: CSVTestLoader, history: Tensor, device: device, forecast_start_idx: int, forecast_length: int, hours_to_forecast: int, decoder_params: Dict, targs=False, multi_params: int = 1) Tensor[source]

A function to generate the actual model prediction.

Parameters:
  • model (Type[TimeSeriesModel]) – A PyTorchForecast

  • df (pd.DataFrame) – The main dataframe containing data

  • test_data (CSVTestLoader) – The test data loader

  • history (torch.Tensor) – The forecast historical data

  • device (torch.device) – The device usually cpu or cuda

  • forecast_start_idx (int) – The index you want the forecast to begin

  • forecast_length (int) – The length of the forecast the model outputs per forward pass

  • hours_to_forecast (int) – The number of time_steps to forecast in future

  • decoder_params (Dict) – The parameters the decoder function takes..

  • multi_params (int, optional) – n_targets, defaults to 1

Returns:

The forecasted values for the time-series in a tensor

Return type:

torch.Tensor

flood_forecast.evaluator.generate_predictions_non_decoded(model: Type[TimeSeriesModel], df: DataFrame, test_data: CSVTestLoader, history_dim: Tensor, forecast_length: int, hours_to_forecast: int) Tensor[source]

Generates predictions for the models that do not use a decoder.

Parameters:
  • model (Type[TimeSeriesModel]) – A PyTorchForecast

  • df (pd.DataFrame) – [description]

  • test_data (CSVTestLoader) – [description]

  • history_dim (torch.Tensor) – [description]

  • forecast_length (int) – [description]

  • hours_to_forecast (int) – [description]

Returns:

[description]

Return type:

torch.Tensor

flood_forecast.evaluator.generate_decoded_predictions(model: Type[TimeSeriesModel], test_data: CSVTestLoader, forecast_start_idx: int, device: device, history_dim: Tensor, hours_to_forecast: int, decoder_params: Dict, multi_targets: int = 1, targs: bool | Tensor = False) Tensor[source]
flood_forecast.evaluator.generate_prediction_samples(model: Type[TimeSeriesModel], df: DataFrame, test_data: CSVTestLoader, history: Tensor, device: device, forecast_start_idx: int, forecast_length: int, hours_to_forecast: int, decoder_params: Dict, num_prediction_samples: int, multi_params=1, targs=False) ndarray[source]

Generates.