Model Evaluation
Author: Isaac Godfried Description:
This module contains functions for evaluating models. The basic logic flow is as follows: 1. evaluate_model is called from trainer.py at the end of training. It calls infer_on_torch_model which does the actual inference. # noqa 2. infer_on_torch_model calls generate_predictions which calls generate_decoded_predictions or generate_predictions_non_decoded depending on whether the model uses a decoder or not. 3. generate_decoded_predictions calls decoding_functions which calls greedy_decode or beam_decode depending on the decoder function specified in the config file. 4. The returned value from generate_decoded_predictions is then used to calculate the evaluation metrics in run_evaluation. 5. run_evaluation returns the evaluation metrics to evaluate_model which returns them to trainer.py.
- flood_forecast.evaluator.stream_baseline(river_flow_df: DataFrame, forecast_column: str, hours_forecast=336) Tuple[DataFrame, float][source]
Function to compute the baseline MSE by using the mean value from the train data.
- flood_forecast.evaluator.get_model_r2_score(river_flow_df: DataFrame, model_evaluate_function: Callable, forecast_column: str, hours_forecast=336)[source]
model_evaluate_function should call any necessary preprocessing.
- flood_forecast.evaluator.evaluate_model(model: Type[TimeSeriesModel], model_type: str, target_col: List[str], evaluation_metrics: List, inference_params: Dict, eval_log: Dict) Tuple[Dict, DataFrame, int, DataFrame][source]
A function to evaluate a model. Called automatically at end of training. Can be imported for continuing to evaluate a model in other places as well.
from flood_forecast.evaluator import evaluate_model forecast_model = PyTorchForecast(config_file) e_log, df_train_test, f_idx, df_preds = evaluate_model(forecast_model, "PyTorch", ["cfs"], ["MSE", "MAPE"], {}) print(e_log) # {"MSE":0.2, "MAPE":0.1} print(df_train_test) # will print a pandas dataframe ...
‘’’
- flood_forecast.evaluator.run_evaluation(model, df_train_and_test, forecast_history, target_col, end_tensor, g_loss=False, eval_log={}, end_tensor_0=None) Dict[source]
- flood_forecast.evaluator.infer_on_torch_model(model, test_csv_path: str = None, datetime_start: datetime = datetime.datetime(2018, 9, 22, 0, 0), hours_to_forecast: int = 336, decoder_params=None, dataset_params: Dict = {}, num_prediction_samples: int = None, probabilistic: bool = False, criterion_params: Dict = None) Tuple[DataFrame, Tensor, int, int, CSVTestLoader, List[DataFrame]][source]
Function to handle both test evaluation and inference on a test data-frame.
- Parameters:
model – The time series model present in the model zoo
test_csv_path – The path to the test data-frame
- Returns:
df: df including training and test data end_tensor: the final tensor after the model has finished predictions history_length: num rows to use in training forecast_start_idx: row index to start forecasting test_data: CSVTestLoader instance df_prediction_samples: has same index as df, and num cols equal to num_prediction_samples or no columns if num_prediction_samples is None
- Return type:
tuple()
- flood_forecast.evaluator.handle_later_ev(model, df_train_and_test, end_tensor, params, csv_test_loader, multi_params, forecast_start_idx, history, datetime_start)[source]
- flood_forecast.evaluator.handle_evaluation_series_loader(csv_series_id_loader: SeriesIDTestLoader, model, device, hours_to_forecast: int, datetime_start) Tuple[List[DataFrame], List][source]
- flood_forecast.evaluator.handle_ci_multi(prediction_samples: Tensor, csv_test_loader: CSVTestLoader, multi_params: int, df_pred, decoder_param: bool, history_length: int, num_samples: int) List[DataFrame][source]
Handles the CI confidence interval.
- Parameters:
prediction_samples (torch.Tensor) – The number of predictions to generate
csv_test_loader (CSVTestLoader) – The test loader genreated in the previous
multi_params (int) – [description]
df_pred ([type]) – The pandas dataframe of the returned prediction
decoder_param (bool) – [description]
history_length (int) – The number of historical time-steps
num_samples (int) – The number of samples to generate (i.e. larger ci)
- Raises:
ValueError – [description]
ValueError – [descriptsion]
- Returns:
Returns an array with different CI predictions
- Return type:
List[pd.DataFrame]
- flood_forecast.evaluator.generate_predictions(model: Type[TimeSeriesModel], df: DataFrame, test_data: CSVTestLoader, history: Tensor, device: device, forecast_start_idx: int, forecast_length: int, hours_to_forecast: int, decoder_params: Dict, targs=False, multi_params: int = 1) Tensor[source]
A function to generate the actual model prediction.
- Parameters:
model (Type[TimeSeriesModel]) – A PyTorchForecast
df (pd.DataFrame) – The main dataframe containing data
test_data (CSVTestLoader) – The test data loader
history (torch.Tensor) – The forecast historical data
device (torch.device) – The device usually cpu or cuda
forecast_start_idx (int) – The index you want the forecast to begin
forecast_length (int) – The length of the forecast the model outputs per forward pass
hours_to_forecast (int) – The number of time_steps to forecast in future
decoder_params (Dict) – The parameters the decoder function takes..
multi_params (int, optional) – n_targets, defaults to 1
- Returns:
The forecasted values for the time-series in a tensor
- Return type:
torch.Tensor
- flood_forecast.evaluator.generate_predictions_non_decoded(model: Type[TimeSeriesModel], df: DataFrame, test_data: CSVTestLoader, history_dim: Tensor, forecast_length: int, hours_to_forecast: int) Tensor[source]
Generates predictions for the models that do not use a decoder.
- Parameters:
model (Type[TimeSeriesModel]) – A PyTorchForecast
df (pd.DataFrame) – [description]
test_data (CSVTestLoader) – [description]
history_dim (torch.Tensor) – [description]
forecast_length (int) – [description]
hours_to_forecast (int) – [description]
- Returns:
[description]
- Return type:
torch.Tensor
- flood_forecast.evaluator.generate_decoded_predictions(model: Type[TimeSeriesModel], test_data: CSVTestLoader, forecast_start_idx: int, device: device, history_dim: Tensor, hours_to_forecast: int, decoder_params: Dict, multi_targets: int = 1, targs: bool | Tensor = False) Tensor[source]
- flood_forecast.evaluator.generate_prediction_samples(model: Type[TimeSeriesModel], df: DataFrame, test_data: CSVTestLoader, history: Tensor, device: device, forecast_start_idx: int, forecast_length: int, hours_to_forecast: int, decoder_params: Dict, num_prediction_samples: int, multi_params=1, targs=False) ndarray[source]
Generates.