PyTorch Loaders

class flood_forecast.preprocessing.pytorch_loaders.CSVDataLoader(file_path: str, forecast_history: int, forecast_length: int, target_col: List, relevant_cols: List, scaling=None, start_stamp: int = 0, end_stamp: int = None, gcp_service_key: str | None = None, interpolate_param: bool = False, sort_column=None, scaled_cols=None, feature_params=None, no_scale=False, preformatted_df=False)[source]

Bases: Dataset

__init__(file_path: str, forecast_history: int, forecast_length: int, target_col: List, relevant_cols: List, scaling=None, start_stamp: int = 0, end_stamp: int = None, gcp_service_key: str | None = None, interpolate_param: bool = False, sort_column=None, scaled_cols=None, feature_params=None, no_scale=False, preformatted_df=False)[source]

A data loader that takes a CSV file and properly batches for use in training/eval a PyTorch model

Parameters:
  • file_path – The path to the CSV file you wish to use (GCS compatible) or a Pandas dataframe.

  • forecast_history – This is the length of the historical time series data you wish to utilize for forecasting

  • forecast_length – The number of time steps to forecast ahead (for transformer this must equal history_length)

  • relevant_cols – Supply column names you wish to predict in the forecast (others will not be used)

  • target_col – The target column or columns you to predict. If you only have one still use a list [‘cfs’]

  • scaling – (highly reccomended) If provided should be a subclass of sklearn.base.BaseEstimator

and sklearn.base.TransformerMixin) i.e StandardScaler, MaxAbsScaler, MinMaxScaler, etc) Note without a scaler the loss is likely to explode and cause infinite loss which will corrupt weights :param start_stamp int: Optional if you want to only use part of a CSV for training, validation

or testing supply these

Parameters:
  • int (end_stamp) – Optional if you want to only use part of a CSV for training, validation, or testing supply these

  • str (sort_column) – The column to sort the time series on prior to forecast.

  • scaled_cols – The columns you want scaling applied to (if left blank will default to all columns)

  • feature_params – These are the datetime features you want to create.

  • no_scale – This means that the end labels will not be scaled when running

inverse_scale(result_data: Tensor | Series | ndarray) Tensor[source]

Un-does the scaling of the data

Parameters:

result_data (Union[torch.Tensor, pd.Series, np.ndarray]) – The data you want to unscale can handle multiple data types.

Returns:

Returns the unscaled data as PyTorch tensor.

Return type:

torch.Tensor

class flood_forecast.preprocessing.pytorch_loaders.CSVSeriesIDLoader(series_id_col: str, main_params: dict, return_method: str, return_all=True)[source]

Bases: CSVDataLoader

__init__(series_id_col: str, main_params: dict, return_method: str, return_all=True)[source]

A data-loader for a CSV file that contains a series ID column.

Parameters:
  • series_id_col (str) – The id

  • main_params (dict) – The central set of parameters

  • return_method (str) – The method of return

  • return_all (bool, optional) – Whether to return all items, defaults to True

inverse_scale(result_data: Tensor | Series | ndarray) Tensor

Un-does the scaling of the data

Parameters:

result_data (Union[torch.Tensor, pd.Series, np.ndarray]) – The data you want to unscale can handle multiple data types.

Returns:

Returns the unscaled data as PyTorch tensor.

Return type:

torch.Tensor

class flood_forecast.preprocessing.pytorch_loaders.CSVTestLoader(df_path: str, forecast_total: int, use_real_precip=True, use_real_temp=True, target_supplied=True, interpolate=False, sort_column_clone=None, **kwargs)[source]

Bases: CSVDataLoader

__init__(df_path: str, forecast_total: int, use_real_precip=True, use_real_temp=True, target_supplied=True, interpolate=False, sort_column_clone=None, **kwargs)[source]
Parameters:

df_path (str) – The path to the CSV file you want to use (GCS compatible) or a Pandas DataFrame

A data loader for the test data.

get_from_start_date(forecast_start: datetime, original_df=None)[source]
convert_real_batches(the_col: str, rows_to_convert)[source]

A helper function to return properly divided precip and temp values to be stacked with t forecasted cfs.

convert_history_batches(the_col: str | List[str], rows_to_convert: DataFrame)[source]

A helper function to return dataframe in batches of size (history_len, num_features)

Args:

the_col (str): column names rows_to_convert (pd.Dataframe): rows in a dataframe to be converted into batches

inverse_scale(result_data: Tensor | Series | ndarray) Tensor

Un-does the scaling of the data

Parameters:

result_data (Union[torch.Tensor, pd.Series, np.ndarray]) – The data you want to unscale can handle multiple data types.

Returns:

Returns the unscaled data as PyTorch tensor.

Return type:

torch.Tensor

class flood_forecast.preprocessing.pytorch_loaders.TestLoaderABC(df_path: str, forecast_total: int, use_real_precip=True, use_real_temp=True, target_supplied=True, interpolate=False, sort_column_clone=None, **kwargs)[source]

Bases: CSVTestLoader

__init__(df_path: str, forecast_total: int, use_real_precip=True, use_real_temp=True, target_supplied=True, interpolate=False, sort_column_clone=None, **kwargs)
Parameters:

df_path (str) – The path to the CSV file you want to use (GCS compatible) or a Pandas DataFrame

A data loader for the test data.

convert_history_batches(the_col: str | List[str], rows_to_convert: DataFrame)

A helper function to return dataframe in batches of size (history_len, num_features)

Args:

the_col (str): column names rows_to_convert (pd.Dataframe): rows in a dataframe to be converted into batches

convert_real_batches(the_col: str, rows_to_convert)

A helper function to return properly divided precip and temp values to be stacked with t forecasted cfs.

get_from_start_date(forecast_start: datetime, original_df=None)
inverse_scale(result_data: Tensor | Series | ndarray) Tensor

Un-does the scaling of the data

Parameters:

result_data (Union[torch.Tensor, pd.Series, np.ndarray]) – The data you want to unscale can handle multiple data types.

Returns:

Returns the unscaled data as PyTorch tensor.

Return type:

torch.Tensor

class flood_forecast.preprocessing.pytorch_loaders.AEDataloader(file_path: str, relevant_cols: List, scaling=None, start_stamp: int = 0, target_col: List = None, end_stamp: int = None, unsqueeze_dim: int = 1, interpolate_param=False, forecast_history=1, no_scale=True, sort_column=None)[source]

Bases: CSVDataLoader

__init__(file_path: str, relevant_cols: List, scaling=None, start_stamp: int = 0, target_col: List = None, end_stamp: int = None, unsqueeze_dim: int = 1, interpolate_param=False, forecast_history=1, no_scale=True, sort_column=None)[source]
A data loader class for autoencoders. Overrides __len__ and __getitem__ from generic dataloader.

Also defaults forecast_history and forecast_length to 1. Since AE will likely only use one row. Same parameters as before.

Parameters:
  • file_path (str) – The path to the file

  • relevant_cols (List) – The relevant columns

  • scaling ([type], optional) – [description], defaults to None

  • start_stamp (int, optional) – [description], defaults to 0

  • target_col (List, optional) – [description], defaults to None

  • end_stamp (int, optional) – [description], defaults to None

  • unsqueeze_dim (int, optional) – [description], defaults to 1

  • interpolate_param (bool, optional) – [description], defaults to False

  • forecast_history (int, optional) – [description], defaults to 1

  • no_scale (bool, optionals) – [description], defaults to True

  • sort_column ([type], optional) – [description], defaults to None

get_from_start_date(forecast_start: datetime)[source]
inverse_scale(result_data: Tensor | Series | ndarray) Tensor

Un-does the scaling of the data

Parameters:

result_data (Union[torch.Tensor, pd.Series, np.ndarray]) – The data you want to unscale can handle multiple data types.

Returns:

Returns the unscaled data as PyTorch tensor.

Return type:

torch.Tensor

class flood_forecast.preprocessing.pytorch_loaders.GeneralClassificationLoader(params: Dict, n_classes: int = 2)[source]

Bases: CSVDataLoader

__init__(params: Dict, n_classes: int = 2)[source]

A generic data loader class for TS classification problems.

Parameters:
  • params (Dict) – The standard dictionary for a dataloader (see CSVDataLoader)

  • n_classes – The number of classes in the problem

inverse_scale(result_data: Tensor | Series | ndarray) Tensor

Un-does the scaling of the data

Parameters:

result_data (Union[torch.Tensor, pd.Series, np.ndarray]) – The data you want to unscale can handle multiple data types.

Returns:

Returns the unscaled data as PyTorch tensor.

Return type:

torch.Tensor

class flood_forecast.preprocessing.pytorch_loaders.TemporalLoader(time_feats: List[str], kwargs: Dict, label_len=0)[source]

Bases: CSVDataLoader

__init__(time_feats: List[str], kwargs: Dict, label_len=0)[source]

A data loader class for creating specific temporal features/embeddings.

Parameters:
  • time_feats (List[str]) – A list of strings of the time features (e.g. [‘month’, ‘day’, ‘hour’])

  • kwargs (Dict[str, Any]) – The set of parameters

  • label_len (int, optional) – For Informer based model the, defaults to 0

static df_to_numpy(pandas_stuff: DataFrame)[source]
inverse_scale(result_data: Tensor | Series | ndarray) Tensor

Un-does the scaling of the data

Parameters:

result_data (Union[torch.Tensor, pd.Series, np.ndarray]) – The data you want to unscale can handle multiple data types.

Returns:

Returns the unscaled data as PyTorch tensor.

Return type:

torch.Tensor

class flood_forecast.preprocessing.pytorch_loaders.TemporalTestLoader(time_feats: List[str], kwargs={}, decoder_step_len=None)[source]

Bases: CSVTestLoader

__init__(time_feats: List[str], kwargs={}, decoder_step_len=None)[source]

A test data-loader class for data in the format of the TemporalLoader.

Parameters:
  • time_feats (List[str]) – The temporal featuers to use in encoding.

  • kwargs (dict, optional) – The dict used to instantiate CSVTestLoader parent, defaults to {}

  • decoder_step_len ([type], optional) – [description], defaults to None

static df_to_numpy(pandas_stuff: DataFrame)[source]
convert_history_batches(the_col: str | List[str], rows_to_convert: DataFrame)

A helper function to return dataframe in batches of size (history_len, num_features)

Args:

the_col (str): column names rows_to_convert (pd.Dataframe): rows in a dataframe to be converted into batches

convert_real_batches(the_col: str, rows_to_convert)

A helper function to return properly divided precip and temp values to be stacked with t forecasted cfs.

get_from_start_date(forecast_start: datetime, original_df=None)
inverse_scale(result_data: Tensor | Series | ndarray) Tensor

Un-does the scaling of the data

Parameters:

result_data (Union[torch.Tensor, pd.Series, np.ndarray]) – The data you want to unscale can handle multiple data types.

Returns:

Returns the unscaled data as PyTorch tensor.

Return type:

torch.Tensor

class flood_forecast.preprocessing.pytorch_loaders.VariableSequenceLength(series_marker_column: str, csv_loader_params: Dict, pad_length=None, task='classification', n_classes=99)[source]

Bases: CSVDataLoader

__init__(series_marker_column: str, csv_loader_params: Dict, pad_length=None, task='classification', n_classes=99)[source]

Enables eas(ier) loading of time-series with variable length data

Parameters:
  • series_marker_column (str) – The column that dealinates when an example begins and ends

  • pad_length (int) – If the specified the length to truncate sequences at or pad them till that length

  • task (str) – The specific task (e.g. classification, forecasting, auto_encode)

get_item_forecast(idx: int)[source]
get_item_classification(idx: int)[source]
get_item_auto_encoder(idx)[source]
pad_input_data(sequence: int)[source]

Pads a sequence to a specified length.

inverse_scale(result_data: Tensor | Series | ndarray) Tensor

Un-does the scaling of the data

Parameters:

result_data (Union[torch.Tensor, pd.Series, np.ndarray]) – The data you want to unscale can handle multiple data types.

Returns:

Returns the unscaled data as PyTorch tensor.

Return type:

torch.Tensor

class flood_forecast.preprocessing.pytorch_loaders.SeriesIDTestLoader(series_id_col: str, main_params: dict, return_method: str, forecast_total=336, return_all=True)[source]

Bases: CSVSeriesIDLoader

inverse_scale(result_data: Tensor | Series | ndarray) Tensor

Un-does the scaling of the data

Parameters:

result_data (Union[torch.Tensor, pd.Series, np.ndarray]) – The data you want to unscale can handle multiple data types.

Returns:

Returns the unscaled data as PyTorch tensor.

Return type:

torch.Tensor

__init__(series_id_col: str, main_params: dict, return_method: str, forecast_total=336, return_all=True)[source]

_summary_

Parameters:
  • series_id_col (str) – The column that contains the series_id

  • main_params (dict) – The core params used to instantiate the CSVSeriesIDLoader

  • return_method (str) – The method of return

  • return_all (bool, optional) – _description_, defaults to True

  • forecast_total (int, optional) – The total length to forecast, defaults to 336

get_from_start_date_all(forecast_start: datetime, series_id: int = None)[source]