zoo.automl.feature package¶
Submodules¶
zoo.automl.feature.abstract module¶
-
class
zoo.automl.feature.abstract.BaseFeatureTransformer[source]¶ Bases:
abc.ABCAbstract Base class for Feature transformers.
-
check_optional_config= False¶
-
fit_transform(input_df, **config)[source]¶ fit data with the input dataframe Will refit the scalars to this data if any. :param input_df: input to be fitted :param config: the config :return:
-
restore(**config)[source]¶ Restore variables from file :param file_path: file contain saved parameters. i.e. some parameters are obtained during training, not in trial config, e.g. scaler fit params) :param config: the trial config :return:
-
zoo.automl.feature.identity_transformer module¶
-
class
zoo.automl.feature.identity_transformer.IdentityTransformer(feature_cols=None, target_col=None)[source]¶ Bases:
zoo.automl.feature.abstract.BaseFeatureTransformerecho transformer
-
fit_transform(input_df, **config)[source]¶ fit data with the input dataframe Will refit the scalars to this data if any. :param input_df: input to be fitted :param config: the config :return:
-
restore(**config)[source]¶ Restore variables from file :param file_path: file contain saved parameters. i.e. some parameters are obtained during training, not in trial config, e.g. scaler fit params) :param config: the trial config :return:
-
zoo.automl.feature.time_sequence module¶
-
class
zoo.automl.feature.time_sequence.TimeSequenceFeatureTransformer(future_seq_len=1, dt_col='datetime', target_col='value', extra_features_col=None, drop_missing=True)[source]¶ Bases:
zoo.automl.feature.abstract.BaseFeatureTransformerTimeSequence feature engineering
-
fit_transform(input_df, **config)[source]¶ Fit data and transform the raw data to features. This is used in training for hyper parameter searching. This method will refresh the parameters (e.g. min and max of the MinMaxScaler) if any :param input_df: The input time series data frame, it can be a list of data frame or just one dataframe Example: datetime value “extra feature 1” “extra feature 2” 2019-01-01 1.9 1 2 2019-01-02 2.3 0 2 :return: tuple (x,y) x: 3-d array in format (no. of samples, past sequence length, 2+feature length), in the last dimension, the 1st col is the time index (data type needs to be numpy datetime type, e.g. “datetime64”), the 2nd col is the target value (data type should be numeric) y: y is 2-d numpy array in format (no. of samples, future sequence length) if future sequence length > 1, or 1-d numpy array in format (no. of samples, ) if future sequence length = 1
-
post_processing(input_df, y_pred, is_train)[source]¶ Used only in pipeline predict, after calling self.transform(input_df, is_train=False). Post_processing includes converting the predicted array into data frame and scalar inverse transform. :param input_df: a list of data frames or one data frame. :param y_pred: Model prediction result (ndarray). :param is_train: indicate the output is used to evaluation or prediction. :return: In validation mode (is_train=True), return the unscaled y_pred and rolled input_y. In test mode (is_train=False) return unscaled data frame(s) in the format of {datetime_col} | {target_col(s)}.
-
save(file_path, replace=False)[source]¶ save the feature tools internal variables as well as the initialization args. Some of the variables are derived after fit_transform, so only saving config is not enough. :param: file : the file to be saved :return:
-
transform(input_df, is_train=True)[source]¶ Transform data into features using the preset of configurations from fit_transform :param input_df: The input time series data frame, input_df can be a list of data frame or one data frame. Example: datetime value “extra feature 1” “extra feature 2” 2019-01-01 1.9 1 2 2019-01-02 2.3 0 2 :param is_train: If the input_df is for training. :return: tuple (x,y) x: 3-d array in format (no. of samples, past sequence length, 2+feature length), in the last dimension, the 1st col is the time index (data type needs to be numpy datetime type, e.g. “datetime64”), the 2nd col is the target value (data type should be numeric) y: y is 2-d numpy array in format (no. of samples, future sequence length) if future sequence length > 1, or 1-d numpy array in format (no. of samples, ) if future sequence length = 1
-