zoo.feature package¶
Submodules¶
zoo.feature.common module¶
-
class
zoo.feature.common.ArrayToTensor(size, bigdl_type='float')[source]¶ Bases:
zoo.feature.common.Preprocessinga Transformer that converts an Array[_] to a Tensor. :param size dimensions of target Tensor.
-
class
zoo.feature.common.ChainedPreprocessing(transformers, bigdl_type='float')[source]¶ Bases:
zoo.feature.common.Preprocessingchains two Preprocessing together. The output type of the first Preprocessing should be the same with the input type of the second Preprocessing.
-
class
zoo.feature.common.FeatureLabelPreprocessing(feature_transformer, label_transformer, bigdl_type='float')[source]¶ Bases:
zoo.feature.common.Preprocessingconstruct a Transformer that convert (Feature, Label) tuple to a Sample. The returned Transformer is robust for the case label = null, in which the Sample is derived from Feature only. :param feature_transformer transformer for feature, transform F to Tensor[T] :param label_transformer transformer for label, transform L to Tensor[T]
-
class
zoo.feature.common.FeatureSet(jvalue=None, bigdl_type='float')[source]¶ Bases:
bigdl.dataset.dataset.DataSetA set of data which is used in the model optimization process. The FeatureSet can be accessed in a random data sample sequence. In the training process, the data sequence is a looped endless sequence. While in the validation process, the data sequence is a limited length sequence. Different from BigDL’s DataSet, this FeatureSet could be cached to Intel Optane DC Persistent Memory, if you set memory_type to PMEM when creating FeatureSet.
-
classmethod
image_frame(image_frame, memory_type='DRAM', sequential_order=False, shuffle=True, bigdl_type='float')[source]¶ Create FeatureSet from ImageFrame. :param image_frame: ImageFrame :param memory_type: string, DRAM, PMEM or a Int number. If it’s DRAM, will cache dataset into dynamic random-access memory If it’s PMEM, will cache dataset into Intel Optane DC Persistent Memory If it’s a Int number n, will cache dataset into disk, and only hold 1/n of the data into memory during the training. After going through the 1/n, we will release the current cache, and load another 1/n into memory. :param sequential_order: whether to iterate the elements in the feature set in sequential order for training. :param shuffle: whether to shuffle the elements in each partition before each epoch when training :param bigdl_type: numeric type :return: A feature set
-
classmethod
image_set(imageset, memory_type='DRAM', sequential_order=False, shuffle=True, bigdl_type='float')[source]¶ Create FeatureSet from ImageFrame. :param imageset: ImageSet :param memory_type: string, DRAM or PMEM If it’s DRAM, will cache dataset into dynamic random-access memory If it’s PMEM, will cache dataset into Intel Optane DC Persistent Memory If it’s a Int number n, will cache dataset into disk, and only hold 1/n of the data into memory during the training. After going through the 1/n, we will release the current cache, and load another 1/n into memory. :param sequential_order: whether to iterate the elements in the feature set in sequential order for training. :param shuffle: whether to shuffle the elements in each partition before each epoch when training :param bigdl_type: numeric type :return: A feature set
-
classmethod
pytorch_dataloader(dataloader, features='_data[0]', labels='_data[1]', bigdl_type='float')[source]¶ Create FeatureSet from pytorch dataloader :param dataloader: a pytorch dataloader, or a function return pytorch dataloader. :param features: features in _data, _data is get from dataloader. :param labels: lables in _data, _data is get from dataloader. :param bigdl_type: numeric type :return: A feature set
-
classmethod
rdd(rdd, memory_type='DRAM', sequential_order=False, shuffle=True, bigdl_type='float')[source]¶ Create FeatureSet from RDD. :param rdd: A RDD :param memory_type: string, DRAM, PMEM or a Int number. If it’s DRAM, will cache dataset into dynamic random-access memory If it’s PMEM, will cache dataset into Intel Optane DC Persistent Memory If it’s a Int number n, will cache dataset into disk, and only hold 1/n of the data into memory during the training. After going through the 1/n, we will release the current cache, and load another 1/n into memory. :param sequential_order: whether to iterate the elements in the feature set in sequential order when training. :param shuffle: whether to shuffle the elements in each partition before each epoch when training :param bigdl_type:numeric type :return: A feature set
-
classmethod
sample_rdd(rdd, memory_type='DRAM', sequential_order=False, shuffle=True, bigdl_type='float')[source]¶ Create FeatureSet from RDD[Sample]. :param rdd: A RDD[Sample] :param memory_type: string, DRAM or PMEM If it’s DRAM, will cache dataset into dynamic random-access memory If it’s PMEM, will cache dataset into Intel Optane DC Persistent Memory If it’s a Int number n, will cache dataset into disk, and only hold 1/n of the data into memory during the training. After going through the 1/n, we will release the current cache, and load another 1/n into memory. :param sequential_order: whether to iterate the elements in the feature set in sequential order when training. :param shuffle: whether to shuffle the elements in each partition before each epoch when training :param bigdl_type:numeric type :return: A feature set
-
classmethod
-
class
zoo.feature.common.MLlibVectorToTensor(size, bigdl_type='float')[source]¶ Bases:
zoo.feature.common.Preprocessinga Transformer that converts MLlib Vector to a Tensor. .. note:: Deprecated in 0.4.0. NNEstimator will automatically extract Vectors now. :param size dimensions of target Tensor.
-
class
zoo.feature.common.Preprocessing(bigdl_type='float', *args)[source]¶ Bases:
bigdl.util.common.JavaValuePreprocessing defines data transform action during feature preprocessing. Python wrapper for the scala Preprocessing
-
class
zoo.feature.common.Relation(id1, id2, label, bigdl_type='float')[source]¶ Bases:
objectIt represents the relationship between two items.
-
class
zoo.feature.common.Relations[source]¶ Bases:
object-
static
read(path, sc=None, min_partitions=1, bigdl_type='float')[source]¶ Read relations from csv or txt file. Each record is supposed to contain the following three fields in order: id1(string), id2(string) and label(int).
For csv file, it should be without header. For txt file, each line should contain one record with fields separated by comma.
Parameters: path – The path to the relations file, which can either be a local or disrtibuted file system (such as HDFS) path. :param sc: An instance of SparkContext. If specified, return RDD of Relation. Default is None and in this case return list of Relation. :param min_partitions: Int. A suggestion value of the minimal partition number for input texts. Only need to specify this when sc is not None. Default is 1.
-
static
-
class
zoo.feature.common.SampleToMiniBatch(batch_size, bigdl_type='float')[source]¶ Bases:
zoo.feature.common.Preprocessinga Transformer that converts Feature to (Feature, None).
-
class
zoo.feature.common.ScalarToTensor(bigdl_type='float')[source]¶ Bases:
zoo.feature.common.Preprocessinga Preprocessing that converts a number to a Tensor.
-
class
zoo.feature.common.SeqToMultipleTensors(size=[], bigdl_type='float')[source]¶ Bases:
zoo.feature.common.Preprocessinga Transformer that converts an Array[_] or Seq[_] or ML Vector to several tensors. :param size, list of int list, dimensions of target Tensors, e.g. [[2],[4]]
-
class
zoo.feature.common.SeqToTensor(size=[], bigdl_type='float')[source]¶ Bases:
zoo.feature.common.Preprocessinga Transformer that converts an Array[_] or Seq[_] to a Tensor. :param size dimensions of target Tensor.
-
class
zoo.feature.common.TensorToSample(bigdl_type='float')[source]¶ Bases:
zoo.feature.common.Preprocessinga Transformer that converts Tensor to Sample.
-
class
zoo.feature.common.ToTuple(bigdl_type='float')[source]¶ Bases:
zoo.feature.common.Preprocessinga Transformer that converts Feature to (Feature, None).