zoo.feature package¶

Subpackages¶

Submodules¶

zoo.feature.common module¶

class zoo.feature.common.ArrayToTensor(size, bigdl_type='float')[source]¶

Bases: zoo.feature.common.Preprocessing

a Transformer that converts an Array[_] to a Tensor. :param size dimensions of target Tensor.

class zoo.feature.common.BigDLAdapter(bigdl_transformer, bigdl_type='float')[source]¶: Bases: zoo.feature.common.Preprocessing

class zoo.feature.common.ChainedPreprocessing(transformers, bigdl_type='float')[source]¶

Bases: zoo.feature.common.Preprocessing

chains two Preprocessing together. The output type of the first Preprocessing should be the same with the input type of the second Preprocessing.

class zoo.feature.common.FeatureLabelPreprocessing(feature_transformer, label_transformer, bigdl_type='float')[source]¶

Bases: zoo.feature.common.Preprocessing

construct a Transformer that convert (Feature, Label) tuple to a Sample. The returned Transformer is robust for the case label = null, in which the Sample is derived from Feature only. :param feature_transformer transformer for feature, transform F to Tensor[T] :param label_transformer transformer for label, transform L to Tensor[T]

class zoo.feature.common.FeatureSet(jvalue=None, bigdl_type='float')[source]¶

Bases: bigdl.dataset.dataset.DataSet

A set of data which is used in the model optimization process. The FeatureSet can be accessed in a random data sample sequence. In the training process, the data sequence is a looped endless sequence. While in the validation process, the data sequence is a limited length sequence. Different from BigDL’s DataSet, this FeatureSet could be cached to Intel Optane DC Persistent Memory, if you set memory_type to PMEM when creating FeatureSet.

classmethod image_frame(image_frame, memory_type='DRAM', sequential_order=False, shuffle=True, bigdl_type='float')[source]¶: Create FeatureSet from ImageFrame. :param image_frame: ImageFrame :param memory_type: string, DRAM, PMEM or a Int number. If it’s DRAM, will cache dataset into dynamic random-access memory If it’s PMEM, will cache dataset into Intel Optane DC Persistent Memory If it’s a Int number n, will cache dataset into disk, and only hold 1/n of the data into memory during the training. After going through the 1/n, we will release the current cache, and load another 1/n into memory. :param sequential_order: whether to iterate the elements in the feature set in sequential order for training. :param shuffle: whether to shuffle the elements in each partition before each epoch when training :param bigdl_type: numeric type :return: A feature set

classmethod image_set(imageset, memory_type='DRAM', sequential_order=False, shuffle=True, bigdl_type='float')[source]¶: Create FeatureSet from ImageFrame. :param imageset: ImageSet :param memory_type: string, DRAM or PMEM If it’s DRAM, will cache dataset into dynamic random-access memory If it’s PMEM, will cache dataset into Intel Optane DC Persistent Memory If it’s a Int number n, will cache dataset into disk, and only hold 1/n of the data into memory during the training. After going through the 1/n, we will release the current cache, and load another 1/n into memory. :param sequential_order: whether to iterate the elements in the feature set in sequential order for training. :param shuffle: whether to shuffle the elements in each partition before each epoch when training :param bigdl_type: numeric type :return: A feature set

classmethod pytorch_dataloader(dataloader, features='_data[0]', labels='_data[1]', bigdl_type='float')[source]¶: Create FeatureSet from pytorch dataloader :param dataloader: a pytorch dataloader, or a function return pytorch dataloader. :param features: features in _data, _data is get from dataloader. :param labels: lables in _data, _data is get from dataloader. :param bigdl_type: numeric type :return: A feature set

classmethod rdd(rdd, memory_type='DRAM', sequential_order=False, shuffle=True, bigdl_type='float')[source]¶: Create FeatureSet from RDD. :param rdd: A RDD :param memory_type: string, DRAM, PMEM or a Int number. If it’s DRAM, will cache dataset into dynamic random-access memory If it’s PMEM, will cache dataset into Intel Optane DC Persistent Memory If it’s a Int number n, will cache dataset into disk, and only hold 1/n of the data into memory during the training. After going through the 1/n, we will release the current cache, and load another 1/n into memory. :param sequential_order: whether to iterate the elements in the feature set in sequential order when training. :param shuffle: whether to shuffle the elements in each partition before each epoch when training :param bigdl_type:numeric type :return: A feature set

classmethod sample_rdd(rdd, memory_type='DRAM', sequential_order=False, shuffle=True, bigdl_type='float')[source]¶: Create FeatureSet from RDD[Sample]. :param rdd: A RDD[Sample] :param memory_type: string, DRAM or PMEM If it’s DRAM, will cache dataset into dynamic random-access memory If it’s PMEM, will cache dataset into Intel Optane DC Persistent Memory If it’s a Int number n, will cache dataset into disk, and only hold 1/n of the data into memory during the training. After going through the 1/n, we will release the current cache, and load another 1/n into memory. :param sequential_order: whether to iterate the elements in the feature set in sequential order when training. :param shuffle: whether to shuffle the elements in each partition before each epoch when training :param bigdl_type:numeric type :return: A feature set

classmethod tf_dataset(func, total_size, bigdl_type='float')[source]¶

Parameters:	func – a function return a tensorflow dataset total_size – total size of this dataset bigdl_type – numeric type
Returns:	A feature set

to_dataset()[source]¶: To BigDL compatible DataSet :return:

transform(transformer)[source]¶: Helper function to transform the data type in the data set. :param transformer: the transformers to transform this feature set. :return: A feature set

class zoo.feature.common.FeatureToTupleAdapter(sample_transformer, bigdl_type='float')[source]¶: Bases: zoo.feature.common.Preprocessing

class zoo.feature.common.MLlibVectorToTensor(size, bigdl_type='float')[source]¶

Bases: zoo.feature.common.Preprocessing

a Transformer that converts MLlib Vector to a Tensor. .. note:: Deprecated in 0.4.0. NNEstimator will automatically extract Vectors now. :param size dimensions of target Tensor.

class zoo.feature.common.Preprocessing(bigdl_type='float', *args)[source]¶

Bases: bigdl.util.common.JavaValue

Preprocessing defines data transform action during feature preprocessing. Python wrapper for the scala Preprocessing

class zoo.feature.common.Relation(id1, id2, label, bigdl_type='float')[source]¶

Bases: object

It represents the relationship between two items.

to_tuple()[source]¶

class zoo.feature.common.Relations[source]¶

Bases: object

static read(path, sc=None, min_partitions=1, bigdl_type='float')[source]¶

Read relations from csv or txt file. Each record is supposed to contain the following three fields in order: id1(string), id2(string) and label(int).

For csv file, it should be without header. For txt file, each line should contain one record with fields separated by comma.

Parameters:	path – The path to the relations file, which can either be a local or disrtibuted file

system (such as HDFS) path. :param sc: An instance of SparkContext. If specified, return RDD of Relation. Default is None and in this case return list of Relation. :param min_partitions: Int. A suggestion value of the minimal partition number for input texts. Only need to specify this when sc is not None. Default is 1.

static read_parquet(path, sc, bigdl_type='float')[source]¶

Read relations from parquet file. Schema should be the following: “id1”(string), “id2”(string) and “label”(int).

Parameters:	path – The path to the parquet file. sc – An instance of SparkContext.
Returns:	RDD of Relation.

class zoo.feature.common.SampleToMiniBatch(batch_size, bigdl_type='float')[source]¶

Bases: zoo.feature.common.Preprocessing

a Transformer that converts Feature to (Feature, None).

class zoo.feature.common.ScalarToTensor(bigdl_type='float')[source]¶

Bases: zoo.feature.common.Preprocessing

a Preprocessing that converts a number to a Tensor.

class zoo.feature.common.SeqToMultipleTensors(size=[], bigdl_type='float')[source]¶

Bases: zoo.feature.common.Preprocessing

a Transformer that converts an Array[_] or Seq[_] or ML Vector to several tensors. :param size, list of int list, dimensions of target Tensors, e.g. [[2],[4]]

class zoo.feature.common.SeqToTensor(size=[], bigdl_type='float')[source]¶

Bases: zoo.feature.common.Preprocessing

a Transformer that converts an Array[_] or Seq[_] to a Tensor. :param size dimensions of target Tensor.

class zoo.feature.common.TensorToSample(bigdl_type='float')[source]¶

Bases: zoo.feature.common.Preprocessing

a Transformer that converts Tensor to Sample.

class zoo.feature.common.ToTuple(bigdl_type='float')[source]¶

Bases: zoo.feature.common.Preprocessing

a Transformer that converts Feature to (Feature, None).