zoo.tfpark.text.estimator package¶
Submodules¶
zoo.tfpark.text.estimator.bert_base module¶
-
class
zoo.tfpark.text.estimator.bert_base.BERTBaseEstimator(model_fn, bert_config_file, init_checkpoint=None, use_one_hot_embeddings=False, model_dir=None, **kwargs)[source]¶ Bases:
zoo.tfpark.estimator.TFEstimatorThe base class for BERT related TFEstimators. Common arguments: bert_config_file, init_checkpoint, use_one_hot_embeddings, optimizer, model_dir.
For its subclass: - One can add additional arguments and access them via params. - One can utilize _bert_model to create model_fn and bert_input_fn to create input_fn.
-
zoo.tfpark.text.estimator.bert_base.bert_input_fn(rdd, max_seq_length, batch_size, features={'input_ids', 'input_mask', 'token_type_ids'}, extra_features=None, labels=None, label_size=None)[source]¶ Takes an RDD to create the input function for BERT related TFEstimators. For training and evaluation, each element in rdd should be a tuple: (dict of features, a single label or dict of labels) Note that currently only integer or integer array labels are supported. For prediction, each element in rdd should be a dict of features.
Features in each RDD element should contain “input_ids”, “input_mask” and “token_type_ids”, each of shape max_seq_length. If you have other extra features in your dict of features, you need to explicitly specify the argument extra_features, which is supposed to be the dict with feature name as key and tuple of (dtype, shape) as its value.
zoo.tfpark.text.estimator.bert_classifier module¶
-
class
zoo.tfpark.text.estimator.bert_classifier.BERTClassifier(num_classes, bert_config_file, init_checkpoint=None, use_one_hot_embeddings=False, optimizer=None, model_dir=None)[source]¶ Bases:
zoo.tfpark.text.estimator.bert_base.BERTBaseEstimatorA pre-built TFEstimator that takes the hidden state of the first token of BERT to do classification.
Parameters: - num_classes – Positive int. The number of classes to be classified.
- bert_config_file – The path to the json file for BERT configurations.
- init_checkpoint – The path to the initial checkpoint of the pre-trained BERT model if any.
Default is None. :param use_one_hot_embeddings: Boolean. Whether to use one-hot for word embeddings. Default is False. :param optimizer: The optimizer used to train the estimator. It should be an instance of tf.train.Optimizer. Default is None if no training is involved. :param model_dir: The output directory for model checkpoints to be written if any. Default is None.
zoo.tfpark.text.estimator.bert_ner module¶
-
class
zoo.tfpark.text.estimator.bert_ner.BERTNER(num_entities, bert_config_file, init_checkpoint=None, use_one_hot_embeddings=False, optimizer=None, model_dir=None)[source]¶ Bases:
zoo.tfpark.text.estimator.bert_base.BERTBaseEstimatorA pre-built TFEstimator that takes the hidden state of the final encoder layer of BERT for named entity recognition based on SoftMax classification. Note that cased BERT models are recommended for NER.
Parameters: - num_entities – Positive int. The number of entity labels to be classified.
- bert_config_file – The path to the json file for BERT configurations.
- init_checkpoint – The path to the initial checkpoint of the pre-trained BERT model if any.
Default is None. :param use_one_hot_embeddings: Boolean. Whether to use one-hot for word embeddings. Default is False. :param optimizer: The optimizer used to train the estimator. It should be an instance of tf.train.Optimizer. Default is None if no training is involved. :param model_dir: The output directory for model checkpoints to be written if any. Default is None.
zoo.tfpark.text.estimator.bert_squad module¶
-
class
zoo.tfpark.text.estimator.bert_squad.BERTSQuAD(bert_config_file, init_checkpoint=None, use_one_hot_embeddings=False, optimizer=None, model_dir=None)[source]¶ Bases:
zoo.tfpark.text.estimator.bert_base.BERTBaseEstimatorA pre-built TFEstimator that that takes the hidden state of the final encoder layer of BERT to perform training and prediction on SQuAD dataset. The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset.
Parameters: - bert_config_file – The path to the json file for BERT configurations.
- init_checkpoint – The path to the initial checkpoint of the pre-trained BERT model if any.
Default is None. :param use_one_hot_embeddings: Boolean. Whether to use one-hot for word embeddings. Default is False. :param optimizer: The optimizer used to train the estimator. It should be an instance of tf.train.Optimizer. Default is None if no training is involved. :param model_dir: The output directory for model checkpoints to be written if any. Default is None.