zoo.tfpark.text.estimator package

Submodules

zoo.tfpark.text.estimator.bert_base module

class zoo.tfpark.text.estimator.bert_base.BERTBaseEstimator(model_fn, bert_config_file, init_checkpoint=None, use_one_hot_embeddings=False, model_dir=None, **kwargs)[source]

Bases: zoo.tfpark.estimator.TFEstimator

The base class for BERT related TFEstimators. Common arguments: bert_config_file, init_checkpoint, use_one_hot_embeddings, optimizer, model_dir.

For its subclass: - One can add additional arguments and access them via params. - One can utilize _bert_model to create model_fn and bert_input_fn to create input_fn.

zoo.tfpark.text.estimator.bert_base.bert_input_fn(rdd, max_seq_length, batch_size, features={'input_ids', 'input_mask', 'token_type_ids'}, extra_features=None, labels=None, label_size=None)[source]

Takes an RDD to create the input function for BERT related TFEstimators. For training and evaluation, each element in rdd should be a tuple: (dict of features, a single label or dict of labels) Note that currently only integer or integer array labels are supported. For prediction, each element in rdd should be a dict of features.

Features in each RDD element should contain “input_ids”, “input_mask” and “token_type_ids”, each of shape max_seq_length. If you have other extra features in your dict of features, you need to explicitly specify the argument extra_features, which is supposed to be the dict with feature name as key and tuple of (dtype, shape) as its value.

zoo.tfpark.text.estimator.bert_base.bert_model(features, labels, mode, params)[source]

Return an instance of BertModel and one can take its different outputs to perform specific tasks.

zoo.tfpark.text.estimator.bert_classifier module

class zoo.tfpark.text.estimator.bert_classifier.BERTClassifier(num_classes, bert_config_file, init_checkpoint=None, use_one_hot_embeddings=False, optimizer=None, model_dir=None)[source]

Bases: zoo.tfpark.text.estimator.bert_base.BERTBaseEstimator

A pre-built TFEstimator that takes the hidden state of the first token of BERT to do classification.

Parameters:
  • num_classes – Positive int. The number of classes to be classified.
  • bert_config_file – The path to the json file for BERT configurations.
  • init_checkpoint – The path to the initial checkpoint of the pre-trained BERT model if any.

Default is None. :param use_one_hot_embeddings: Boolean. Whether to use one-hot for word embeddings. Default is False. :param optimizer: The optimizer used to train the estimator. It should be an instance of tf.train.Optimizer. Default is None if no training is involved. :param model_dir: The output directory for model checkpoints to be written if any. Default is None.

zoo.tfpark.text.estimator.bert_classifier.make_bert_classifier_model_fn(optimizer)[source]

zoo.tfpark.text.estimator.bert_ner module

class zoo.tfpark.text.estimator.bert_ner.BERTNER(num_entities, bert_config_file, init_checkpoint=None, use_one_hot_embeddings=False, optimizer=None, model_dir=None)[source]

Bases: zoo.tfpark.text.estimator.bert_base.BERTBaseEstimator

A pre-built TFEstimator that takes the hidden state of the final encoder layer of BERT for named entity recognition based on SoftMax classification. Note that cased BERT models are recommended for NER.

Parameters:
  • num_entities – Positive int. The number of entity labels to be classified.
  • bert_config_file – The path to the json file for BERT configurations.
  • init_checkpoint – The path to the initial checkpoint of the pre-trained BERT model if any.

Default is None. :param use_one_hot_embeddings: Boolean. Whether to use one-hot for word embeddings. Default is False. :param optimizer: The optimizer used to train the estimator. It should be an instance of tf.train.Optimizer. Default is None if no training is involved. :param model_dir: The output directory for model checkpoints to be written if any. Default is None.

zoo.tfpark.text.estimator.bert_ner.make_bert_ner_model_fn(optimizer)[source]

zoo.tfpark.text.estimator.bert_squad module

class zoo.tfpark.text.estimator.bert_squad.BERTSQuAD(bert_config_file, init_checkpoint=None, use_one_hot_embeddings=False, optimizer=None, model_dir=None)[source]

Bases: zoo.tfpark.text.estimator.bert_base.BERTBaseEstimator

A pre-built TFEstimator that that takes the hidden state of the final encoder layer of BERT to perform training and prediction on SQuAD dataset. The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset.

Parameters:
  • bert_config_file – The path to the json file for BERT configurations.
  • init_checkpoint – The path to the initial checkpoint of the pre-trained BERT model if any.

Default is None. :param use_one_hot_embeddings: Boolean. Whether to use one-hot for word embeddings. Default is False. :param optimizer: The optimizer used to train the estimator. It should be an instance of tf.train.Optimizer. Default is None if no training is involved. :param model_dir: The output directory for model checkpoints to be written if any. Default is None.

zoo.tfpark.text.estimator.bert_squad.make_bert_squad_model_fn(optimizer)[source]

Module contents