zoo.tfpark.text.keras package

Submodules

zoo.tfpark.text.keras.intent_extraction module

class zoo.tfpark.text.keras.intent_extraction.IntentEntity(num_intents, num_entities, word_vocab_size, char_vocab_size, word_length=12, word_emb_dim=100, char_emb_dim=30, char_lstm_dim=30, tagger_lstm_dim=100, dropout=0.2, optimizer=None)[source]

Bases: zoo.tfpark.text.keras.text_model.TextKerasModel

A multi-task model used for joint intent extraction and slot filling.

This model has two inputs: - word indices of shape (batch, sequence_length) - character indices of shape (batch, sequence_length, word_length) This model has two outputs: - intent labels of shape (batch, num_intents) - entity tags of shape (batch, sequence_length, num_entities)

Parameters:
  • num_intents – Positive int. The number of intent classes to be classified.
  • num_entities – Positive int. The number of slot labels to be classified.
  • word_vocab_size – Positive int. The size of the word dictionary.
  • char_vocab_size – Positive int. The size of the character dictionary.
  • word_length – Positive int. The max word length in characters. Default is 12.
  • word_emb_dim – Positive int. The dimension of word embeddings. Default is 100.
  • char_emb_dim – Positive int. The dimension of character embeddings. Default is 30.
  • char_lstm_dim – Positive int. The hidden size of character feature Bi-LSTM layer.

Default is 30. :param tagger_lstm_dim: Positive int. The hidden size of tagger Bi-LSTM layers. Default is 100. :param dropout: Dropout rate. Default is 0.2. :param optimizer: Optimizer to train the model. If not specified, it will by default to be tf.train.AdamOptimizer().

static load_model(path)[source]

Load an existing IntentEntity model (with weights) from HDF5 file.

Parameters:path – String. The path to the pre-defined model.
Returns:IntentEntity.

zoo.tfpark.text.keras.ner module

class zoo.tfpark.text.keras.ner.NER(num_entities, word_vocab_size, char_vocab_size, word_length=12, word_emb_dim=100, char_emb_dim=30, tagger_lstm_dim=100, dropout=0.5, crf_mode='reg', optimizer=None)[source]

Bases: zoo.tfpark.text.keras.text_model.TextKerasModel

The model used for named entity recognition using Bidirectional LSTM with Conditional Random Field (CRF) sequence classifier.

This model has two inputs: - word indices of shape (batch, sequence_length) - character indices of shape (batch, sequence_length, word_length) This model outputs entity tags of shape (batch, sequence_length, num_entities).

Parameters:
  • num_entities – Positive int. The number of entity labels to be classified.
  • word_vocab_size – Positive int. The size of the word dictionary.
  • char_vocab_size – Positive int. The size of the character dictionary.
  • word_length – Positive int. The max word length in characters. Default is 12.
  • word_emb_dim – Positive int. The dimension of word embeddings. Default is 100.
  • char_emb_dim – Positive int. The dimension of character embeddings. Default is 30.
  • tagger_lstm_dim – Positive int. The hidden size of tagger Bi-LSTM layers. Default is 100.
  • dropout – Dropout rate. Default is 0.5.
  • crf_mode – String. CRF operation mode. Either ‘reg’ or ‘pad’. Default is ‘reg’.

‘reg’ for regular full sequence learning (all sequences have equal length). ‘pad’ for supplied sequence lengths (useful for padded sequences). For ‘pad’ mode, a third input for sequence_length (batch, 1) is needed. :param optimizer: Optimizer to train the model. If not specified, it will by default to be tf.keras.optimizers.Adam(0.001, clipnorm=5.).

static load_model(path)[source]

Load an existing NER model (with weights) from HDF5 file.

Parameters:path – String. The path to the pre-defined model.
Returns:NER.

zoo.tfpark.text.keras.pos_tagging module

class zoo.tfpark.text.keras.pos_tagging.SequenceTagger(num_pos_labels, num_chunk_labels, word_vocab_size, char_vocab_size=None, word_length=12, feature_size=100, dropout=0.2, classifier='softmax', optimizer=None)[source]

Bases: zoo.tfpark.text.keras.text_model.TextKerasModel

The model used as POS-tagger and chunker for sentence tagging, which contains three Bidirectional LSTM layers.

This model can have one or two input(s): - word indices of shape (batch, sequence_length) *If char_vocab_size is not None: - character indices of shape (batch, sequence_length, word_length) This model has two outputs: - pos tags of shape (batch, sequence_length, num_pos_labels) - chunk tags of shape (batch, sequence_length, num_chunk_labels)

Parameters:
  • num_pos_labels – Positive int. The number of pos labels to be classified.
  • num_chunk_labels – Positive int. The number of chunk labels to be classified.
  • word_vocab_size – Positive int. The size of the word dictionary.
  • char_vocab_size – Positive int. The size of the character dictionary.

Default is None and in this case only one input, namely word indices is expected. :param word_length: Positive int. The max word length in characters. Default is 12. :param feature_size: Positive int. The size of Embedding and Bi-LSTM layers. Default is 100. :param dropout: Dropout rate. Default is 0.5. :param classifier: String. The classification layer used for tagging chunks. Either ‘softmax’ or ‘crf’ (Conditional Random Field). Default is ‘softmax’. :param optimizer: Optimizer to train the model. If not specified, it will by default to be tf.train.AdamOptimizer().

static load_model(path)[source]

Load an existing SequenceTagger model (with weights) from HDF5 file.

Parameters:path – String. The path to the pre-defined model.
Returns:NER.

zoo.tfpark.text.keras.text_model module

class zoo.tfpark.text.keras.text_model.TextKerasModel(labor, optimizer=None, **kwargs)[source]

Bases: zoo.tfpark.model.KerasModel

The base class for text models in tfpark.

save_model(path)[source]

Save the model to a single HDF5 file.

Parameters:path – String. The path to save the model.

Module contents