zoo.tfpark.text.keras package¶
Submodules¶
zoo.tfpark.text.keras.intent_extraction module¶
-
class
zoo.tfpark.text.keras.intent_extraction.IntentEntity(num_intents, num_entities, word_vocab_size, char_vocab_size, word_length=12, word_emb_dim=100, char_emb_dim=30, char_lstm_dim=30, tagger_lstm_dim=100, dropout=0.2, optimizer=None)[source]¶ Bases:
zoo.tfpark.text.keras.text_model.TextKerasModelA multi-task model used for joint intent extraction and slot filling.
This model has two inputs: - word indices of shape (batch, sequence_length) - character indices of shape (batch, sequence_length, word_length) This model has two outputs: - intent labels of shape (batch, num_intents) - entity tags of shape (batch, sequence_length, num_entities)
Parameters: - num_intents – Positive int. The number of intent classes to be classified.
- num_entities – Positive int. The number of slot labels to be classified.
- word_vocab_size – Positive int. The size of the word dictionary.
- char_vocab_size – Positive int. The size of the character dictionary.
- word_length – Positive int. The max word length in characters. Default is 12.
- word_emb_dim – Positive int. The dimension of word embeddings. Default is 100.
- char_emb_dim – Positive int. The dimension of character embeddings. Default is 30.
- char_lstm_dim – Positive int. The hidden size of character feature Bi-LSTM layer.
Default is 30. :param tagger_lstm_dim: Positive int. The hidden size of tagger Bi-LSTM layers. Default is 100. :param dropout: Dropout rate. Default is 0.2. :param optimizer: Optimizer to train the model. If not specified, it will by default to be tf.train.AdamOptimizer().
zoo.tfpark.text.keras.ner module¶
-
class
zoo.tfpark.text.keras.ner.NER(num_entities, word_vocab_size, char_vocab_size, word_length=12, word_emb_dim=100, char_emb_dim=30, tagger_lstm_dim=100, dropout=0.5, crf_mode='reg', optimizer=None)[source]¶ Bases:
zoo.tfpark.text.keras.text_model.TextKerasModelThe model used for named entity recognition using Bidirectional LSTM with Conditional Random Field (CRF) sequence classifier.
This model has two inputs: - word indices of shape (batch, sequence_length) - character indices of shape (batch, sequence_length, word_length) This model outputs entity tags of shape (batch, sequence_length, num_entities).
Parameters: - num_entities – Positive int. The number of entity labels to be classified.
- word_vocab_size – Positive int. The size of the word dictionary.
- char_vocab_size – Positive int. The size of the character dictionary.
- word_length – Positive int. The max word length in characters. Default is 12.
- word_emb_dim – Positive int. The dimension of word embeddings. Default is 100.
- char_emb_dim – Positive int. The dimension of character embeddings. Default is 30.
- tagger_lstm_dim – Positive int. The hidden size of tagger Bi-LSTM layers. Default is 100.
- dropout – Dropout rate. Default is 0.5.
- crf_mode – String. CRF operation mode. Either ‘reg’ or ‘pad’. Default is ‘reg’.
‘reg’ for regular full sequence learning (all sequences have equal length). ‘pad’ for supplied sequence lengths (useful for padded sequences). For ‘pad’ mode, a third input for sequence_length (batch, 1) is needed. :param optimizer: Optimizer to train the model. If not specified, it will by default to be tf.keras.optimizers.Adam(0.001, clipnorm=5.).
zoo.tfpark.text.keras.pos_tagging module¶
-
class
zoo.tfpark.text.keras.pos_tagging.SequenceTagger(num_pos_labels, num_chunk_labels, word_vocab_size, char_vocab_size=None, word_length=12, feature_size=100, dropout=0.2, classifier='softmax', optimizer=None)[source]¶ Bases:
zoo.tfpark.text.keras.text_model.TextKerasModelThe model used as POS-tagger and chunker for sentence tagging, which contains three Bidirectional LSTM layers.
This model can have one or two input(s): - word indices of shape (batch, sequence_length) *If char_vocab_size is not None: - character indices of shape (batch, sequence_length, word_length) This model has two outputs: - pos tags of shape (batch, sequence_length, num_pos_labels) - chunk tags of shape (batch, sequence_length, num_chunk_labels)
Parameters: - num_pos_labels – Positive int. The number of pos labels to be classified.
- num_chunk_labels – Positive int. The number of chunk labels to be classified.
- word_vocab_size – Positive int. The size of the word dictionary.
- char_vocab_size – Positive int. The size of the character dictionary.
Default is None and in this case only one input, namely word indices is expected. :param word_length: Positive int. The max word length in characters. Default is 12. :param feature_size: Positive int. The size of Embedding and Bi-LSTM layers. Default is 100. :param dropout: Dropout rate. Default is 0.5. :param classifier: String. The classification layer used for tagging chunks. Either ‘softmax’ or ‘crf’ (Conditional Random Field). Default is ‘softmax’. :param optimizer: Optimizer to train the model. If not specified, it will by default to be tf.train.AdamOptimizer().
zoo.tfpark.text.keras.text_model module¶
-
class
zoo.tfpark.text.keras.text_model.TextKerasModel(labor, optimizer=None, **kwargs)[source]¶ Bases:
zoo.tfpark.model.KerasModelThe base class for text models in tfpark.