Model¶
main¶
architecture¶
-
class
nlper.model.architecture.BahdanauAttention(hidden_size: int)¶ Bahdanau attention Initializes fully connected layer and internal parameter V with uniformly distributed weights.
- Parameters
hidden_size (int) – Number of features attention in fully connected layer
-
forward(hidden: torch.Tensor, encoder_outputs: torch.Tensor) → torch.Tensor¶ Calculates attention weights by applying softmax on attention alignment scores.
- Parameters
hidden (torch.Tensor) – Encoder hidden states
encoder_outputs (torch.Tensor) – Encoder outputs
- Returns
Attention weights
- Return type
torch.Tensor
-
score(hidden: torch.Tensor, encoder_outputs: torch.Tensor) → torch.Tensor¶ Calculates alignment scores of attention.
- Parameters
hidden (torch.Tensor) – Encoder hidden states
encoder_outputs (torch.Tensor) – Encoder outputs
- Returns
Attention alignment scores
- Return type
torch.Tensor
-
class
nlper.model.architecture.DecoderRNN(embedding_size: int, hidden_size: int, output_size: int, n_layers: int = 1, dropout: float = 0.1)¶ Model decoder class Initializes embedding layer, dropout layer, Bahdanau attention module, single directional GRU and linear classifier.
- Parameters
embedding_size (int) – Size of embedding layer, number of expected features in GRU
hidden_size (int) – Number of features in the hidden state of GRU and in fully connected layer of attention
output_size (int) – Number of unique words in vocabulary
n_layers (int) – Number of recurrent layers in GRU
dropout (float) – Probability of dropout on GRU layer except from last layer
-
forward(sequence: torch.Tensor, hidden: torch.Tensor, encoder_outputs: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor]¶ Defines decoder structure and flow.
Pushes sequence through embedding layer
Applies dropout
Calls attention layer to obtain attention weights
Calculates context vector of attention
Concatenates context vector with previous decoder output
Feeds GRU with concatenation result
Generate final output by applying softmax
- Parameters
sequence (torch.Tensor) – StartOfSentence token or previous decoder output
hidden (torch.Tensor) – Hidden state
encoder_outputs (torch.Tensor) – Encoder output
- Returns
Decoder output, decoder hidden state and attention weights
- Return type
tuple
-
class
nlper.model.architecture.EncoderRNN(input_size: int, embedding_size: int, hidden_size: int, n_layers: int = 1, dropout: float = 0.1)¶ Model encoder class Initializes embedding laayer and bidirectional GRU.
- Parameters
input_size (int) – Number of unique words in vocabulary
embedding_size (int) – Size of embedding layer, number of expected features in GRU
hidden_size (int) – Number of features in the hidden state of GRU
n_layers (int) – Number of recurrent layers in GRU
dropout (float) – Probability of dropout on GRU layer except from last layer
-
forward(sequence: torch.Tensor, hidden: Any = None) → Tuple[torch.Tensor, torch.Tensor]¶ Defines encoder structure and flow.
Pushes sequence through embedding layer
Feeds GRU with embedded sequence
Merges bidirectional GRU model into single tensor
- Parameters
sequence (torch.Tensor) – Tensor of indices representing text
hidden (torch.Tensor, optional) – Initial hidden state of GRU, default None
- Returns
Encoder output and encoder hidden states
- Return type
tuple
-
class
nlper.model.architecture.Seq2Seq(encoder: torch.nn.modules.module.Module, decoder: torch.nn.modules.module.Module)¶ Sequence to Sequence model, built using encoder and decoder.
- Parameters
encoder (nn.Module) – Encoder model
decoder (nn.Module) – Decoder model with Bahdanau attention
-
forward(text: torch.Tensor, summary: torch.Tensor, teacher_forcing_ratio: float = 0.5) → torch.Tensor¶ Defines Seq2Seq structure and flow. Teacher forcing ratio specifies probability of altering the decoder output with the target summary token for the next word generation. Used to accelerate model learning time.
Feeds encoder with input indices
Initializes decoder hidden state as encoder hidden state
Initializes decoder output with Start of Sequence <sos> token
Initializes summary output vector
- Until the maximum summary length is reached:
Feeds decoder with decoder output, hidden state and encoder output
Updates decoder output and hidden state
Updates summary output vector with decoder output token
With teacher_forcing_ratio probability alters decoder output
- Parameters
text (torch.Tensor) – Indices of input text
summary (torch.Tensor) – Indices of target / reference summary
teacher_forcing_ratio (float) –
- Returns
Output sequence / summary
- Return type
torch.Tensor
model¶
-
class
nlper.model.model.Model(config: Dict[str, Any], vocab_config: Any)¶ Utils of Seq2Seq model.
Executes: * model training * text prediction (summarization) * model evaluation * saving and loading of a model
- Parameters
config (dict) – Config dictionary
vocab_config (VocabConfig) – Vocabulary config for model
-
create_model() → None¶ Initializes full Seq2Seq model with encoder and decoder as specified in config file.
-
create_optimizers_and_loss() → None¶ Initializes Adam optimizer for Seq2Seq model and learning rate scheduler as specified in config file. Initialized CrossEntropyLoss with ignoring padding token <pad> from sequence.
-
evaluate(valid_iterator: Any) → List[torch.Tensor]¶ Evaluates the trained Seq2Seq model performance.
- Parameters
valid_iterator (torchtext.data.BucketIterator) – Valid or test iterator
- Returns
-
get_text_summary_from_batch(batch) → Tuple[torch.Tensor, torch.Tensor]¶ Obtains original text and target summary indices from batch and transforms to GPU :param batch: :type batch: torchtext.data.batch.Batch :return: Text and summary indices for model :rtype: tuple
-
load_model(model_path: str, attention_param_path: str = None) → None¶ Loads trained model and transfers to GPU. Currently attention
Vparameter is also saved and loaded, cause PyTorch does not supports nn.Parameter saving directly.- Parameters
model_path (str) – Path to trained model
attention_param_path (str) – Path to trained attention parameter
-
predict(text: str, length_of_original_text: float = 0.25) → Tuple[str, torch.Tensor]¶ Predicts model output / summarizes given text. Obtains summarization with defined maximum percentage of length of original text. Returns summarization and attention weights to plot attention heatmap.
- Parameters
text (str) – Original text to summarize
length_of_original_text (float) – Maximum ratio of summary length comparing to original text
- Returns
summary text and attention weights
- Return type
tuple
-
save_model(model_path: str, model_epoch: int) → None¶ Saves trained model weights after epoch, transferred to CPU. Currently attention
Vparameter is also saved, cause PyTorch does not supports nn.Parameter saving directly.- Parameters
model_path (str) – Path to save model
model_epoch (int) – Model epoch
-
show_loss(batch_id: int, loss: torch.Tensor, train_iterator: Any) → None¶ Logs loss value for specified batch.
- Parameters
batch_id (int) – Number of batch
loss – Loss value for batch
loss – torch.Tensor
train_iterator (torchtext.data.BucketIterator) – Train iterator
-
show_rouge_and_attention_matrix(epoch: int, batch_id: int, text: torch.Tensor, summary: torch.Tensor) → None¶ Calls rouge metric calculation and attention heatmap drawing.
- Parameters
epoch (int) – Current training epoch.
batch_id (int) – Number of batch
text (torch.Tensor) – Model input / original text indices tensor
summary (torch.Tensor) – Model generated summary text indices tensor
-
train(train_iterator: Any, epoch: int = 0) → List[torch.Tensor]¶ Executes model training.
- Parameters
train_iterator (torchtext.data.BucketIterator) – Iterator over training dataset
epoch (int) – Current training epoch.
- Returns
Training loss values for batches
- Return type
list