Model

main

architecture

class nlper.model.architecture.BahdanauAttention(hidden_size: int)

Bahdanau attention Initializes fully connected layer and internal parameter V with uniformly distributed weights.

Parameters

hidden_size (int) – Number of features attention in fully connected layer

forward(hidden: torch.Tensor, encoder_outputs: torch.Tensor) → torch.Tensor

Calculates attention weights by applying softmax on attention alignment scores.

Parameters
  • hidden (torch.Tensor) – Encoder hidden states

  • encoder_outputs (torch.Tensor) – Encoder outputs

Returns

Attention weights

Return type

torch.Tensor

score(hidden: torch.Tensor, encoder_outputs: torch.Tensor) → torch.Tensor

Calculates alignment scores of attention.

Parameters
  • hidden (torch.Tensor) – Encoder hidden states

  • encoder_outputs (torch.Tensor) – Encoder outputs

Returns

Attention alignment scores

Return type

torch.Tensor

class nlper.model.architecture.DecoderRNN(embedding_size: int, hidden_size: int, output_size: int, n_layers: int = 1, dropout: float = 0.1)

Model decoder class Initializes embedding layer, dropout layer, Bahdanau attention module, single directional GRU and linear classifier.

Parameters
  • embedding_size (int) – Size of embedding layer, number of expected features in GRU

  • hidden_size (int) – Number of features in the hidden state of GRU and in fully connected layer of attention

  • output_size (int) – Number of unique words in vocabulary

  • n_layers (int) – Number of recurrent layers in GRU

  • dropout (float) – Probability of dropout on GRU layer except from last layer

forward(sequence: torch.Tensor, hidden: torch.Tensor, encoder_outputs: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Defines decoder structure and flow.

  • Pushes sequence through embedding layer

  • Applies dropout

  • Calls attention layer to obtain attention weights

  • Calculates context vector of attention

  • Concatenates context vector with previous decoder output

  • Feeds GRU with concatenation result

  • Generate final output by applying softmax

Parameters
  • sequence (torch.Tensor) – StartOfSentence token or previous decoder output

  • hidden (torch.Tensor) – Hidden state

  • encoder_outputs (torch.Tensor) – Encoder output

Returns

Decoder output, decoder hidden state and attention weights

Return type

tuple

class nlper.model.architecture.EncoderRNN(input_size: int, embedding_size: int, hidden_size: int, n_layers: int = 1, dropout: float = 0.1)

Model encoder class Initializes embedding laayer and bidirectional GRU.

Parameters
  • input_size (int) – Number of unique words in vocabulary

  • embedding_size (int) – Size of embedding layer, number of expected features in GRU

  • hidden_size (int) – Number of features in the hidden state of GRU

  • n_layers (int) – Number of recurrent layers in GRU

  • dropout (float) – Probability of dropout on GRU layer except from last layer

forward(sequence: torch.Tensor, hidden: Any = None) → Tuple[torch.Tensor, torch.Tensor]

Defines encoder structure and flow.

  • Pushes sequence through embedding layer

  • Feeds GRU with embedded sequence

  • Merges bidirectional GRU model into single tensor

Parameters
  • sequence (torch.Tensor) – Tensor of indices representing text

  • hidden (torch.Tensor, optional) – Initial hidden state of GRU, default None

Returns

Encoder output and encoder hidden states

Return type

tuple

class nlper.model.architecture.Seq2Seq(encoder: torch.nn.modules.module.Module, decoder: torch.nn.modules.module.Module)

Sequence to Sequence model, built using encoder and decoder.

Parameters
  • encoder (nn.Module) – Encoder model

  • decoder (nn.Module) – Decoder model with Bahdanau attention

forward(text: torch.Tensor, summary: torch.Tensor, teacher_forcing_ratio: float = 0.5) → torch.Tensor

Defines Seq2Seq structure and flow. Teacher forcing ratio specifies probability of altering the decoder output with the target summary token for the next word generation. Used to accelerate model learning time.

  • Feeds encoder with input indices

  • Initializes decoder hidden state as encoder hidden state

  • Initializes decoder output with Start of Sequence <sos> token

  • Initializes summary output vector

  • Until the maximum summary length is reached:
    • Feeds decoder with decoder output, hidden state and encoder output

    • Updates decoder output and hidden state

    • Updates summary output vector with decoder output token

    • With teacher_forcing_ratio probability alters decoder output

Parameters
  • text (torch.Tensor) – Indices of input text

  • summary (torch.Tensor) – Indices of target / reference summary

  • teacher_forcing_ratio (float) –

Returns

Output sequence / summary

Return type

torch.Tensor

model

class nlper.model.model.Model(config: Dict[str, Any], vocab_config: Any)

Utils of Seq2Seq model.

Executes: * model training * text prediction (summarization) * model evaluation * saving and loading of a model

Parameters
  • config (dict) – Config dictionary

  • vocab_config (VocabConfig) – Vocabulary config for model

create_model() → None

Initializes full Seq2Seq model with encoder and decoder as specified in config file.

create_optimizers_and_loss() → None

Initializes Adam optimizer for Seq2Seq model and learning rate scheduler as specified in config file. Initialized CrossEntropyLoss with ignoring padding token <pad> from sequence.

evaluate(valid_iterator: Any) → List[torch.Tensor]

Evaluates the trained Seq2Seq model performance.

Parameters

valid_iterator (torchtext.data.BucketIterator) – Valid or test iterator

Returns

get_text_summary_from_batch(batch) → Tuple[torch.Tensor, torch.Tensor]

Obtains original text and target summary indices from batch and transforms to GPU :param batch: :type batch: torchtext.data.batch.Batch :return: Text and summary indices for model :rtype: tuple

load_model(model_path: str, attention_param_path: str = None) → None

Loads trained model and transfers to GPU. Currently attention V parameter is also saved and loaded, cause PyTorch does not supports nn.Parameter saving directly.

Parameters
  • model_path (str) – Path to trained model

  • attention_param_path (str) – Path to trained attention parameter

predict(text: str, length_of_original_text: float = 0.25) → Tuple[str, torch.Tensor]

Predicts model output / summarizes given text. Obtains summarization with defined maximum percentage of length of original text. Returns summarization and attention weights to plot attention heatmap.

Parameters
  • text (str) – Original text to summarize

  • length_of_original_text (float) – Maximum ratio of summary length comparing to original text

Returns

summary text and attention weights

Return type

tuple

save_model(model_path: str, model_epoch: int) → None

Saves trained model weights after epoch, transferred to CPU. Currently attention V parameter is also saved, cause PyTorch does not supports nn.Parameter saving directly.

Parameters
  • model_path (str) – Path to save model

  • model_epoch (int) – Model epoch

show_loss(batch_id: int, loss: torch.Tensor, train_iterator: Any) → None

Logs loss value for specified batch.

Parameters
  • batch_id (int) – Number of batch

  • loss – Loss value for batch

  • loss – torch.Tensor

  • train_iterator (torchtext.data.BucketIterator) – Train iterator

show_rouge_and_attention_matrix(epoch: int, batch_id: int, text: torch.Tensor, summary: torch.Tensor) → None

Calls rouge metric calculation and attention heatmap drawing.

Parameters
  • epoch (int) – Current training epoch.

  • batch_id (int) – Number of batch

  • text (torch.Tensor) – Model input / original text indices tensor

  • summary (torch.Tensor) – Model generated summary text indices tensor

train(train_iterator: Any, epoch: int = 0) → List[torch.Tensor]

Executes model training.

Parameters
  • train_iterator (torchtext.data.BucketIterator) – Iterator over training dataset

  • epoch (int) – Current training epoch.

Returns

Training loss values for batches

Return type

list