Model trainer

main

nlper.trainer.__init__.main(config: str)

Executes the model training pipeline.

Parameters

config (str) – Path to config

application

class nlper.trainer.application.Application(config_path: str)

Model train application. Starts by initializing vocabulary config and writers for saving updated vocabulary and loss function results. By default model training starts with epoch number equal to 1, can be altered while fine tuning the model.

Parameters

config_path (str) – Path to yaml config file, by default loads example config

load_trained_model() → None

Calls method to load model parameters for particular epoch for fine tuning. Also alters the next epoch number to continue training.

prepare_and_save_vocab() → None

Sets vocabulary from torchtext.Field and saves it to file.

prepare_model() → None

Initializes model for training. Calls method to create optimizers and loss functions.

run() → None

Executes model training process.

save_loss(loss: List[float], name: str, epoch: int = 0) → None

Calls method to save model losses using particular loss type and after particular epoch.

Parameters
  • loss (list) – List with loss function values.

  • name (str) – Loss function type name, for example: train or text

  • epoch (int) – Number of training epoch to save loss after.

save_model(model_epoch: int) → None

Calls method to save model after particular epoch.

Parameters

model_epoch (int) – Number of training epoch to save model after.

train() → None

Performs training and evaluation of model having train, valid and test iterators. Calls saving model and loss after every epoch.

data loader

class nlper.trainer.data_loader.DataLoader(config: Dict[str, Any])

Data loader for model. Starts by initializing language utils. By default specifies disabled SpaCy language model options are ner and parser which significantly accelerates model training.

Parameters

config (dict) – Data loader config

build_vocab(dataset: torchtext.data.dataset.TabularDataset) → None

Builds vocabulary on dataset with definined special tokens and word frequency. The frequency is a minimum number of times a word must appear in dataset, to be placed into vocabulary.

Parameters

dataset (torchtext.TabularDataset) – Dataset to build vocabulary on

load() → Tuple[Tuple[torchtext.data.iterator.BucketIterator, torchtext.data.iterator.BucketIterator, torchtext.data.iterator.BucketIterator], torchtext.data.field.Field, torchtext.data.field.Field]

Calls language generation and data iterators creation for data loader.

Returns

Dataset iterators and fields with vocabulary

Return type

tuple

load_iterators(splits: Tuple[torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset]) → None

Obtains iterators for train, test, valid splits using torchtext BucketIterator

Parameters

splits (tuple) – Tuple of tabular datasets for particular type of split

load_splits() → Tuple[torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset]

Loads train, test, valid splits using torchtext TabularDataset feature with

Returns

load_splits_and_iterators() → None

Calls train, test and valid split loading; vocabulary initialization and loading data iterators.

prepare_fields() → None

Initializes torchtext Fields with base token and language

set_language() → None

Sets SpaCy language model to Polish with disables computation expensive language options.

Default disabled language options * ner * parser