Model trainer¶

main¶

nlper.trainer.__init__.main(config: str)¶

Executes the model training pipeline.

Parameters: config (str) – Path to config

application¶

class nlper.trainer.application.Application(config_path: str)¶

Model train application. Starts by initializing vocabulary config and writers for saving updated vocabulary and loss function results. By default model training starts with epoch number equal to 1, can be altered while fine tuning the model.

Parameters: config_path (str) – Path to yaml config file, by default loads example config

load_trained_model() → None¶: Calls method to load model parameters for particular epoch for fine tuning. Also alters the next epoch number to continue training.

prepare_and_save_vocab() → None¶: Sets vocabulary from torchtext.Field and saves it to file.

prepare_model() → None¶: Initializes model for training. Calls method to create optimizers and loss functions.

run() → None¶: Executes model training process.

save_loss(loss: List[float], name: str, epoch: int = 0) → None¶

Calls method to save model losses using particular loss type and after particular epoch.

Parameters

loss (list) – List with loss function values.
name (str) – Loss function type name, for example: train or text
epoch (int) – Number of training epoch to save loss after.

save_model(model_epoch: int) → None¶

Calls method to save model after particular epoch.

Parameters: model_epoch (int) – Number of training epoch to save model after.

train() → None¶: Performs training and evaluation of model having train, valid and test iterators. Calls saving model and loss after every epoch.

data loader¶

class nlper.trainer.data_loader.DataLoader(config: Dict[str, Any])¶

Data loader for model. Starts by initializing language utils. By default specifies disabled SpaCy language model options are ner and parser which significantly accelerates model training.

Parameters: config (dict) – Data loader config

build_vocab(dataset: torchtext.data.dataset.TabularDataset) → None¶

Builds vocabulary on dataset with definined special tokens and word frequency. The frequency is a minimum number of times a word must appear in dataset, to be placed into vocabulary.

Parameters: dataset (torchtext.TabularDataset) – Dataset to build vocabulary on

load() → Tuple[Tuple[torchtext.data.iterator.BucketIterator, torchtext.data.iterator.BucketIterator, torchtext.data.iterator.BucketIterator], torchtext.data.field.Field, torchtext.data.field.Field]¶

Calls language generation and data iterators creation for data loader.

Returns: Dataset iterators and fields with vocabulary
Return type: tuple

load_iterators(splits: Tuple[torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset]) → None¶

Obtains iterators for train, test, valid splits using torchtext BucketIterator

Parameters: splits (tuple) – Tuple of tabular datasets for particular type of split

load_splits() → Tuple[torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset]¶

Loads train, test, valid splits using torchtext TabularDataset feature with

Returns

load_splits_and_iterators() → None¶: Calls train, test and valid split loading; vocabulary initialization and loading data iterators.

prepare_fields() → None¶: Initializes torchtext Fields with base token and language

set_language() → None¶

Sets SpaCy language model to Polish with disables computation expensive language options.

Default disabled language options * ner * parser

Model trainer¶

main¶

application¶

data loader¶

NLPer

Navigation

Related Topics