Model trainer¶
main¶
-
nlper.trainer.__init__.main(config: str)¶ Executes the model training pipeline.
- Parameters
config (str) – Path to config
application¶
-
class
nlper.trainer.application.Application(config_path: str)¶ Model train application. Starts by initializing vocabulary config and writers for saving updated vocabulary and loss function results. By default model training starts with epoch number equal to 1, can be altered while fine tuning the model.
- Parameters
config_path (str) – Path to yaml config file, by default loads example config
-
load_trained_model() → None¶ Calls method to load model parameters for particular epoch for fine tuning. Also alters the next epoch number to continue training.
-
prepare_and_save_vocab() → None¶ Sets vocabulary from torchtext.Field and saves it to file.
-
prepare_model() → None¶ Initializes model for training. Calls method to create optimizers and loss functions.
-
run() → None¶ Executes model training process.
-
save_loss(loss: List[float], name: str, epoch: int = 0) → None¶ Calls method to save model losses using particular loss type and after particular epoch.
- Parameters
loss (list) – List with loss function values.
name (str) – Loss function type name, for example: train or text
epoch (int) – Number of training epoch to save loss after.
-
save_model(model_epoch: int) → None¶ Calls method to save model after particular epoch.
- Parameters
model_epoch (int) – Number of training epoch to save model after.
-
train() → None¶ Performs training and evaluation of model having train, valid and test iterators. Calls saving model and loss after every epoch.
data loader¶
-
class
nlper.trainer.data_loader.DataLoader(config: Dict[str, Any])¶ Data loader for model. Starts by initializing language utils. By default specifies disabled SpaCy language model options are
nerandparserwhich significantly accelerates model training.- Parameters
config (dict) – Data loader config
-
build_vocab(dataset: torchtext.data.dataset.TabularDataset) → None¶ Builds vocabulary on dataset with definined special tokens and word frequency. The frequency is a minimum number of times a word must appear in dataset, to be placed into vocabulary.
- Parameters
dataset (torchtext.TabularDataset) – Dataset to build vocabulary on
-
load() → Tuple[Tuple[torchtext.data.iterator.BucketIterator, torchtext.data.iterator.BucketIterator, torchtext.data.iterator.BucketIterator], torchtext.data.field.Field, torchtext.data.field.Field]¶ Calls language generation and data iterators creation for data loader.
- Returns
Dataset iterators and fields with vocabulary
- Return type
tuple
-
load_iterators(splits: Tuple[torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset]) → None¶ Obtains iterators for train, test, valid splits using torchtext BucketIterator
- Parameters
splits (tuple) – Tuple of tabular datasets for particular type of split
-
load_splits() → Tuple[torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset, torchtext.data.dataset.Dataset]¶ Loads train, test, valid splits using torchtext TabularDataset feature with
- Returns
-
load_splits_and_iterators() → None¶ Calls train, test and valid split loading; vocabulary initialization and loading data iterators.
-
prepare_fields() → None¶ Initializes torchtext Fields with base token and language
-
set_language() → None¶ Sets SpaCy language model to Polish with disables computation expensive language options.
Default disabled language options *
ner*parser