seed: 4711 network: desc: Configuration of the PyTorch neural network. type: VQVAE args: in_channels: 1 channels: [32, 64, 64] kernel_sizes: [4, 4, 4] strides: [2, 2, 2] num_residual_layers: 2 embedding_dim: 128 num_embeddings: 512 upsampling: null beta: 0.25 activation: leaky_relu dropout_rate: 0.2 model: desc: Configuration of the PyTorch Lightning model. type: LitVQVAEModel args: optimizer: type: MADGRAD args: lr: 1.0e-3 momentum: 0.9 weight_decay: 0 eps: 1.0e-6 lr_scheduler: type: OneCycleLR args: interval: &interval step max_lr: 1.0e-3 three_phase: true epochs: 64 steps_per_epoch: 633 # num_samples / batch_size criterion: type: MSELoss args: reduction: mean monitor: val_loss mapping: sentence_piece data: desc: Configuration of the training/test data. type: IAMExtendedParagraphs args: batch_size: 32 num_workers: 12 train_fraction: 0.8 augment: true callbacks: - type: ModelCheckpoint args: monitor: val_loss mode: min save_last: true - type: StochasticWeightAveraging args: swa_epoch_start: 0.8 swa_lrs: 0.05 annealing_epochs: 10 annealing_strategy: cos device: null - type: LearningRateMonitor args: logging_interval: *interval # - type: EarlyStopping # args: # monitor: val_loss # mode: min # patience: 10 trainer: desc: Configuration of the PyTorch Lightning Trainer. args: stochastic_weight_avg: true auto_scale_batch_size: binsearch gradient_clip_val: 0 fast_dev_run: false gpus: 1 precision: 16 max_epochs: 64 terminate_on_nan: true weights_summary: top load_checkpoint: null