license, base_model, tags, datasets, metrics, model-index
license base_model tags datasets metrics model-index
apache-2.0 facebook/wav2vec2-xls-r-300m
generated_from_trainer
common_voice_13_0
wer
name results
wav2vec2-large-xls-r-300m-korean
task dataset metrics
name type
Automatic Speech Recognition automatic-speech-recognition
name type config split args
common_voice_13_0 common_voice_13_0 ko test ko
name type value
Wer wer 0.5931520644511581

wav2vec2-large-xls-r-300m-korean

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice_13_0 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4687
  • Wer: 0.5932

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 300

Training results

Training Loss Epoch Step Validation Loss Wer
20.8922 6.25 400 4.6827 0.9990
4.0513 12.5 800 2.3657 0.9204
1.5386 18.75 1200 1.2355 0.7392
0.7429 25.0 1600 1.1179 0.6636
0.3746 31.25 2000 1.0465 0.6314
0.2407 37.5 2400 1.1492 0.6596
0.1966 43.75 2800 1.1291 0.6344
0.1697 50.0 3200 1.1897 0.6395
0.1533 56.25 3600 1.2202 0.6193
0.129 62.5 4000 1.2106 0.6516
0.1097 68.75 4400 1.1662 0.6254
0.102 75.0 4800 1.2086 0.6133
0.0918 81.25 5200 1.2295 0.6485
0.0806 87.5 5600 1.2861 0.6123
0.0738 93.75 6000 1.2436 0.6093
0.0697 100.0 6400 1.3496 0.6626
0.0667 106.25 6800 1.2364 0.6133
0.0591 112.5 7200 1.2689 0.6062
0.054 118.75 7600 1.2886 0.6183
0.0523 125.0 8000 1.3328 0.6445
0.0542 131.25 8400 1.4019 0.6133
0.045 137.5 8800 1.3426 0.6042
0.0425 143.75 9200 1.3042 0.6032
0.0378 150.0 9600 1.3638 0.6224
0.0354 156.25 10000 1.3397 0.6294
0.0282 162.5 10400 1.3939 0.6173
0.0288 168.75 10800 1.3674 0.6475
0.0278 175.0 11200 1.3636 0.6324
0.0239 181.25 11600 1.4101 0.6405
0.0238 187.5 12000 1.4528 0.6163
0.0214 193.75 12400 1.4458 0.6093
0.0194 200.0 12800 1.3920 0.6304
0.0168 206.25 13200 1.4277 0.6193
0.0168 212.5 13600 1.3959 0.6203
0.0154 218.75 14000 1.4043 0.6133
0.0144 225.0 14400 1.4508 0.6193
0.0134 231.25 14800 1.4309 0.6224
0.0109 237.5 15200 1.4301 0.6123
0.0107 243.75 15600 1.4373 0.6002
0.0098 250.0 16000 1.4147 0.6113
0.0095 256.25 16400 1.4585 0.6193
0.009 262.5 16800 1.4424 0.6203
0.0079 268.75 17200 1.5019 0.6193
0.0066 275.0 17600 1.4835 0.5932
0.0059 281.25 18000 1.4749 0.5992
0.0057 287.5 18400 1.4897 0.6002
0.0053 293.75 18800 1.4667 0.5901
0.0048 300.0 19200 1.4687 0.5932

Framework versions

  • Transformers 4.35.0.dev0
  • Pytorch 1.12.1
  • Datasets 2.14.5
  • Tokenizers 0.14.0
Description
Model synced from source: kresnik/wav2vec2-large-xls-r-300m-korean
Readme 33 KiB