初始化项目,由ModelHub XC社区提供模型
Model: wannaphong/wav2vec2-large-xlsr-53-th-cv8-newmm Source: Original Platform
This commit is contained in:
29
.gitattributes
vendored
Normal file
29
.gitattributes
vendored
Normal file
@@ -0,0 +1,29 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zstandard filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
language_model/3gram_correct.arpa filter=lfs diff=lfs merge=lfs -text
|
||||
model.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
76
README.md
Normal file
76
README.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
language:
|
||||
- th
|
||||
tags:
|
||||
- automatic-speech-recognition
|
||||
license: apache-2.0
|
||||
datasets:
|
||||
- common_voice
|
||||
metrics:
|
||||
- wer
|
||||
- cer
|
||||
---
|
||||
|
||||
# Thai Wav2Vec2 with CommonVoice V8 (newmm tokenizer) + language model
|
||||
|
||||
This model trained with CommonVoice V8 dataset by increase data from CommonVoice V7 dataset that It was use in [airesearch/wav2vec2-large-xlsr-53-th](https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th). It was finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53).
|
||||
|
||||
## Model description
|
||||
- Technical report: [Thai Wav2Vec2.0 with CommonVoice V8](https://arxiv.org/abs/2208.04799)
|
||||
|
||||
## Datasets
|
||||
|
||||
It is increase new data from The Common Voice V8 dataset to Common Voice V7 dataset or remove all data in Common Voice V7 dataset before split Common Voice V8 then add CommonVoice V7 dataset back to dataset.
|
||||
|
||||
It use [ekapolc/Thai_commonvoice_split](https://github.com/ekapolc/Thai_commonvoice_split) script for split Common Voice dataset.
|
||||
|
||||
## Models
|
||||
|
||||
This model was finetune [wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) model with Thai Common Voice V8 dataset and It use pre-tokenize with `pythainlp.tokenize.word_tokenize`.
|
||||
|
||||
## Training
|
||||
|
||||
I used many code from [vistec-AI/wav2vec2-large-xlsr-53-th](https://github.com/vistec-AI/wav2vec2-large-xlsr-53-th) and I fixed bug training code in [vistec-AI/wav2vec2-large-xlsr-53-th#2](https://github.com/vistec-AI/wav2vec2-large-xlsr-53-th/pull/2)
|
||||
|
||||
## Evaluation
|
||||
|
||||
**Test with CommonVoice V8 Testset**
|
||||
|
||||
| Model | WER by newmm (%) | WER by deepcut (%) | CER |
|
||||
|-----------------------|------------------|--------------------|----------|
|
||||
| AIResearch.in.th and PyThaiNLP | 17.414503 | 11.923089 | 3.854153 |
|
||||
| wav2vec2 with deepcut | 16.354521 | 11.424476 | 3.684060 |
|
||||
| wav2vec2 with newmm | 16.698299 | 11.436941 | 3.737407 |
|
||||
| wav2vec2 with deepcut + language model | 12.630260 | 9.613886 | 3.292073 |
|
||||
| **wav2vec2 with newmm + language model** | 12.583706 | 9.598305 | 3.276610 |
|
||||
|
||||
**Test with CommonVoice V7 Testset (same test by CV V7)**
|
||||
|
||||
| Model | WER by newmm (%) | WER by deepcut (%) | CER |
|
||||
|-----------------------|------------------|--------------------|----------|
|
||||
| AIResearch.in.th and PyThaiNLP | 13.936698 | 9.347462 | 2.804787 |
|
||||
| wav2vec2 with deepcut | 12.776381 | 8.773006 | 2.628882 |
|
||||
| wav2vec2 with newmm | 12.750596 | 8.672616 | 2.623341 |
|
||||
| wav2vec2 with deepcut + language model | 9.940050 | 7.423313 | 2.344940 |
|
||||
| **wav2vec2 with newmm + language model** | 9.559724 | 7.339654 | 2.277071 |
|
||||
|
||||
|
||||
This is use same testset from [https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th](https://huggingface.co/airesearch/wav2vec2-large-xlsr-53-th).
|
||||
|
||||
|
||||
**Links:**
|
||||
- GitHub Dataset: [https://github.com/wannaphong/thai_commonvoice_dataset](https://github.com/wannaphong/thai_commonvoice_dataset)
|
||||
- Technical report: [Thai Wav2Vec2.0 with CommonVoice V8](https://arxiv.org/abs/2208.04799)
|
||||
|
||||
## BibTeX entry and citation info
|
||||
|
||||
```
|
||||
@misc{phatthiyaphaibun2022thai,
|
||||
title={Thai Wav2Vec2.0 with CommonVoice V8},
|
||||
author={Wannaphong Phatthiyaphaibun and Chompakorn Chaksangchaichot and Peerat Limkonchotiwat and Ekapol Chuangsuwanich and Sarana Nutanong},
|
||||
year={2022},
|
||||
eprint={2208.04799},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CL}
|
||||
}
|
||||
```
|
||||
1
added_tokens.json
Normal file
1
added_tokens.json
Normal file
@@ -0,0 +1 @@
|
||||
{"<s>": 70, "</s>": 71}
|
||||
1
alphabet.json
Normal file
1
alphabet.json
Normal file
@@ -0,0 +1 @@
|
||||
{"labels": ["\u0e37", "\u0e0a", "\u0e2e", "\u0e08", "\u0e31", "\u0e2b", "\u0e09", "\u0e04", "\u0e02", "\u0e06", "\u0e49", "\u0e27", "\u0e22", "\u0e2f", "\u0e43", "\u0e20", "\u0e23", "\u0e48", "\u0e14", "\u0e33", "\u0e42", "\u0e4b", "\u0e24", "\u0e0e", "\u0e45", "\u0e40", "\u0e16", "\u0e1c", "\u0e34", "\u0e13", "\u0e1a", "\u0e1d", "\u0e0c", "\u0e17", "\u0e15", "\u0e10", "\u0e11", "\u0e12", "\u0e28", "\u0e4a", "\u0e2c", "\u0e30", "\u0e41", "\u0e01", "\u0e47", "\u0e39", " ", "\u0e32", "\u0e0d", "\u0e18", "\u0e36", "\u0e07", "\u0e29", "\u0e35", "\u0e19", "\u0e25", "\u0e0b", "\u0e0f", "\u0e2a", "\u0e44", "\u0e38", "\u0e2d", "\u0e1e", "\u0e1f", "\u0e03", "\u0e1b", "\u0e21", "\u0e4c", "\u2047", "", "<s>", "</s>"], "is_bpe": false}
|
||||
115
config.json
Normal file
115
config.json
Normal file
@@ -0,0 +1,115 @@
|
||||
{
|
||||
"_name_or_path": "facebook/wav2vec2-large-xlsr-53",
|
||||
"activation_dropout": 0.0,
|
||||
"adapter_kernel_size": 3,
|
||||
"adapter_stride": 2,
|
||||
"add_adapter": false,
|
||||
"apply_spec_augment": true,
|
||||
"architectures": [
|
||||
"Wav2Vec2ForCTC"
|
||||
],
|
||||
"attention_dropout": 0.1,
|
||||
"bos_token_id": 1,
|
||||
"classifier_proj_size": 256,
|
||||
"codevector_dim": 768,
|
||||
"contrastive_logits_temperature": 0.1,
|
||||
"conv_bias": true,
|
||||
"conv_dim": [
|
||||
512,
|
||||
512,
|
||||
512,
|
||||
512,
|
||||
512,
|
||||
512,
|
||||
512
|
||||
],
|
||||
"conv_kernel": [
|
||||
10,
|
||||
3,
|
||||
3,
|
||||
3,
|
||||
3,
|
||||
2,
|
||||
2
|
||||
],
|
||||
"conv_stride": [
|
||||
5,
|
||||
2,
|
||||
2,
|
||||
2,
|
||||
2,
|
||||
2,
|
||||
2
|
||||
],
|
||||
"ctc_loss_reduction": "mean",
|
||||
"ctc_zero_infinity": false,
|
||||
"diversity_loss_weight": 0.1,
|
||||
"do_stable_layer_norm": true,
|
||||
"eos_token_id": 2,
|
||||
"feat_extract_activation": "gelu",
|
||||
"feat_extract_dropout": 0.0,
|
||||
"feat_extract_norm": "layer",
|
||||
"feat_proj_dropout": 0.0,
|
||||
"feat_quantizer_dropout": 0.0,
|
||||
"final_dropout": 0.0,
|
||||
"hidden_act": "gelu",
|
||||
"hidden_dropout": 0.1,
|
||||
"hidden_size": 1024,
|
||||
"initializer_range": 0.02,
|
||||
"intermediate_size": 4096,
|
||||
"layer_norm_eps": 1e-05,
|
||||
"layerdrop": 0.1,
|
||||
"mask_channel_length": 10,
|
||||
"mask_channel_min_space": 1,
|
||||
"mask_channel_other": 0.0,
|
||||
"mask_channel_prob": 0.0,
|
||||
"mask_channel_selection": "static",
|
||||
"mask_feature_length": 10,
|
||||
"mask_feature_min_masks": 0,
|
||||
"mask_feature_prob": 0.0,
|
||||
"mask_time_length": 10,
|
||||
"mask_time_min_masks": 2,
|
||||
"mask_time_min_space": 1,
|
||||
"mask_time_other": 0.0,
|
||||
"mask_time_prob": 0.05,
|
||||
"mask_time_selection": "static",
|
||||
"model_type": "wav2vec2",
|
||||
"num_adapter_layers": 3,
|
||||
"num_attention_heads": 16,
|
||||
"num_codevector_groups": 2,
|
||||
"num_codevectors_per_group": 320,
|
||||
"num_conv_pos_embedding_groups": 16,
|
||||
"num_conv_pos_embeddings": 128,
|
||||
"num_feat_extract_layers": 7,
|
||||
"num_hidden_layers": 24,
|
||||
"num_negatives": 100,
|
||||
"output_hidden_size": 1024,
|
||||
"pad_token_id": 69,
|
||||
"proj_codevector_dim": 768,
|
||||
"tdnn_dilation": [
|
||||
1,
|
||||
2,
|
||||
3,
|
||||
1,
|
||||
1
|
||||
],
|
||||
"tdnn_dim": [
|
||||
512,
|
||||
512,
|
||||
512,
|
||||
512,
|
||||
1500
|
||||
],
|
||||
"tdnn_kernel": [
|
||||
5,
|
||||
3,
|
||||
3,
|
||||
1,
|
||||
1
|
||||
],
|
||||
"torch_dtype": "float32",
|
||||
"transformers_version": "4.17.0",
|
||||
"use_weighted_layer_sum": false,
|
||||
"vocab_size": 72,
|
||||
"xvector_output_dim": 512
|
||||
}
|
||||
3
language_model/3gram.bin
Normal file
3
language_model/3gram.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:608af0f75d801457b16ba96d6068c59aa2318d9374bc14722ab06708de56f324
|
||||
size 9431499
|
||||
1
language_model/attrs.json
Normal file
1
language_model/attrs.json
Normal file
@@ -0,0 +1 @@
|
||||
{"alpha": 0.5, "beta": 1.5, "unk_score_offset": -10.0, "score_boundary": true}
|
||||
21098
language_model/unigrams.txt
Normal file
21098
language_model/unigrams.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f0135130a25f0f16a59cc376f886f9e54e0c1736f1b85c4c01e93b9f4cc4b090
|
||||
size 1262102632
|
||||
10
preprocessor_config.json
Normal file
10
preprocessor_config.json
Normal file
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"do_normalize": true,
|
||||
"feature_extractor_type": "Wav2Vec2FeatureExtractor",
|
||||
"feature_size": 1,
|
||||
"padding_side": "right",
|
||||
"padding_value": 0.0,
|
||||
"processor_class": "Wav2Vec2ProcessorWithLM",
|
||||
"return_attention_mask": false,
|
||||
"sampling_rate": 16000
|
||||
}
|
||||
3
pytorch_model.bin
Normal file
3
pytorch_model.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:cd00de33e508bc6094dc81e59069b3434ef1771ada269a6728f69970439c3ef9
|
||||
size 1262221489
|
||||
3
rng_state.pth
Normal file
3
rng_state.pth
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3cb82389f05c149e395fb250fadffa54638bc114b5b165f33377e2c38c7acbc8
|
||||
size 18647
|
||||
3
scaler.pt
Normal file
3
scaler.pt
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:17ed9e914777e8e0e26e4ea887f50f9fbb4dd6b7a33099c61ed027e612089d8d
|
||||
size 559
|
||||
3
scheduler.pt
Normal file
3
scheduler.pt
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:5e2b7b4d5e5abc7ad90139ee4e4b708feb908bc6a03baa728d2231450a53d462
|
||||
size 623
|
||||
1
special_tokens_map.json
Normal file
1
special_tokens_map.json
Normal file
@@ -0,0 +1 @@
|
||||
{"bos_token": "<s>", "eos_token": "</s>", "unk_token": "[UNK]", "pad_token": "[PAD]", "additional_special_tokens": [{"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "<s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}, {"content": "</s>", "single_word": false, "lstrip": false, "rstrip": false, "normalized": true}]}
|
||||
1
tokenizer_config.json
Normal file
1
tokenizer_config.json
Normal file
@@ -0,0 +1 @@
|
||||
{"unk_token": "[UNK]", "bos_token": "<s>", "eos_token": "</s>", "pad_token": "[PAD]", "do_lower_case": false, "word_delimiter_token": "|", "replace_word_delimiter_char": " ", "special_tokens_map_file": null, "name_or_path": "./thai-asr-cv8", "processor_class": "Wav2Vec2ProcessorWithLM"}
|
||||
1486
trainer_state.json
Normal file
1486
trainer_state.json
Normal file
File diff suppressed because it is too large
Load Diff
3
training_args.bin
Normal file
3
training_args.bin
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0cf4ef9abbfd40cb8d9509784468dc10e7742b233f8b0a9653b55deb0f8a6d30
|
||||
size 3055
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
@@ -0,0 +1 @@
|
||||
{"ื": 0, "ช": 1, "ฮ": 2, "จ": 3, "ั": 4, "ห": 5, "ฉ": 6, "ค": 7, "ข": 8, "ฆ": 9, "้": 10, "ว": 11, "ย": 12, "ฯ": 13, "ใ": 14, "ภ": 15, "ร": 16, "่": 17, "ด": 18, "ำ": 19, "โ": 20, "๋": 21, "ฤ": 22, "ฎ": 23, "ๅ": 24, "เ": 25, "ถ": 26, "ผ": 27, "ิ": 28, "ณ": 29, "บ": 30, "ฝ": 31, "ฌ": 32, "ท": 33, "ต": 34, "ฐ": 35, "ฑ": 36, "ฒ": 37, "ศ": 38, "๊": 39, "ฬ": 40, "ะ": 41, "แ": 42, "ก": 43, "็": 44, "ู": 45, "า": 47, "ญ": 48, "ธ": 49, "ึ": 50, "ง": 51, "ษ": 52, "ี": 53, "น": 54, "ล": 55, "ซ": 56, "ฏ": 57, "ส": 58, "ไ": 59, "ุ": 60, "อ": 61, "พ": 62, "ฟ": 63, "ฃ": 64, "ป": 65, "ม": 66, "์": 67, "|": 46, "[UNK]": 68, "[PAD]": 69}
|
||||
Reference in New Issue
Block a user