初始化项目,由ModelHub XC社区提供模型

Model: JacobLinCool/whisper-large-v3-turbo-common_voice_19_0-zh-TW
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-08 11:40:38 +08:00
commit 5dbb14cb03
12 changed files with 117112 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

102
README.md Normal file
View File

@@ -0,0 +1,102 @@
---
library_name: transformers
language:
- zh
license: mit
base_model: openai/whisper-large-v3-turbo
tags:
- wft
- whisper
- automatic-speech-recognition
- audio
- speech
- generated_from_trainer
datasets:
- JacobLinCool/common_voice_19_0_zh-TW
metrics:
- wer
model-index:
- name: whisper-large-v3-turbo-common_voice_19_0-zh-TW
results:
- task:
type: automatic-speech-recognition
name: Automatic Speech Recognition
dataset:
name: JacobLinCool/common_voice_19_0_zh-TW
type: JacobLinCool/common_voice_19_0_zh-TW
metrics:
- type: wer
value: 32.55535607420706
name: Wer
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# whisper-large-v3-turbo-common_voice_19_0-zh-TW
This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) on the JacobLinCool/common_voice_19_0_zh-TW dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1786
- Wer: 32.5554
- Cer: 8.6009
- Decode Runtime: 90.9833
- Wer Runtime: 0.1257
- Cer Runtime: 0.1534
## Model description
This is an open-source Traditional Chinese (Taiwan) automatic speech recognition (ASR) model.
## Intended uses & limitations
This model is designed to be a prompt-free ASR model for Traditional Chinese. Due to its inherited language identification (LID) system from Whisper, which supports other Chinese language variants under the same language token (`zh`), we expect that performance may degrade when transcribing Simplified Chinese.
The model is free to use under the MIT license.
## Training and evaluation data
This model was trained on the [Common Voice Corpus 19.0 Chinese (Taiwan) Subset](https://huggingface.co/datasets/JacobLinCool/common_voice_19_0_zh-TW), containing about 50k training examples (44 hours) and 5k test examples (5 hours). This dataset is four times larger than the combination of training and validation set (`train+validation`) of [mozilla-foundation/common_voice_16_1](https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1), which includes about 12k examples.
## Training procedure
[Tensorboard](https://huggingface.co/JacobLinCool/whisper-large-v3-turbo-common_voice_19_0-zh-TW-lora/tensorboard)
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 4
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 50
- training_steps: 5000
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer | Cer | Decode Runtime | Wer Runtime | Cer Runtime |
|:-------------:|:------:|:----:|:---------------:|:-------:|:-------:|:--------------:|:-----------:|:-----------:|
| No log | 0 | 0 | 2.7208 | 76.5011 | 20.4851 | 89.4916 | 0.1213 | 0.1639 |
| 1.1832 | 0.1 | 500 | 0.1939 | 39.9561 | 10.8721 | 90.0926 | 0.1222 | 0.1555 |
| 1.5179 | 0.2 | 1000 | 0.1774 | 37.6621 | 9.9322 | 89.8657 | 0.1225 | 0.1545 |
| 0.6179 | 0.3 | 1500 | 0.1796 | 36.2657 | 9.8325 | 90.2480 | 0.1198 | 0.1573 |
| 0.3626 | 1.0912 | 2000 | 0.1846 | 36.2258 | 9.7801 | 90.3306 | 0.1196 | 0.1539 |
| 0.1311 | 1.1912 | 2500 | 0.1776 | 34.8095 | 9.3214 | 90.3124 | 0.1286 | 0.1610 |
| 0.1263 | 1.2912 | 3000 | 0.1763 | 36.1261 | 9.3563 | 90.4271 | 0.1330 | 0.1650 |
| 0.2194 | 2.0825 | 3500 | 0.1891 | 34.6898 | 9.3114 | 91.1932 | 0.1320 | 0.1643 |
| 0.1127 | 2.1825 | 4000 | 0.1838 | 34.0714 | 9.1095 | 90.2416 | 0.1196 | 0.1529 |
| 0.3792 | 2.2824 | 4500 | 0.1786 | 33.1339 | 8.7679 | 90.9144 | 0.1310 | 0.1550 |
| 0.0606 | 3.0737 | 5000 | 0.1786 | 32.5554 | 8.6009 | 90.9833 | 0.1257 | 0.1534 |
### Framework versions
- PEFT 0.13.2
- Transformers 4.46.1
- Pytorch 2.4.0
- Datasets 3.0.2
- Tokenizers 0.20.1

1611
added_tokens.json Normal file

File diff suppressed because it is too large Load Diff

50
config.json Normal file
View File

@@ -0,0 +1,50 @@
{
"_name_or_path": "openai/whisper-large-v3-turbo",
"activation_dropout": 0.0,
"activation_function": "gelu",
"apply_spec_augment": false,
"architectures": [
"WhisperForConditionalGeneration"
],
"attention_dropout": 0.0,
"begin_suppress_tokens": [
220,
50256
],
"bos_token_id": 50257,
"classifier_proj_size": 256,
"d_model": 1280,
"decoder_attention_heads": 20,
"decoder_ffn_dim": 5120,
"decoder_layerdrop": 0.0,
"decoder_layers": 4,
"decoder_start_token_id": 50258,
"dropout": 0.0,
"encoder_attention_heads": 20,
"encoder_ffn_dim": 5120,
"encoder_layerdrop": 0.0,
"encoder_layers": 32,
"eos_token_id": 50257,
"forced_decoder_ids": null,
"init_std": 0.02,
"is_encoder_decoder": true,
"mask_feature_length": 10,
"mask_feature_min_masks": 0,
"mask_feature_prob": 0.0,
"mask_time_length": 10,
"mask_time_min_masks": 2,
"mask_time_prob": 0.05,
"max_source_positions": 1500,
"max_target_positions": 448,
"median_filter_width": 7,
"model_type": "whisper",
"num_hidden_layers": 32,
"num_mel_bins": 128,
"pad_token_id": 50257,
"scale_embedding": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.46.1",
"use_cache": false,
"use_weighted_layer_sum": false,
"vocab_size": 51866
}

160
generation_config.json Normal file
View File

@@ -0,0 +1,160 @@
{
"alignment_heads": [
[
2,
4
],
[
2,
11
],
[
3,
3
],
[
3,
6
],
[
3,
11
],
[
3,
14
]
],
"begin_suppress_tokens": [
220,
50257
],
"bos_token_id": 50257,
"decoder_start_token_id": 50258,
"eos_token_id": 50257,
"forced_decoder_ids": [
[
1,
null
],
[
2,
50360
]
],
"is_multilingual": true,
"lang_to_id": {
"<|af|>": 50327,
"<|am|>": 50334,
"<|ar|>": 50272,
"<|as|>": 50350,
"<|az|>": 50304,
"<|ba|>": 50355,
"<|be|>": 50330,
"<|bg|>": 50292,
"<|bn|>": 50302,
"<|bo|>": 50347,
"<|br|>": 50309,
"<|bs|>": 50315,
"<|ca|>": 50270,
"<|cs|>": 50283,
"<|cy|>": 50297,
"<|da|>": 50285,
"<|de|>": 50261,
"<|el|>": 50281,
"<|en|>": 50259,
"<|es|>": 50262,
"<|et|>": 50307,
"<|eu|>": 50310,
"<|fa|>": 50300,
"<|fi|>": 50277,
"<|fo|>": 50338,
"<|fr|>": 50265,
"<|gl|>": 50319,
"<|gu|>": 50333,
"<|haw|>": 50352,
"<|ha|>": 50354,
"<|he|>": 50279,
"<|hi|>": 50276,
"<|hr|>": 50291,
"<|ht|>": 50339,
"<|hu|>": 50286,
"<|hy|>": 50312,
"<|id|>": 50275,
"<|is|>": 50311,
"<|it|>": 50274,
"<|ja|>": 50266,
"<|jw|>": 50356,
"<|ka|>": 50329,
"<|kk|>": 50316,
"<|km|>": 50323,
"<|kn|>": 50306,
"<|ko|>": 50264,
"<|la|>": 50294,
"<|lb|>": 50345,
"<|ln|>": 50353,
"<|lo|>": 50336,
"<|lt|>": 50293,
"<|lv|>": 50301,
"<|mg|>": 50349,
"<|mi|>": 50295,
"<|mk|>": 50308,
"<|ml|>": 50296,
"<|mn|>": 50314,
"<|mr|>": 50320,
"<|ms|>": 50282,
"<|mt|>": 50343,
"<|my|>": 50346,
"<|ne|>": 50313,
"<|nl|>": 50271,
"<|nn|>": 50342,
"<|no|>": 50288,
"<|oc|>": 50328,
"<|pa|>": 50321,
"<|pl|>": 50269,
"<|ps|>": 50340,
"<|pt|>": 50267,
"<|ro|>": 50284,
"<|ru|>": 50263,
"<|sa|>": 50344,
"<|sd|>": 50332,
"<|si|>": 50322,
"<|sk|>": 50298,
"<|sl|>": 50305,
"<|sn|>": 50324,
"<|so|>": 50326,
"<|sq|>": 50317,
"<|sr|>": 50303,
"<|su|>": 50357,
"<|sv|>": 50273,
"<|sw|>": 50318,
"<|ta|>": 50287,
"<|te|>": 50299,
"<|tg|>": 50331,
"<|th|>": 50289,
"<|tk|>": 50341,
"<|tl|>": 50348,
"<|tr|>": 50268,
"<|tt|>": 50351,
"<|uk|>": 50280,
"<|ur|>": 50290,
"<|uz|>": 50337,
"<|vi|>": 50278,
"<|yi|>": 50335,
"<|yo|>": 50325,
"<|yue|>": 50358,
"<|zh|>": 50260
},
"max_initial_timestamp_index": 50,
"max_length": 448,
"no_timestamps_token_id": 50364,
"pad_token_id": 50257,
"prev_sot_token_id": 50362,
"return_timestamps": false,
"suppress_tokens": [],
"task_to_id": {
"transcribe": 50360,
"translate": 50359
},
"transformers_version": "4.46.1"
}

50001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:506ea515a0be7ab9a0912f2211233655369dbc47f6ff39258870c29b45ff833d
size 1617825448

1742
normalizer.json Normal file

File diff suppressed because it is too large Load Diff

14
preprocessor_config.json Normal file
View File

@@ -0,0 +1,14 @@
{
"chunk_length": 30,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 128,
"hop_length": 160,
"n_fft": 400,
"n_samples": 480000,
"nb_max_frames": 3000,
"padding_side": "right",
"padding_value": 0.0,
"processor_class": "WhisperProcessor",
"return_attention_mask": false,
"sampling_rate": 16000
}

139
special_tokens_map.json Normal file
View File

@@ -0,0 +1,139 @@
{
"additional_special_tokens": [
"<|startoftranscript|>",
"<|en|>",
"<|zh|>",
"<|de|>",
"<|es|>",
"<|ru|>",
"<|ko|>",
"<|fr|>",
"<|ja|>",
"<|pt|>",
"<|tr|>",
"<|pl|>",
"<|ca|>",
"<|nl|>",
"<|ar|>",
"<|sv|>",
"<|it|>",
"<|id|>",
"<|hi|>",
"<|fi|>",
"<|vi|>",
"<|he|>",
"<|uk|>",
"<|el|>",
"<|ms|>",
"<|cs|>",
"<|ro|>",
"<|da|>",
"<|hu|>",
"<|ta|>",
"<|no|>",
"<|th|>",
"<|ur|>",
"<|hr|>",
"<|bg|>",
"<|lt|>",
"<|la|>",
"<|mi|>",
"<|ml|>",
"<|cy|>",
"<|sk|>",
"<|te|>",
"<|fa|>",
"<|lv|>",
"<|bn|>",
"<|sr|>",
"<|az|>",
"<|sl|>",
"<|kn|>",
"<|et|>",
"<|mk|>",
"<|br|>",
"<|eu|>",
"<|is|>",
"<|hy|>",
"<|ne|>",
"<|mn|>",
"<|bs|>",
"<|kk|>",
"<|sq|>",
"<|sw|>",
"<|gl|>",
"<|mr|>",
"<|pa|>",
"<|si|>",
"<|km|>",
"<|sn|>",
"<|yo|>",
"<|so|>",
"<|af|>",
"<|oc|>",
"<|ka|>",
"<|be|>",
"<|tg|>",
"<|sd|>",
"<|gu|>",
"<|am|>",
"<|yi|>",
"<|lo|>",
"<|uz|>",
"<|fo|>",
"<|ht|>",
"<|ps|>",
"<|tk|>",
"<|nn|>",
"<|mt|>",
"<|sa|>",
"<|lb|>",
"<|my|>",
"<|bo|>",
"<|tl|>",
"<|mg|>",
"<|as|>",
"<|tt|>",
"<|haw|>",
"<|ln|>",
"<|ha|>",
"<|ba|>",
"<|jw|>",
"<|su|>",
"<|yue|>",
"<|translate|>",
"<|transcribe|>",
"<|startoflm|>",
"<|startofprev|>",
"<|nospeech|>",
"<|notimestamps|>"
],
"bos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

12996
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

50259
vocab.json Normal file

File diff suppressed because it is too large Load Diff