初始化项目,由ModelHub XC社区提供模型

Model: zuazo/whisper-large-v3-es
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-13 20:42:50 +08:00
commit f6a263e81f
22 changed files with 188204 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

128
README.md Normal file
View File

@@ -0,0 +1,128 @@
---
language:
- es
license: apache-2.0
base_model: openai/whisper-large-v3
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_13_0
metrics:
- wer
model-index:
- name: Whisper Large-V3 Spanish
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: mozilla-foundation/common_voice_13_0 es
type: mozilla-foundation/common_voice_13_0
config: es
split: test
args: es
metrics:
- name: Wer
type: wer
value: 4.9295277686894154
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Whisper Large-V3 Spanish
This model is a fine-tuned version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) on the mozilla-foundation/common_voice_13_0 es dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3245
- Wer: 4.9295
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 32
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 20000
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:-----:|:---------------:|:------:|
| 0.058 | 2.04 | 1000 | 0.1540 | 4.6851 |
| 0.0124 | 4.07 | 2000 | 0.1829 | 4.6787 |
| 0.0052 | 6.11 | 3000 | 0.2190 | 4.8096 |
| 0.0024 | 8.15 | 4000 | 0.2289 | 4.8776 |
| 0.0024 | 10.18 | 5000 | 0.2341 | 4.8923 |
| 0.0015 | 12.22 | 6000 | 0.2459 | 4.9340 |
| 0.0021 | 14.26 | 7000 | 0.2558 | 4.9276 |
| 0.0011 | 16.29 | 8000 | 0.2540 | 5.1015 |
| 0.0013 | 18.33 | 9000 | 0.2611 | 5.1855 |
| 0.0005 | 20.37 | 10000 | 0.2720 | 4.9379 |
| 0.0028 | 22.4 | 11000 | 0.2614 | 5.0110 |
| 0.0004 | 24.44 | 12000 | 0.2652 | 4.9898 |
| 0.0004 | 26.48 | 13000 | 0.2850 | 4.9776 |
| 0.0006 | 28.51 | 14000 | 0.2736 | 4.9732 |
| 0.0002 | 30.55 | 15000 | 0.2944 | 5.1566 |
| 0.0002 | 32.59 | 16000 | 0.2949 | 5.0007 |
| 0.0001 | 34.62 | 17000 | 0.3094 | 4.9552 |
| 0.0 | 36.66 | 18000 | 0.3185 | 4.9622 |
| 0.0 | 38.7 | 19000 | 0.3229 | 4.9462 |
| 0.0 | 40.73 | 20000 | 0.3245 | 4.9295 |
### Framework versions
- Transformers 4.37.2
- Pytorch 2.2.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.1
## Citation
If you use these models in your research, please cite:
```bibtex
@misc{dezuazo2025whisperlmimprovingasrmodels,
title={Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages},
author={Xabier de Zuazo and Eva Navas and Ibon Saratxaga and Inma Hernáez Rioja},
year={2025},
eprint={2503.23542},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.23542},
}
```
Please, check the related paper preprint in
[arXiv:2503.23542](https://arxiv.org/abs/2503.23542)
for more details.
## Licensing
This model is available under the
[Apache-2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
You are free to use, modify, and distribute this model as long as you credit
the original creators.

1611
added_tokens.json Normal file

File diff suppressed because it is too large Load Diff

12
all_results.json Normal file
View File

@@ -0,0 +1,12 @@
{
"epoch": 40.73,
"eval_loss": 0.32449695467948914,
"eval_runtime": 3201.1248,
"eval_samples_per_second": 4.907,
"eval_steps_per_second": 0.307,
"eval_wer": 4.9295277686894154,
"train_loss": 0.009992324647886562,
"train_runtime": 175663.8097,
"train_samples_per_second": 7.287,
"train_steps_per_second": 0.114
}

51
config.json Normal file
View File

@@ -0,0 +1,51 @@
{
"_name_or_path": "openai/whisper-large-v3",
"activation_dropout": 0.0,
"activation_function": "gelu",
"apply_spec_augment": false,
"architectures": [
"WhisperForConditionalGeneration"
],
"attention_dropout": 0.0,
"begin_suppress_tokens": [
220,
50257
],
"bos_token_id": 50257,
"classifier_proj_size": 256,
"d_model": 1280,
"decoder_attention_heads": 20,
"decoder_ffn_dim": 5120,
"decoder_layerdrop": 0.0,
"decoder_layers": 32,
"decoder_start_token_id": 50258,
"dropout": 0.0,
"encoder_attention_heads": 20,
"encoder_ffn_dim": 5120,
"encoder_layerdrop": 0.0,
"encoder_layers": 32,
"eos_token_id": 50257,
"forced_decoder_ids": null,
"init_std": 0.02,
"is_encoder_decoder": true,
"mask_feature_length": 10,
"mask_feature_min_masks": 0,
"mask_feature_prob": 0.0,
"mask_time_length": 10,
"mask_time_min_masks": 2,
"mask_time_prob": 0.05,
"max_length": 448,
"max_source_positions": 1500,
"max_target_positions": 448,
"median_filter_width": 7,
"model_type": "whisper",
"num_hidden_layers": 32,
"num_mel_bins": 128,
"pad_token_id": 50256,
"scale_embedding": false,
"torch_dtype": "float32",
"transformers_version": "4.37.2",
"use_cache": false,
"use_weighted_layer_sum": false,
"vocab_size": 51866
}

8
eval_results.json Normal file
View File

@@ -0,0 +1,8 @@
{
"epoch": 40.73,
"eval_loss": 0.32449695467948914,
"eval_runtime": 3201.1248,
"eval_samples_per_second": 4.907,
"eval_steps_per_second": 0.307,
"eval_wer": 4.9295277686894154
}

265
generation_config.json Normal file
View File

@@ -0,0 +1,265 @@
{
"alignment_heads": [
[
7,
0
],
[
10,
17
],
[
12,
18
],
[
13,
12
],
[
16,
1
],
[
17,
14
],
[
19,
11
],
[
21,
4
],
[
24,
1
],
[
25,
6
]
],
"begin_suppress_tokens": [
220,
50257
],
"bos_token_id": 50257,
"decoder_start_token_id": 50258,
"eos_token_id": 50257,
"forced_decoder_ids": [
[
1,
null
],
[
2,
50360
]
],
"is_multilingual": true,
"lang_to_id": {
"<|af|>": 50327,
"<|am|>": 50334,
"<|ar|>": 50272,
"<|as|>": 50350,
"<|az|>": 50304,
"<|ba|>": 50355,
"<|be|>": 50330,
"<|bg|>": 50292,
"<|bn|>": 50302,
"<|bo|>": 50347,
"<|br|>": 50309,
"<|bs|>": 50315,
"<|ca|>": 50270,
"<|cs|>": 50283,
"<|cy|>": 50297,
"<|da|>": 50285,
"<|de|>": 50261,
"<|el|>": 50281,
"<|en|>": 50259,
"<|es|>": 50262,
"<|et|>": 50307,
"<|eu|>": 50310,
"<|fa|>": 50300,
"<|fi|>": 50277,
"<|fo|>": 50338,
"<|fr|>": 50265,
"<|gl|>": 50319,
"<|gu|>": 50333,
"<|haw|>": 50352,
"<|ha|>": 50354,
"<|he|>": 50279,
"<|hi|>": 50276,
"<|hr|>": 50291,
"<|ht|>": 50339,
"<|hu|>": 50286,
"<|hy|>": 50312,
"<|id|>": 50275,
"<|is|>": 50311,
"<|it|>": 50274,
"<|ja|>": 50266,
"<|jw|>": 50356,
"<|ka|>": 50329,
"<|kk|>": 50316,
"<|km|>": 50323,
"<|kn|>": 50306,
"<|ko|>": 50264,
"<|la|>": 50294,
"<|lb|>": 50345,
"<|ln|>": 50353,
"<|lo|>": 50336,
"<|lt|>": 50293,
"<|lv|>": 50301,
"<|mg|>": 50349,
"<|mi|>": 50295,
"<|mk|>": 50308,
"<|ml|>": 50296,
"<|mn|>": 50314,
"<|mr|>": 50320,
"<|ms|>": 50282,
"<|mt|>": 50343,
"<|my|>": 50346,
"<|ne|>": 50313,
"<|nl|>": 50271,
"<|nn|>": 50342,
"<|no|>": 50288,
"<|oc|>": 50328,
"<|pa|>": 50321,
"<|pl|>": 50269,
"<|ps|>": 50340,
"<|pt|>": 50267,
"<|ro|>": 50284,
"<|ru|>": 50263,
"<|sa|>": 50344,
"<|sd|>": 50332,
"<|si|>": 50322,
"<|sk|>": 50298,
"<|sl|>": 50305,
"<|sn|>": 50324,
"<|so|>": 50326,
"<|sq|>": 50317,
"<|sr|>": 50303,
"<|su|>": 50357,
"<|sv|>": 50273,
"<|sw|>": 50318,
"<|ta|>": 50287,
"<|te|>": 50299,
"<|tg|>": 50331,
"<|th|>": 50289,
"<|tk|>": 50341,
"<|tl|>": 50348,
"<|tr|>": 50268,
"<|tt|>": 50351,
"<|uk|>": 50280,
"<|ur|>": 50290,
"<|uz|>": 50337,
"<|vi|>": 50278,
"<|yi|>": 50335,
"<|yo|>": 50325,
"<|yue|>": 50358,
"<|zh|>": 50260
},
"max_initial_timestamp_index": 50,
"max_length": 448,
"no_timestamps_token_id": 50364,
"pad_token_id": 50257,
"prev_sot_token_id": 50362,
"return_timestamps": false,
"suppress_tokens": [
1,
2,
7,
8,
9,
10,
14,
25,
26,
27,
28,
29,
31,
58,
59,
60,
61,
62,
63,
90,
91,
92,
93,
359,
503,
522,
542,
873,
893,
902,
918,
922,
931,
1350,
1853,
1982,
2460,
2627,
3246,
3253,
3268,
3536,
3846,
3961,
4183,
4667,
6585,
6647,
7273,
9061,
9383,
10428,
10929,
11938,
12033,
12331,
12562,
13793,
14157,
14635,
15265,
15618,
16553,
16604,
18362,
18956,
20075,
21675,
22520,
26130,
26161,
26435,
28279,
29464,
31650,
32302,
32470,
36865,
42863,
47425,
49870,
50254,
50258,
50359,
50360,
50361,
50362,
50363
],
"task_to_id": {
"transcribe": 50360,
"translate": 50359
},
"transformers_version": "4.37.2"
}

50001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:17b58f02c0bb0da4936024150417d47ea87a5390707618bea1aca2b5560bb7d4
size 4993448880

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0acee9578e8fda996008ceeb93bc82a6e2cbdde91bfcdb11fea1259a6df48dfe
size 1180663192

1266
model.safetensors.index.json Normal file

File diff suppressed because it is too large Load Diff

1742
normalizer.json Normal file

File diff suppressed because it is too large Load Diff

14
preprocessor_config.json Normal file
View File

@@ -0,0 +1,14 @@
{
"chunk_length": 30,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 128,
"hop_length": 160,
"n_fft": 400,
"n_samples": 480000,
"nb_max_frames": 3000,
"padding_side": "right",
"padding_value": 0.0,
"processor_class": "WhisperProcessor",
"return_attention_mask": false,
"sampling_rate": 16000
}

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1f34f01a4e1e51866d99eddf6ab54b35bf750769c141b4bbaf9e9de36260225b
size 137842

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:223649ecced8306ff5c93755545415f34e7d5ac3bcb4477a68b3891f9342de6c
size 40

139
special_tokens_map.json Normal file
View File

@@ -0,0 +1,139 @@
{
"additional_special_tokens": [
"<|startoftranscript|>",
"<|en|>",
"<|zh|>",
"<|de|>",
"<|es|>",
"<|ru|>",
"<|ko|>",
"<|fr|>",
"<|ja|>",
"<|pt|>",
"<|tr|>",
"<|pl|>",
"<|ca|>",
"<|nl|>",
"<|ar|>",
"<|sv|>",
"<|it|>",
"<|id|>",
"<|hi|>",
"<|fi|>",
"<|vi|>",
"<|he|>",
"<|uk|>",
"<|el|>",
"<|ms|>",
"<|cs|>",
"<|ro|>",
"<|da|>",
"<|hu|>",
"<|ta|>",
"<|no|>",
"<|th|>",
"<|ur|>",
"<|hr|>",
"<|bg|>",
"<|lt|>",
"<|la|>",
"<|mi|>",
"<|ml|>",
"<|cy|>",
"<|sk|>",
"<|te|>",
"<|fa|>",
"<|lv|>",
"<|bn|>",
"<|sr|>",
"<|az|>",
"<|sl|>",
"<|kn|>",
"<|et|>",
"<|mk|>",
"<|br|>",
"<|eu|>",
"<|is|>",
"<|hy|>",
"<|ne|>",
"<|mn|>",
"<|bs|>",
"<|kk|>",
"<|sq|>",
"<|sw|>",
"<|gl|>",
"<|mr|>",
"<|pa|>",
"<|si|>",
"<|km|>",
"<|sn|>",
"<|yo|>",
"<|so|>",
"<|af|>",
"<|oc|>",
"<|ka|>",
"<|be|>",
"<|tg|>",
"<|sd|>",
"<|gu|>",
"<|am|>",
"<|yi|>",
"<|lo|>",
"<|uz|>",
"<|fo|>",
"<|ht|>",
"<|ps|>",
"<|tk|>",
"<|nn|>",
"<|mt|>",
"<|sa|>",
"<|lb|>",
"<|my|>",
"<|bo|>",
"<|tl|>",
"<|mg|>",
"<|as|>",
"<|tt|>",
"<|haw|>",
"<|ln|>",
"<|ha|>",
"<|ba|>",
"<|jw|>",
"<|su|>",
"<|yue|>",
"<|translate|>",
"<|transcribe|>",
"<|startoflm|>",
"<|startofprev|>",
"<|nospeech|>",
"<|notimestamps|>"
],
"bos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

114903
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

12996
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

7
train_results.json Normal file
View File

@@ -0,0 +1,7 @@
{
"epoch": 40.73,
"train_loss": 0.009992324647886562,
"train_runtime": 175663.8097,
"train_samples_per_second": 7.287,
"train_steps_per_second": 0.114
}

5010
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:08b881033f72766762c16de07729fa6609e280e8ec5d8a9b327aa379a10131c1
size 4792

1
vocab.json Normal file

File diff suppressed because one or more lines are too long