初始化项目,由ModelHub XC社区提供模型
Model: suzii/vi-whisper-large-v3-turbo Source: Original Platform
This commit is contained in:
36
.gitattributes
vendored
Normal file
36
.gitattributes
vendored
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
onnx/encoder_model.onnx_data filter=lfs diff=lfs merge=lfs -text
|
||||||
100
README.md
Normal file
100
README.md
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
---
|
||||||
|
datasets:
|
||||||
|
- capleaf/viVoice
|
||||||
|
- NhutP/VSV-1100
|
||||||
|
- doof-ferb/fpt_fosd
|
||||||
|
- doof-ferb/infore1_25hours
|
||||||
|
- google/fleurs
|
||||||
|
- doof-ferb/LSVSC
|
||||||
|
- quocanh34/viet_vlsp
|
||||||
|
- linhtran92/viet_youtube_asr_corpus_v2
|
||||||
|
- doof-ferb/infore2_audiobooks
|
||||||
|
- linhtran92/viet_bud500
|
||||||
|
language:
|
||||||
|
- vi
|
||||||
|
metrics:
|
||||||
|
- wer
|
||||||
|
base_model:
|
||||||
|
- openai/whisper-large-v3-turbo
|
||||||
|
library_name: transformers
|
||||||
|
tags:
|
||||||
|
- onnx
|
||||||
|
---
|
||||||
|
# Fine-tuned Whisper-V3-Turbo for Vietnamese ASR
|
||||||
|
|
||||||
|
This project involves fine-tuning the Whisper-V3-Turbo model to improve its performance for Automatic Speech Recognition (ASR) in the Vietnamese language. The model was trained for 240 hours using a single Nvidia A6000 GPU.
|
||||||
|
|
||||||
|
## Data Sources
|
||||||
|
|
||||||
|
The training data comes from various Vietnamese speech corpora. Below is a list of datasets used for training:
|
||||||
|
|
||||||
|
1. **capleaf/viVoice**
|
||||||
|
2. **NhutP/VSV-1100**
|
||||||
|
3. **doof-ferb/fpt_fosd**
|
||||||
|
4. **doof-ferb/infore1_25hours**
|
||||||
|
5. **google/fleurs (vi_vn)**
|
||||||
|
6. **doof-ferb/LSVSC**
|
||||||
|
7. **quocanh34/viet_vlsp**
|
||||||
|
8. **linhtran92/viet_youtube_asr_corpus_v2**
|
||||||
|
9. **doof-ferb/infore2_audiobooks**
|
||||||
|
10. **linhtran92/viet_bud500**
|
||||||
|
|
||||||
|
## Model
|
||||||
|
|
||||||
|
The model used in this project is the **Whisper-V3-Turbo**. Whisper is a multilingual ASR model trained on a large and diverse dataset. The version used here has been fine-tuned specifically for the Vietnamese language.
|
||||||
|
|
||||||
|
## Training Configuration
|
||||||
|
|
||||||
|
- **GPU Used**: Nvidia A6000
|
||||||
|
- **Training Time**: 240 hours
|
||||||
|
- [Wandb report](https://api.wandb.ai/links/goiliace/ae0qectc)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
To use the fine-tuned model, follow the steps below:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
|
||||||
|
|
||||||
|
device = "cuda:0" if torch.cuda.is_available() else "cpu"
|
||||||
|
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
|
||||||
|
|
||||||
|
model_id = "suzii/vi-whisper-large-v3-turbo-v1"
|
||||||
|
|
||||||
|
model = AutoModelForSpeechSeq2Seq.from_pretrained(
|
||||||
|
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
|
||||||
|
)
|
||||||
|
model.to(device)
|
||||||
|
|
||||||
|
processor = AutoProcessor.from_pretrained(model_id)
|
||||||
|
|
||||||
|
pipe = pipeline(
|
||||||
|
"automatic-speech-recognition",
|
||||||
|
model=model,
|
||||||
|
tokenizer=processor.tokenizer,
|
||||||
|
feature_extractor=processor.feature_extractor,
|
||||||
|
torch_dtype=torch_dtype,
|
||||||
|
device=device,
|
||||||
|
)
|
||||||
|
result = pipe("your-audio.mp3", return_timestamps=True)
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
|
||||||
|
This project would not be possible without the following datasets:
|
||||||
|
|
||||||
|
- [capleaf/viVoice](https://huggingface.co/datasets/capleaf/viVoice)
|
||||||
|
- [NhutP/VSV-1100](https://huggingface.co/datasets/nhutp/vsv-1100)
|
||||||
|
- [doof-ferb/fpt_fosd](https://huggingface.co/datasets/doof-ferb/fpt_fosd)
|
||||||
|
- [doof-ferb/infore1_25hours](https://huggingface.co/datasets/doof-ferb/infore1_25hours)
|
||||||
|
- [google/fleurs](https://huggingface.co/datasets/google/fleurs)
|
||||||
|
- [doof-ferb/LSVSC](https://huggingface.co/datasets/doof-ferb/LSVSC)
|
||||||
|
- [quocanh34/viet_vlsp](https://huggingface.co/datasets/quocanh34/viet-vlsp)
|
||||||
|
- [linhtran92/viet_youtube_asr_corpus_v2](https://huggingface.co/datasets/linhtran92/viet_youtube_asr_corpus_v2)
|
||||||
|
- [doof-ferb/infore2_audiobooks](https://huggingface.co/datasets/doof-ferb/infore2_audiobooks/)
|
||||||
|
- [linhtran92/viet_bud500](https://huggingface.co/datasets/linhtran92/viet_bud500)
|
||||||
1611
added_tokens.json
Normal file
1611
added_tokens.json
Normal file
File diff suppressed because it is too large
Load Diff
50
config.json
Normal file
50
config.json
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
{
|
||||||
|
"_name_or_path": "whisper-v3-vi-ddp-30-12-2024-1",
|
||||||
|
"activation_dropout": 0.0,
|
||||||
|
"activation_function": "gelu",
|
||||||
|
"apply_spec_augment": false,
|
||||||
|
"architectures": [
|
||||||
|
"WhisperForConditionalGeneration"
|
||||||
|
],
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"begin_suppress_tokens": [
|
||||||
|
220,
|
||||||
|
50256
|
||||||
|
],
|
||||||
|
"bos_token_id": 50257,
|
||||||
|
"classifier_proj_size": 256,
|
||||||
|
"d_model": 1280,
|
||||||
|
"decoder_attention_heads": 20,
|
||||||
|
"decoder_ffn_dim": 5120,
|
||||||
|
"decoder_layerdrop": 0.0,
|
||||||
|
"decoder_layers": 4,
|
||||||
|
"decoder_start_token_id": 50258,
|
||||||
|
"dropout": 0.0,
|
||||||
|
"encoder_attention_heads": 20,
|
||||||
|
"encoder_ffn_dim": 5120,
|
||||||
|
"encoder_layerdrop": 0.0,
|
||||||
|
"encoder_layers": 32,
|
||||||
|
"eos_token_id": 50257,
|
||||||
|
"forced_decoder_ids": null,
|
||||||
|
"init_std": 0.02,
|
||||||
|
"is_encoder_decoder": true,
|
||||||
|
"mask_feature_length": 10,
|
||||||
|
"mask_feature_min_masks": 0,
|
||||||
|
"mask_feature_prob": 0.0,
|
||||||
|
"mask_time_length": 10,
|
||||||
|
"mask_time_min_masks": 2,
|
||||||
|
"mask_time_prob": 0.05,
|
||||||
|
"max_source_positions": 1500,
|
||||||
|
"max_target_positions": 448,
|
||||||
|
"median_filter_width": 7,
|
||||||
|
"model_type": "whisper",
|
||||||
|
"num_hidden_layers": 32,
|
||||||
|
"num_mel_bins": 128,
|
||||||
|
"pad_token_id": 50257,
|
||||||
|
"scale_embedding": false,
|
||||||
|
"torch_dtype": "float32",
|
||||||
|
"transformers_version": "4.37.2",
|
||||||
|
"use_cache": false,
|
||||||
|
"use_weighted_layer_sum": false,
|
||||||
|
"vocab_size": 51866
|
||||||
|
}
|
||||||
161
generation_config.json
Normal file
161
generation_config.json
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
{
|
||||||
|
"alignment_heads": [
|
||||||
|
[
|
||||||
|
2,
|
||||||
|
4
|
||||||
|
],
|
||||||
|
[
|
||||||
|
2,
|
||||||
|
11
|
||||||
|
],
|
||||||
|
[
|
||||||
|
3,
|
||||||
|
3
|
||||||
|
],
|
||||||
|
[
|
||||||
|
3,
|
||||||
|
6
|
||||||
|
],
|
||||||
|
[
|
||||||
|
3,
|
||||||
|
11
|
||||||
|
],
|
||||||
|
[
|
||||||
|
3,
|
||||||
|
14
|
||||||
|
]
|
||||||
|
],
|
||||||
|
"begin_suppress_tokens": [
|
||||||
|
220,
|
||||||
|
50257
|
||||||
|
],
|
||||||
|
"bos_token_id": 50257,
|
||||||
|
"decoder_start_token_id": 50258,
|
||||||
|
"eos_token_id": 50257,
|
||||||
|
"forced_decoder_ids": [
|
||||||
|
[
|
||||||
|
1,
|
||||||
|
null
|
||||||
|
],
|
||||||
|
[
|
||||||
|
2,
|
||||||
|
50360
|
||||||
|
]
|
||||||
|
],
|
||||||
|
"is_multilingual": true,
|
||||||
|
"lang_to_id": {
|
||||||
|
"<|af|>": 50327,
|
||||||
|
"<|am|>": 50334,
|
||||||
|
"<|ar|>": 50272,
|
||||||
|
"<|as|>": 50350,
|
||||||
|
"<|az|>": 50304,
|
||||||
|
"<|ba|>": 50355,
|
||||||
|
"<|be|>": 50330,
|
||||||
|
"<|bg|>": 50292,
|
||||||
|
"<|bn|>": 50302,
|
||||||
|
"<|bo|>": 50347,
|
||||||
|
"<|br|>": 50309,
|
||||||
|
"<|bs|>": 50315,
|
||||||
|
"<|ca|>": 50270,
|
||||||
|
"<|cs|>": 50283,
|
||||||
|
"<|cy|>": 50297,
|
||||||
|
"<|da|>": 50285,
|
||||||
|
"<|de|>": 50261,
|
||||||
|
"<|el|>": 50281,
|
||||||
|
"<|en|>": 50259,
|
||||||
|
"<|es|>": 50262,
|
||||||
|
"<|et|>": 50307,
|
||||||
|
"<|eu|>": 50310,
|
||||||
|
"<|fa|>": 50300,
|
||||||
|
"<|fi|>": 50277,
|
||||||
|
"<|fo|>": 50338,
|
||||||
|
"<|fr|>": 50265,
|
||||||
|
"<|gl|>": 50319,
|
||||||
|
"<|gu|>": 50333,
|
||||||
|
"<|haw|>": 50352,
|
||||||
|
"<|ha|>": 50354,
|
||||||
|
"<|he|>": 50279,
|
||||||
|
"<|hi|>": 50276,
|
||||||
|
"<|hr|>": 50291,
|
||||||
|
"<|ht|>": 50339,
|
||||||
|
"<|hu|>": 50286,
|
||||||
|
"<|hy|>": 50312,
|
||||||
|
"<|id|>": 50275,
|
||||||
|
"<|is|>": 50311,
|
||||||
|
"<|it|>": 50274,
|
||||||
|
"<|ja|>": 50266,
|
||||||
|
"<|jw|>": 50356,
|
||||||
|
"<|ka|>": 50329,
|
||||||
|
"<|kk|>": 50316,
|
||||||
|
"<|km|>": 50323,
|
||||||
|
"<|kn|>": 50306,
|
||||||
|
"<|ko|>": 50264,
|
||||||
|
"<|la|>": 50294,
|
||||||
|
"<|lb|>": 50345,
|
||||||
|
"<|ln|>": 50353,
|
||||||
|
"<|lo|>": 50336,
|
||||||
|
"<|lt|>": 50293,
|
||||||
|
"<|lv|>": 50301,
|
||||||
|
"<|mg|>": 50349,
|
||||||
|
"<|mi|>": 50295,
|
||||||
|
"<|mk|>": 50308,
|
||||||
|
"<|ml|>": 50296,
|
||||||
|
"<|mn|>": 50314,
|
||||||
|
"<|mr|>": 50320,
|
||||||
|
"<|ms|>": 50282,
|
||||||
|
"<|mt|>": 50343,
|
||||||
|
"<|my|>": 50346,
|
||||||
|
"<|ne|>": 50313,
|
||||||
|
"<|nl|>": 50271,
|
||||||
|
"<|nn|>": 50342,
|
||||||
|
"<|no|>": 50288,
|
||||||
|
"<|oc|>": 50328,
|
||||||
|
"<|pa|>": 50321,
|
||||||
|
"<|pl|>": 50269,
|
||||||
|
"<|ps|>": 50340,
|
||||||
|
"<|pt|>": 50267,
|
||||||
|
"<|ro|>": 50284,
|
||||||
|
"<|ru|>": 50263,
|
||||||
|
"<|sa|>": 50344,
|
||||||
|
"<|sd|>": 50332,
|
||||||
|
"<|si|>": 50322,
|
||||||
|
"<|sk|>": 50298,
|
||||||
|
"<|sl|>": 50305,
|
||||||
|
"<|sn|>": 50324,
|
||||||
|
"<|so|>": 50326,
|
||||||
|
"<|sq|>": 50317,
|
||||||
|
"<|sr|>": 50303,
|
||||||
|
"<|su|>": 50357,
|
||||||
|
"<|sv|>": 50273,
|
||||||
|
"<|sw|>": 50318,
|
||||||
|
"<|ta|>": 50287,
|
||||||
|
"<|te|>": 50299,
|
||||||
|
"<|tg|>": 50331,
|
||||||
|
"<|th|>": 50289,
|
||||||
|
"<|tk|>": 50341,
|
||||||
|
"<|tl|>": 50348,
|
||||||
|
"<|tr|>": 50268,
|
||||||
|
"<|tt|>": 50351,
|
||||||
|
"<|uk|>": 50280,
|
||||||
|
"<|ur|>": 50290,
|
||||||
|
"<|uz|>": 50337,
|
||||||
|
"<|vi|>": 50278,
|
||||||
|
"<|yi|>": 50335,
|
||||||
|
"<|yo|>": 50325,
|
||||||
|
"<|yue|>": 50358,
|
||||||
|
"<|zh|>": 50260
|
||||||
|
},
|
||||||
|
"max_initial_timestamp_index": 50,
|
||||||
|
"max_length": 448,
|
||||||
|
"no_timestamps_token_id": 50364,
|
||||||
|
"pad_token_id": 50257,
|
||||||
|
"prev_sot_token_id": 50362,
|
||||||
|
"return_timestamps": false,
|
||||||
|
"suppress_tokens": [],
|
||||||
|
"task_to_id": {
|
||||||
|
"transcribe": 50360,
|
||||||
|
"translate": 50359
|
||||||
|
},
|
||||||
|
"transformers_version": "4.37.2",
|
||||||
|
"use_cache": false
|
||||||
|
}
|
||||||
50001
merges.txt
Normal file
50001
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model.safetensors
Normal file
3
model.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:2cbfaba39707b6746499c4a4fb49e86f731e9a796249bc16a5cc11014f6cf91e
|
||||||
|
size 3235581408
|
||||||
1742
normalizer.json
Normal file
1742
normalizer.json
Normal file
File diff suppressed because it is too large
Load Diff
1611
onnx/added_tokens.json
Normal file
1611
onnx/added_tokens.json
Normal file
File diff suppressed because it is too large
Load Diff
50
onnx/config.json
Normal file
50
onnx/config.json
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
{
|
||||||
|
"_attn_implementation_autoset": true,
|
||||||
|
"activation_dropout": 0.0,
|
||||||
|
"activation_function": "gelu",
|
||||||
|
"apply_spec_augment": false,
|
||||||
|
"architectures": [
|
||||||
|
"WhisperForConditionalGeneration"
|
||||||
|
],
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"begin_suppress_tokens": [
|
||||||
|
220,
|
||||||
|
50256
|
||||||
|
],
|
||||||
|
"bos_token_id": 50257,
|
||||||
|
"classifier_proj_size": 256,
|
||||||
|
"d_model": 1280,
|
||||||
|
"decoder_attention_heads": 20,
|
||||||
|
"decoder_ffn_dim": 5120,
|
||||||
|
"decoder_layerdrop": 0.0,
|
||||||
|
"decoder_layers": 4,
|
||||||
|
"decoder_start_token_id": 50258,
|
||||||
|
"dropout": 0.0,
|
||||||
|
"encoder_attention_heads": 20,
|
||||||
|
"encoder_ffn_dim": 5120,
|
||||||
|
"encoder_layerdrop": 0.0,
|
||||||
|
"encoder_layers": 32,
|
||||||
|
"eos_token_id": 50257,
|
||||||
|
"forced_decoder_ids": null,
|
||||||
|
"init_std": 0.02,
|
||||||
|
"is_encoder_decoder": true,
|
||||||
|
"mask_feature_length": 10,
|
||||||
|
"mask_feature_min_masks": 0,
|
||||||
|
"mask_feature_prob": 0.0,
|
||||||
|
"mask_time_length": 10,
|
||||||
|
"mask_time_min_masks": 2,
|
||||||
|
"mask_time_prob": 0.05,
|
||||||
|
"max_source_positions": 1500,
|
||||||
|
"max_target_positions": 448,
|
||||||
|
"median_filter_width": 7,
|
||||||
|
"model_type": "whisper",
|
||||||
|
"num_hidden_layers": 32,
|
||||||
|
"num_mel_bins": 128,
|
||||||
|
"pad_token_id": 50257,
|
||||||
|
"scale_embedding": false,
|
||||||
|
"torch_dtype": "float32",
|
||||||
|
"transformers_version": "4.51.3",
|
||||||
|
"use_cache": true,
|
||||||
|
"use_weighted_layer_sum": false,
|
||||||
|
"vocab_size": 51866
|
||||||
|
}
|
||||||
3
onnx/decoder_model.onnx
Normal file
3
onnx/decoder_model.onnx
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:25b458592bcd5f87baab2dde07134dad884a3905012bdbcd7e79fb7949480a22
|
||||||
|
size 953330447
|
||||||
3
onnx/decoder_model_merged.onnx
Normal file
3
onnx/decoder_model_merged.onnx
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:30a922c1e7dc4588b48664afd23aeefec5ad003e5f8132819803bd99993b78eb
|
||||||
|
size 953459940
|
||||||
3
onnx/decoder_with_past_model.onnx
Normal file
3
onnx/decoder_with_past_model.onnx
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:d9dc442c6029851b9071205e77e0e1512f181d3a7629a701fe8332ce1c2e4cdc
|
||||||
|
size 900869067
|
||||||
3
onnx/encoder_model.onnx
Normal file
3
onnx/encoder_model.onnx
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:0ff151d21d145b56f575fd96712d4eed33f04d38adb11cdf010d118c827e39e1
|
||||||
|
size 496106
|
||||||
3
onnx/encoder_model.onnx_data
Normal file
3
onnx/encoder_model.onnx_data
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:ffa938cd7bbe2ef95afc7d03685113bcf3379063cd24ebad0a06a9629a669852
|
||||||
|
size 2547875840
|
||||||
161
onnx/generation_config.json
Normal file
161
onnx/generation_config.json
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
{
|
||||||
|
"alignment_heads": [
|
||||||
|
[
|
||||||
|
2,
|
||||||
|
4
|
||||||
|
],
|
||||||
|
[
|
||||||
|
2,
|
||||||
|
11
|
||||||
|
],
|
||||||
|
[
|
||||||
|
3,
|
||||||
|
3
|
||||||
|
],
|
||||||
|
[
|
||||||
|
3,
|
||||||
|
6
|
||||||
|
],
|
||||||
|
[
|
||||||
|
3,
|
||||||
|
11
|
||||||
|
],
|
||||||
|
[
|
||||||
|
3,
|
||||||
|
14
|
||||||
|
]
|
||||||
|
],
|
||||||
|
"begin_suppress_tokens": [
|
||||||
|
220,
|
||||||
|
50257
|
||||||
|
],
|
||||||
|
"bos_token_id": 50257,
|
||||||
|
"decoder_start_token_id": 50258,
|
||||||
|
"eos_token_id": 50257,
|
||||||
|
"forced_decoder_ids": [
|
||||||
|
[
|
||||||
|
1,
|
||||||
|
null
|
||||||
|
],
|
||||||
|
[
|
||||||
|
2,
|
||||||
|
50360
|
||||||
|
]
|
||||||
|
],
|
||||||
|
"is_multilingual": true,
|
||||||
|
"lang_to_id": {
|
||||||
|
"<|af|>": 50327,
|
||||||
|
"<|am|>": 50334,
|
||||||
|
"<|ar|>": 50272,
|
||||||
|
"<|as|>": 50350,
|
||||||
|
"<|az|>": 50304,
|
||||||
|
"<|ba|>": 50355,
|
||||||
|
"<|be|>": 50330,
|
||||||
|
"<|bg|>": 50292,
|
||||||
|
"<|bn|>": 50302,
|
||||||
|
"<|bo|>": 50347,
|
||||||
|
"<|br|>": 50309,
|
||||||
|
"<|bs|>": 50315,
|
||||||
|
"<|ca|>": 50270,
|
||||||
|
"<|cs|>": 50283,
|
||||||
|
"<|cy|>": 50297,
|
||||||
|
"<|da|>": 50285,
|
||||||
|
"<|de|>": 50261,
|
||||||
|
"<|el|>": 50281,
|
||||||
|
"<|en|>": 50259,
|
||||||
|
"<|es|>": 50262,
|
||||||
|
"<|et|>": 50307,
|
||||||
|
"<|eu|>": 50310,
|
||||||
|
"<|fa|>": 50300,
|
||||||
|
"<|fi|>": 50277,
|
||||||
|
"<|fo|>": 50338,
|
||||||
|
"<|fr|>": 50265,
|
||||||
|
"<|gl|>": 50319,
|
||||||
|
"<|gu|>": 50333,
|
||||||
|
"<|haw|>": 50352,
|
||||||
|
"<|ha|>": 50354,
|
||||||
|
"<|he|>": 50279,
|
||||||
|
"<|hi|>": 50276,
|
||||||
|
"<|hr|>": 50291,
|
||||||
|
"<|ht|>": 50339,
|
||||||
|
"<|hu|>": 50286,
|
||||||
|
"<|hy|>": 50312,
|
||||||
|
"<|id|>": 50275,
|
||||||
|
"<|is|>": 50311,
|
||||||
|
"<|it|>": 50274,
|
||||||
|
"<|ja|>": 50266,
|
||||||
|
"<|jw|>": 50356,
|
||||||
|
"<|ka|>": 50329,
|
||||||
|
"<|kk|>": 50316,
|
||||||
|
"<|km|>": 50323,
|
||||||
|
"<|kn|>": 50306,
|
||||||
|
"<|ko|>": 50264,
|
||||||
|
"<|la|>": 50294,
|
||||||
|
"<|lb|>": 50345,
|
||||||
|
"<|ln|>": 50353,
|
||||||
|
"<|lo|>": 50336,
|
||||||
|
"<|lt|>": 50293,
|
||||||
|
"<|lv|>": 50301,
|
||||||
|
"<|mg|>": 50349,
|
||||||
|
"<|mi|>": 50295,
|
||||||
|
"<|mk|>": 50308,
|
||||||
|
"<|ml|>": 50296,
|
||||||
|
"<|mn|>": 50314,
|
||||||
|
"<|mr|>": 50320,
|
||||||
|
"<|ms|>": 50282,
|
||||||
|
"<|mt|>": 50343,
|
||||||
|
"<|my|>": 50346,
|
||||||
|
"<|ne|>": 50313,
|
||||||
|
"<|nl|>": 50271,
|
||||||
|
"<|nn|>": 50342,
|
||||||
|
"<|no|>": 50288,
|
||||||
|
"<|oc|>": 50328,
|
||||||
|
"<|pa|>": 50321,
|
||||||
|
"<|pl|>": 50269,
|
||||||
|
"<|ps|>": 50340,
|
||||||
|
"<|pt|>": 50267,
|
||||||
|
"<|ro|>": 50284,
|
||||||
|
"<|ru|>": 50263,
|
||||||
|
"<|sa|>": 50344,
|
||||||
|
"<|sd|>": 50332,
|
||||||
|
"<|si|>": 50322,
|
||||||
|
"<|sk|>": 50298,
|
||||||
|
"<|sl|>": 50305,
|
||||||
|
"<|sn|>": 50324,
|
||||||
|
"<|so|>": 50326,
|
||||||
|
"<|sq|>": 50317,
|
||||||
|
"<|sr|>": 50303,
|
||||||
|
"<|su|>": 50357,
|
||||||
|
"<|sv|>": 50273,
|
||||||
|
"<|sw|>": 50318,
|
||||||
|
"<|ta|>": 50287,
|
||||||
|
"<|te|>": 50299,
|
||||||
|
"<|tg|>": 50331,
|
||||||
|
"<|th|>": 50289,
|
||||||
|
"<|tk|>": 50341,
|
||||||
|
"<|tl|>": 50348,
|
||||||
|
"<|tr|>": 50268,
|
||||||
|
"<|tt|>": 50351,
|
||||||
|
"<|uk|>": 50280,
|
||||||
|
"<|ur|>": 50290,
|
||||||
|
"<|uz|>": 50337,
|
||||||
|
"<|vi|>": 50278,
|
||||||
|
"<|yi|>": 50335,
|
||||||
|
"<|yo|>": 50325,
|
||||||
|
"<|yue|>": 50358,
|
||||||
|
"<|zh|>": 50260
|
||||||
|
},
|
||||||
|
"max_initial_timestamp_index": 50,
|
||||||
|
"max_length": 448,
|
||||||
|
"no_timestamps_token_id": 50364,
|
||||||
|
"pad_token_id": 50257,
|
||||||
|
"prev_sot_token_id": 50362,
|
||||||
|
"return_timestamps": false,
|
||||||
|
"suppress_tokens": [],
|
||||||
|
"task_to_id": {
|
||||||
|
"transcribe": 50360,
|
||||||
|
"translate": 50359
|
||||||
|
},
|
||||||
|
"transformers_version": "4.51.3",
|
||||||
|
"use_cache": false
|
||||||
|
}
|
||||||
50001
onnx/merges.txt
Normal file
50001
onnx/merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
1742
onnx/normalizer.json
Normal file
1742
onnx/normalizer.json
Normal file
File diff suppressed because it is too large
Load Diff
15
onnx/preprocessor_config.json
Normal file
15
onnx/preprocessor_config.json
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
{
|
||||||
|
"chunk_length": 30,
|
||||||
|
"dither": 0.0,
|
||||||
|
"feature_extractor_type": "WhisperFeatureExtractor",
|
||||||
|
"feature_size": 128,
|
||||||
|
"hop_length": 160,
|
||||||
|
"n_fft": 400,
|
||||||
|
"n_samples": 480000,
|
||||||
|
"nb_max_frames": 3000,
|
||||||
|
"padding_side": "right",
|
||||||
|
"padding_value": 0.0,
|
||||||
|
"processor_class": "WhisperProcessor",
|
||||||
|
"return_attention_mask": false,
|
||||||
|
"sampling_rate": 16000
|
||||||
|
}
|
||||||
139
onnx/special_tokens_map.json
Normal file
139
onnx/special_tokens_map.json
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
{
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|startoftranscript|>",
|
||||||
|
"<|en|>",
|
||||||
|
"<|zh|>",
|
||||||
|
"<|de|>",
|
||||||
|
"<|es|>",
|
||||||
|
"<|ru|>",
|
||||||
|
"<|ko|>",
|
||||||
|
"<|fr|>",
|
||||||
|
"<|ja|>",
|
||||||
|
"<|pt|>",
|
||||||
|
"<|tr|>",
|
||||||
|
"<|pl|>",
|
||||||
|
"<|ca|>",
|
||||||
|
"<|nl|>",
|
||||||
|
"<|ar|>",
|
||||||
|
"<|sv|>",
|
||||||
|
"<|it|>",
|
||||||
|
"<|id|>",
|
||||||
|
"<|hi|>",
|
||||||
|
"<|fi|>",
|
||||||
|
"<|vi|>",
|
||||||
|
"<|he|>",
|
||||||
|
"<|uk|>",
|
||||||
|
"<|el|>",
|
||||||
|
"<|ms|>",
|
||||||
|
"<|cs|>",
|
||||||
|
"<|ro|>",
|
||||||
|
"<|da|>",
|
||||||
|
"<|hu|>",
|
||||||
|
"<|ta|>",
|
||||||
|
"<|no|>",
|
||||||
|
"<|th|>",
|
||||||
|
"<|ur|>",
|
||||||
|
"<|hr|>",
|
||||||
|
"<|bg|>",
|
||||||
|
"<|lt|>",
|
||||||
|
"<|la|>",
|
||||||
|
"<|mi|>",
|
||||||
|
"<|ml|>",
|
||||||
|
"<|cy|>",
|
||||||
|
"<|sk|>",
|
||||||
|
"<|te|>",
|
||||||
|
"<|fa|>",
|
||||||
|
"<|lv|>",
|
||||||
|
"<|bn|>",
|
||||||
|
"<|sr|>",
|
||||||
|
"<|az|>",
|
||||||
|
"<|sl|>",
|
||||||
|
"<|kn|>",
|
||||||
|
"<|et|>",
|
||||||
|
"<|mk|>",
|
||||||
|
"<|br|>",
|
||||||
|
"<|eu|>",
|
||||||
|
"<|is|>",
|
||||||
|
"<|hy|>",
|
||||||
|
"<|ne|>",
|
||||||
|
"<|mn|>",
|
||||||
|
"<|bs|>",
|
||||||
|
"<|kk|>",
|
||||||
|
"<|sq|>",
|
||||||
|
"<|sw|>",
|
||||||
|
"<|gl|>",
|
||||||
|
"<|mr|>",
|
||||||
|
"<|pa|>",
|
||||||
|
"<|si|>",
|
||||||
|
"<|km|>",
|
||||||
|
"<|sn|>",
|
||||||
|
"<|yo|>",
|
||||||
|
"<|so|>",
|
||||||
|
"<|af|>",
|
||||||
|
"<|oc|>",
|
||||||
|
"<|ka|>",
|
||||||
|
"<|be|>",
|
||||||
|
"<|tg|>",
|
||||||
|
"<|sd|>",
|
||||||
|
"<|gu|>",
|
||||||
|
"<|am|>",
|
||||||
|
"<|yi|>",
|
||||||
|
"<|lo|>",
|
||||||
|
"<|uz|>",
|
||||||
|
"<|fo|>",
|
||||||
|
"<|ht|>",
|
||||||
|
"<|ps|>",
|
||||||
|
"<|tk|>",
|
||||||
|
"<|nn|>",
|
||||||
|
"<|mt|>",
|
||||||
|
"<|sa|>",
|
||||||
|
"<|lb|>",
|
||||||
|
"<|my|>",
|
||||||
|
"<|bo|>",
|
||||||
|
"<|tl|>",
|
||||||
|
"<|mg|>",
|
||||||
|
"<|as|>",
|
||||||
|
"<|tt|>",
|
||||||
|
"<|haw|>",
|
||||||
|
"<|ln|>",
|
||||||
|
"<|ha|>",
|
||||||
|
"<|ba|>",
|
||||||
|
"<|jw|>",
|
||||||
|
"<|su|>",
|
||||||
|
"<|yue|>",
|
||||||
|
"<|translate|>",
|
||||||
|
"<|transcribe|>",
|
||||||
|
"<|startoflm|>",
|
||||||
|
"<|startofprev|>",
|
||||||
|
"<|nospeech|>",
|
||||||
|
"<|notimestamps|>"
|
||||||
|
],
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"unk_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
264862
onnx/tokenizer.json
Normal file
264862
onnx/tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
12997
onnx/tokenizer_config.json
Normal file
12997
onnx/tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
50259
onnx/vocab.json
Normal file
50259
onnx/vocab.json
Normal file
File diff suppressed because it is too large
Load Diff
14
preprocessor_config.json
Normal file
14
preprocessor_config.json
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
{
|
||||||
|
"chunk_length": 30,
|
||||||
|
"feature_extractor_type": "WhisperFeatureExtractor",
|
||||||
|
"feature_size": 128,
|
||||||
|
"hop_length": 160,
|
||||||
|
"n_fft": 400,
|
||||||
|
"n_samples": 480000,
|
||||||
|
"nb_max_frames": 3000,
|
||||||
|
"padding_side": "right",
|
||||||
|
"padding_value": 0.0,
|
||||||
|
"processor_class": "WhisperProcessor",
|
||||||
|
"return_attention_mask": false,
|
||||||
|
"sampling_rate": 16000
|
||||||
|
}
|
||||||
139
special_tokens_map.json
Normal file
139
special_tokens_map.json
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
{
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|startoftranscript|>",
|
||||||
|
"<|en|>",
|
||||||
|
"<|zh|>",
|
||||||
|
"<|de|>",
|
||||||
|
"<|es|>",
|
||||||
|
"<|ru|>",
|
||||||
|
"<|ko|>",
|
||||||
|
"<|fr|>",
|
||||||
|
"<|ja|>",
|
||||||
|
"<|pt|>",
|
||||||
|
"<|tr|>",
|
||||||
|
"<|pl|>",
|
||||||
|
"<|ca|>",
|
||||||
|
"<|nl|>",
|
||||||
|
"<|ar|>",
|
||||||
|
"<|sv|>",
|
||||||
|
"<|it|>",
|
||||||
|
"<|id|>",
|
||||||
|
"<|hi|>",
|
||||||
|
"<|fi|>",
|
||||||
|
"<|vi|>",
|
||||||
|
"<|he|>",
|
||||||
|
"<|uk|>",
|
||||||
|
"<|el|>",
|
||||||
|
"<|ms|>",
|
||||||
|
"<|cs|>",
|
||||||
|
"<|ro|>",
|
||||||
|
"<|da|>",
|
||||||
|
"<|hu|>",
|
||||||
|
"<|ta|>",
|
||||||
|
"<|no|>",
|
||||||
|
"<|th|>",
|
||||||
|
"<|ur|>",
|
||||||
|
"<|hr|>",
|
||||||
|
"<|bg|>",
|
||||||
|
"<|lt|>",
|
||||||
|
"<|la|>",
|
||||||
|
"<|mi|>",
|
||||||
|
"<|ml|>",
|
||||||
|
"<|cy|>",
|
||||||
|
"<|sk|>",
|
||||||
|
"<|te|>",
|
||||||
|
"<|fa|>",
|
||||||
|
"<|lv|>",
|
||||||
|
"<|bn|>",
|
||||||
|
"<|sr|>",
|
||||||
|
"<|az|>",
|
||||||
|
"<|sl|>",
|
||||||
|
"<|kn|>",
|
||||||
|
"<|et|>",
|
||||||
|
"<|mk|>",
|
||||||
|
"<|br|>",
|
||||||
|
"<|eu|>",
|
||||||
|
"<|is|>",
|
||||||
|
"<|hy|>",
|
||||||
|
"<|ne|>",
|
||||||
|
"<|mn|>",
|
||||||
|
"<|bs|>",
|
||||||
|
"<|kk|>",
|
||||||
|
"<|sq|>",
|
||||||
|
"<|sw|>",
|
||||||
|
"<|gl|>",
|
||||||
|
"<|mr|>",
|
||||||
|
"<|pa|>",
|
||||||
|
"<|si|>",
|
||||||
|
"<|km|>",
|
||||||
|
"<|sn|>",
|
||||||
|
"<|yo|>",
|
||||||
|
"<|so|>",
|
||||||
|
"<|af|>",
|
||||||
|
"<|oc|>",
|
||||||
|
"<|ka|>",
|
||||||
|
"<|be|>",
|
||||||
|
"<|tg|>",
|
||||||
|
"<|sd|>",
|
||||||
|
"<|gu|>",
|
||||||
|
"<|am|>",
|
||||||
|
"<|yi|>",
|
||||||
|
"<|lo|>",
|
||||||
|
"<|uz|>",
|
||||||
|
"<|fo|>",
|
||||||
|
"<|ht|>",
|
||||||
|
"<|ps|>",
|
||||||
|
"<|tk|>",
|
||||||
|
"<|nn|>",
|
||||||
|
"<|mt|>",
|
||||||
|
"<|sa|>",
|
||||||
|
"<|lb|>",
|
||||||
|
"<|my|>",
|
||||||
|
"<|bo|>",
|
||||||
|
"<|tl|>",
|
||||||
|
"<|mg|>",
|
||||||
|
"<|as|>",
|
||||||
|
"<|tt|>",
|
||||||
|
"<|haw|>",
|
||||||
|
"<|ln|>",
|
||||||
|
"<|ha|>",
|
||||||
|
"<|ba|>",
|
||||||
|
"<|jw|>",
|
||||||
|
"<|su|>",
|
||||||
|
"<|yue|>",
|
||||||
|
"<|translate|>",
|
||||||
|
"<|transcribe|>",
|
||||||
|
"<|startoflm|>",
|
||||||
|
"<|startofprev|>",
|
||||||
|
"<|nospeech|>",
|
||||||
|
"<|notimestamps|>"
|
||||||
|
],
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"unk_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
12996
tokenizer_config.json
Normal file
12996
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
50259
vocab.json
Normal file
50259
vocab.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user