初始化项目,由ModelHub XC社区提供模型
Model: UBC-NLP/Simba-W Source: Original Platform
This commit is contained in:
35
.gitattributes
vendored
Normal file
35
.gitattributes
vendored
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.model filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||||
|
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||||
|
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||||
211
.ipynb_checkpoints/README-checkpoint.md
Normal file
211
.ipynb_checkpoints/README-checkpoint.md
Normal file
@@ -0,0 +1,211 @@
|
|||||||
|
---
|
||||||
|
language:
|
||||||
|
- am # Amharic
|
||||||
|
- ar # Arabic
|
||||||
|
- tw # Asante Twi
|
||||||
|
- bm # Bambara
|
||||||
|
- fr # French
|
||||||
|
- lg # Ganda
|
||||||
|
- ha # Hausa
|
||||||
|
- ig # Igbo
|
||||||
|
- rw # Kinyarwanda
|
||||||
|
- kg # Kongo
|
||||||
|
- ln # Lingala
|
||||||
|
- lu # Luba-Katanga
|
||||||
|
- mg # Malagasy
|
||||||
|
- nso # Northern Sotho
|
||||||
|
- ny # Nyanja
|
||||||
|
- om # Oromo
|
||||||
|
- pt # Portuguese
|
||||||
|
- sn # Shona
|
||||||
|
- so # Somali
|
||||||
|
- st # Southern Sotho
|
||||||
|
- sw # Swahili
|
||||||
|
- ss # Swati
|
||||||
|
- ti # Tigrinya
|
||||||
|
- ts # Tsonga
|
||||||
|
- tn # Tswana
|
||||||
|
- ak # Twi
|
||||||
|
- ve # Venda
|
||||||
|
- wo # Wolof
|
||||||
|
- xh # Xhosa
|
||||||
|
- yo # Yoruba
|
||||||
|
- zu # Zulu
|
||||||
|
- tzm # Tamazight
|
||||||
|
- sg # Sango
|
||||||
|
- din # Dinka
|
||||||
|
- ee # Ewe
|
||||||
|
- fo # Fon
|
||||||
|
- luo # Luo
|
||||||
|
- mos # Mossi
|
||||||
|
- umb # Umbundu
|
||||||
|
license: cc-by-4.0
|
||||||
|
tags:
|
||||||
|
- automatic-speech-recognition
|
||||||
|
- audio
|
||||||
|
- speech
|
||||||
|
- african-languages
|
||||||
|
- multilingual
|
||||||
|
- simba
|
||||||
|
- low-resource
|
||||||
|
- speech-recognition
|
||||||
|
- asr
|
||||||
|
datasets:
|
||||||
|
- UBC-NLP/SimbaBench
|
||||||
|
metrics:
|
||||||
|
- wer
|
||||||
|
- cer
|
||||||
|
library_name: transformers
|
||||||
|
pipeline_tag: automatic-speech-recognition
|
||||||
|
---
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
|
||||||
|
|
||||||
|
|
||||||
|
[](https://aclanthology.org/2025.emnlp-main.559/)
|
||||||
|
[](https://africa.dlnlp.ai/simba/)
|
||||||
|
[](#simbabench)
|
||||||
|
[](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
|
||||||
|
[](#demo)
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
## *Bridging the Digital Divide for African AI*
|
||||||
|
|
||||||
|
**Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
|
||||||
|
|
||||||
|
## Best-in-Class Multilingual Models
|
||||||
|
|
||||||
|
Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
|
||||||
|
|
||||||
|
- **Unified Suite:** Models optimized for African languages.
|
||||||
|
- **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
|
||||||
|
- **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
|
||||||
|
- **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
|
||||||
|
|
||||||
|
The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
|
||||||
|
|
||||||
|
### 🗣️✍️ Simba-ASR
|
||||||
|
> **The New Standard for African Speech-to-Text**
|
||||||
|
|
||||||
|
**🎯 Task** `Automatic Speech Recognition` — Powering high-accuracy transcription across the continent.
|
||||||
|
|
||||||
|
**🌍 Language Coverage (43 African languages)**
|
||||||
|
> **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **Baoulé** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`).
|
||||||
|
|
||||||
|
**🏗️ Base Architectures**
|
||||||
|
|
||||||
|
- **Simba-S** (SeamlessM4T-v2-MT) — *Top Performer*
|
||||||
|
- **Simba-W** (Whisper-v3-large)
|
||||||
|
- **Simba-X** (Wav2Vec2-XLS-R-2b)
|
||||||
|
- **Simba-M** (MMS-1b-all)
|
||||||
|
- **Simba-H** (AfriHuBERT)
|
||||||
|
|
||||||
|
🌐 Explore the Frontier
|
||||||
|
|
||||||
|
| **ASR Models** | **Architecture** | **#Parameters** | **🤗 Hugging Face Model Card** | **Status** |
|
||||||
|
|---------|:------------------:| :------------------:| :------------------:|:------------------:|
|
||||||
|
| 🔥**Simba-S**🔥| SeamlessM4T-v2 | 2.3B | 🤗 [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | ✅ Released |
|
||||||
|
| 🔥**Simba-W**🔥| Whisper | 1.5B | 🤗 [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | ✅ Released |
|
||||||
|
| 🔥**Simba-X**🔥| Wav2Vec2 | 1B | 🤗 [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | ✅ Released |
|
||||||
|
| 🔥**Simba-M**🔥| MMS | 1B | 🤗 [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | ✅ Released |
|
||||||
|
| 🔥**Simba-H**🔥| HuBERT | 94M | 🤗 [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | ✅ Released |
|
||||||
|
|
||||||
|
* **Simba-S** emerged as the best-performing ASR model overall.
|
||||||
|
|
||||||
|
|
||||||
|
**🧩 Usage Example**
|
||||||
|
|
||||||
|
You can easily run inference using the Hugging Face `transformers` library.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import pipeline
|
||||||
|
|
||||||
|
# Load Simba-S for ASR
|
||||||
|
asr_pipeline = pipeline(
|
||||||
|
"automatic-speech-recognition",
|
||||||
|
model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
|
||||||
|
)
|
||||||
|
|
||||||
|
##### Load the multilingual African adapter (Only for `UBC-NLP/Simba-M`)
|
||||||
|
asr_pipeline.model.load_adapter("multilingual_african") # Only for `UBC-NLP/Simba-M`
|
||||||
|
###########################
|
||||||
|
|
||||||
|
# Transcribe audio from file
|
||||||
|
result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
|
||||||
|
print(result["text"])
|
||||||
|
|
||||||
|
|
||||||
|
# Transcribe audio from audio array
|
||||||
|
result = asr_pipeline({
|
||||||
|
"array": audio_array,
|
||||||
|
"sampling_rate": 16_000
|
||||||
|
})
|
||||||
|
print(result["text"])
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Example Outputs
|
||||||
|
|
||||||
|
Using the same audio file with different Simba models:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simba-S
|
||||||
|
{'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'}
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simba-W
|
||||||
|
{'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'}
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simba-X
|
||||||
|
{'text': 'fator fr on ar taamsodr is'}
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simba-M
|
||||||
|
{'text': 'watter veronwaardiging sodaar in ons binniste gewees het'}
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simba-H
|
||||||
|
{'text': 'watter vironwaardiging so daar in ons binneste geweeshet'}
|
||||||
|
```
|
||||||
|
|
||||||
|
Get started with Simba models in minutes using our interactive Colab notebook: [](https://github.com/UBC-NLP/simba/edit/main/simba_models.ipynb)
|
||||||
|
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
|
||||||
|
@inproceedings{elmadany-etal-2025-voice,
|
||||||
|
title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
|
||||||
|
author = "Elmadany, AbdelRahim A. and
|
||||||
|
Kwon, Sang Yun and
|
||||||
|
Toyin, Hawau Olamide and
|
||||||
|
Alcoba Inciarte, Alcides and
|
||||||
|
Aldarmaki, Hanan and
|
||||||
|
Abdul-Mageed, Muhammad",
|
||||||
|
editor = "Christodoulopoulos, Christos and
|
||||||
|
Chakraborty, Tanmoy and
|
||||||
|
Rose, Carolyn and
|
||||||
|
Peng, Violet",
|
||||||
|
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
|
||||||
|
month = nov,
|
||||||
|
year = "2025",
|
||||||
|
address = "Suzhou, China",
|
||||||
|
publisher = "Association for Computational Linguistics",
|
||||||
|
url = "https://aclanthology.org/2025.emnlp-main.559/",
|
||||||
|
doi = "10.18653/v1/2025.emnlp-main.559",
|
||||||
|
pages = "11039--11061",
|
||||||
|
ISBN = "979-8-89176-332-6",
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
50
.ipynb_checkpoints/config-checkpoint.json
Normal file
50
.ipynb_checkpoints/config-checkpoint.json
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
{
|
||||||
|
"_name_or_path": "openai/whisper-large-v3",
|
||||||
|
"activation_dropout": 0.0,
|
||||||
|
"activation_function": "gelu",
|
||||||
|
"apply_spec_augment": false,
|
||||||
|
"architectures": [
|
||||||
|
"WhisperForConditionalGeneration"
|
||||||
|
],
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"begin_suppress_tokens": [
|
||||||
|
220,
|
||||||
|
50257
|
||||||
|
],
|
||||||
|
"bos_token_id": 50257,
|
||||||
|
"classifier_proj_size": 256,
|
||||||
|
"d_model": 1280,
|
||||||
|
"decoder_attention_heads": 20,
|
||||||
|
"decoder_ffn_dim": 5120,
|
||||||
|
"decoder_layerdrop": 0.0,
|
||||||
|
"decoder_layers": 32,
|
||||||
|
"decoder_start_token_id": 50258,
|
||||||
|
"dropout": 0.0,
|
||||||
|
"encoder_attention_heads": 20,
|
||||||
|
"encoder_ffn_dim": 5120,
|
||||||
|
"encoder_layerdrop": 0.0,
|
||||||
|
"encoder_layers": 32,
|
||||||
|
"eos_token_id": 50257,
|
||||||
|
"init_std": 0.02,
|
||||||
|
"is_encoder_decoder": true,
|
||||||
|
"mask_feature_length": 10,
|
||||||
|
"mask_feature_min_masks": 0,
|
||||||
|
"mask_feature_prob": 0.0,
|
||||||
|
"mask_time_length": 10,
|
||||||
|
"mask_time_min_masks": 2,
|
||||||
|
"mask_time_prob": 0.05,
|
||||||
|
"max_length": 448,
|
||||||
|
"max_source_positions": 1500,
|
||||||
|
"max_target_positions": 448,
|
||||||
|
"median_filter_width": 7,
|
||||||
|
"model_type": "whisper",
|
||||||
|
"num_hidden_layers": 32,
|
||||||
|
"num_mel_bins": 128,
|
||||||
|
"pad_token_id": 50256,
|
||||||
|
"scale_embedding": false,
|
||||||
|
"torch_dtype": "float16",
|
||||||
|
"transformers_version": "4.48.1",
|
||||||
|
"use_cache": true,
|
||||||
|
"use_weighted_layer_sum": false,
|
||||||
|
"vocab_size": 51866
|
||||||
|
}
|
||||||
212
README.md
Normal file
212
README.md
Normal file
@@ -0,0 +1,212 @@
|
|||||||
|
---
|
||||||
|
language:
|
||||||
|
- am # Amharic
|
||||||
|
- ar # Arabic
|
||||||
|
- tw # Asante Twi
|
||||||
|
- bm # Bambara
|
||||||
|
- fr # French
|
||||||
|
- lg # Ganda
|
||||||
|
- ha # Hausa
|
||||||
|
- ig # Igbo
|
||||||
|
- rw # Kinyarwanda
|
||||||
|
- kg # Kongo
|
||||||
|
- ln # Lingala
|
||||||
|
- lu # Luba-Katanga
|
||||||
|
- mg # Malagasy
|
||||||
|
- nso # Northern Sotho
|
||||||
|
- ny # Nyanja
|
||||||
|
- om # Oromo
|
||||||
|
- pt # Portuguese
|
||||||
|
- sn # Shona
|
||||||
|
- so # Somali
|
||||||
|
- st # Southern Sotho
|
||||||
|
- sw # Swahili
|
||||||
|
- ss # Swati
|
||||||
|
- ti # Tigrinya
|
||||||
|
- ts # Tsonga
|
||||||
|
- tn # Tswana
|
||||||
|
- ak # Twi
|
||||||
|
- ve # Venda
|
||||||
|
- wo # Wolof
|
||||||
|
- xh # Xhosa
|
||||||
|
- yo # Yoruba
|
||||||
|
- zu # Zulu
|
||||||
|
- tzm # Tamazight
|
||||||
|
- sg # Sango
|
||||||
|
- din # Dinka
|
||||||
|
- ee # Ewe
|
||||||
|
- fo # Fon
|
||||||
|
- luo # Luo
|
||||||
|
- mos # Mossi
|
||||||
|
- umb # Umbundu
|
||||||
|
license: cc-by-4.0
|
||||||
|
tags:
|
||||||
|
- automatic-speech-recognition
|
||||||
|
- audio
|
||||||
|
- speech
|
||||||
|
- african-languages
|
||||||
|
- multilingual
|
||||||
|
- simba
|
||||||
|
- low-resource
|
||||||
|
- speech-recognition
|
||||||
|
- asr
|
||||||
|
datasets:
|
||||||
|
- UBC-NLP/SimbaBench
|
||||||
|
metrics:
|
||||||
|
- wer
|
||||||
|
- cer
|
||||||
|
library_name: transformers
|
||||||
|
pipeline_tag: automatic-speech-recognition
|
||||||
|
---
|
||||||
|
<div align="center">
|
||||||
|
|
||||||
|
<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
|
||||||
|
|
||||||
|
|
||||||
|
[](https://aclanthology.org/2025.emnlp-main.559/)
|
||||||
|
[](https://africa.dlnlp.ai/simba/)
|
||||||
|
[](https://huggingface.co/spaces/UBC-NLP/SimbaBench)
|
||||||
|
[](https://github.com/UBC-NLP/simba)
|
||||||
|
[](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
|
||||||
|
[](https://huggingface.co/datasets/UBC-NLP/SimbaBench_dataset)
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
## *Bridging the Digital Divide for African AI*
|
||||||
|
|
||||||
|
**Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
|
||||||
|
|
||||||
|
## Best-in-Class Multilingual Models
|
||||||
|
|
||||||
|
Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
|
||||||
|
|
||||||
|
- **Unified Suite:** Models optimized for African languages.
|
||||||
|
- **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
|
||||||
|
- **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
|
||||||
|
- **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
|
||||||
|
|
||||||
|
The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
|
||||||
|
|
||||||
|
### 🗣️✍️ Simba-ASR
|
||||||
|
> **The New Standard for African Speech-to-Text**
|
||||||
|
|
||||||
|
**🎯 Task** `Automatic Speech Recognition` — Powering high-accuracy transcription across the continent.
|
||||||
|
|
||||||
|
**🌍 Language Coverage (43 African languages)**
|
||||||
|
> **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **Baoulé** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`).
|
||||||
|
|
||||||
|
**🏗️ Base Architectures**
|
||||||
|
|
||||||
|
- **Simba-S** (SeamlessM4T-v2-MT) — *Top Performer*
|
||||||
|
- **Simba-W** (Whisper-v3-large)
|
||||||
|
- **Simba-X** (Wav2Vec2-XLS-R-2b)
|
||||||
|
- **Simba-M** (MMS-1b-all)
|
||||||
|
- **Simba-H** (AfriHuBERT)
|
||||||
|
|
||||||
|
🌐 Explore the Frontier
|
||||||
|
|
||||||
|
| **ASR Models** | **Architecture** | **#Parameters** | **🤗 Hugging Face Model Card** | **Status** |
|
||||||
|
|---------|:------------------:| :------------------:| :------------------:|:------------------:|
|
||||||
|
| 🔥**Simba-S**🔥| SeamlessM4T-v2 | 2.3B | 🤗 [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | ✅ Released |
|
||||||
|
| 🔥**Simba-W**🔥| Whisper | 1.5B | 🤗 [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | ✅ Released |
|
||||||
|
| 🔥**Simba-X**🔥| Wav2Vec2 | 1B | 🤗 [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | ✅ Released |
|
||||||
|
| 🔥**Simba-M**🔥| MMS | 1B | 🤗 [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | ✅ Released |
|
||||||
|
| 🔥**Simba-H**🔥| HuBERT | 94M | 🤗 [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | ✅ Released |
|
||||||
|
|
||||||
|
* **Simba-S** emerged as the best-performing ASR model overall.
|
||||||
|
|
||||||
|
|
||||||
|
**🧩 Usage Example**
|
||||||
|
|
||||||
|
You can easily run inference using the Hugging Face `transformers` library.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import pipeline
|
||||||
|
|
||||||
|
# Load Simba-S for ASR
|
||||||
|
asr_pipeline = pipeline(
|
||||||
|
"automatic-speech-recognition",
|
||||||
|
model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
|
||||||
|
)
|
||||||
|
|
||||||
|
##### Load the multilingual African adapter (Only for `UBC-NLP/Simba-M`)
|
||||||
|
asr_pipeline.model.load_adapter("multilingual_african") # Only for `UBC-NLP/Simba-M`
|
||||||
|
###########################
|
||||||
|
|
||||||
|
# Transcribe audio from file
|
||||||
|
result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
|
||||||
|
print(result["text"])
|
||||||
|
|
||||||
|
|
||||||
|
# Transcribe audio from audio array
|
||||||
|
result = asr_pipeline({
|
||||||
|
"array": audio_array,
|
||||||
|
"sampling_rate": 16_000
|
||||||
|
})
|
||||||
|
print(result["text"])
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Example Outputs
|
||||||
|
|
||||||
|
Using the same audio file with different Simba models:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simba-S
|
||||||
|
{'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'}
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simba-W
|
||||||
|
{'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'}
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simba-X
|
||||||
|
{'text': 'fator fr on ar taamsodr is'}
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simba-M
|
||||||
|
{'text': 'watter veronwaardiging sodaar in ons binniste gewees het'}
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Simba-H
|
||||||
|
{'text': 'watter vironwaardiging so daar in ons binneste geweeshet'}
|
||||||
|
```
|
||||||
|
|
||||||
|
Get started with Simba models in minutes using our interactive Colab notebook: [](https://github.com/UBC-NLP/simba/edit/main/simba_models.ipynb)
|
||||||
|
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
|
||||||
|
If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
|
||||||
|
|
||||||
|
```bibtex
|
||||||
|
|
||||||
|
@inproceedings{elmadany-etal-2025-voice,
|
||||||
|
title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
|
||||||
|
author = "Elmadany, AbdelRahim A. and
|
||||||
|
Kwon, Sang Yun and
|
||||||
|
Toyin, Hawau Olamide and
|
||||||
|
Alcoba Inciarte, Alcides and
|
||||||
|
Aldarmaki, Hanan and
|
||||||
|
Abdul-Mageed, Muhammad",
|
||||||
|
editor = "Christodoulopoulos, Christos and
|
||||||
|
Chakraborty, Tanmoy and
|
||||||
|
Rose, Carolyn and
|
||||||
|
Peng, Violet",
|
||||||
|
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
|
||||||
|
month = nov,
|
||||||
|
year = "2025",
|
||||||
|
address = "Suzhou, China",
|
||||||
|
publisher = "Association for Computational Linguistics",
|
||||||
|
url = "https://aclanthology.org/2025.emnlp-main.559/",
|
||||||
|
doi = "10.18653/v1/2025.emnlp-main.559",
|
||||||
|
pages = "11039--11061",
|
||||||
|
ISBN = "979-8-89176-332-6",
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
1611
added_tokens.json
Normal file
1611
added_tokens.json
Normal file
File diff suppressed because it is too large
Load Diff
50
config.json
Normal file
50
config.json
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
{
|
||||||
|
"_name_or_path": "openai/whisper-large-v3",
|
||||||
|
"activation_dropout": 0.0,
|
||||||
|
"activation_function": "gelu",
|
||||||
|
"apply_spec_augment": false,
|
||||||
|
"architectures": [
|
||||||
|
"WhisperForConditionalGeneration"
|
||||||
|
],
|
||||||
|
"attention_dropout": 0.0,
|
||||||
|
"begin_suppress_tokens": [
|
||||||
|
220,
|
||||||
|
50257
|
||||||
|
],
|
||||||
|
"bos_token_id": 50257,
|
||||||
|
"classifier_proj_size": 256,
|
||||||
|
"d_model": 1280,
|
||||||
|
"decoder_attention_heads": 20,
|
||||||
|
"decoder_ffn_dim": 5120,
|
||||||
|
"decoder_layerdrop": 0.0,
|
||||||
|
"decoder_layers": 32,
|
||||||
|
"decoder_start_token_id": 50258,
|
||||||
|
"dropout": 0.0,
|
||||||
|
"encoder_attention_heads": 20,
|
||||||
|
"encoder_ffn_dim": 5120,
|
||||||
|
"encoder_layerdrop": 0.0,
|
||||||
|
"encoder_layers": 32,
|
||||||
|
"eos_token_id": 50257,
|
||||||
|
"init_std": 0.02,
|
||||||
|
"is_encoder_decoder": true,
|
||||||
|
"mask_feature_length": 10,
|
||||||
|
"mask_feature_min_masks": 0,
|
||||||
|
"mask_feature_prob": 0.0,
|
||||||
|
"mask_time_length": 10,
|
||||||
|
"mask_time_min_masks": 2,
|
||||||
|
"mask_time_prob": 0.05,
|
||||||
|
"max_length": 448,
|
||||||
|
"max_source_positions": 1500,
|
||||||
|
"max_target_positions": 448,
|
||||||
|
"median_filter_width": 7,
|
||||||
|
"model_type": "whisper",
|
||||||
|
"num_hidden_layers": 32,
|
||||||
|
"num_mel_bins": 128,
|
||||||
|
"pad_token_id": 50256,
|
||||||
|
"scale_embedding": false,
|
||||||
|
"torch_dtype": "float16",
|
||||||
|
"transformers_version": "4.48.1",
|
||||||
|
"use_cache": true,
|
||||||
|
"use_weighted_layer_sum": false,
|
||||||
|
"vocab_size": 51866
|
||||||
|
}
|
||||||
257
generation_config.json
Normal file
257
generation_config.json
Normal file
@@ -0,0 +1,257 @@
|
|||||||
|
{
|
||||||
|
"alignment_heads": [
|
||||||
|
[
|
||||||
|
7,
|
||||||
|
0
|
||||||
|
],
|
||||||
|
[
|
||||||
|
10,
|
||||||
|
17
|
||||||
|
],
|
||||||
|
[
|
||||||
|
12,
|
||||||
|
18
|
||||||
|
],
|
||||||
|
[
|
||||||
|
13,
|
||||||
|
12
|
||||||
|
],
|
||||||
|
[
|
||||||
|
16,
|
||||||
|
1
|
||||||
|
],
|
||||||
|
[
|
||||||
|
17,
|
||||||
|
14
|
||||||
|
],
|
||||||
|
[
|
||||||
|
19,
|
||||||
|
11
|
||||||
|
],
|
||||||
|
[
|
||||||
|
21,
|
||||||
|
4
|
||||||
|
],
|
||||||
|
[
|
||||||
|
24,
|
||||||
|
1
|
||||||
|
],
|
||||||
|
[
|
||||||
|
25,
|
||||||
|
6
|
||||||
|
]
|
||||||
|
],
|
||||||
|
"begin_suppress_tokens": [
|
||||||
|
220,
|
||||||
|
50257
|
||||||
|
],
|
||||||
|
"bos_token_id": 50257,
|
||||||
|
"decoder_start_token_id": 50258,
|
||||||
|
"eos_token_id": 50257,
|
||||||
|
"is_multilingual": true,
|
||||||
|
"lang_to_id": {
|
||||||
|
"<|af|>": 50327,
|
||||||
|
"<|am|>": 50334,
|
||||||
|
"<|ar|>": 50272,
|
||||||
|
"<|as|>": 50350,
|
||||||
|
"<|az|>": 50304,
|
||||||
|
"<|ba|>": 50355,
|
||||||
|
"<|be|>": 50330,
|
||||||
|
"<|bg|>": 50292,
|
||||||
|
"<|bn|>": 50302,
|
||||||
|
"<|bo|>": 50347,
|
||||||
|
"<|br|>": 50309,
|
||||||
|
"<|bs|>": 50315,
|
||||||
|
"<|ca|>": 50270,
|
||||||
|
"<|cs|>": 50283,
|
||||||
|
"<|cy|>": 50297,
|
||||||
|
"<|da|>": 50285,
|
||||||
|
"<|de|>": 50261,
|
||||||
|
"<|el|>": 50281,
|
||||||
|
"<|en|>": 50259,
|
||||||
|
"<|es|>": 50262,
|
||||||
|
"<|et|>": 50307,
|
||||||
|
"<|eu|>": 50310,
|
||||||
|
"<|fa|>": 50300,
|
||||||
|
"<|fi|>": 50277,
|
||||||
|
"<|fo|>": 50338,
|
||||||
|
"<|fr|>": 50265,
|
||||||
|
"<|gl|>": 50319,
|
||||||
|
"<|gu|>": 50333,
|
||||||
|
"<|haw|>": 50352,
|
||||||
|
"<|ha|>": 50354,
|
||||||
|
"<|he|>": 50279,
|
||||||
|
"<|hi|>": 50276,
|
||||||
|
"<|hr|>": 50291,
|
||||||
|
"<|ht|>": 50339,
|
||||||
|
"<|hu|>": 50286,
|
||||||
|
"<|hy|>": 50312,
|
||||||
|
"<|id|>": 50275,
|
||||||
|
"<|is|>": 50311,
|
||||||
|
"<|it|>": 50274,
|
||||||
|
"<|ja|>": 50266,
|
||||||
|
"<|jw|>": 50356,
|
||||||
|
"<|ka|>": 50329,
|
||||||
|
"<|kk|>": 50316,
|
||||||
|
"<|km|>": 50323,
|
||||||
|
"<|kn|>": 50306,
|
||||||
|
"<|ko|>": 50264,
|
||||||
|
"<|la|>": 50294,
|
||||||
|
"<|lb|>": 50345,
|
||||||
|
"<|ln|>": 50353,
|
||||||
|
"<|lo|>": 50336,
|
||||||
|
"<|lt|>": 50293,
|
||||||
|
"<|lv|>": 50301,
|
||||||
|
"<|mg|>": 50349,
|
||||||
|
"<|mi|>": 50295,
|
||||||
|
"<|mk|>": 50308,
|
||||||
|
"<|ml|>": 50296,
|
||||||
|
"<|mn|>": 50314,
|
||||||
|
"<|mr|>": 50320,
|
||||||
|
"<|ms|>": 50282,
|
||||||
|
"<|mt|>": 50343,
|
||||||
|
"<|my|>": 50346,
|
||||||
|
"<|ne|>": 50313,
|
||||||
|
"<|nl|>": 50271,
|
||||||
|
"<|nn|>": 50342,
|
||||||
|
"<|no|>": 50288,
|
||||||
|
"<|oc|>": 50328,
|
||||||
|
"<|pa|>": 50321,
|
||||||
|
"<|pl|>": 50269,
|
||||||
|
"<|ps|>": 50340,
|
||||||
|
"<|pt|>": 50267,
|
||||||
|
"<|ro|>": 50284,
|
||||||
|
"<|ru|>": 50263,
|
||||||
|
"<|sa|>": 50344,
|
||||||
|
"<|sd|>": 50332,
|
||||||
|
"<|si|>": 50322,
|
||||||
|
"<|sk|>": 50298,
|
||||||
|
"<|sl|>": 50305,
|
||||||
|
"<|sn|>": 50324,
|
||||||
|
"<|so|>": 50326,
|
||||||
|
"<|sq|>": 50317,
|
||||||
|
"<|sr|>": 50303,
|
||||||
|
"<|su|>": 50357,
|
||||||
|
"<|sv|>": 50273,
|
||||||
|
"<|sw|>": 50318,
|
||||||
|
"<|ta|>": 50287,
|
||||||
|
"<|te|>": 50299,
|
||||||
|
"<|tg|>": 50331,
|
||||||
|
"<|th|>": 50289,
|
||||||
|
"<|tk|>": 50341,
|
||||||
|
"<|tl|>": 50348,
|
||||||
|
"<|tr|>": 50268,
|
||||||
|
"<|tt|>": 50351,
|
||||||
|
"<|uk|>": 50280,
|
||||||
|
"<|ur|>": 50290,
|
||||||
|
"<|uz|>": 50337,
|
||||||
|
"<|vi|>": 50278,
|
||||||
|
"<|yi|>": 50335,
|
||||||
|
"<|yo|>": 50325,
|
||||||
|
"<|yue|>": 50358,
|
||||||
|
"<|zh|>": 50260
|
||||||
|
},
|
||||||
|
"language": null,
|
||||||
|
"max_initial_timestamp_index": 50,
|
||||||
|
"max_length": 448,
|
||||||
|
"no_timestamps_token_id": 50364,
|
||||||
|
"pad_token_id": 50257,
|
||||||
|
"prev_sot_token_id": 50362,
|
||||||
|
"return_timestamps": false,
|
||||||
|
"suppress_tokens": [
|
||||||
|
1,
|
||||||
|
2,
|
||||||
|
7,
|
||||||
|
8,
|
||||||
|
9,
|
||||||
|
10,
|
||||||
|
14,
|
||||||
|
25,
|
||||||
|
26,
|
||||||
|
27,
|
||||||
|
28,
|
||||||
|
29,
|
||||||
|
31,
|
||||||
|
58,
|
||||||
|
59,
|
||||||
|
60,
|
||||||
|
61,
|
||||||
|
62,
|
||||||
|
63,
|
||||||
|
90,
|
||||||
|
91,
|
||||||
|
92,
|
||||||
|
93,
|
||||||
|
359,
|
||||||
|
503,
|
||||||
|
522,
|
||||||
|
542,
|
||||||
|
873,
|
||||||
|
893,
|
||||||
|
902,
|
||||||
|
918,
|
||||||
|
922,
|
||||||
|
931,
|
||||||
|
1350,
|
||||||
|
1853,
|
||||||
|
1982,
|
||||||
|
2460,
|
||||||
|
2627,
|
||||||
|
3246,
|
||||||
|
3253,
|
||||||
|
3268,
|
||||||
|
3536,
|
||||||
|
3846,
|
||||||
|
3961,
|
||||||
|
4183,
|
||||||
|
4667,
|
||||||
|
6585,
|
||||||
|
6647,
|
||||||
|
7273,
|
||||||
|
9061,
|
||||||
|
9383,
|
||||||
|
10428,
|
||||||
|
10929,
|
||||||
|
11938,
|
||||||
|
12033,
|
||||||
|
12331,
|
||||||
|
12562,
|
||||||
|
13793,
|
||||||
|
14157,
|
||||||
|
14635,
|
||||||
|
15265,
|
||||||
|
15618,
|
||||||
|
16553,
|
||||||
|
16604,
|
||||||
|
18362,
|
||||||
|
18956,
|
||||||
|
20075,
|
||||||
|
21675,
|
||||||
|
22520,
|
||||||
|
26130,
|
||||||
|
26161,
|
||||||
|
26435,
|
||||||
|
28279,
|
||||||
|
29464,
|
||||||
|
31650,
|
||||||
|
32302,
|
||||||
|
32470,
|
||||||
|
36865,
|
||||||
|
42863,
|
||||||
|
47425,
|
||||||
|
49870,
|
||||||
|
50254,
|
||||||
|
50258,
|
||||||
|
50359,
|
||||||
|
50360,
|
||||||
|
50361,
|
||||||
|
50362,
|
||||||
|
50363
|
||||||
|
],
|
||||||
|
"task": "transcribe",
|
||||||
|
"task_to_id": {
|
||||||
|
"transcribe": 50360,
|
||||||
|
"translate": 50359
|
||||||
|
},
|
||||||
|
"transformers_version": "4.48.1"
|
||||||
|
}
|
||||||
50001
merges.txt
Normal file
50001
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
3
model-00001-of-00002.safetensors
Normal file
3
model-00001-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:b0bc430a9ed5ec1a8104464aefb91490d7d8406b3ea764e1e8c40754b359c2e3
|
||||||
|
size 4993448880
|
||||||
3
model-00002-of-00002.safetensors
Normal file
3
model-00002-of-00002.safetensors
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:2f4c644bc86b826132dda1f1f1eaa6904362ecbd0760c8df1744422bf2ba51af
|
||||||
|
size 1180663192
|
||||||
1266
model.safetensors.index.json
Normal file
1266
model.safetensors.index.json
Normal file
File diff suppressed because it is too large
Load Diff
1742
normalizer.json
Normal file
1742
normalizer.json
Normal file
File diff suppressed because it is too large
Load Diff
14
preprocessor_config.json
Normal file
14
preprocessor_config.json
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
{
|
||||||
|
"chunk_length": 30,
|
||||||
|
"feature_extractor_type": "WhisperFeatureExtractor",
|
||||||
|
"feature_size": 128,
|
||||||
|
"hop_length": 160,
|
||||||
|
"n_fft": 400,
|
||||||
|
"n_samples": 480000,
|
||||||
|
"nb_max_frames": 3000,
|
||||||
|
"padding_side": "right",
|
||||||
|
"padding_value": 0.0,
|
||||||
|
"processor_class": "WhisperProcessor",
|
||||||
|
"return_attention_mask": false,
|
||||||
|
"sampling_rate": 16000
|
||||||
|
}
|
||||||
139
special_tokens_map.json
Normal file
139
special_tokens_map.json
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
{
|
||||||
|
"additional_special_tokens": [
|
||||||
|
"<|startoftranscript|>",
|
||||||
|
"<|en|>",
|
||||||
|
"<|zh|>",
|
||||||
|
"<|de|>",
|
||||||
|
"<|es|>",
|
||||||
|
"<|ru|>",
|
||||||
|
"<|ko|>",
|
||||||
|
"<|fr|>",
|
||||||
|
"<|ja|>",
|
||||||
|
"<|pt|>",
|
||||||
|
"<|tr|>",
|
||||||
|
"<|pl|>",
|
||||||
|
"<|ca|>",
|
||||||
|
"<|nl|>",
|
||||||
|
"<|ar|>",
|
||||||
|
"<|sv|>",
|
||||||
|
"<|it|>",
|
||||||
|
"<|id|>",
|
||||||
|
"<|hi|>",
|
||||||
|
"<|fi|>",
|
||||||
|
"<|vi|>",
|
||||||
|
"<|he|>",
|
||||||
|
"<|uk|>",
|
||||||
|
"<|el|>",
|
||||||
|
"<|ms|>",
|
||||||
|
"<|cs|>",
|
||||||
|
"<|ro|>",
|
||||||
|
"<|da|>",
|
||||||
|
"<|hu|>",
|
||||||
|
"<|ta|>",
|
||||||
|
"<|no|>",
|
||||||
|
"<|th|>",
|
||||||
|
"<|ur|>",
|
||||||
|
"<|hr|>",
|
||||||
|
"<|bg|>",
|
||||||
|
"<|lt|>",
|
||||||
|
"<|la|>",
|
||||||
|
"<|mi|>",
|
||||||
|
"<|ml|>",
|
||||||
|
"<|cy|>",
|
||||||
|
"<|sk|>",
|
||||||
|
"<|te|>",
|
||||||
|
"<|fa|>",
|
||||||
|
"<|lv|>",
|
||||||
|
"<|bn|>",
|
||||||
|
"<|sr|>",
|
||||||
|
"<|az|>",
|
||||||
|
"<|sl|>",
|
||||||
|
"<|kn|>",
|
||||||
|
"<|et|>",
|
||||||
|
"<|mk|>",
|
||||||
|
"<|br|>",
|
||||||
|
"<|eu|>",
|
||||||
|
"<|is|>",
|
||||||
|
"<|hy|>",
|
||||||
|
"<|ne|>",
|
||||||
|
"<|mn|>",
|
||||||
|
"<|bs|>",
|
||||||
|
"<|kk|>",
|
||||||
|
"<|sq|>",
|
||||||
|
"<|sw|>",
|
||||||
|
"<|gl|>",
|
||||||
|
"<|mr|>",
|
||||||
|
"<|pa|>",
|
||||||
|
"<|si|>",
|
||||||
|
"<|km|>",
|
||||||
|
"<|sn|>",
|
||||||
|
"<|yo|>",
|
||||||
|
"<|so|>",
|
||||||
|
"<|af|>",
|
||||||
|
"<|oc|>",
|
||||||
|
"<|ka|>",
|
||||||
|
"<|be|>",
|
||||||
|
"<|tg|>",
|
||||||
|
"<|sd|>",
|
||||||
|
"<|gu|>",
|
||||||
|
"<|am|>",
|
||||||
|
"<|yi|>",
|
||||||
|
"<|lo|>",
|
||||||
|
"<|uz|>",
|
||||||
|
"<|fo|>",
|
||||||
|
"<|ht|>",
|
||||||
|
"<|ps|>",
|
||||||
|
"<|tk|>",
|
||||||
|
"<|nn|>",
|
||||||
|
"<|mt|>",
|
||||||
|
"<|sa|>",
|
||||||
|
"<|lb|>",
|
||||||
|
"<|my|>",
|
||||||
|
"<|bo|>",
|
||||||
|
"<|tl|>",
|
||||||
|
"<|mg|>",
|
||||||
|
"<|as|>",
|
||||||
|
"<|tt|>",
|
||||||
|
"<|haw|>",
|
||||||
|
"<|ln|>",
|
||||||
|
"<|ha|>",
|
||||||
|
"<|ba|>",
|
||||||
|
"<|jw|>",
|
||||||
|
"<|su|>",
|
||||||
|
"<|yue|>",
|
||||||
|
"<|translate|>",
|
||||||
|
"<|transcribe|>",
|
||||||
|
"<|startoflm|>",
|
||||||
|
"<|startofprev|>",
|
||||||
|
"<|nospeech|>",
|
||||||
|
"<|notimestamps|>"
|
||||||
|
],
|
||||||
|
"bos_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"eos_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"pad_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
},
|
||||||
|
"unk_token": {
|
||||||
|
"content": "<|endoftext|>",
|
||||||
|
"lstrip": false,
|
||||||
|
"normalized": false,
|
||||||
|
"rstrip": false,
|
||||||
|
"single_word": false
|
||||||
|
}
|
||||||
|
}
|
||||||
264883
tokenizer.json
Normal file
264883
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
12997
tokenizer_config.json
Normal file
12997
tokenizer_config.json
Normal file
File diff suppressed because it is too large
Load Diff
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user