初始化项目,由ModelHub XC社区提供模型
Model: UBC-NLP/Simba-W Source: Original Platform
This commit is contained in:
211
.ipynb_checkpoints/README-checkpoint.md
Normal file
211
.ipynb_checkpoints/README-checkpoint.md
Normal file
@@ -0,0 +1,211 @@
|
||||
---
|
||||
language:
|
||||
- am # Amharic
|
||||
- ar # Arabic
|
||||
- tw # Asante Twi
|
||||
- bm # Bambara
|
||||
- fr # French
|
||||
- lg # Ganda
|
||||
- ha # Hausa
|
||||
- ig # Igbo
|
||||
- rw # Kinyarwanda
|
||||
- kg # Kongo
|
||||
- ln # Lingala
|
||||
- lu # Luba-Katanga
|
||||
- mg # Malagasy
|
||||
- nso # Northern Sotho
|
||||
- ny # Nyanja
|
||||
- om # Oromo
|
||||
- pt # Portuguese
|
||||
- sn # Shona
|
||||
- so # Somali
|
||||
- st # Southern Sotho
|
||||
- sw # Swahili
|
||||
- ss # Swati
|
||||
- ti # Tigrinya
|
||||
- ts # Tsonga
|
||||
- tn # Tswana
|
||||
- ak # Twi
|
||||
- ve # Venda
|
||||
- wo # Wolof
|
||||
- xh # Xhosa
|
||||
- yo # Yoruba
|
||||
- zu # Zulu
|
||||
- tzm # Tamazight
|
||||
- sg # Sango
|
||||
- din # Dinka
|
||||
- ee # Ewe
|
||||
- fo # Fon
|
||||
- luo # Luo
|
||||
- mos # Mossi
|
||||
- umb # Umbundu
|
||||
license: cc-by-4.0
|
||||
tags:
|
||||
- automatic-speech-recognition
|
||||
- audio
|
||||
- speech
|
||||
- african-languages
|
||||
- multilingual
|
||||
- simba
|
||||
- low-resource
|
||||
- speech-recognition
|
||||
- asr
|
||||
datasets:
|
||||
- UBC-NLP/SimbaBench
|
||||
metrics:
|
||||
- wer
|
||||
- cer
|
||||
library_name: transformers
|
||||
pipeline_tag: automatic-speech-recognition
|
||||
---
|
||||
<div align="center">
|
||||
|
||||
<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
|
||||
|
||||
|
||||
[](https://aclanthology.org/2025.emnlp-main.559/)
|
||||
[](https://africa.dlnlp.ai/simba/)
|
||||
[](#simbabench)
|
||||
[](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
|
||||
[](#demo)
|
||||
|
||||
</div>
|
||||
|
||||
## *Bridging the Digital Divide for African AI*
|
||||
|
||||
**Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
|
||||
|
||||
## Best-in-Class Multilingual Models
|
||||
|
||||
Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
|
||||
|
||||
- **Unified Suite:** Models optimized for African languages.
|
||||
- **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
|
||||
- **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
|
||||
- **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
|
||||
|
||||
The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
|
||||
|
||||
### 🗣️✍️ Simba-ASR
|
||||
> **The New Standard for African Speech-to-Text**
|
||||
|
||||
**🎯 Task** `Automatic Speech Recognition` — Powering high-accuracy transcription across the continent.
|
||||
|
||||
**🌍 Language Coverage (43 African languages)**
|
||||
> **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **Baoulé** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`).
|
||||
|
||||
**🏗️ Base Architectures**
|
||||
|
||||
- **Simba-S** (SeamlessM4T-v2-MT) — *Top Performer*
|
||||
- **Simba-W** (Whisper-v3-large)
|
||||
- **Simba-X** (Wav2Vec2-XLS-R-2b)
|
||||
- **Simba-M** (MMS-1b-all)
|
||||
- **Simba-H** (AfriHuBERT)
|
||||
|
||||
🌐 Explore the Frontier
|
||||
|
||||
| **ASR Models** | **Architecture** | **#Parameters** | **🤗 Hugging Face Model Card** | **Status** |
|
||||
|---------|:------------------:| :------------------:| :------------------:|:------------------:|
|
||||
| 🔥**Simba-S**🔥| SeamlessM4T-v2 | 2.3B | 🤗 [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | ✅ Released |
|
||||
| 🔥**Simba-W**🔥| Whisper | 1.5B | 🤗 [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | ✅ Released |
|
||||
| 🔥**Simba-X**🔥| Wav2Vec2 | 1B | 🤗 [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | ✅ Released |
|
||||
| 🔥**Simba-M**🔥| MMS | 1B | 🤗 [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | ✅ Released |
|
||||
| 🔥**Simba-H**🔥| HuBERT | 94M | 🤗 [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | ✅ Released |
|
||||
|
||||
* **Simba-S** emerged as the best-performing ASR model overall.
|
||||
|
||||
|
||||
**🧩 Usage Example**
|
||||
|
||||
You can easily run inference using the Hugging Face `transformers` library.
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
# Load Simba-S for ASR
|
||||
asr_pipeline = pipeline(
|
||||
"automatic-speech-recognition",
|
||||
model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
|
||||
)
|
||||
|
||||
##### Load the multilingual African adapter (Only for `UBC-NLP/Simba-M`)
|
||||
asr_pipeline.model.load_adapter("multilingual_african") # Only for `UBC-NLP/Simba-M`
|
||||
###########################
|
||||
|
||||
# Transcribe audio from file
|
||||
result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
|
||||
print(result["text"])
|
||||
|
||||
|
||||
# Transcribe audio from audio array
|
||||
result = asr_pipeline({
|
||||
"array": audio_array,
|
||||
"sampling_rate": 16_000
|
||||
})
|
||||
print(result["text"])
|
||||
|
||||
```
|
||||
|
||||
#### Example Outputs
|
||||
|
||||
Using the same audio file with different Simba models:
|
||||
|
||||
```python
|
||||
# Simba-S
|
||||
{'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'}
|
||||
```
|
||||
|
||||
```python
|
||||
# Simba-W
|
||||
{'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'}
|
||||
```
|
||||
|
||||
```python
|
||||
# Simba-X
|
||||
{'text': 'fator fr on ar taamsodr is'}
|
||||
```
|
||||
|
||||
```python
|
||||
# Simba-M
|
||||
{'text': 'watter veronwaardiging sodaar in ons binniste gewees het'}
|
||||
```
|
||||
|
||||
```python
|
||||
# Simba-H
|
||||
{'text': 'watter vironwaardiging so daar in ons binneste geweeshet'}
|
||||
```
|
||||
|
||||
Get started with Simba models in minutes using our interactive Colab notebook: [](https://github.com/UBC-NLP/simba/edit/main/simba_models.ipynb)
|
||||
|
||||
|
||||
## Citation
|
||||
|
||||
If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
|
||||
|
||||
```bibtex
|
||||
|
||||
@inproceedings{elmadany-etal-2025-voice,
|
||||
title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
|
||||
author = "Elmadany, AbdelRahim A. and
|
||||
Kwon, Sang Yun and
|
||||
Toyin, Hawau Olamide and
|
||||
Alcoba Inciarte, Alcides and
|
||||
Aldarmaki, Hanan and
|
||||
Abdul-Mageed, Muhammad",
|
||||
editor = "Christodoulopoulos, Christos and
|
||||
Chakraborty, Tanmoy and
|
||||
Rose, Carolyn and
|
||||
Peng, Violet",
|
||||
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
|
||||
month = nov,
|
||||
year = "2025",
|
||||
address = "Suzhou, China",
|
||||
publisher = "Association for Computational Linguistics",
|
||||
url = "https://aclanthology.org/2025.emnlp-main.559/",
|
||||
doi = "10.18653/v1/2025.emnlp-main.559",
|
||||
pages = "11039--11061",
|
||||
ISBN = "979-8-89176-332-6",
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
50
.ipynb_checkpoints/config-checkpoint.json
Normal file
50
.ipynb_checkpoints/config-checkpoint.json
Normal file
@@ -0,0 +1,50 @@
|
||||
{
|
||||
"_name_or_path": "openai/whisper-large-v3",
|
||||
"activation_dropout": 0.0,
|
||||
"activation_function": "gelu",
|
||||
"apply_spec_augment": false,
|
||||
"architectures": [
|
||||
"WhisperForConditionalGeneration"
|
||||
],
|
||||
"attention_dropout": 0.0,
|
||||
"begin_suppress_tokens": [
|
||||
220,
|
||||
50257
|
||||
],
|
||||
"bos_token_id": 50257,
|
||||
"classifier_proj_size": 256,
|
||||
"d_model": 1280,
|
||||
"decoder_attention_heads": 20,
|
||||
"decoder_ffn_dim": 5120,
|
||||
"decoder_layerdrop": 0.0,
|
||||
"decoder_layers": 32,
|
||||
"decoder_start_token_id": 50258,
|
||||
"dropout": 0.0,
|
||||
"encoder_attention_heads": 20,
|
||||
"encoder_ffn_dim": 5120,
|
||||
"encoder_layerdrop": 0.0,
|
||||
"encoder_layers": 32,
|
||||
"eos_token_id": 50257,
|
||||
"init_std": 0.02,
|
||||
"is_encoder_decoder": true,
|
||||
"mask_feature_length": 10,
|
||||
"mask_feature_min_masks": 0,
|
||||
"mask_feature_prob": 0.0,
|
||||
"mask_time_length": 10,
|
||||
"mask_time_min_masks": 2,
|
||||
"mask_time_prob": 0.05,
|
||||
"max_length": 448,
|
||||
"max_source_positions": 1500,
|
||||
"max_target_positions": 448,
|
||||
"median_filter_width": 7,
|
||||
"model_type": "whisper",
|
||||
"num_hidden_layers": 32,
|
||||
"num_mel_bins": 128,
|
||||
"pad_token_id": 50256,
|
||||
"scale_embedding": false,
|
||||
"torch_dtype": "float16",
|
||||
"transformers_version": "4.48.1",
|
||||
"use_cache": true,
|
||||
"use_weighted_layer_sum": false,
|
||||
"vocab_size": 51866
|
||||
}
|
||||
Reference in New Issue
Block a user