初始化项目,由ModelHub XC社区提供模型

Model: FaisaI/tadabur-Whisper-Small
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-08 16:19:43 +08:00
commit 1dd146187c
19 changed files with 239385 additions and 0 deletions

35
.gitattributes vendored Normal file
View File

@@ -0,0 +1,35 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text

118
README.md Normal file
View File

@@ -0,0 +1,118 @@
---
base_model:
- openai/whisper-small
datasets:
- FaisaI/tadabur
language:
- ar
license: cc-by-nc-4.0
metrics:
- wer
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
- quran
- asr
- arabic
- speech-recognition
---
<div align="center">
<img src="https://huggingface.co/datasets/FaisaI/tadabur/resolve/main/tadabur_logo.png" width="100"><br><br>
<h1>Tadabur-Whisper-Small</h1>
A Whisper Small model fine-tuned on [Tadabur](https://huggingface.co/datasets/FaisaI/tadabur) for Qur'anic speech recognition.
[![Paper](https://img.shields.io/badge/Paper-Read-a27b5c?style=flat-square)](https://huggingface.co/papers/2604.18932)
[![Dataset](https://img.shields.io/badge/🤗_Dataset-FaisaI%2Ftadabur-c8a97a?style=flat-square)](https://huggingface.co/datasets/FaisaI/tadabur)
[![Base Model](https://img.shields.io/badge/Base-Whisper_Small-1c1f1e?style=flat-square)](https://huggingface.co/openai/whisper-small)
[![License](https://img.shields.io/badge/License-CC_BY--NC_4.0-e6ddd0?style=flat-square)](https://creativecommons.org/licenses/by-nc/4.0/)
[![Page](https://img.shields.io/badge/🌐_Project_Page-tadabur-a27b5c?style=flat-square)](https://fherran.github.io/tadabur)
</div>
---
## Overview
**Tadabur-Whisper-Small** is a fine-tuned version of [Whisper Small](https://huggingface.co/openai/whisper-small) on the [Tadabur dataset](https://huggingface.co/datasets/FaisaI/tadabur), as presented in the paper [Tadabur: A Large-Scale Quran Audio Dataset](https://huggingface.co/papers/2604.18932).
- **GitHub Repository:** [fherran/tadabur](https://github.com/fherran/tadabur)
- **Project Page:** [fherran.github.io/tadabur](https://fherran.github.io/tadabur)
---
## Training Iteration
| Step | Epoch | WER ↓ |
|:---:|:---:|:---:|
| 2,500 | 0.15 | 13.78% |
| 5,000 | 0.30 | 11.20% |
| 7,500 | 0.44 | 11.15% |
| 25,000 | 1.48 | **7.89%** ⭐ |
| 32,500 | 1.93 | 14.75% |
---
## Usage
```python
from transformers import pipeline
asr = pipeline(
"automatic-speech-recognition",
model="FaisaI/tadabur-whisper-small",
generate_kwargs={"language": "arabic"}
)
result = asr("path/to/audiofile")
print(result["text"])
```
Or with the full Whisper API:
```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa
processor = WhisperProcessor.from_pretrained("FaisaI/tadabur-whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("FaisaI/tadabur-whisper-small")
# Audio must be 16kHz mono
audio_array, sampling_rate = librosa.load("path/to/audiofile", sr=16000,mono=True)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(**inputs, language="arabic")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
```
---
## Limitations
- Not suitable for speaker identification or diarization.
- May underperform on noisy or low-quality recordings.
- Not fully generalized — transcription errors are expected.
---
## Ethical Considerations
This model is trained exclusively on Qur'anic recitation data. Users must engage with outputs respectfully and must not use this model for mockery, distortion, or any disrespectful application involving Qur'anic content.
**For research and educational use only.**
---
## Citation
```bibtex
@misc{alherran2026tadabur,
author = {Alherran, Faisal},
title = {Tadabur: A Large-Scale Quran Audio Dataset},
year = {2026},
eprint = {2604.18932},
archivePrefix = {arXiv},
primaryClass = {cs.SD},
doi = {10.48550/arXiv.2604.18932},
url = {https://arxiv.org/abs/2604.18932}
}
```

1609
added_tokens.json Normal file

File diff suppressed because it is too large Load Diff

60
config.json Normal file
View File

@@ -0,0 +1,60 @@
{
"activation_dropout": 0.0,
"activation_function": "gelu",
"apply_spec_augment": false,
"architectures": [
"WhisperForConditionalGeneration"
],
"attention_dropout": 0.0,
"begin_suppress_tokens": null,
"bos_token_id": 50257,
"classifier_proj_size": 256,
"d_model": 768,
"decoder_attention_heads": 12,
"decoder_ffn_dim": 3072,
"decoder_layerdrop": 0.0,
"decoder_layers": 12,
"decoder_start_token_id": 50258,
"dropout": 0.0,
"dtype": "float32",
"encoder_attention_heads": 12,
"encoder_ffn_dim": 3072,
"encoder_layerdrop": 0.0,
"encoder_layers": 12,
"eos_token_id": 50257,
"forced_decoder_ids": [
[
1,
50259
],
[
2,
50359
],
[
3,
50363
]
],
"init_std": 0.02,
"is_encoder_decoder": true,
"mask_feature_length": 10,
"mask_feature_min_masks": 0,
"mask_feature_prob": 0.0,
"mask_time_length": 10,
"mask_time_min_masks": 2,
"mask_time_prob": 0.05,
"max_length": null,
"max_source_positions": 1500,
"max_target_positions": 448,
"median_filter_width": 7,
"model_type": "whisper",
"num_hidden_layers": 12,
"num_mel_bins": 80,
"pad_token_id": 50257,
"scale_embedding": false,
"transformers_version": "4.57.6",
"use_cache": true,
"use_weighted_layer_sum": false,
"vocab_size": 51865
}

255
generation_config.json Normal file
View File

@@ -0,0 +1,255 @@
{
"alignment_heads": [
[
5,
3
],
[
5,
9
],
[
8,
0
],
[
8,
4
],
[
8,
7
],
[
8,
8
],
[
9,
0
],
[
9,
7
],
[
9,
9
],
[
10,
5
]
],
"begin_suppress_tokens": [
220,
50257
],
"bos_token_id": 50257,
"decoder_start_token_id": 50258,
"eos_token_id": 50257,
"forced_decoder_ids": null,
"is_multilingual": true,
"lang_to_id": {
"<|af|>": 50327,
"<|am|>": 50334,
"<|ar|>": 50272,
"<|as|>": 50350,
"<|az|>": 50304,
"<|ba|>": 50355,
"<|be|>": 50330,
"<|bg|>": 50292,
"<|bn|>": 50302,
"<|bo|>": 50347,
"<|br|>": 50309,
"<|bs|>": 50315,
"<|ca|>": 50270,
"<|cs|>": 50283,
"<|cy|>": 50297,
"<|da|>": 50285,
"<|de|>": 50261,
"<|el|>": 50281,
"<|en|>": 50259,
"<|es|>": 50262,
"<|et|>": 50307,
"<|eu|>": 50310,
"<|fa|>": 50300,
"<|fi|>": 50277,
"<|fo|>": 50338,
"<|fr|>": 50265,
"<|gl|>": 50319,
"<|gu|>": 50333,
"<|haw|>": 50352,
"<|ha|>": 50354,
"<|he|>": 50279,
"<|hi|>": 50276,
"<|hr|>": 50291,
"<|ht|>": 50339,
"<|hu|>": 50286,
"<|hy|>": 50312,
"<|id|>": 50275,
"<|is|>": 50311,
"<|it|>": 50274,
"<|ja|>": 50266,
"<|jw|>": 50356,
"<|ka|>": 50329,
"<|kk|>": 50316,
"<|km|>": 50323,
"<|kn|>": 50306,
"<|ko|>": 50264,
"<|la|>": 50294,
"<|lb|>": 50345,
"<|ln|>": 50353,
"<|lo|>": 50336,
"<|lt|>": 50293,
"<|lv|>": 50301,
"<|mg|>": 50349,
"<|mi|>": 50295,
"<|mk|>": 50308,
"<|ml|>": 50296,
"<|mn|>": 50314,
"<|mr|>": 50320,
"<|ms|>": 50282,
"<|mt|>": 50343,
"<|my|>": 50346,
"<|ne|>": 50313,
"<|nl|>": 50271,
"<|nn|>": 50342,
"<|no|>": 50288,
"<|oc|>": 50328,
"<|pa|>": 50321,
"<|pl|>": 50269,
"<|ps|>": 50340,
"<|pt|>": 50267,
"<|ro|>": 50284,
"<|ru|>": 50263,
"<|sa|>": 50344,
"<|sd|>": 50332,
"<|si|>": 50322,
"<|sk|>": 50298,
"<|sl|>": 50305,
"<|sn|>": 50324,
"<|so|>": 50326,
"<|sq|>": 50317,
"<|sr|>": 50303,
"<|su|>": 50357,
"<|sv|>": 50273,
"<|sw|>": 50318,
"<|ta|>": 50287,
"<|te|>": 50299,
"<|tg|>": 50331,
"<|th|>": 50289,
"<|tk|>": 50341,
"<|tl|>": 50348,
"<|tr|>": 50268,
"<|tt|>": 50351,
"<|uk|>": 50280,
"<|ur|>": 50290,
"<|uz|>": 50337,
"<|vi|>": 50278,
"<|yi|>": 50335,
"<|yo|>": 50325,
"<|zh|>": 50260
},
"language": "ar",
"max_initial_timestamp_index": 50,
"max_length": 448,
"no_timestamps_token_id": 50363,
"pad_token_id": 50257,
"prev_sot_token_id": 50361,
"return_timestamps": false,
"suppress_tokens": [
1,
2,
7,
8,
9,
10,
14,
25,
26,
27,
28,
29,
31,
58,
59,
60,
61,
62,
63,
90,
91,
92,
93,
359,
503,
522,
542,
873,
893,
902,
918,
922,
931,
1350,
1853,
1982,
2460,
2627,
3246,
3253,
3268,
3536,
3846,
3961,
4183,
4667,
6585,
6647,
7273,
9061,
9383,
10428,
10929,
11938,
12033,
12331,
12562,
13793,
14157,
14635,
15265,
15618,
16553,
16604,
18362,
18956,
20075,
21675,
22520,
26130,
26161,
26435,
28279,
29464,
31650,
32302,
32470,
36865,
42863,
47425,
49870,
50254,
50258,
50360,
50361,
50362
],
"task": "transcribe",
"task_to_id": {
"transcribe": 50359,
"translate": 50358
},
"transformers_version": "4.57.6"
}

50001
merges.txt Normal file

File diff suppressed because it is too large Load Diff

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bce0dc1d2bc315af2a0abd51c59dac270ef3408df9434bc5df0f8d56f0808369
size 966995080

1742
normalizer.json Normal file

File diff suppressed because it is too large Load Diff

3
optimizer.pt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a0e05912a26086e73fb5e6ea9fc0e8bad54176ec6755471d1587c49a06a9c00b
size 737545552

15
preprocessor_config.json Normal file
View File

@@ -0,0 +1,15 @@
{
"chunk_length": 30,
"dither": 0.0,
"feature_extractor_type": "WhisperFeatureExtractor",
"feature_size": 80,
"hop_length": 160,
"n_fft": 400,
"n_samples": 480000,
"nb_max_frames": 3000,
"padding_side": "right",
"padding_value": 0.0,
"processor_class": "WhisperProcessor",
"return_attention_mask": false,
"sampling_rate": 16000
}

3
rng_state.pth Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0db7c158d9d59b9e4ee29fd00b507ea48b077ec1175610df61ee115918c9d443
size 14244

3
scaler.pt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:016eb7cac1a334a0918c2d5da18ae02546365f5ffdb8ea01d14ad7c36b6b3ec3
size 988

3
scheduler.pt Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:264ca169b8a57638c0ccc455ee554a3ddebf781108e9b3865aaa2243be553cfc
size 1064

139
special_tokens_map.json Normal file
View File

@@ -0,0 +1,139 @@
{
"additional_special_tokens": [
"<|endoftext|>",
"<|startoftranscript|>",
"<|en|>",
"<|zh|>",
"<|de|>",
"<|es|>",
"<|ru|>",
"<|ko|>",
"<|fr|>",
"<|ja|>",
"<|pt|>",
"<|tr|>",
"<|pl|>",
"<|ca|>",
"<|nl|>",
"<|ar|>",
"<|sv|>",
"<|it|>",
"<|id|>",
"<|hi|>",
"<|fi|>",
"<|vi|>",
"<|he|>",
"<|uk|>",
"<|el|>",
"<|ms|>",
"<|cs|>",
"<|ro|>",
"<|da|>",
"<|hu|>",
"<|ta|>",
"<|no|>",
"<|th|>",
"<|ur|>",
"<|hr|>",
"<|bg|>",
"<|lt|>",
"<|la|>",
"<|mi|>",
"<|ml|>",
"<|cy|>",
"<|sk|>",
"<|te|>",
"<|fa|>",
"<|lv|>",
"<|bn|>",
"<|sr|>",
"<|az|>",
"<|sl|>",
"<|kn|>",
"<|et|>",
"<|mk|>",
"<|br|>",
"<|eu|>",
"<|is|>",
"<|hy|>",
"<|ne|>",
"<|mn|>",
"<|bs|>",
"<|kk|>",
"<|sq|>",
"<|sw|>",
"<|gl|>",
"<|mr|>",
"<|pa|>",
"<|si|>",
"<|km|>",
"<|sn|>",
"<|yo|>",
"<|so|>",
"<|af|>",
"<|oc|>",
"<|ka|>",
"<|be|>",
"<|tg|>",
"<|sd|>",
"<|gu|>",
"<|am|>",
"<|yi|>",
"<|lo|>",
"<|uz|>",
"<|fo|>",
"<|ht|>",
"<|ps|>",
"<|tk|>",
"<|nn|>",
"<|mt|>",
"<|sa|>",
"<|lb|>",
"<|my|>",
"<|bo|>",
"<|tl|>",
"<|mg|>",
"<|as|>",
"<|tt|>",
"<|haw|>",
"<|ln|>",
"<|ha|>",
"<|ba|>",
"<|jw|>",
"<|su|>",
"<|translate|>",
"<|transcribe|>",
"<|startoflm|>",
"<|startofprev|>",
"<|nocaptions|>",
"<|notimestamps|>"
],
"bos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

114853
tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

12989
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff

7291
trainer_state.json Normal file

File diff suppressed because it is too large Load Diff

3
training_args.bin Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e4ff8d14dab0bf68638771ad20ed208924e0e6a6290e198f6f2e0e2dbae605f3
size 5624

50260
vocab.json Normal file

File diff suppressed because it is too large Load Diff