初始化项目，由ModelHub XC社区提供模型

Model: TaloCreations/whisper-darija-finetuned Source: Original Platform
2026-05-08 11:34:43 +08:00
commit 1b676f7889
12 changed files with 117285 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,177 @@
+---
+library_name: transformers
+tags:
+- automatic-speech-recognition
+- audio
+- darija
+- moroccan-arabic
+- whisper
+- fine-tuned
+---
+
+# Model Card for Whisper Darija (Fine-Tuned)
+
+This is a fine-tuned [OpenAI Whisper small model](https://huggingface.co/openai/whisper-small) on Moroccan Darija speech transcription. It is trained to transcribe Moroccan dialectal Arabic from audio.
+
+## Model Details
+
+### Model Description
+
+This model is a fine-tuned version of `giannitto/whisper-morocco-model` using a dataset of Moroccan Darija audio and transcriptions. The fine-tuning process aimed to improve the model's Word Error Rate (WER) for spoken Darija, which is underrepresented in many multilingual speech models.
+
+- **Developed by:** Bentaleb Ali
+- **Model type:** Automatic Speech Recognition (ASR)
+- **Language(s):** Moroccan Darija (Arabic dialect)
+- **License:** Apache 2.0
+- **Finetuned from model:** giannitto/whisper-morocco-model
+
+### Model Sources
+
+- **Repository:** https://huggingface.co/TaloCreations/whisper-darija-finetuned
+
+## Uses
+
+### Direct Use
+
+This model is intended for transcription of Moroccan Darija audio into text. It can be used in:
+- Voice assistants
+- Media subtitling
+- Dialectal speech processing
+- Linguistic research
+
+### Out-of-Scope Use
+
+- Translation tasks (this model is for transcription, not translation)
+- Other Arabic dialects outside Moroccan Darija
+
+## Bias, Risks, and Limitations
+
+- The model may perform poorly on noisy or low-quality recordings.
+- The model may not generalize well to other dialects of Arabic.
+- Biases in the training data (e.g., gender, age, region) may affect transcription accuracy.
+
+### Recommendations
+
+Carefully evaluate outputs when using the model in sensitive applications. Avoid using it in high-risk domains without human verification.
+
+## How to Get Started with the Model
+
+```python
+from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
+import torch, torchaudio
+
+# Load model and processor
+processor = AutoProcessor.from_pretrained("TaloCreations/whisper-darija-finetuned")
+model = AutoModelForSpeechSeq2Seq.from_pretrained("TaloCreations/whisper-darija-finetuned")
+model.eval()
+
+speech, sr = torchaudio.load("path_to_record.wav")
+
+if sr != 16000:
+    resampler = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)
+    speech = resampler(speech)
+
+# Preprocess and generate
+inputs = processor(speech[0], sampling_rate=16000, return_tensors="pt")
+with torch.no_grad():
+    generated_ids = model.generate(**inputs)
+    transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
+
+print("📢 Transcription:", transcription)
+
+```
+
+## Training Details
+
+### Training Data
+
+The model was trained on:
+- [atlasia/DODa-audio-dataset Viewer](https://huggingface.co/datasets/atlasia/DODa-audio-dataset)
+- [adiren7/darija_speech_to_text](https://huggingface.co/datasets/adiren7/darija_speech_to_text)
+
+These datasets contain manually transcribed audio samples of Moroccan Darija.
+
+### Training Procedure
+
+#### Preprocessing
+- All audio was resampled to 16kHz
+- Mel spectrograms were padded to 3000 frames (30s max)
+- Transcripts were tokenized and clipped to <=448 tokens
+- Decoder prompts were injected to ensure language/task alignment
+
+#### Training Hyperparameters
+- Batch size: 8 (gradient accumulation = 2)
+- Epochs: 10
+- Learning rate: 2e-6
+- Mixed precision: fp16
+- Weight decay: 0.01
+- Warmup steps: 500
+
+## Evaluation
+
+### Testing Data, Factors & Metrics
+
+#### Testing Data
+A held-out subset (10%) of the training datasets.
+
+#### Metrics
+- Word Error Rate (WER)
+
+### Results
+
+### 📊 Training Progress
+
+| Epoch | Training Loss | Validation Loss | Word Error Rate (WER) |
+|-------|----------------|------------------|------------------------|
+| 1     | 0.905000       | 0.831409         | 0.825147               |
+| 2     | 0.773200       | 0.712022         | 0.732625               |
+| 3     | 0.658900       | 0.652096         | 0.631158               |
+| 4     | 0.609100       | 0.608619         | 0.578152               |
+| 5     | 0.548400       | 0.579711         | 0.546444               |
+| 6     | 0.509700       | 0.561768         | 0.524927               |
+| 7     | 0.482000       | 0.551717         | 0.522067               |
+| 8     | 0.459400       | 0.545695         | 0.526979               |
+| 9     | 0.446500       | 0.543017         | 0.497141               |
+| 10    | 0.443200       | 0.542152         | 0.504545               |
+
+
+
+#### Summary
+After 10 epochs, the model achieved a WER of ~50%, a significant improvement over baseline multilingual Whisper models on Moroccan Darija.
+
+## Environmental Impact
+
+Estimated based on training on a single A100 GPU for ~6.5 hours.
+
+- **Hardware Type:** A100
+- **Hours used:** ~6.5
+- **Cloud Provider:** Google Cloud (Colab)
+- **Compute Region:** Morocco
+
+## Technical Specifications
+
+### Model Architecture and Objective
+- Whisper (small) encoder-decoder architecture
+- Objective: sequence-to-sequence transcription
+
+### Compute Infrastructure
+- Google Colab Pro
+- 1x A100 GPU
+- PyTorch + Transformers 4.39
+
+## Citation
+
+```bibtex
+  title={Whisper Darija: Fine-tuned Whisper Model for Moroccan Arabic Speech},
+  author={Bentaleb, Ali},
+  year={2025},
+}
+```
+
+## Model Card Authors
+- Ali Bentaleb [@TaloCreations](https://huggingface.co/TaloCreations)
+
+
+## Model Card Contact
+- 📧 alitennis131800@gmail.com
+