---
library_name: transformers
tags:
- automatic-speech-recognition
- audio
- darija
- moroccan-arabic
- whisper
- fine-tuned
---

# Model Card for Whisper Darija (Fine-Tuned)

This is a fine-tuned [OpenAI Whisper small model](https://huggingface.co/openai/whisper-small) on Moroccan Darija speech transcription. It is trained to transcribe Moroccan dialectal Arabic from audio.

## Model Details

### Model Description

This model is a fine-tuned version of `giannitto/whisper-morocco-model` using a dataset of Moroccan Darija audio and transcriptions. The fine-tuning process aimed to improve the model's Word Error Rate (WER) for spoken Darija, which is underrepresented in many multilingual speech models.

- **Developed by:** Bentaleb Ali
- **Model type:** Automatic Speech Recognition (ASR)
- **Language(s):** Moroccan Darija (Arabic dialect)
- **License:** Apache 2.0
- **Finetuned from model:** giannitto/whisper-morocco-model

### Model Sources

- **Repository:** https://huggingface.co/TaloCreations/whisper-darija-finetuned

## Uses

### Direct Use

This model is intended for transcription of Moroccan Darija audio into text. It can be used in:
- Voice assistants
- Media subtitling
- Dialectal speech processing
- Linguistic research

### Out-of-Scope Use

- Translation tasks (this model is for transcription, not translation)
- Other Arabic dialects outside Moroccan Darija

## Bias, Risks, and Limitations

- The model may perform poorly on noisy or low-quality recordings.
- The model may not generalize well to other dialects of Arabic.
- Biases in the training data (e.g., gender, age, region) may affect transcription accuracy.

### Recommendations

Carefully evaluate outputs when using the model in sensitive applications. Avoid using it in high-risk domains without human verification.

## How to Get Started with the Model

```python
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torch, torchaudio

# Load model and processor
processor = AutoProcessor.from_pretrained("TaloCreations/whisper-darija-finetuned")
model = AutoModelForSpeechSeq2Seq.from_pretrained("TaloCreations/whisper-darija-finetuned")
model.eval()

speech, sr = torchaudio.load("path_to_record.wav")

if sr != 16000:
    resampler = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)
    speech = resampler(speech)

# Preprocess and generate
inputs = processor(speech[0], sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    generated_ids = model.generate(**inputs)
    transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("📢 Transcription:", transcription)

```

## Training Details

### Training Data

The model was trained on:
- [atlasia/DODa-audio-dataset Viewer](https://huggingface.co/datasets/atlasia/DODa-audio-dataset)
- [adiren7/darija_speech_to_text](https://huggingface.co/datasets/adiren7/darija_speech_to_text)

These datasets contain manually transcribed audio samples of Moroccan Darija.

### Training Procedure

#### Preprocessing
- All audio was resampled to 16kHz
- Mel spectrograms were padded to 3000 frames (30s max)
- Transcripts were tokenized and clipped to <=448 tokens
- Decoder prompts were injected to ensure language/task alignment

#### Training Hyperparameters
- Batch size: 8 (gradient accumulation = 2)
- Epochs: 10
- Learning rate: 2e-6
- Mixed precision: fp16
- Weight decay: 0.01
- Warmup steps: 500

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data
A held-out subset (10%) of the training datasets.

#### Metrics
- Word Error Rate (WER)

### Results

### 📊 Training Progress

| Epoch | Training Loss | Validation Loss | Word Error Rate (WER) |
|-------|----------------|------------------|------------------------|
| 1     | 0.905000       | 0.831409         | 0.825147               |
| 2     | 0.773200       | 0.712022         | 0.732625               |
| 3     | 0.658900       | 0.652096         | 0.631158               |
| 4     | 0.609100       | 0.608619         | 0.578152               |
| 5     | 0.548400       | 0.579711         | 0.546444               |
| 6     | 0.509700       | 0.561768         | 0.524927               |
| 7     | 0.482000       | 0.551717         | 0.522067               |
| 8     | 0.459400       | 0.545695         | 0.526979               |
| 9     | 0.446500       | 0.543017         | 0.497141               |
| 10    | 0.443200       | 0.542152         | 0.504545               |


#### Summary
After 10 epochs, the model achieved a WER of ~50%, a significant improvement over baseline multilingual Whisper models on Moroccan Darija.

## Environmental Impact

Estimated based on training on a single A100 GPU for ~6.5 hours.

- **Hardware Type:** A100
- **Hours used:** ~6.5
- **Cloud Provider:** Google Cloud (Colab)
- **Compute Region:** Morocco

## Technical Specifications

### Model Architecture and Objective
- Whisper (small) encoder-decoder architecture
- Objective: sequence-to-sequence transcription

### Compute Infrastructure
- Google Colab Pro
- 1x A100 GPU
- PyTorch + Transformers 4.39

## Citation

```bibtex
  title={Whisper Darija: Fine-tuned Whisper Model for Moroccan Arabic Speech},
  author={Bentaleb, Ali},
  year={2025},
}
```

## Model Card Authors
- Ali Bentaleb [@TaloCreations](https://huggingface.co/TaloCreations)


## Model Card Contact
- 📧 alitennis131800@gmail.com