library_name, tags
library_name tags
transformers
automatic-speech-recognition
audio
darija
moroccan-arabic
whisper
fine-tuned

Model Card for Whisper Darija (Fine-Tuned)

This is a fine-tuned OpenAI Whisper small model on Moroccan Darija speech transcription. It is trained to transcribe Moroccan dialectal Arabic from audio.

Model Details

Model Description

This model is a fine-tuned version of giannitto/whisper-morocco-model using a dataset of Moroccan Darija audio and transcriptions. The fine-tuning process aimed to improve the model's Word Error Rate (WER) for spoken Darija, which is underrepresented in many multilingual speech models.

  • Developed by: Bentaleb Ali
  • Model type: Automatic Speech Recognition (ASR)
  • Language(s): Moroccan Darija (Arabic dialect)
  • License: Apache 2.0
  • Finetuned from model: giannitto/whisper-morocco-model

Model Sources

Uses

Direct Use

This model is intended for transcription of Moroccan Darija audio into text. It can be used in:

  • Voice assistants
  • Media subtitling
  • Dialectal speech processing
  • Linguistic research

Out-of-Scope Use

  • Translation tasks (this model is for transcription, not translation)
  • Other Arabic dialects outside Moroccan Darija

Bias, Risks, and Limitations

  • The model may perform poorly on noisy or low-quality recordings.
  • The model may not generalize well to other dialects of Arabic.
  • Biases in the training data (e.g., gender, age, region) may affect transcription accuracy.

Recommendations

Carefully evaluate outputs when using the model in sensitive applications. Avoid using it in high-risk domains without human verification.

How to Get Started with the Model

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torch, torchaudio

# Load model and processor
processor = AutoProcessor.from_pretrained("TaloCreations/whisper-darija-finetuned")
model = AutoModelForSpeechSeq2Seq.from_pretrained("TaloCreations/whisper-darija-finetuned")
model.eval()

speech, sr = torchaudio.load("path_to_record.wav")

if sr != 16000:
    resampler = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)
    speech = resampler(speech)

# Preprocess and generate
inputs = processor(speech[0], sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    generated_ids = model.generate(**inputs)
    transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("📢 Transcription:", transcription)

Training Details

Training Data

The model was trained on:

These datasets contain manually transcribed audio samples of Moroccan Darija.

Training Procedure

Preprocessing

  • All audio was resampled to 16kHz
  • Mel spectrograms were padded to 3000 frames (30s max)
  • Transcripts were tokenized and clipped to <=448 tokens
  • Decoder prompts were injected to ensure language/task alignment

Training Hyperparameters

  • Batch size: 8 (gradient accumulation = 2)
  • Epochs: 10
  • Learning rate: 2e-6
  • Mixed precision: fp16
  • Weight decay: 0.01
  • Warmup steps: 500

Evaluation

Testing Data, Factors & Metrics

Testing Data

A held-out subset (10%) of the training datasets.

Metrics

  • Word Error Rate (WER)

Results

📊 Training Progress

Epoch Training Loss Validation Loss Word Error Rate (WER)
1 0.905000 0.831409 0.825147
2 0.773200 0.712022 0.732625
3 0.658900 0.652096 0.631158
4 0.609100 0.608619 0.578152
5 0.548400 0.579711 0.546444
6 0.509700 0.561768 0.524927
7 0.482000 0.551717 0.522067
8 0.459400 0.545695 0.526979
9 0.446500 0.543017 0.497141
10 0.443200 0.542152 0.504545

Summary

After 10 epochs, the model achieved a WER of ~50%, a significant improvement over baseline multilingual Whisper models on Moroccan Darija.

Environmental Impact

Estimated based on training on a single A100 GPU for ~6.5 hours.

  • Hardware Type: A100
  • Hours used: ~6.5
  • Cloud Provider: Google Cloud (Colab)
  • Compute Region: Morocco

Technical Specifications

Model Architecture and Objective

  • Whisper (small) encoder-decoder architecture
  • Objective: sequence-to-sequence transcription

Compute Infrastructure

  • Google Colab Pro
  • 1x A100 GPU
  • PyTorch + Transformers 4.39

Citation

  title={Whisper Darija: Fine-tuned Whisper Model for Moroccan Arabic Speech},
  author={Bentaleb, Ali},
  year={2025},
}

Model Card Authors

Model Card Contact

Description
Model synced from source: TaloCreations/whisper-darija-finetuned
Readme 682 KiB
Languages
Text 100%