Go to file

ModelHub XC 1b676f7889 初始化项目，由ModelHub XC社区提供模型

Model: TaloCreations/whisper-darija-finetuned
Source: Original Platform

2026-05-08 11:34:43 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

model.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

normalizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

preprocessor_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-08 11:34:43 +08:00

README.md

library_name, tags

library_name

Model Card for Whisper Darija (Fine-Tuned)

This is a fine-tuned OpenAI Whisper small model on Moroccan Darija speech transcription. It is trained to transcribe Moroccan dialectal Arabic from audio.

Model Details

Model Description

This model is a fine-tuned version of giannitto/whisper-morocco-model using a dataset of Moroccan Darija audio and transcriptions. The fine-tuning process aimed to improve the model's Word Error Rate (WER) for spoken Darija, which is underrepresented in many multilingual speech models.

Developed by: Bentaleb Ali
Model type: Automatic Speech Recognition (ASR)
Language(s): Moroccan Darija (Arabic dialect)
License: Apache 2.0
Finetuned from model: giannitto/whisper-morocco-model

Model Sources

Repository: https://huggingface.co/TaloCreations/whisper-darija-finetuned

Uses

Direct Use

This model is intended for transcription of Moroccan Darija audio into text. It can be used in:

Voice assistants
Media subtitling
Dialectal speech processing
Linguistic research

Out-of-Scope Use

Translation tasks (this model is for transcription, not translation)
Other Arabic dialects outside Moroccan Darija

Bias, Risks, and Limitations

The model may perform poorly on noisy or low-quality recordings.
The model may not generalize well to other dialects of Arabic.
Biases in the training data (e.g., gender, age, region) may affect transcription accuracy.

Recommendations

Carefully evaluate outputs when using the model in sensitive applications. Avoid using it in high-risk domains without human verification.

How to Get Started with the Model

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torch, torchaudio

# Load model and processor
processor = AutoProcessor.from_pretrained("TaloCreations/whisper-darija-finetuned")
model = AutoModelForSpeechSeq2Seq.from_pretrained("TaloCreations/whisper-darija-finetuned")
model.eval()

speech, sr = torchaudio.load("path_to_record.wav")

if sr != 16000:
    resampler = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)
    speech = resampler(speech)

# Preprocess and generate
inputs = processor(speech[0], sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    generated_ids = model.generate(**inputs)
    transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("📢 Transcription:", transcription)

Training Details

Training Data

The model was trained on:

These datasets contain manually transcribed audio samples of Moroccan Darija.

Training Procedure

Preprocessing

All audio was resampled to 16kHz
Mel spectrograms were padded to 3000 frames (30s max)
Transcripts were tokenized and clipped to <=448 tokens
Decoder prompts were injected to ensure language/task alignment

Training Hyperparameters

Batch size: 8 (gradient accumulation = 2)
Epochs: 10
Learning rate: 2e-6
Mixed precision: fp16
Weight decay: 0.01
Warmup steps: 500

Evaluation

Testing Data, Factors & Metrics

Testing Data

A held-out subset (10%) of the training datasets.

Metrics

Word Error Rate (WER)

Results

📊 Training Progress

Epoch	Training Loss	Validation Loss	Word Error Rate (WER)
1	0.905000	0.831409	0.825147
2	0.773200	0.712022	0.732625
3	0.658900	0.652096	0.631158
4	0.609100	0.608619	0.578152
5	0.548400	0.579711	0.546444
6	0.509700	0.561768	0.524927
7	0.482000	0.551717	0.522067
8	0.459400	0.545695	0.526979
9	0.446500	0.543017	0.497141
10	0.443200	0.542152	0.504545

Summary

After 10 epochs, the model achieved a WER of ~50%, a significant improvement over baseline multilingual Whisper models on Moroccan Darija.

Environmental Impact

Estimated based on training on a single A100 GPU for ~6.5 hours.

Hardware Type: A100
Hours used: ~6.5
Cloud Provider: Google Cloud (Colab)
Compute Region: Morocco

Technical Specifications

Model Architecture and Objective

Whisper (small) encoder-decoder architecture
Objective: sequence-to-sequence transcription

Compute Infrastructure

Google Colab Pro
1x A100 GPU
PyTorch + Transformers 4.39

Citation

  title={Whisper Darija: Fine-tuned Whisper Model for Moroccan Arabic Speech},
  author={Bentaleb, Ali},
  year={2025},
}

Model Card Authors

Ali Bentaleb @TaloCreations

Model Card Contact

📧 alitennis131800@gmail.com