Arabic-Whisper-CodeSwitchin…/README.md

---
library_name: transformers
license: gpl-3.0
datasets:
- MohamedRashad/arabic-english-code-switching
language:
- ar
- en
metrics:
- wer
pipeline_tag: automatic-speech-recognition
---

# 👳 Arabic-Whisper-CodeSwitching-Edition

This model is a fine-tuned version of [Whisper Large v2 by OpenAI](https://huggingface.co/openai/whisper-large-v2), trained on an [Arabic-English-code-switching](https://huggingface.co/datasets/MohamedRashad/arabic-english-code-switching) dataset.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6116d0584ef9fdfbf45dc4d9/w5AXicC8X3kK1AC30OVmH.png)

## 📝 Model Details

### Model Description

The Arabic-Whisper-CodeSwitching-Edition is designed to handle Arabic audio with embedded English words. This model enhances the original Whisper Large v2 by improving its performance on Arabic-English code-switching speech

- **Developed by:** العبد لله
- **Model type:** Speech Recognition
- **Language(s) (NLP):** Arabic, English (in the context of Arabic audio)
- **License:** GPL-3.0

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository for data collection:** https://github.com/MohamedAliRashad/youtube-audio-collector
- **Demo:** https://huggingface.co/spaces/MohamedRashad/Arabic-Whisper-CodeSwitching-Edition

## 👷 Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

The model can be used directly for transcribing Arabic speech that includes English words. It is particularly useful in multilingual environments where code-switching is common.

### Out-of-Scope Use

The model may not perform well on monolingual speech in languages other than Arabic or English, or on speech with code-switching in languages other than Arabic and English.

## 😨 Bias, Risks, and Limitations

### Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. More information needed for further recommendations.

## 🔍 How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import WhisperForConditionalGeneration, WhisperProcessor

processor = WhisperProcessor.from_pretrained("MohamedRashad/Arabic-Whisper-CodeSwitching-Edition")
model = WhisperForConditionalGeneration.from_pretrained("MohamedRashad/Arabic-Whisper-CodeSwitching-Edition")

# Example usage
inputs = processor("path_to_audio_file.wav", return_tensors="pt")
generated_ids = model.generate(inputs["input_features"])
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(transcription)
```

## 👨‍🎓 Citation

### BibTeX:
```bibtex
@misc{rashad2024arabicwhisper,
  title={Arabic-Whisper-CodeSwitching-Edition},
  author={Mohamed Rashad},
  year={2024},
  url={https://huggingface.co/spaces/MohamedRashad/Arabic-Whisper-CodeSwitching-Edition},
}
```

### APA:
Rashad, M. (2024). Arabic-Whisper-CodeSwitching-Edition. Retrieved from https://huggingface.co/spaces/MohamedRashad/Arabic-Whisper-CodeSwitching-Edition
初始化项目，由ModelHub XC社区提供模型 Model: MohamedRashad/Arabic-Whisper-CodeSwitching-Edition Source: Original Platform 2026-05-15 19:48:26 +08:00			`---`
			`library_name: transformers`
			`license: gpl-3.0`
			`datasets:`
			`- MohamedRashad/arabic-english-code-switching`
			`language:`
			`- ar`
			`- en`
			`metrics:`
			`- wer`
			`pipeline_tag: automatic-speech-recognition`
			`---`

			`# 👳 Arabic-Whisper-CodeSwitching-Edition`

			`This model is a fine-tuned version of [Whisper Large v2 by OpenAI](https://huggingface.co/openai/whisper-large-v2), trained on an [Arabic-English-code-switching](https://huggingface.co/datasets/MohamedRashad/arabic-english-code-switching) dataset.`

			`![image/png](https://cdn-uploads.huggingface.co/production/uploads/6116d0584ef9fdfbf45dc4d9/w5AXicC8X3kK1AC30OVmH.png)`

			`## 📝 Model Details`

			`### Model Description`

			`The Arabic-Whisper-CodeSwitching-Edition is designed to handle Arabic audio with embedded English words. This model enhances the original Whisper Large v2 by improving its performance on Arabic-English code-switching speech`

			`- Developed by: العبد لله`
			`- Model type: Speech Recognition`
			`- Language(s) (NLP): Arabic, English (in the context of Arabic audio)`
			`- License: GPL-3.0`

			`### Model Sources [optional]`

			`<!-- Provide the basic links for the model. -->`

			`- Repository for data collection: https://github.com/MohamedAliRashad/youtube-audio-collector`
			`- Demo: https://huggingface.co/spaces/MohamedRashad/Arabic-Whisper-CodeSwitching-Edition`

			`## 👷 Uses`

			`<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->`

			`### Direct Use`

			`The model can be used directly for transcribing Arabic speech that includes English words. It is particularly useful in multilingual environments where code-switching is common.`

			`### Out-of-Scope Use`

			`The model may not perform well on monolingual speech in languages other than Arabic or English, or on speech with code-switching in languages other than Arabic and English.`

			`## 😨 Bias, Risks, and Limitations`

			`### Recommendations`

			`Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. More information needed for further recommendations.`

			`## 🔍 How to Get Started with the Model`

			`Use the code below to get started with the model.`

			```python
			`from transformers import WhisperForConditionalGeneration, WhisperProcessor`

			`processor = WhisperProcessor.from_pretrained("MohamedRashad/Arabic-Whisper-CodeSwitching-Edition")`
			`model = WhisperForConditionalGeneration.from_pretrained("MohamedRashad/Arabic-Whisper-CodeSwitching-Edition")`

			`# Example usage`
			`inputs = processor("path_to_audio_file.wav", return_tensors="pt")`
			`generated_ids = model.generate(inputs["input_features"])`
			`transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)`
			`print(transcription)`
			```

			`## 👨‍🎓 Citation`

			`### BibTeX:`
			```bibtex
			`@misc{rashad2024arabicwhisper,`
			`title={Arabic-Whisper-CodeSwitching-Edition},`
			`author={Mohamed Rashad},`
			`year={2024},`
			`url={https://huggingface.co/spaces/MohamedRashad/Arabic-Whisper-CodeSwitching-Edition},`
			`}`
			```

			`### APA:`
			`Rashad, M. (2024). Arabic-Whisper-CodeSwitching-Edition. Retrieved from https://huggingface.co/spaces/MohamedRashad/Arabic-Whisper-CodeSwitching-Edition`