Aladdin-3B/README.md

---
library_name: transformers
tags:
- SmolLM-3B
- Arabic
language:
- ar
metrics:
- chrf
base_model:
- HuggingFaceTB/SmolLM3-3B
pipeline_tag: text-generation
---

# Model Card for unige-fti/Aladdin-3B

Multidialectal Arabic generation and translation model fine-tuned for dialect fidelity and diglossia.

## Model Details

### Model Description

- **Base model:** SmolLM3-3B
- **Architecture:** Decoder-only causal transformer (SmolLM architecture)
- **Parameters:** ~3B
- **Language coverage:** Arabic dialects, Modern Standard Arabic (MSA), English

Primary tasks:
- Dialectal Arabic generation
- Bidirectional translation (DA ↔ MSA ↔ English)
- Controlled generation conditioned on dialect instructions

This model was fine-tuned by the Aladdin-FTI team for the AMIYA shared task to jointly optimize:

- Machine translation (semantic adequacy & diglossia)
```
Instruction-formatted prompts:

Translate from English into Egyptian Arabic:
<SOURCE>
```
- Instruction-conditioned generation (dialect fidelity)
```
Complete the sentence in Moroccan Arabic:
<PREFIX>

```

The objective balances meaning preservation and dialect naturalness in Arabic diglossia settings.


### Model Sources

- **Repository:** [Github repository](https://github.com/drvenabili/mtfinetune_amiya/tree/main)
- **Paper:** [https://arxiv.org/abs/2602.16290](https://arxiv.org/abs/2602.16290)


## How to Get Started with the Model

TODO

## Training Details


### Training Data: Closed-track training data only. 
Datasets span multiple dialect regions and domains

Parallel corpora:
- SauDial
- Casablanca corpus
- JODA
- UFAL Levantine
- DODA
- Atlas

Monolingual dialect corpora:
- MADAR
- Shami
- Saudi Tweets
- EDGAD / EDC
- HABIBI lyrics

## Citation

If you use this model in your research, please cite the following paper:

```
@inproceedings{mutal2026aladdinfti,
  title     = {Aladdin-FTI @ AMIYA: Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation},
  author    = {Mutal, Jonathan and Al Almaoui, Perla and Hengchen, Simon and Bouillon, Pierrette},
  booktitle = {Proceedings of the AMIYA Shared Task, co-located with VarDial at EACL 2026},
  year      = {2026},
  address   = {Rabat, Morocco},
  publisher = {Association for Computational Linguistics},
}
```

## Compute infrastructure

The computations were performed at the University of Geneva using the Baobab HPC service.
初始化项目，由ModelHub XC社区提供模型 Model: unige-fti/Aladdin-3B Source: Original Platform 2026-05-28 12:20:21 +08:00			`---`
			`library_name: transformers`
			`tags:`
			`- SmolLM-3B`
			`- Arabic`
			`language:`
			`- ar`
			`metrics:`
			`- chrf`
			`base_model:`
			`- HuggingFaceTB/SmolLM3-3B`
			`pipeline_tag: text-generation`
			`---`

			`# Model Card for unige-fti/Aladdin-3B`

			`Multidialectal Arabic generation and translation model fine-tuned for dialect fidelity and diglossia.`

			`## Model Details`

			`### Model Description`

			`- Base model: SmolLM3-3B`
			`- Architecture: Decoder-only causal transformer (SmolLM architecture)`
			`- Parameters: ~3B`
			`- Language coverage: Arabic dialects, Modern Standard Arabic (MSA), English`

			`Primary tasks:`
			`- Dialectal Arabic generation`
			`- Bidirectional translation (DA ↔ MSA ↔ English)`
			`- Controlled generation conditioned on dialect instructions`

			`This model was fine-tuned by the Aladdin-FTI team for the AMIYA shared task to jointly optimize:`

			`- Machine translation (semantic adequacy & diglossia)`
			```
			`Instruction-formatted prompts:`

			`Translate from English into Egyptian Arabic:`
			`<SOURCE>`
			```
			`- Instruction-conditioned generation (dialect fidelity)`
			```
			`Complete the sentence in Moroccan Arabic:`
			`<PREFIX>`

			```

			`The objective balances meaning preservation and dialect naturalness in Arabic diglossia settings.`


			`### Model Sources`

			`- Repository: [Github repository](https://github.com/drvenabili/mtfinetune_amiya/tree/main)`
			`- Paper: [https://arxiv.org/abs/2602.16290](https://arxiv.org/abs/2602.16290)`


			`## How to Get Started with the Model`

			`TODO`

			`## Training Details`


			`### Training Data: Closed-track training data only.`
			`Datasets span multiple dialect regions and domains`

			`Parallel corpora:`
			`- SauDial`
			`- Casablanca corpus`
			`- JODA`
			`- UFAL Levantine`
			`- DODA`
			`- Atlas`

			`Monolingual dialect corpora:`
			`- MADAR`
			`- Shami`
			`- Saudi Tweets`
			`- EDGAD / EDC`
			`- HABIBI lyrics`

			`## Citation`

			`If you use this model in your research, please cite the following paper:`

			```
			`@inproceedings{mutal2026aladdinfti,`
			`title = {Aladdin-FTI @ AMIYA: Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation},`
			`author = {Mutal, Jonathan and Al Almaoui, Perla and Hengchen, Simon and Bouillon, Pierrette},`
			`booktitle = {Proceedings of the AMIYA Shared Task, co-located with VarDial at EACL 2026},`
			`year = {2026},`
			`address = {Rabat, Morocco},`
			`publisher = {Association for Computational Linguistics},`
			`}`
			```

			`## Compute infrastructure`

			`The computations were performed at the University of Geneva using the Baobab HPC service.`