--- library_name: transformers tags: - SmolLM-3B - Arabic language: - ar metrics: - chrf base_model: - HuggingFaceTB/SmolLM3-3B pipeline_tag: text-generation --- # Model Card for unige-fti/Aladdin-3B Multidialectal Arabic generation and translation model fine-tuned for dialect fidelity and diglossia. ## Model Details ### Model Description - **Base model:** SmolLM3-3B - **Architecture:** Decoder-only causal transformer (SmolLM architecture) - **Parameters:** ~3B - **Language coverage:** Arabic dialects, Modern Standard Arabic (MSA), English Primary tasks: - Dialectal Arabic generation - Bidirectional translation (DA ↔ MSA ↔ English) - Controlled generation conditioned on dialect instructions This model was fine-tuned by the Aladdin-FTI team for the AMIYA shared task to jointly optimize: - Machine translation (semantic adequacy & diglossia) ``` Instruction-formatted prompts: Translate from English into Egyptian Arabic: ``` - Instruction-conditioned generation (dialect fidelity) ``` Complete the sentence in Moroccan Arabic: ``` The objective balances meaning preservation and dialect naturalness in Arabic diglossia settings. ### Model Sources - **Repository:** [Github repository](https://github.com/drvenabili/mtfinetune_amiya/tree/main) - **Paper:** [https://arxiv.org/abs/2602.16290](https://arxiv.org/abs/2602.16290) ## How to Get Started with the Model TODO ## Training Details ### Training Data: Closed-track training data only. Datasets span multiple dialect regions and domains Parallel corpora: - SauDial - Casablanca corpus - JODA - UFAL Levantine - DODA - Atlas Monolingual dialect corpora: - MADAR - Shami - Saudi Tweets - EDGAD / EDC - HABIBI lyrics ## Citation If you use this model in your research, please cite the following paper: ``` @inproceedings{mutal2026aladdinfti, title = {Aladdin-FTI @ AMIYA: Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation}, author = {Mutal, Jonathan and Al Almaoui, Perla and Hengchen, Simon and Bouillon, Pierrette}, booktitle = {Proceedings of the AMIYA Shared Task, co-located with VarDial at EACL 2026}, year = {2026}, address = {Rabat, Morocco}, publisher = {Association for Computational Linguistics}, } ``` ## Compute infrastructure The computations were performed at the University of Geneva using the Baobab HPC service.