--- license: apache-2.0 language: - ca - es - en base_model: gplsi/Aitana-2B-SI-Instruct tags: - valencian - spanish - english - text-generation - instruct - dpo - alignment - alia - gplsi datasets: - nvidia/HelpSteer3 - OpenAssistant/oasst1 - OpenAssistant/oasst2 - Open-Orca/OpenOrca library_name: transformers pipeline_tag: text-generation --- # Aitana-2B-SI-Instruct-Aligned **Aitana-2B-SI-Instruct-Aligned** is a DPO-aligned instruction-tuned generative language model from the **Aitana family**, developed by the [GPLSI (Language and Information Systems Group)](https://gplsi.dlsi.ua.es/) at the University of Alicante. Built on [gplsi/Aitana-2B-SI-Instruct](https://huggingface.co/gplsi/Aitana-2B-SI-Instruct), this model has been further aligned using Direct Preference Optimization (DPO) to improve response quality and alignment with human preferences across Valencian, Spanish, and English. ## Table of Contents - [Model Description](#model-description) - [Alignment Details](#alignment-details) - [Training Data](#training-data) - [Intended Uses](#intended-uses) - [How to Use](#how-to-use) - [Evaluation](#evaluation) - [Additional Information](#additional-information) ## Model Description | Property | Value | |----------|-------| | **Base Model** | [gplsi/Aitana-2B-SI-Instruct](https://huggingface.co/gplsi/Aitana-2B-S-Instruct) | | **Architecture** | Transformer decoder-only | | **Parameters** | ~2.25B | | **Languages** | Valencian, Spanish, English | | **License** | Apache 2.0 | Aitana-2B-SI-Instruct-Aligned extends the Aitana-2B-SI-Instruct instruction-tuned model with Direct Preference Optimization (DPO) alignment. This additional training stage improves the model's ability to generate helpful, high-quality responses that better align with human preferences while maintaining its strong multilingual capabilities. ## Alignment Details The model was aligned using Direct Preference Optimization (DPO) with the following configuration: | Hyperparameter | Value | |----------------|-------| | **Method** | DPO (Direct Preference Optimization) | | **Learning rate** | 5e-6 | | **Epochs** | 1 | | **Beta** | 0.1 | | **LR Scheduler** | Linear | | **Total Samples** | 146,180 | | **English Samples** | 80,308 | | **Spanish Samples** | 30,072 | | **Valencian Samples** | 35,800 | | **Languages** | Spanish, Valencian, English | The DPO alignment was performed using curated preference pairs that teach the model to prefer more helpful, accurate, and well-structured responses. ## Training Data The base instruction model was trained on the ALIA Instruction/v12 dataset. This DPO-aligned variant was further aligned using the Alignment/v8 dataset, composed of the following preference data: | Dataset ID | Name | Languages | Source | |------------|------|-----------|--------| | al1 | HelpSteer3 | EN, ES | [nvidia/HelpSteer3](https://huggingface.co/datasets/nvidia/HelpSteer3) | | al2 | OpenAssistant1 (OASST1) | EN, ES, RU (+32 more) | [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) | | al3 | OpenAssistant2 (OASST2) | EN, ES, RU (+32 more) | [OpenAssistant/oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2) | | al4 | OpenOrca | EN | [Open-Orca/OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) | | al5 | OASST2 Valenciano | VA | — | The alignment data focused on English, Spanish, and Valencian preference pairs, with the distribution: 80,308 English, 30,072 Spanish, and 35,800 Valencian samples. ## Intended Uses This model can be used for: - **Instruction following** in Valencian, Spanish, and English with improved alignment to human preferences - **Chat and conversational applications** requiring high-quality multilingual responses - **Text generation** with task-specific prompting and improved output quality - **Domain-specific applications** in administrative, legal, or tourism contexts > **Note**: As an aligned instruction-tuned model, it is designed to follow user prompts and generate helpful, safe responses. It is not intended for use as a factual knowledge base. The DPO alignment improves response quality and preference alignment. ## How to Use ### Transformers ```python import torch from transformers import pipeline, AutoTokenizer model_id = "gplsi/Aitana-2B-SI-Instruct-Aligned" tokenizer = AutoTokenizer.from_pretrained(model_id) generator = pipeline( "text-generation", model=model_id, tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto", ) # Valencian example text = "Explica què són les Corts Valencianes i quina funció tenen." result = generator(text, do_sample=True, top_k=10, max_new_tokens=100) print(result[0]['generated_text']) # Spanish example text = "Describe las principales funciones del gobierno autonómico valenciano." result = generator(text, do_sample=True, top_k=10, max_new_tokens=100) print(result[0]['generated_text']) # English example text = "Explain the role of tourism in the Valencian Community economy." result = generator(text, do_sample=True, top_k=10, max_new_tokens=100) print(result[0]['generated_text']) ``` ## Evaluation In the following tables, we present the results obtained with different benchmarks from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) in comparison with [Salamandra-2B-Instruct](https://huggingface.co/BSC-LT/Salamandra-2B-Instruct), and [Aitana-2B-S-Instruct-Aligned](https://huggingface.co/gplsi/Aitana-2B-S-Instruct-Aligned). The results reflect the DPO-aligned instruction-tuned performance. ### Valencian #### Classification Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) | |------------------------------|--------|----------------------------|-------------|---------------|-----------------------|-----------------------| | XNLI | va |Natural Language Inference | acc | **0.520** | 0.514 | 0.485 | #### Generation Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) | |------------------------------|--------|----------------------------|-------------|---------------|-----------------------|-----------------------| | Cocoteros | va |Reading Comprehension | bleu | 2.796 | 3.612 | **4.223** | | Phrases ca-va | va-ca |Translation - Adaptation | bleu | 58.425 | **74.538** | 68.305 | | Phrases va-ca | va-ca |Translation - Adaptation | bleu | 70.660 | **71.691** | 69.551 | | Phrases va-es | va-es |Translation | bleu | 65.427 | **72.097** | 70.061 | | Phrases es-va | es-va |Translation | bleu | 45.688 | **56.012** | 54.053 | | Truthfulqa_va | va | Truthfulness | bleu_acc | **0.409** | 0.394 | 0.383 | ### Catalan #### Classification Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) | |------------------------------|--------|---------------------------|-------------|---------------|-----------------------|-----------------------| | Belebele Cat_latn | ca | Reading Comprehension | acc | 0.287 | 0.248 | **0.319** | | COPA | ca | Commonsense Reasoning | acc | 0.708 | **0.726** | 0.694 | | XStoryCloze | ca | Commonsense Reasoning | acc | 0.616 | **0.629** | 0.623 | | OpenBookQA | ca | Question Answering | acc | 0.296 | 0.296 | **0.326** | | PAWS | ca | Paraphrasing | acc | **0.602** | 0.598 | 0.531 | | PiQA | ca | Question Answering | acc | 0.638 | **0.655** | 0.629 | | ARC Easy | ca | Question Answering | acc | 0.516 | 0.524 | **0.526** | | ARC Challenge | ca | Question Answering | acc | 0.298 | **0.314** | 0.310 | | XNLI | ca | Natural Language Inference| acc | 0.513 | **0.515** | 0.497 | | Teca | ca | Natural Language Inference| acc | 0.486 | **0.500** | 0.468 | | WNLI | ca | Natural Language Inference| acc | **0.563** | 0.437 | 0.436 | | Catcola | ca | Linguistic Acceptability | acc | 0.492 | **0.713** | 0.680 | | Catcola | ca | Linguistic Acceptability | mcc | **0.097** | -0.040 | 0.013 | | Catalanqa | ca | Question Answering | F1 | **0.516** | 0.384 | 0.396 | | Mgsm direct | ca | Math | exact match | 0.000 | **0.012** | 0.004 | | Catalanqa | ca | Question Answering | exact match | **0.182** | 0.011 | 0.031 | | Xquad | ca | Question Answering | exact match | **0.103** | 0.014 | 0.037 | | Xquad | ca | Question Answering | F1 | **0.394** | 0.287 | 0.317 | #### Generation Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) | |------------------------------|--------|--------------------------|--------|----------------|-----------------------|-----------------------| | Cabreu abstractive | ca | Summarization | bleu | 7.610 | 7.703 | **8.837** | | Cabreu extractive | ca | Summarization | bleu | **38.002** | 19.876 | 28.16803 | | Cabreu extreme | ca | Summarization | bleu | 2.733 | 3.245 | **3.386** | ### Spanish #### Classification Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) | |------------------------------|--------|---------------------------|-------------|---------------|-----------------------|-----------------------| | Belebele | es | Reading Comprehension | acc | 0.268 | 0.244 | **0.285** | | PAWS | es | Paraphrasing | acc | 0.566 | **0.618** | 0.546 | | XNLI | es | Natural Language Inference| acc | **0.463** | 0.439 | 0.443 | | WNLI | es | Natural Language Inference| acc | 0.479 | **0.535** | **0.535** | | XStoryCloze | es | Commonsense Reasoning | acc | 0.617 | 0.628 | **0.632** | | Escola | es | Linguistic Acceptability | acc | 0.293 | **0.708** | 0.654 | | Escola | es | Linguistic Acceptability | mcc | 0.020 | 0.000 | **0.046** | | OpenbookQA | es | Question Answering | acc | 0.286 | **0.338** | 0.332 | | MGSM Direct | es | Math | exact match | 0.020 | 0.024 | **0.1** | | XQUAD | es | Question Answering | exact match | **0.066** | 0.026 | 0.019 | | XQUAD | es | Question Answering | F1 | **0.355** | 0.293 | 0.293 | #### Generation Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) | |------------------------------|--------|---------------------|---------|----------------|-----------------------|-----------------------| | Cocoteros | es |Reading Comprehension| bleu | 3.308 | 3.141 | **3.670** | | XLSum | es | Summarization | bleu | 1.695 | 1.737 | **1.971** | ### English #### Classification Benchmarks | Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) | |------------------------------|--------|----------------------------|-------------|---------------|-----------------------|-----------------------| | Arc Challenge | en | Question Answering | acc | 0.354 | 0.363 | **0.372** | | Arc Easy | en | Question Answering | acc | 0.681 | **0.709** | 0.682 | | Belebele | en | Reading Comprehension | acc | 0.260 | 0.293 | **0.349** | | PAWS | en | Paraphrasing | acc | **0.597** | 0.594 | 0.555 | | XNLI | en | Natural Language Inference | acc | 0.512 | **0.553** | 0.480 | | XStoryCloze | en | Commonsense Reasoning | acc | 0.662 | 0.680 | **0.693** | | OpenBookQA | en | Question Answering | acc | 0.298 | **0.338** | 0.316 | | PiQA | en | Question Answering | acc | 0.715 | **0.717** | 0.704 | | Social iqa | en | Question Answering | acc | 0.453 | 0.451 | **0.468** | | WNLI | en | Natural Language Inference | acc | **0.535** | 0.465 | 0.451 | | MGSM Direct | en | Math | exact match | 0.008 | 0.052 | **0.116** | | TriviaQA | en | Question Answering | exact match | 0.076 | 0.147 | **0.156** | ### Judge Evaluation The model was also evaluated using an LLM-as-judge approach across different task categories. The scores below represent the average rating (1-5 scale, 5 being best) and standard deviation for each task category, comparing Aitana-2B-SI-Instruct-Aligned against Salamandra-2B-Instruct and Aitana-2B-S-Instruct-Aligned. | Task Category | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) | |---------------|------------------------|-----------------------------------|-----------------------| | CommonSense reasoning | 2.277 / 1.151 | 2.737 / 1.140 | **2.969 / 1.086** | | Maths | 1.060 / 0.124 | 1.123 / 0.249 | **1.191 / 0.349** | | Paraphrasing | **3.518 / 1.308** | 3.460 / 1.088 | 3.472 / 0.959 | | Reading comprehension | **2.966 / 1.111** | 2.894 / 1.311 | **3.112 / 1.146** | | Summarization | 2.217 / 1.068 | 2.261 / 0.820 | **2.591 / 1.115** | | Translation | **3.557 / 0.760** | 3.418 / 0.999 | 3.390 / 0.730 | | **Overall Avg** | 2.599 / 0.920 | 2.649 / 0.935 | **2.787 / 0.897** | The DPO-aligned model shows a notable improvement in overall average score (2.787) compared to Aitana-2B-S-Instruct-Aligned (v0.1) (2.649) and Salamandra-2B-Instruct (2.599) with particular gains in CommonSense reasoning, reading comprehension and summarization. The aligned model also shows tighter standard deviations in several categories, indicating more consistent quality responses. ## Additional Information ### Author The model has been developed by the **Language and [Information Systems Group (GPLSI)](https://gplsi.dlsi.ua.es/)** and the **[Centro de Inteligencia Digital (CENID)](https://cenid.es)**, both part of the **[University of Alicante (UA)](https://www.ua.es/es/)**, as part of their ongoing research in **Natural Language Processing (NLP)**. ### Funding This work is funded by the **Ministerio para la Transformación Digital y de la Función Pública**, co-financed by the **EU – NextGenerationEU**, within the framework of the project *Desarrollo de Modelos ALIA*. This work has also been partially supported by Project HEART-NLP (PID2024-156263OB-C22). ### Acknowledgments We would like to express our gratitude to all individuals and institutions that have contributed to the development of this work. Special thanks to: - [Language Technologies Laboratory at Barcelona Supercomputing Center](https://www.bsc.es/es/discover-bsc/organisation/research-structure/language-technologies-laboratory) - [Centro Vasco de Tecnología de la Lengua (HiTZ)](https://www.hitz.eus/es) - [Centro Singular de Investigación en Tecnologías Inteligentes (CiTIUS)](https://citius.gal/) - [Sistemas Inteligentes de Acceso a la Información (SINAI)](https://www.ujaen.es/investigacion-y-transferencia/grupos-de-investigacion/sistemas-inteligentes-de-acceso-la-informacion-sinai) - [Instituto Universitario de Investigación Informática (IUII)](https://web.ua.es/es/iuii/) - [Leonardo HPC System](https://leonardo-supercomputer.cineca.eu/) - [European supercomputing ecosystem (EUROHPC)](https://www.eurohpc-ju.europa.eu/) We also acknowledge the financial, technical, and scientific support of the **Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA**, whose contribution has been essential to the completion of this research. ### License [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) ### Disclaimer This model is intended for general purposes and is available under a permissive Apache License 2.0. Be aware that the model may have biases and/or undesirable outputs. Users deploying systems based on this model are responsible for mitigating risks and complying with applicable AI regulations. ### Reference ```bibtex @misc{gplsi-aitana-2B-SI-Instruct-Aligned, author = {Galiano, Santiago and Sepúlveda-Torres, Robiert and Martínez-Murillo, Iván and Estevanell-Valladares, Ernesto L. and Grande, Eduardo and Consuegra-Ayala, Juan Pablo and Miró Maestre, María and Canal-Esteve, Miquel and Bonora, Mar and Gutierrez, Yoan and Abreu Salas, José Ignacio and Lloret, Elena and Montoyo, Andrés and Muñoz-Guillena, Rafael and Palomar, Manuel}, title = {Aitana-2B-SI-Instruct-Aligned: DPO-aligned instruction-tuned model for Valencian, Spanish and English}, year = {2026}, institution = {Language and Information Systems Group (GPLSI) and Centro de Inteligencia Digital (CENID), University of Alicante (UA)}, howpublished = {\url{https://huggingface.co/gplsi/Aitana-2B-SI-Instruct-Aligned}}, note = {Accessed: 2026-05-11} } ``` --- **Copyright © 2026 Language and Information Systems Group (GPLSI) and Centro de Inteligencia Digital (CENID), University of Alicante (UA). Distributed under the Apache License 2.0.**