Files

292 lines
19 KiB
Markdown
Raw Permalink Normal View History

---
license: apache-2.0
language:
- ca
- es
- en
base_model: gplsi/Aitana-2B-SI-Instruct
tags:
- valencian
- spanish
- english
- text-generation
- instruct
- dpo
- alignment
- alia
- gplsi
datasets:
- nvidia/HelpSteer3
- OpenAssistant/oasst1
- OpenAssistant/oasst2
- Open-Orca/OpenOrca
library_name: transformers
pipeline_tag: text-generation
---
# Aitana-2B-SI-Instruct-Aligned
**Aitana-2B-SI-Instruct-Aligned** is a DPO-aligned instruction-tuned generative language model from the **Aitana family**, developed by the [GPLSI (Language and Information Systems Group)](https://gplsi.dlsi.ua.es/) at the University of Alicante. Built on [gplsi/Aitana-2B-SI-Instruct](https://huggingface.co/gplsi/Aitana-2B-SI-Instruct), this model has been further aligned using Direct Preference Optimization (DPO) to improve response quality and alignment with human preferences across Valencian, Spanish, and English.
## Table of Contents
- [Model Description](#model-description)
- [Alignment Details](#alignment-details)
- [Training Data](#training-data)
- [Intended Uses](#intended-uses)
- [How to Use](#how-to-use)
- [Evaluation](#evaluation)
- [Additional Information](#additional-information)
## Model Description
| Property | Value |
|----------|-------|
| **Base Model** | [gplsi/Aitana-2B-SI-Instruct](https://huggingface.co/gplsi/Aitana-2B-S-Instruct) |
| **Architecture** | Transformer decoder-only |
| **Parameters** | ~2.25B |
| **Languages** | Valencian, Spanish, English |
| **License** | Apache 2.0 |
Aitana-2B-SI-Instruct-Aligned extends the Aitana-2B-SI-Instruct instruction-tuned model with Direct Preference Optimization (DPO) alignment. This additional training stage improves the model's ability to generate helpful, high-quality responses that better align with human preferences while maintaining its strong multilingual capabilities.
## Alignment Details
The model was aligned using Direct Preference Optimization (DPO) with the following configuration:
| Hyperparameter | Value |
|----------------|-------|
| **Method** | DPO (Direct Preference Optimization) |
| **Learning rate** | 5e-6 |
| **Epochs** | 1 |
| **Beta** | 0.1 |
| **LR Scheduler** | Linear |
| **Total Samples** | 146,180 |
| **English Samples** | 80,308 |
| **Spanish Samples** | 30,072 |
| **Valencian Samples** | 35,800 |
| **Languages** | Spanish, Valencian, English |
The DPO alignment was performed using curated preference pairs that teach the model to prefer more helpful, accurate, and well-structured responses.
## Training Data
The base instruction model was trained on the ALIA Instruction/v12 dataset. This DPO-aligned variant was further aligned using the Alignment/v8 dataset, composed of the following preference data:
| Dataset ID | Name | Languages | Source |
|------------|------|-----------|--------|
| al1 | HelpSteer3 | EN, ES | [nvidia/HelpSteer3](https://huggingface.co/datasets/nvidia/HelpSteer3) |
| al2 | OpenAssistant1 (OASST1) | EN, ES, RU (+32 more) | [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) |
| al3 | OpenAssistant2 (OASST2) | EN, ES, RU (+32 more) | [OpenAssistant/oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2) |
| al4 | OpenOrca | EN | [Open-Orca/OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) |
| al5 | OASST2 Valenciano | VA | — |
The alignment data focused on English, Spanish, and Valencian preference pairs, with the distribution: 80,308 English, 30,072 Spanish, and 35,800 Valencian samples.
## Intended Uses
This model can be used for:
- **Instruction following** in Valencian, Spanish, and English with improved alignment to human preferences
- **Chat and conversational applications** requiring high-quality multilingual responses
- **Text generation** with task-specific prompting and improved output quality
- **Domain-specific applications** in administrative, legal, or tourism contexts
> **Note**: As an aligned instruction-tuned model, it is designed to follow user prompts and generate helpful, safe responses. It is not intended for use as a factual knowledge base. The DPO alignment improves response quality and preference alignment.
## How to Use
### Transformers
```python
import torch
from transformers import pipeline, AutoTokenizer
model_id = "gplsi/Aitana-2B-SI-Instruct-Aligned"
tokenizer = AutoTokenizer.from_pretrained(model_id)
generator = pipeline(
"text-generation",
model=model_id,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Valencian example
text = "Explica què són les Corts Valencianes i quina funció tenen."
result = generator(text, do_sample=True, top_k=10, max_new_tokens=100)
print(result[0]['generated_text'])
# Spanish example
text = "Describe las principales funciones del gobierno autonómico valenciano."
result = generator(text, do_sample=True, top_k=10, max_new_tokens=100)
print(result[0]['generated_text'])
# English example
text = "Explain the role of tourism in the Valencian Community economy."
result = generator(text, do_sample=True, top_k=10, max_new_tokens=100)
print(result[0]['generated_text'])
```
## Evaluation
In the following tables, we present the results obtained with different benchmarks from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) in comparison with [Salamandra-2B-Instruct](https://huggingface.co/BSC-LT/Salamandra-2B-Instruct), and [Aitana-2B-S-Instruct-Aligned](https://huggingface.co/gplsi/Aitana-2B-S-Instruct-Aligned). The results reflect the DPO-aligned instruction-tuned performance.
### Valencian
#### Classification Benchmarks
| Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) |
|------------------------------|--------|----------------------------|-------------|---------------|-----------------------|-----------------------|
| XNLI | va |Natural Language Inference | acc | **0.520** | 0.514 | 0.485 |
#### Generation Benchmarks
| Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) |
|------------------------------|--------|----------------------------|-------------|---------------|-----------------------|-----------------------|
| Cocoteros | va |Reading Comprehension | bleu | 2.796 | 3.612 | **4.223** |
| Phrases ca-va | va-ca |Translation - Adaptation | bleu | 58.425 | **74.538** | 68.305 |
| Phrases va-ca | va-ca |Translation - Adaptation | bleu | 70.660 | **71.691** | 69.551 |
| Phrases va-es | va-es |Translation | bleu | 65.427 | **72.097** | 70.061 |
| Phrases es-va | es-va |Translation | bleu | 45.688 | **56.012** | 54.053 |
| Truthfulqa_va | va | Truthfulness | bleu_acc | **0.409** | 0.394 | 0.383 |
### Catalan
#### Classification Benchmarks
| Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) |
|------------------------------|--------|---------------------------|-------------|---------------|-----------------------|-----------------------|
| Belebele Cat_latn | ca | Reading Comprehension | acc | 0.287 | 0.248 | **0.319** |
| COPA | ca | Commonsense Reasoning | acc | 0.708 | **0.726** | 0.694 |
| XStoryCloze | ca | Commonsense Reasoning | acc | 0.616 | **0.629** | 0.623 |
| OpenBookQA | ca | Question Answering | acc | 0.296 | 0.296 | **0.326** |
| PAWS | ca | Paraphrasing | acc | **0.602** | 0.598 | 0.531 |
| PiQA | ca | Question Answering | acc | 0.638 | **0.655** | 0.629 |
| ARC Easy | ca | Question Answering | acc | 0.516 | 0.524 | **0.526** |
| ARC Challenge | ca | Question Answering | acc | 0.298 | **0.314** | 0.310 |
| XNLI | ca | Natural Language Inference| acc | 0.513 | **0.515** | 0.497 |
| Teca | ca | Natural Language Inference| acc | 0.486 | **0.500** | 0.468 |
| WNLI | ca | Natural Language Inference| acc | **0.563** | 0.437 | 0.436 |
| Catcola | ca | Linguistic Acceptability | acc | 0.492 | **0.713** | 0.680 |
| Catcola | ca | Linguistic Acceptability | mcc | **0.097** | -0.040 | 0.013 |
| Catalanqa | ca | Question Answering | F1 | **0.516** | 0.384 | 0.396 |
| Mgsm direct | ca | Math | exact match | 0.000 | **0.012** | 0.004 |
| Catalanqa | ca | Question Answering | exact match | **0.182** | 0.011 | 0.031 |
| Xquad | ca | Question Answering | exact match | **0.103** | 0.014 | 0.037 |
| Xquad | ca | Question Answering | F1 | **0.394** | 0.287 | 0.317 |
#### Generation Benchmarks
| Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) |
|------------------------------|--------|--------------------------|--------|----------------|-----------------------|-----------------------|
| Cabreu abstractive | ca | Summarization | bleu | 7.610 | 7.703 | **8.837** |
| Cabreu extractive | ca | Summarization | bleu | **38.002** | 19.876 | 28.16803 |
| Cabreu extreme | ca | Summarization | bleu | 2.733 | 3.245 | **3.386** |
### Spanish
#### Classification Benchmarks
| Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) |
|------------------------------|--------|---------------------------|-------------|---------------|-----------------------|-----------------------|
| Belebele | es | Reading Comprehension | acc | 0.268 | 0.244 | **0.285** |
| PAWS | es | Paraphrasing | acc | 0.566 | **0.618** | 0.546 |
| XNLI | es | Natural Language Inference| acc | **0.463** | 0.439 | 0.443 |
| WNLI | es | Natural Language Inference| acc | 0.479 | **0.535** | **0.535** |
| XStoryCloze | es | Commonsense Reasoning | acc | 0.617 | 0.628 | **0.632** |
| Escola | es | Linguistic Acceptability | acc | 0.293 | **0.708** | 0.654 |
| Escola | es | Linguistic Acceptability | mcc | 0.020 | 0.000 | **0.046** |
| OpenbookQA | es | Question Answering | acc | 0.286 | **0.338** | 0.332 |
| MGSM Direct | es | Math | exact match | 0.020 | 0.024 | **0.1** |
| XQUAD | es | Question Answering | exact match | **0.066** | 0.026 | 0.019 |
| XQUAD | es | Question Answering | F1 | **0.355** | 0.293 | 0.293 |
#### Generation Benchmarks
| Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) |
|------------------------------|--------|---------------------|---------|----------------|-----------------------|-----------------------|
| Cocoteros | es |Reading Comprehension| bleu | 3.308 | 3.141 | **3.670** |
| XLSum | es | Summarization | bleu | 1.695 | 1.737 | **1.971** |
### English
#### Classification Benchmarks
| Dataset | Lang. | Task | Metric | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) |
|------------------------------|--------|----------------------------|-------------|---------------|-----------------------|-----------------------|
| Arc Challenge | en | Question Answering | acc | 0.354 | 0.363 | **0.372** |
| Arc Easy | en | Question Answering | acc | 0.681 | **0.709** | 0.682 |
| Belebele | en | Reading Comprehension | acc | 0.260 | 0.293 | **0.349** |
| PAWS | en | Paraphrasing | acc | **0.597** | 0.594 | 0.555 |
| XNLI | en | Natural Language Inference | acc | 0.512 | **0.553** | 0.480 |
| XStoryCloze | en | Commonsense Reasoning | acc | 0.662 | 0.680 | **0.693** |
| OpenBookQA | en | Question Answering | acc | 0.298 | **0.338** | 0.316 |
| PiQA | en | Question Answering | acc | 0.715 | **0.717** | 0.704 |
| Social iqa | en | Question Answering | acc | 0.453 | 0.451 | **0.468** |
| WNLI | en | Natural Language Inference | acc | **0.535** | 0.465 | 0.451 |
| MGSM Direct | en | Math | exact match | 0.008 | 0.052 | **0.116** |
| TriviaQA | en | Question Answering | exact match | 0.076 | 0.147 | **0.156** |
### Judge Evaluation
The model was also evaluated using an LLM-as-judge approach across different task categories. The scores below represent the average rating (1-5 scale, 5 being best) and standard deviation for each task category, comparing Aitana-2B-SI-Instruct-Aligned against Salamandra-2B-Instruct and Aitana-2B-S-Instruct-Aligned.
| Task Category | Salamandra-2B-Instruct | Aitana-2B-S-Instruct-Aligned (v0.1) | Aitana-2B-SI-Instruct-Aligned (v0.1) |
|---------------|------------------------|-----------------------------------|-----------------------|
| CommonSense reasoning | 2.277 / 1.151 | 2.737 / 1.140 | **2.969 / 1.086** |
| Maths | 1.060 / 0.124 | 1.123 / 0.249 | **1.191 / 0.349** |
| Paraphrasing | **3.518 / 1.308** | 3.460 / 1.088 | 3.472 / 0.959 |
| Reading comprehension | **2.966 / 1.111** | 2.894 / 1.311 | **3.112 / 1.146** |
| Summarization | 2.217 / 1.068 | 2.261 / 0.820 | **2.591 / 1.115** |
| Translation | **3.557 / 0.760** | 3.418 / 0.999 | 3.390 / 0.730 |
| **Overall Avg** | 2.599 / 0.920 | 2.649 / 0.935 | **2.787 / 0.897** |
The DPO-aligned model shows a notable improvement in overall average score (2.787) compared to Aitana-2B-S-Instruct-Aligned (v0.1) (2.649) and Salamandra-2B-Instruct (2.599) with particular gains in CommonSense reasoning, reading comprehension and summarization. The aligned model also shows tighter standard deviations in several categories, indicating more consistent quality responses.
## Additional Information
### Author
The model has been developed by the **Language and [Information Systems Group (GPLSI)](https://gplsi.dlsi.ua.es/)** and the **[Centro de Inteligencia Digital (CENID)](https://cenid.es)**, both part of the **[University of Alicante (UA)](https://www.ua.es/es/)**, as part of their ongoing research in **Natural Language Processing (NLP)**.
### Funding
This work is funded by the **Ministerio para la Transformación Digital y de la Función Pública**, co-financed by the **EU NextGenerationEU**, within the framework of the project *Desarrollo de Modelos ALIA*. This work has also been partially supported by Project HEART-NLP (PID2024-156263OB-C22).
### Acknowledgments
We would like to express our gratitude to all individuals and institutions that have contributed to the development of this work.
Special thanks to:
- [Language Technologies Laboratory at Barcelona Supercomputing Center](https://www.bsc.es/es/discover-bsc/organisation/research-structure/language-technologies-laboratory)
- [Centro Vasco de Tecnología de la Lengua (HiTZ)](https://www.hitz.eus/es)
- [Centro Singular de Investigación en Tecnologías Inteligentes (CiTIUS)](https://citius.gal/)
- [Sistemas Inteligentes de Acceso a la Información (SINAI)](https://www.ujaen.es/investigacion-y-transferencia/grupos-de-investigacion/sistemas-inteligentes-de-acceso-la-informacion-sinai)
- [Instituto Universitario de Investigación Informática (IUII)](https://web.ua.es/es/iuii/)
- [Leonardo HPC System](https://leonardo-supercomputer.cineca.eu/)
- [European supercomputing ecosystem (EUROHPC)](https://www.eurohpc-ju.europa.eu/)
We also acknowledge the financial, technical, and scientific support of the **Ministerio para la Transformación Digital y de la Función Pública - Funded by EU NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA**, whose contribution has been essential to the completion of this research.
### License
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
### Disclaimer
This model is intended for general purposes and is available under a permissive Apache License 2.0. Be aware that the model may have biases and/or undesirable outputs. Users deploying systems based on this model are responsible for mitigating risks and complying with applicable AI regulations.
### Reference
```bibtex
@misc{gplsi-aitana-2B-SI-Instruct-Aligned,
author = {Galiano, Santiago and Sepúlveda-Torres, Robiert and Martínez-Murillo, Iván and Estevanell-Valladares, Ernesto L. and Grande, Eduardo and Consuegra-Ayala, Juan Pablo and Miró Maestre, María and Canal-Esteve, Miquel and Bonora, Mar and Gutierrez, Yoan and Abreu Salas, José Ignacio and Lloret, Elena and Montoyo, Andrés and Muñoz-Guillena, Rafael and Palomar, Manuel},
title = {Aitana-2B-SI-Instruct-Aligned: DPO-aligned instruction-tuned model for Valencian, Spanish and English},
year = {2026},
institution = {Language and Information Systems Group (GPLSI) and Centro de Inteligencia Digital (CENID), University of Alicante (UA)},
howpublished = {\url{https://huggingface.co/gplsi/Aitana-2B-SI-Instruct-Aligned}},
note = {Accessed: 2026-05-11}
}
```
---
**Copyright © 2026 Language and Information Systems Group (GPLSI) and Centro de Inteligencia Digital (CENID), University of Alicante (UA). Distributed under the Apache License 2.0.**