初始化项目,由ModelHub XC社区提供模型
Model: VillanovaAI/Villanova-2B-2603 Source: Original Platform
This commit is contained in:
188
README.md
Normal file
188
README.md
Normal file
@@ -0,0 +1,188 @@
|
||||
---
|
||||
language:
|
||||
- en
|
||||
- it
|
||||
- es
|
||||
- fr
|
||||
- de
|
||||
license: apache-2.0
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- llama
|
||||
- multilingual
|
||||
- instruct
|
||||
- chat
|
||||
- villanova
|
||||
base_model:
|
||||
- VillanovaAI/Villanova-2B-Base-2603
|
||||
datasets:
|
||||
- VillanovaAI/villanova-sft-2603
|
||||
---
|
||||
|
||||
# Model Card for Villanova-2B-2603
|
||||
|
||||
<img src="https://huggingface.co/spaces/VillanovaAI/README/resolve/main/Logo_VILLANOVA_colore.svg" alt="Villanova.AI logo" height="96"/>
|
||||
|
||||
**Villanova-2B-2603** is a fully open, multilingual instruction-tuned Large Language Model developed by [Villanova.AI](https://huggingface.co/VillanovaAI). Part of the Villanova project, it is designed to advance open European language technology with native support for five European languages. All model weights, training data sources, and training details are publicly released.
|
||||
|
||||
Built on top of [Villanova-2B-Base-2603](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2603) — a **2.4B-parameter model pretrained from scratch** — this instruction-tuned model offers strong multilingual instruction following and safety alignment under a fully open Apache 2.0 license.
|
||||
|
||||
---
|
||||
|
||||
## Model Family
|
||||
|
||||
**[Villanova-2B-Base-2603](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2603)** — Base model (4.4T)<br>
|
||||
 ↳ **[Villanova-2B-2603](https://huggingface.co/VillanovaAI/Villanova-2B-2603)** — SFT / Instruct — 📍 *This model*<br>
|
||||
  ↳ [Villanova-2B-2603-GGUF](https://huggingface.co/VillanovaAI/Villanova-2B-2603-GGUF) — Quantized<br>
|
||||
 ↳ **[Villanova-2B-VL-2603](https://huggingface.co/VillanovaAI/Villanova-2B-VL-2603)** — Vision-Language Instruct<br>
|
||||
  ↳ [Villanova-2B-VL-2603-GGUF](https://huggingface.co/VillanovaAI/Villanova-2B-VL-2603-GGUF) — Quantized<br>
|
||||
<br>
|
||||
**[Villanova-2B-Base-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2512-Preview)** — Base model (2.2T) (previous version, not recommended)<br>
|
||||
 ↳ **[Villanova-2B-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-2512-Preview)** — SFT / Instruct (previous version, not recommended)<br>
|
||||
|
||||
---
|
||||
|
||||
## Highlights
|
||||
|
||||
- **European-focused, fully open model** released under Apache 2.0
|
||||
- **Native multilingual support** for 5 European languages: English, French, German, Italian, and Spanish
|
||||
- **Strong instruction following**, competitive with larger commercial models
|
||||
- **Robust multilingual safety alignment** across all supported languages
|
||||
- **+58% overall improvement** over our previous release ([Villanova-2B-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-2512-Preview))
|
||||
- Only **2B parameters**, efficient enough for edge and on-device deployment
|
||||
|
||||
## Model Summary
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **Architecture** | Decoder-only Transformer (LLaMA-based) |
|
||||
| **Parameters** | 2.4B |
|
||||
| **Base Model** | [VillanovaAI/Villanova-2B-Base-2603](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2603) (pretrained from scratch) |
|
||||
| **Pre-training Data** | 4.4T tokens (multilingual, two-stage) |
|
||||
| **Fine-tuning Data** | [VillanovaAI/villanova-sft-2603](https://huggingface.co/datasets/VillanovaAI/villanova-sft-2603) |
|
||||
| **Languages** | English, French, German, Italian, Spanish |
|
||||
| **Context Length** | 32,768 tokens |
|
||||
| **Precision** | bfloat16 |
|
||||
| **License** | Apache 2.0 |
|
||||
|
||||
## How to Use
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "VillanovaAI/Villanova-2B-2603"
|
||||
device = "cuda" # or "cpu"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
|
||||
|
||||
messages = [
|
||||
{"role": "user", "content": "Explain quantum entanglement in simple terms."}
|
||||
]
|
||||
|
||||
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
model_inputs = tokenizer([input_text], return_tensors="pt").to(device)
|
||||
|
||||
generated_ids = model.generate(**model_inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
|
||||
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
|
||||
print(tokenizer.decode(output_ids, skip_special_tokens=True))
|
||||
```
|
||||
|
||||
## Evaluation
|
||||
|
||||
Villanova-2B-2603 was extensively evaluated across **25 benchmarks** covering Reasoning, Question Answering, Safety, and Instruction Following in both English and multilingual settings. All evaluations were performed using identical settings and prompts for fair comparison.
|
||||
|
||||
Tables are sorted by the main metric (descending). Models are grouped into *Fully Open* and *Open Weight* categories.
|
||||
|
||||
### Overall Performance
|
||||
|
||||
Villanova-2B-2603 is the **#1 fully open model** in overall average across all benchmarks.
|
||||
|
||||
| Model | Size | Reasoning | QA | Safety | Instr. Follow | **Overall** |
|
||||
|:---|:---:|:---:|:---:|:---:|:---:|:---:|
|
||||
| **Fully Open** | | | | | | |
|
||||
| **Villanova-2B-2603** | 2.4B | 31.0 | 33.1 | **39.5** | **45.1** | **36.9** |
|
||||
| OLMo-2-0425-1B-Instruct | 1.2B | **38.7** | 35.6 | 19.4 | 39.3 | 33.9 |
|
||||
| Minerva-7B-instruct-v1.0 | 7.4B | 27.1 | **36.2** | 30.1 | 16.9 | 28.5 |
|
||||
| EuroLLM-1.7B-Instruct | 1.7B | 26.0 | 24.7 | 3.8 | 19.5 | 19.5 |
|
||||
| salamandra-2b-instruct | 2.3B | 23.6 | 26.6 | 9.6 | 15.7 | 20.0 |
|
||||
| **Open Weight** ||||||
|
||||
| Llama-3.2-3B-Instruct | 3.2B | **51.2** | **48.1** | **56.8** | **48.1** | **50.4** |
|
||||
| Qwen2.5-3B-Instruct | 3.1B | 39.4 | 35.8 | 54.7 | 46.8 | 42.9 |
|
||||
| Llama-3.2-1B-Instruct | 1.2B | 37.5 | 38.1 | 56.6 | 35.5 | 41.1 |
|
||||
| gemma-3-1b-it | 1.0B | 28.5 | 27.0 | 53.6 | 39.9 | 35.7 |
|
||||
| Qwen3-1.7B | 1.7B | 37.4 | 37.5 | 2.6 | 19.5 | 26.2 |
|
||||
|
||||
### Instruction Following
|
||||
|
||||
Villanova-2B-2603 is the **#1 fully open model** for instruction following, and is competitive with larger open weight models. The MARCO benchmark evaluates structured instruction following across all five languages.
|
||||
|
||||
| Model | Size | IFEval | MARCO-EN | MARCO-DE | MARCO-ES | MARCO-FR | MARCO-IT | **Avg** |
|
||||
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
||||
| **Fully Open** | | | | | | | | |
|
||||
| **Villanova-2B-2603** | 2.4B | 62.0 | 39.4 | **40.5** | **44.2** | **42.5** | **42.1** | **45.1** |
|
||||
| OLMo-2-0425-1B-Instruct | 1.2B | **77.9** | **52.9** | 23.1 | 29.0 | 27.9 | 24.9 | 39.3 |
|
||||
| EuroLLM-1.7B-Instruct | 1.7B | 34.5 | 18.3 | 15.9 | 15.9 | 17.4 | 15.2 | 19.5 |
|
||||
| Minerva-7B-instruct-v1.0 | 7.4B | 29.6 | 17.0 | 12.2 | 13.9 | 13.9 | 15.0 | 16.9 |
|
||||
| salamandra-2b-instruct | 2.3B | 26.4 | 17.7 | 12.2 | 12.0 | 12.9 | 12.9 | 15.7 |
|
||||
| **Open Weight** ||||||||
|
||||
| Llama-3.2-3B-Instruct | 3.2B | **82.2** | **54.0** | **39.9** | 38.8 | 37.5 | 35.9 | **48.1** |
|
||||
| Qwen2.5-3B-Instruct | 3.1B | 71.5 | 47.3 | 37.5 | **42.5** | **41.0** | **40.7** | 46.8 |
|
||||
| gemma-3-1b-it | 1.0B | 74.5 | 42.7 | 27.5 | 33.3 | 27.9 | 33.3 | 39.9 |
|
||||
| Llama-3.2-1B-Instruct | 1.2B | 64.8 | 43.2 | 25.3 | 29.0 | 24.2 | 26.6 | 35.5 |
|
||||
| Qwen3-1.7B | 1.7B | 48.4 | 27.4 | 8.9 | 10.3 | 13.1 | 9.1 | 19.5 |
|
||||
|
||||
> **Key insight:** While some models score higher on English-only IFEval, Villanova-2B-2603 delivers the most balanced multilingual instruction following, with MARCO scores of 40-44 across DE, ES, FR, IT. This is far ahead of OLMo (19-25) and Gemma (27-33) on non-English languages.
|
||||
|
||||
### Safety (M-ALERT)
|
||||
|
||||
Villanova-2B-2603 is the **#1 fully open model** for safety. Safety was evaluated using the M-ALERT benchmark across all five languages.
|
||||
|
||||
| Model | Size | EN | DE | ES | FR | IT | **Avg** |
|
||||
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
||||
| **Fully Open** | | | | | | | |
|
||||
| **Villanova-2B-2603** | 2.4B | 31.0 | 4.1 | **56.0** | **62.2** | 44.2 | **39.5** |
|
||||
| Minerva-7B-instruct-v1.0 | 7.4B | 31.6 | 4.3 | 26.9 | 24.8 | **62.9** | 30.1 |
|
||||
| OLMo-2-0425-1B-Instruct | 1.2B | **58.0** | **5.7** | 13.4 | 10.7 | 9.1 | 19.4 |
|
||||
| salamandra-2b-instruct | 2.3B | 4.9 | 3.0 | 15.6 | 15.4 | 9.2 | 9.6 |
|
||||
| EuroLLM-1.7B-Instruct | 1.7B | 5.4 | 0.8 | 2.6 | 8.4 | 1.7 | 3.8 |
|
||||
| **Open Weight** |||||||
|
||||
| Llama-3.2-3B-Instruct | 3.2B | 54.5 | 26.4 | 70.3 | 63.3 | **69.4** | **56.8** |
|
||||
| Llama-3.2-1B-Instruct | 1.2B | 47.1 | **32.9** | 67.4 | **68.6** | 67.2 | 56.6 |
|
||||
| Qwen2.5-3B-Instruct | 3.1B | **60.2** | 23.2 | **71.7** | 64.0 | 54.4 | 54.7 |
|
||||
| gemma-3-1b-it | 1.0B | 58.6 | 28.7 | 58.8 | 68.4 | 53.3 | 53.6 |
|
||||
| Qwen3-1.7B | 1.7B | 10.2 | 0.0 | 0.5 | 0.8 | 1.3 | 2.6 |
|
||||
|
||||
### Reasoning & Question Answering
|
||||
|
||||
| Model | Size | BBH | LB-BBH | GSM8K | DROP | TruthfulQA | **Avg Reasoning** | **Avg QA** |
|
||||
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
||||
| **Fully Open** | | | | | | | | |
|
||||
| Minerva-7B-instruct-v1.0 | 7.4B | 29.0 | 30.0 | 10.6 | 29.2 | 29.6 | 27.1 | **36.2** |
|
||||
| OLMo-2-0425-1B-Instruct | 1.2B | 27.6 | **33.8** | **67.4** | 30.2 | **33.8** | **38.7** | 35.6 |
|
||||
| **Villanova-2B-2603** | 2.4B | **29.3** | 33.2 | 23.4 | **34.8** | 28.5 | 31.0 | 33.1 |
|
||||
| salamandra-2b-instruct | 2.3B | 22.5 | 29.2 | 2.3 | 20.6 | 27.8 | 23.6 | 26.6 |
|
||||
| EuroLLM-1.7B-Instruct | 1.7B | 28.5 | 29.8 | 12.7 | 22.2 | 29.2 | 26.0 | 24.7 |
|
||||
| **Open Weight** |||||||||
|
||||
| Llama-3.2-3B-Instruct | 3.2B | **59.3** | 44.6 | **77.2** | **48.3** | 36.1 | **51.2** | **48.1** |
|
||||
| Qwen2.5-3B-Instruct | 3.1B | 12.2 | **46.9** | 76.0 | 12.5 | **41.4** | 39.4 | 35.8 |
|
||||
| Qwen3-1.7B | 1.7B | 9.8 | 43.5 | 74.2 | 34.4 | 29.6 | 37.4 | 37.5 |
|
||||
| Llama-3.2-1B-Instruct | 1.2B | 39.3 | 35.7 | 45.6 | 31.8 | 28.9 | 37.5 | 38.1 |
|
||||
| gemma-3-1b-it | 1.0B | 25.0 | 35.1 | 34.0 | 21.1 | 26.6 | 28.5 | 27.0 |
|
||||
|
||||
## Improvement over Previous Release
|
||||
|
||||
Villanova-2B-2603 represents a **major leap** over our previous model ([Villanova-2B-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-2512-Preview)):
|
||||
|
||||
| Category | 2512-Preview | **2603** | **Improvement** |
|
||||
|:---|:---:|:---:|:---:|
|
||||
| Overall | 23.3 | **36.9** | **+58%** |
|
||||
| Instruction Following | 28.9 | **45.1** | **+56%** |
|
||||
| Safety | 2.4 | **39.5** | **+1546%** |
|
||||
| Reasoning | 27.5 | **31.0** | **+13%** |
|
||||
| QA | 29.0 | **33.1** | **+14%** |
|
||||
|
||||
## License
|
||||
|
||||
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|
||||
Reference in New Issue
Block a user