初始化项目,由ModelHub XC社区提供模型

Model: VillanovaAI/Villanova-2B-2603
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-13 06:12:07 +08:00
commit fc50d46de6
10 changed files with 1426 additions and 0 deletions

36
.gitattributes vendored Normal file
View File

@@ -0,0 +1,36 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

188
README.md Normal file
View File

@@ -0,0 +1,188 @@
---
language:
- en
- it
- es
- fr
- de
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- llama
- multilingual
- instruct
- chat
- villanova
base_model:
- VillanovaAI/Villanova-2B-Base-2603
datasets:
- VillanovaAI/villanova-sft-2603
---
# Model Card for Villanova-2B-2603
<img src="https://huggingface.co/spaces/VillanovaAI/README/resolve/main/Logo_VILLANOVA_colore.svg" alt="Villanova.AI logo" height="96"/>
**Villanova-2B-2603** is a fully open, multilingual instruction-tuned Large Language Model developed by [Villanova.AI](https://huggingface.co/VillanovaAI). Part of the Villanova project, it is designed to advance open European language technology with native support for five European languages. All model weights, training data sources, and training details are publicly released.
Built on top of [Villanova-2B-Base-2603](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2603) — a **2.4B-parameter model pretrained from scratch** — this instruction-tuned model offers strong multilingual instruction following and safety alignment under a fully open Apache 2.0 license.
---
## Model Family
**[Villanova-2B-Base-2603](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2603)** — Base model (4.4T)<br>
&emsp;**[Villanova-2B-2603](https://huggingface.co/VillanovaAI/Villanova-2B-2603)** — SFT / Instruct — 📍 *This model*<br>
&emsp;&emsp;↳ [Villanova-2B-2603-GGUF](https://huggingface.co/VillanovaAI/Villanova-2B-2603-GGUF) — Quantized<br>
&emsp;**[Villanova-2B-VL-2603](https://huggingface.co/VillanovaAI/Villanova-2B-VL-2603)** — Vision-Language Instruct<br>
&emsp;&emsp;↳ [Villanova-2B-VL-2603-GGUF](https://huggingface.co/VillanovaAI/Villanova-2B-VL-2603-GGUF) — Quantized<br>
<br>
**[Villanova-2B-Base-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2512-Preview)** — Base model (2.2T) (previous version, not recommended)<br>
&emsp;**[Villanova-2B-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-2512-Preview)** — SFT / Instruct (previous version, not recommended)<br>
---
## Highlights
- **European-focused, fully open model** released under Apache 2.0
- **Native multilingual support** for 5 European languages: English, French, German, Italian, and Spanish
- **Strong instruction following**, competitive with larger commercial models
- **Robust multilingual safety alignment** across all supported languages
- **+58% overall improvement** over our previous release ([Villanova-2B-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-2512-Preview))
- Only **2B parameters**, efficient enough for edge and on-device deployment
## Model Summary
| | |
|---|---|
| **Architecture** | Decoder-only Transformer (LLaMA-based) |
| **Parameters** | 2.4B |
| **Base Model** | [VillanovaAI/Villanova-2B-Base-2603](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2603) (pretrained from scratch) |
| **Pre-training Data** | 4.4T tokens (multilingual, two-stage) |
| **Fine-tuning Data** | [VillanovaAI/villanova-sft-2603](https://huggingface.co/datasets/VillanovaAI/villanova-sft-2603) |
| **Languages** | English, French, German, Italian, Spanish |
| **Context Length** | 32,768 tokens |
| **Precision** | bfloat16 |
| **License** | Apache 2.0 |
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "VillanovaAI/Villanova-2B-2603"
device = "cuda" # or "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
messages = [
{"role": "user", "content": "Explain quantum entanglement in simple terms."}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([input_text], return_tensors="pt").to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
```
## Evaluation
Villanova-2B-2603 was extensively evaluated across **25 benchmarks** covering Reasoning, Question Answering, Safety, and Instruction Following in both English and multilingual settings. All evaluations were performed using identical settings and prompts for fair comparison.
Tables are sorted by the main metric (descending). Models are grouped into *Fully Open* and *Open Weight* categories.
### Overall Performance
Villanova-2B-2603 is the **#1 fully open model** in overall average across all benchmarks.
| Model | Size | Reasoning | QA | Safety | Instr. Follow | **Overall** |
|:---|:---:|:---:|:---:|:---:|:---:|:---:|
| **Fully Open** | | | | | | |
| **Villanova-2B-2603** | 2.4B | 31.0 | 33.1 | **39.5** | **45.1** | **36.9** |
| OLMo-2-0425-1B-Instruct | 1.2B | **38.7** | 35.6 | 19.4 | 39.3 | 33.9 |
| Minerva-7B-instruct-v1.0 | 7.4B | 27.1 | **36.2** | 30.1 | 16.9 | 28.5 |
| EuroLLM-1.7B-Instruct | 1.7B | 26.0 | 24.7 | 3.8 | 19.5 | 19.5 |
| salamandra-2b-instruct | 2.3B | 23.6 | 26.6 | 9.6 | 15.7 | 20.0 |
| **Open Weight** ||||||
| Llama-3.2-3B-Instruct | 3.2B | **51.2** | **48.1** | **56.8** | **48.1** | **50.4** |
| Qwen2.5-3B-Instruct | 3.1B | 39.4 | 35.8 | 54.7 | 46.8 | 42.9 |
| Llama-3.2-1B-Instruct | 1.2B | 37.5 | 38.1 | 56.6 | 35.5 | 41.1 |
| gemma-3-1b-it | 1.0B | 28.5 | 27.0 | 53.6 | 39.9 | 35.7 |
| Qwen3-1.7B | 1.7B | 37.4 | 37.5 | 2.6 | 19.5 | 26.2 |
### Instruction Following
Villanova-2B-2603 is the **#1 fully open model** for instruction following, and is competitive with larger open weight models. The MARCO benchmark evaluates structured instruction following across all five languages.
| Model | Size | IFEval | MARCO-EN | MARCO-DE | MARCO-ES | MARCO-FR | MARCO-IT | **Avg** |
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| **Fully Open** | | | | | | | | |
| **Villanova-2B-2603** | 2.4B | 62.0 | 39.4 | **40.5** | **44.2** | **42.5** | **42.1** | **45.1** |
| OLMo-2-0425-1B-Instruct | 1.2B | **77.9** | **52.9** | 23.1 | 29.0 | 27.9 | 24.9 | 39.3 |
| EuroLLM-1.7B-Instruct | 1.7B | 34.5 | 18.3 | 15.9 | 15.9 | 17.4 | 15.2 | 19.5 |
| Minerva-7B-instruct-v1.0 | 7.4B | 29.6 | 17.0 | 12.2 | 13.9 | 13.9 | 15.0 | 16.9 |
| salamandra-2b-instruct | 2.3B | 26.4 | 17.7 | 12.2 | 12.0 | 12.9 | 12.9 | 15.7 |
| **Open Weight** ||||||||
| Llama-3.2-3B-Instruct | 3.2B | **82.2** | **54.0** | **39.9** | 38.8 | 37.5 | 35.9 | **48.1** |
| Qwen2.5-3B-Instruct | 3.1B | 71.5 | 47.3 | 37.5 | **42.5** | **41.0** | **40.7** | 46.8 |
| gemma-3-1b-it | 1.0B | 74.5 | 42.7 | 27.5 | 33.3 | 27.9 | 33.3 | 39.9 |
| Llama-3.2-1B-Instruct | 1.2B | 64.8 | 43.2 | 25.3 | 29.0 | 24.2 | 26.6 | 35.5 |
| Qwen3-1.7B | 1.7B | 48.4 | 27.4 | 8.9 | 10.3 | 13.1 | 9.1 | 19.5 |
> **Key insight:** While some models score higher on English-only IFEval, Villanova-2B-2603 delivers the most balanced multilingual instruction following, with MARCO scores of 40-44 across DE, ES, FR, IT. This is far ahead of OLMo (19-25) and Gemma (27-33) on non-English languages.
### Safety (M-ALERT)
Villanova-2B-2603 is the **#1 fully open model** for safety. Safety was evaluated using the M-ALERT benchmark across all five languages.
| Model | Size | EN | DE | ES | FR | IT | **Avg** |
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| **Fully Open** | | | | | | | |
| **Villanova-2B-2603** | 2.4B | 31.0 | 4.1 | **56.0** | **62.2** | 44.2 | **39.5** |
| Minerva-7B-instruct-v1.0 | 7.4B | 31.6 | 4.3 | 26.9 | 24.8 | **62.9** | 30.1 |
| OLMo-2-0425-1B-Instruct | 1.2B | **58.0** | **5.7** | 13.4 | 10.7 | 9.1 | 19.4 |
| salamandra-2b-instruct | 2.3B | 4.9 | 3.0 | 15.6 | 15.4 | 9.2 | 9.6 |
| EuroLLM-1.7B-Instruct | 1.7B | 5.4 | 0.8 | 2.6 | 8.4 | 1.7 | 3.8 |
| **Open Weight** |||||||
| Llama-3.2-3B-Instruct | 3.2B | 54.5 | 26.4 | 70.3 | 63.3 | **69.4** | **56.8** |
| Llama-3.2-1B-Instruct | 1.2B | 47.1 | **32.9** | 67.4 | **68.6** | 67.2 | 56.6 |
| Qwen2.5-3B-Instruct | 3.1B | **60.2** | 23.2 | **71.7** | 64.0 | 54.4 | 54.7 |
| gemma-3-1b-it | 1.0B | 58.6 | 28.7 | 58.8 | 68.4 | 53.3 | 53.6 |
| Qwen3-1.7B | 1.7B | 10.2 | 0.0 | 0.5 | 0.8 | 1.3 | 2.6 |
### Reasoning & Question Answering
| Model | Size | BBH | LB-BBH | GSM8K | DROP | TruthfulQA | **Avg Reasoning** | **Avg QA** |
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| **Fully Open** | | | | | | | | |
| Minerva-7B-instruct-v1.0 | 7.4B | 29.0 | 30.0 | 10.6 | 29.2 | 29.6 | 27.1 | **36.2** |
| OLMo-2-0425-1B-Instruct | 1.2B | 27.6 | **33.8** | **67.4** | 30.2 | **33.8** | **38.7** | 35.6 |
| **Villanova-2B-2603** | 2.4B | **29.3** | 33.2 | 23.4 | **34.8** | 28.5 | 31.0 | 33.1 |
| salamandra-2b-instruct | 2.3B | 22.5 | 29.2 | 2.3 | 20.6 | 27.8 | 23.6 | 26.6 |
| EuroLLM-1.7B-Instruct | 1.7B | 28.5 | 29.8 | 12.7 | 22.2 | 29.2 | 26.0 | 24.7 |
| **Open Weight** |||||||||
| Llama-3.2-3B-Instruct | 3.2B | **59.3** | 44.6 | **77.2** | **48.3** | 36.1 | **51.2** | **48.1** |
| Qwen2.5-3B-Instruct | 3.1B | 12.2 | **46.9** | 76.0 | 12.5 | **41.4** | 39.4 | 35.8 |
| Qwen3-1.7B | 1.7B | 9.8 | 43.5 | 74.2 | 34.4 | 29.6 | 37.4 | 37.5 |
| Llama-3.2-1B-Instruct | 1.2B | 39.3 | 35.7 | 45.6 | 31.8 | 28.9 | 37.5 | 38.1 |
| gemma-3-1b-it | 1.0B | 25.0 | 35.1 | 34.0 | 21.1 | 26.6 | 28.5 | 27.0 |
## Improvement over Previous Release
Villanova-2B-2603 represents a **major leap** over our previous model ([Villanova-2B-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-2512-Preview)):
| Category | 2512-Preview | **2603** | **Improvement** |
|:---|:---:|:---:|:---:|
| Overall | 23.3 | **36.9** | **+58%** |
| Instruction Following | 28.9 | **45.1** | **+56%** |
| Safety | 2.4 | **39.5** | **+1546%** |
| Reasoning | 27.5 | **31.0** | **+13%** |
| QA | 29.0 | **33.1** | **+14%** |
## License
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

12
chat_template.jinja Normal file
View File

@@ -0,0 +1,12 @@
{% set has_system = messages|selectattr('role', 'equalto', 'system')|list|length > 0 %}{% if not has_system %}{{ '<|im_start|>system
You are Villanova, a helpful AI assistant built by Villanova.AI.<|im_end|>
' }}{% endif %}{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|im_start|>system
' + message['content'] + '<|im_end|>
' }}{% elif message['role'] == 'user' %}{{ '<|im_start|>user
' + message['content'] + '<|im_end|>
' }}{% elif message['role'] == 'assistant' %}{{ '<|im_start|>assistant
' + message['content'] }}{% if not loop.last %}{{ '<|im_end|>
' }}{% else %}{{ eos_token }}{% endif %}{% elif message['role'] == 'tool' %}{{ '<|im_start|>tool
' + message['content'] + '<|im_end|>
' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}

29
config.json Normal file
View File

@@ -0,0 +1,29 @@
{
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"dtype": "bfloat16",
"eos_token_id": 2,
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 2560,
"initializer_range": 0.014,
"intermediate_size": 10240,
"max_position_embeddings": 32768,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 20,
"num_hidden_layers": 18,
"num_key_value_heads": 4,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000,
"tie_word_embeddings": true,
"transformers_version": "4.57.1",
"use_cache": true,
"vocab_size": 256000
}

6
generation_config.json Normal file
View File

@@ -0,0 +1,6 @@
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"transformers_version": "4.57.1"
}

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:51d826843cdf9d01363a7d76e789680738037ec11775f3c3db1d81ff8ca70d88
size 4708314704

44
special_tokens_map.json Normal file
View File

@@ -0,0 +1,44 @@
{
"bos_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"cls_token": {
"content": "<s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"eos_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<pad>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"sep_token": {
"content": "</s>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"unk_token": {
"content": "<unk>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2e90b85b3e3b3ebfc6b9bafeb954b37f2435eed595738337e53f2a746d23d5a2
size 37007416

3
tokenizer.model Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ab94ddf46d14f0279254858d53770c5319c5129d47291ee2bada530271cb1292
size 4813276

1102
tokenizer_config.json Normal file

File diff suppressed because it is too large Load Diff