初始化项目，由ModelHub XC社区提供模型

Model: VillanovaAI/Villanova-2B-2603 Source: Original Platform
2026-04-13 06:12:07 +08:00
commit fc50d46de6
10 changed files with 1426 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,188 @@
 ---
 language:
 - en
 - it
 - es
 - fr
 - de
 license: apache-2.0
 library_name: transformers
 pipeline_tag: text-generation
 tags:
 - llama
 - multilingual
 - instruct
 - chat
 - villanova
 base_model:
 - VillanovaAI/Villanova-2B-Base-2603
 datasets:
 - VillanovaAI/villanova-sft-2603
 ---
 # Model Card for Villanova-2B-2603
 <img src="https://huggingface.co/spaces/VillanovaAI/README/resolve/main/Logo_VILLANOVA_colore.svg" alt="Villanova.AI logo" height="96"/>
 **Villanova-2B-2603** is a fully open, multilingual instruction-tuned Large Language Model developed by [Villanova.AI](https://huggingface.co/VillanovaAI). Part of the Villanova project, it is designed to advance open European language technology with native support for five European languages. All model weights, training data sources, and training details are publicly released. 
 Built on top of [Villanova-2B-Base-2603](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2603) — a **2.4B-parameter model pretrained from scratch** — this instruction-tuned model offers strong multilingual instruction following and safety alignment under a fully open Apache 2.0 license.
 ---
 ## Model Family
 **[Villanova-2B-Base-2603](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2603)** — Base model (4.4T)<br>
 &emsp;↳ **[Villanova-2B-2603](https://huggingface.co/VillanovaAI/Villanova-2B-2603)** — SFT / Instruct — 📍 *This model*<br>
 &emsp;&emsp;↳ [Villanova-2B-2603-GGUF](https://huggingface.co/VillanovaAI/Villanova-2B-2603-GGUF) — Quantized<br>
 &emsp;↳ **[Villanova-2B-VL-2603](https://huggingface.co/VillanovaAI/Villanova-2B-VL-2603)** — Vision-Language Instruct<br>
 &emsp;&emsp;↳ [Villanova-2B-VL-2603-GGUF](https://huggingface.co/VillanovaAI/Villanova-2B-VL-2603-GGUF) — Quantized<br>
 <br>
 **[Villanova-2B-Base-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2512-Preview)** — Base model (2.2T) (previous version, not recommended)<br>
 &emsp;↳ **[Villanova-2B-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-2512-Preview)** — SFT / Instruct (previous version, not recommended)<br>
 ---
 ## Highlights
 - **European-focused, fully open model** released under Apache 2.0
 - **Native multilingual support** for 5 European languages: English, French, German, Italian, and Spanish
 - **Strong instruction following**, competitive with larger commercial models
 - **Robust multilingual safety alignment** across all supported languages
 - **+58% overall improvement** over our previous release ([Villanova-2B-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-2512-Preview))
 - Only **2B parameters**, efficient enough for edge and on-device deployment
 ## Model Summary
 | | |
 |---|---|
 | **Architecture** | Decoder-only Transformer (LLaMA-based) |
 | **Parameters** | 2.4B |
 | **Base Model** | [VillanovaAI/Villanova-2B-Base-2603](https://huggingface.co/VillanovaAI/Villanova-2B-Base-2603) (pretrained from scratch) |
 | **Pre-training Data** | 4.4T tokens (multilingual, two-stage) |
 | **Fine-tuning Data** | [VillanovaAI/villanova-sft-2603](https://huggingface.co/datasets/VillanovaAI/villanova-sft-2603) |
 | **Languages** | English, French, German, Italian, Spanish |
 | **Context Length** | 32,768 tokens |
 | **Precision** | bfloat16 |
 | **License** | Apache 2.0 |
 ## How to Use
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "VillanovaAI/Villanova-2B-2603"
 device = "cuda"  # or "cpu"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
 messages = [
    {"role": "user", "content": "Explain quantum entanglement in simple terms."}
 ]
 input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 model_inputs = tokenizer([input_text], return_tensors="pt").to(device)
 generated_ids = model.generate(**model_inputs, max_new_tokens=256, do_sample=True, temperature=0.7)
 output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
 print(tokenizer.decode(output_ids, skip_special_tokens=True))
 ```
 ## Evaluation
 Villanova-2B-2603 was extensively evaluated across **25 benchmarks** covering Reasoning, Question Answering, Safety, and Instruction Following in both English and multilingual settings. All evaluations were performed using identical settings and prompts for fair comparison.
 Tables are sorted by the main metric (descending). Models are grouped into *Fully Open* and *Open Weight* categories.
 ### Overall Performance
 Villanova-2B-2603 is the **#1 fully open model** in overall average across all benchmarks.
 | Model | Size | Reasoning | QA | Safety | Instr. Follow | **Overall** |
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 | **Fully Open** | | | | | | |
 | **Villanova-2B-2603** | 2.4B | 31.0 | 33.1 | **39.5** | **45.1** | **36.9** |
 | OLMo-2-0425-1B-Instruct | 1.2B | **38.7** | 35.6 | 19.4 | 39.3 | 33.9 |
 | Minerva-7B-instruct-v1.0 | 7.4B | 27.1 | **36.2** | 30.1 | 16.9 | 28.5 |
 | EuroLLM-1.7B-Instruct | 1.7B | 26.0 | 24.7 | 3.8 | 19.5 | 19.5 |
 | salamandra-2b-instruct | 2.3B | 23.6 | 26.6 | 9.6 | 15.7 | 20.0 |
 | **Open Weight** ||||||
 | Llama-3.2-3B-Instruct | 3.2B | **51.2** | **48.1** | **56.8** | **48.1** | **50.4** |
 | Qwen2.5-3B-Instruct | 3.1B | 39.4 | 35.8 | 54.7 | 46.8 | 42.9 |
 | Llama-3.2-1B-Instruct | 1.2B | 37.5 | 38.1 | 56.6 | 35.5 | 41.1 |
 | gemma-3-1b-it | 1.0B | 28.5 | 27.0 | 53.6 | 39.9 | 35.7 |
 | Qwen3-1.7B | 1.7B | 37.4 | 37.5 | 2.6 | 19.5 | 26.2 |
 ### Instruction Following
 Villanova-2B-2603 is the **#1 fully open model** for instruction following, and is competitive with larger open weight models. The MARCO benchmark evaluates structured instruction following across all five languages.
 | Model | Size | IFEval | MARCO-EN | MARCO-DE | MARCO-ES | MARCO-FR | MARCO-IT | **Avg** |
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
 | **Fully Open** | | | | | | | | |
 | **Villanova-2B-2603** | 2.4B | 62.0 | 39.4 | **40.5** | **44.2** | **42.5** | **42.1** | **45.1** |
 | OLMo-2-0425-1B-Instruct | 1.2B | **77.9** | **52.9** | 23.1 | 29.0 | 27.9 | 24.9 | 39.3 |
 | EuroLLM-1.7B-Instruct | 1.7B | 34.5 | 18.3 | 15.9 | 15.9 | 17.4 | 15.2 | 19.5 |
 | Minerva-7B-instruct-v1.0 | 7.4B | 29.6 | 17.0 | 12.2 | 13.9 | 13.9 | 15.0 | 16.9 |
 | salamandra-2b-instruct | 2.3B | 26.4 | 17.7 | 12.2 | 12.0 | 12.9 | 12.9 | 15.7 |
 | **Open Weight** ||||||||
 | Llama-3.2-3B-Instruct | 3.2B | **82.2** | **54.0** | **39.9** | 38.8 | 37.5 | 35.9 | **48.1** |
 | Qwen2.5-3B-Instruct | 3.1B | 71.5 | 47.3 | 37.5 | **42.5** | **41.0** | **40.7** | 46.8 |
 | gemma-3-1b-it | 1.0B | 74.5 | 42.7 | 27.5 | 33.3 | 27.9 | 33.3 | 39.9 |
 | Llama-3.2-1B-Instruct | 1.2B | 64.8 | 43.2 | 25.3 | 29.0 | 24.2 | 26.6 | 35.5 |
 | Qwen3-1.7B | 1.7B | 48.4 | 27.4 | 8.9 | 10.3 | 13.1 | 9.1 | 19.5 |
 > **Key insight:** While some models score higher on English-only IFEval, Villanova-2B-2603 delivers the most balanced multilingual instruction following, with MARCO scores of 40-44 across DE, ES, FR, IT. This is far ahead of OLMo (19-25) and Gemma (27-33) on non-English languages.
 ### Safety (M-ALERT)
 Villanova-2B-2603 is the **#1 fully open model** for safety. Safety was evaluated using the M-ALERT benchmark across all five languages.
 | Model | Size | EN | DE | ES | FR | IT | **Avg** |
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
 | **Fully Open** | | | | | | | |
 | **Villanova-2B-2603** | 2.4B | 31.0 | 4.1 | **56.0** | **62.2** | 44.2 | **39.5** |
 | Minerva-7B-instruct-v1.0 | 7.4B | 31.6 | 4.3 | 26.9 | 24.8 | **62.9** | 30.1 |
 | OLMo-2-0425-1B-Instruct | 1.2B | **58.0** | **5.7** | 13.4 | 10.7 | 9.1 | 19.4 |
 | salamandra-2b-instruct | 2.3B | 4.9 | 3.0 | 15.6 | 15.4 | 9.2 | 9.6 |
 | EuroLLM-1.7B-Instruct | 1.7B | 5.4 | 0.8 | 2.6 | 8.4 | 1.7 | 3.8 |
 | **Open Weight** |||||||
 | Llama-3.2-3B-Instruct | 3.2B | 54.5 | 26.4 | 70.3 | 63.3 | **69.4** | **56.8** |
 | Llama-3.2-1B-Instruct | 1.2B | 47.1 | **32.9** | 67.4 | **68.6** | 67.2 | 56.6 |
 | Qwen2.5-3B-Instruct | 3.1B | **60.2** | 23.2 | **71.7** | 64.0 | 54.4 | 54.7 |
 | gemma-3-1b-it | 1.0B | 58.6 | 28.7 | 58.8 | 68.4 | 53.3 | 53.6 |
 | Qwen3-1.7B | 1.7B | 10.2 | 0.0 | 0.5 | 0.8 | 1.3 | 2.6 |
 ### Reasoning & Question Answering
 | Model | Size | BBH | LB-BBH | GSM8K | DROP | TruthfulQA | **Avg Reasoning** | **Avg QA** |
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
 | **Fully Open** | | | | | | | | |
 | Minerva-7B-instruct-v1.0 | 7.4B | 29.0 | 30.0 | 10.6 | 29.2 | 29.6 | 27.1 | **36.2** |
 | OLMo-2-0425-1B-Instruct | 1.2B | 27.6 | **33.8** | **67.4** | 30.2 | **33.8** | **38.7** | 35.6 |
 | **Villanova-2B-2603** | 2.4B | **29.3** | 33.2 | 23.4 | **34.8** | 28.5 | 31.0 | 33.1 |
 | salamandra-2b-instruct | 2.3B | 22.5 | 29.2 | 2.3 | 20.6 | 27.8 | 23.6 | 26.6 |
 | EuroLLM-1.7B-Instruct | 1.7B | 28.5 | 29.8 | 12.7 | 22.2 | 29.2 | 26.0 | 24.7 |
 | **Open Weight** |||||||||
 | Llama-3.2-3B-Instruct | 3.2B | **59.3** | 44.6 | **77.2** | **48.3** | 36.1 | **51.2** | **48.1** |
 | Qwen2.5-3B-Instruct | 3.1B | 12.2 | **46.9** | 76.0 | 12.5 | **41.4** | 39.4 | 35.8 |
 | Qwen3-1.7B | 1.7B | 9.8 | 43.5 | 74.2 | 34.4 | 29.6 | 37.4 | 37.5 |
 | Llama-3.2-1B-Instruct | 1.2B | 39.3 | 35.7 | 45.6 | 31.8 | 28.9 | 37.5 | 38.1 |
 | gemma-3-1b-it | 1.0B | 25.0 | 35.1 | 34.0 | 21.1 | 26.6 | 28.5 | 27.0 |
 ## Improvement over Previous Release
 Villanova-2B-2603 represents a **major leap** over our previous model ([Villanova-2B-2512-Preview](https://huggingface.co/VillanovaAI/Villanova-2B-2512-Preview)):
 | Category | 2512-Preview | **2603** | **Improvement** |
 |:---|:---:|:---:|:---:|
 | Overall | 23.3 | **36.9** | **+58%** |
 | Instruction Following | 28.9 | **45.1** | **+56%** |
 | Safety | 2.4 | **39.5** | **+1546%** |
 | Reasoning | 27.5 | **31.0** | **+13%** |
 | QA | 29.0 | **33.1** | **+14%** |
 ## License
 This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,12 @@
 {% set has_system = messages|selectattr('role', 'equalto', 'system')|list|length > 0 %}{% if not has_system %}{{ '<|im_start|>system
 You are Villanova, a helpful AI assistant built by Villanova.AI.<|im_end|>
 ' }}{% endif %}{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|im_start|>system
 ' + message['content'] + '<|im_end|>
 ' }}{% elif message['role'] == 'user' %}{{ '<|im_start|>user
 ' + message['content'] + '<|im_end|>
 ' }}{% elif message['role'] == 'assistant' %}{{ '<|im_start|>assistant
 ' + message['content'] }}{% if not loop.last %}{{ '<|im_end|>
 ' }}{% else %}{{ eos_token }}{% endif %}{% elif message['role'] == 'tool' %}{{ '<|im_start|>tool
 ' + message['content'] + '<|im_end|>
 ' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
 ' }}{% endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,29 @@
 {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "dtype": "bfloat16",
  "eos_token_id": 2,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 2560,
  "initializer_range": 0.014,
  "intermediate_size": 10240,
  "max_position_embeddings": 32768,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 20,
  "num_hidden_layers": 18,
  "num_key_value_heads": 4,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000,
  "tie_word_embeddings": true,
  "transformers_version": "4.57.1",
  "use_cache": true,
  "vocab_size": 256000
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
 {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "transformers_version": "4.57.1"
 }
--- a/model.safetensors
+++ b/model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:51d826843cdf9d01363a7d76e789680738037ec11775f3c3db1d81ff8ca70d88
 size 4708314704
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,44 @@
 {
  "bos_token": {
    "content": "<s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "cls_token": {
    "content": "<s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<pad>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "sep_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "unk_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:2e90b85b3e3b3ebfc6b9bafeb954b37f2435eed595738337e53f2a746d23d5a2
 size 37007416
--- a/tokenizer.model
+++ b/tokenizer.model
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:ab94ddf46d14f0279254858d53770c5319c5129d47291ee2bada530271cb1292
 size 4813276
--- a/tokenizer_config.json
+++ b/tokenizer_config.json