初始化项目,由ModelHub XC社区提供模型

Model: PAI/DistilQwen2-1.5B-Instruct
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-23 07:00:15 +08:00
commit c636c8ca9b
12 changed files with 151621 additions and 0 deletions

40
.gitattributes vendored Normal file
View File

@@ -0,0 +1,40 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
tokenizer.json filter=lfs diff=lfs merge=lfs -text

74
README.md Normal file
View File

@@ -0,0 +1,74 @@
## 📖 Introduction
**DistilQwen2-1.5B** is a distilled version of **Qwen2-1.5B-Instruct**, designed to distill the capabilities of stronger LLMs into smaller ones. To achieve this, we utilized a diverse range of datasets for the distillation process, including well-known open-source collections such as Magpie, Openhermes, and Mammoth 2, as well as proprietary synthetic datasets.
The training data primarily consists of instructions in Chinese and English. To enhance the quality and diversity of the instruction data, we implemented a difficulty scoring system and task-related resampling techniques.
For difficulty scoring, we employed the LLM-as-a-Judge paradigm, using the teacher model to evaluate responses based on accuracy, relevance, helpfulness, and level of detail. We then calculated the Model Fitting Difficulty (MFD) Score by subtracting the teacher model's score from the student model's score. A higher MFD Score indicates that the instruction is more valuable for distillation training. This approach allowed us to remove low-difficulty instructions from the training set, focusing on more challenging and informative examples.
This careful curation and scoring process ensures that **DistilQwen2-1.5B** achieves high performance after the distillation process.
## 🚀 Quick Start
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"alibaba-pai/DistilQwen2-1.5B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("alibaba-pai/DistilQwen2-1.5B-Instruct")
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=2048
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
## 🔍 Evaluation
We evaluated our model on instruction-following leaderboards such as AlpacaEval, MT-Bench and IFEval.
| Model | AlpacaEval 2.0 (length-controlled) | MT-Bench | MT-Bench (single) | IFEval (instruction-loose) | IFEval (strict-prompt) |
|------|-----------------------------------|----------|-------------------|---------------------------|------------------------|
| Qwen2-1.5B-Instruct | 5.22 | 5.85 | 6.45 | 41.37 | 28.10 |
| DistilQwen2-1.5B-Instruct | 8.28 | 6.42 | 7.12 | 49.76 | 36.04 |
| Qwen2-7B-Instruct | 24.33 | 8.27 | 8.68 | 66.67 | 52.31 |
| DistilQwen2-7B-Instruct | 25.35 | 8.40 | 9.03 | 71.46 | 60.26 |
## 📜 Citation
If you find our work helpful, please cite it!
```
@misc{TAPIR,
title={Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning},
author={Yuanhao Yue and Chengyu Wang and Jun Huang and Peng Wang},
year={2024},
eprint={2405.13448},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2405.13448},
}
```

5
added_tokens.json Normal file
View File

@@ -0,0 +1,5 @@
{
"<|endoftext|>": 151643,
"<|im_end|>": 151645,
"<|im_start|>": 151644
}

28
config.json Normal file
View File

@@ -0,0 +1,28 @@
{
"_name_or_path": "/mnt/workspace/yueyuanhao/202405/models/Qwen2-1.5B-Instruct/",
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 1536,
"initializer_range": 0.02,
"intermediate_size": 8960,
"max_position_embeddings": 32768,
"max_window_layers": 28,
"model_type": "qwen2",
"num_attention_heads": 12,
"num_hidden_layers": 28,
"num_key_value_heads": 2,
"rms_norm_eps": 1e-06,
"rope_theta": 1000000.0,
"sliding_window": 32768,
"tie_word_embeddings": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.41.2",
"use_cache": false,
"use_sliding_window": false,
"vocab_size": 151936
}

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework":"Pytorch","task":"text-generation"}

14
generation_config.json Normal file
View File

@@ -0,0 +1,14 @@
{
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"repetition_penalty": 1.1,
"temperature": 0.7,
"top_k": 20,
"top_p": 0.8,
"transformers_version": "4.41.2"
}

151388
merges.txt Normal file

File diff suppressed because it is too large Load Diff

3
model.safetensors Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8f54291e3b8924de1f0c02f97ecd71d9f28c4b3cd7e8067d4b9dc627db1bc30e
size 3087467144

20
special_tokens_map.json Normal file
View File

@@ -0,0 +1,20 @@
{
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>"
],
"eos_token": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
},
"pad_token": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false
}
}

3
tokenizer.json Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a12e3ba5d5e0ad173cf7b408ab8534c6be8cbc6a146714e9c7dc8cf2346603b1
size 7028043

44
tokenizer_config.json Normal file
View File

@@ -0,0 +1,44 @@
{
"add_prefix_space": false,
"added_tokens_decoder": {
"151643": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151644": {
"content": "<|im_start|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"151645": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
},
"additional_special_tokens": [
"<|im_start|>",
"<|im_end|>"
],
"bos_token": null,
"chat_template": "{% set system_message = 'You are a helpful assistant.' %}{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user\n' + content + '<|im_end|>\n<|im_start|>assistant\n' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '\n' }}{% endif %}{% endfor %}",
"clean_up_tokenization_spaces": false,
"eos_token": "<|im_end|>",
"errors": "replace",
"model_max_length": 32768,
"pad_token": "<|endoftext|>",
"padding_side": "right",
"split_special_tokens": false,
"tokenizer_class": "Qwen2Tokenizer",
"unk_token": null
}

1
vocab.json Normal file

File diff suppressed because one or more lines are too long