DistilQwen2-7B-Instruct/README.md

## 📖 Introduction

**DistilQwen2-7B** is a distilled version of **Qwen2-7B-Instruct**, designed to distill the capabilities of stronger LLMs into smaller ones. To achieve this, we utilized a diverse range of datasets for the distillation process, including well-known open-source collections such as Magpie, Openhermes, and Mammoth 2, as well as proprietary synthetic datasets.

The training data primarily consists of instructions in Chinese and English. To enhance the quality and diversity of the instruction data, we implemented a difficulty scoring system and task-related resampling techniques.

For difficulty scoring, we employed the LLM-as-a-Judge paradigm, using the teacher model to evaluate responses based on accuracy, relevance, helpfulness, and level of detail. We then calculated the Model Fitting Difficulty (MFD) Score by subtracting the teacher model's score from the student model's score. A higher MFD Score indicates that the instruction is more valuable for distillation training. This approach allowed us to remove low-difficulty instructions from the training set, focusing on more challenging and informative examples.

This careful curation and scoring process ensures that **DistilQwen2-7B** achieves high performance after the distillation process.

## 🚀 Quick Start

Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "alibaba-pai/DistilQwen2-7B-Instruct",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("alibaba-pai/DistilQwen2-7B-Instruct")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=2048，
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## 🔍 Evaluation

We evaluated our model on instruction-following leaderboards such as AlpacaEval, MT-Bench and IFEval.

| Model | AlpacaEval 2.0 (length-controlled) | MT-Bench | MT-Bench (single) | IFEval (instruction-loose) | IFEval (strict-prompt) |
|------|-----------------------------------|----------|-------------------|---------------------------|------------------------|
| Qwen2-1.5B-Instruct | 5.22                | 5.85     | 6.45              | 41.37                     | 28.10                  |
| DistilQwen2-1.5B-Instruct | 8.28          | 6.42     | 7.12              | 49.76                     | 36.04                  |
| Qwen2-7B-Instruct | 24.33                | 8.27     | 8.68              | 66.67                     | 52.31                  |
| DistilQwen2-7B-Instruct | 25.35           | 8.40     | 9.03              | 71.46                     | 60.26                  |


## 📜 Citation

If you find our work helpful, please cite it!

```
@misc{TAPIR,
      title={Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning},
      author={Yuanhao Yue and Chengyu Wang and Jun Huang and Peng Wang},
      year={2024},
      eprint={2405.13448},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2405.13448},
}
```