Files
DistilQwen2.5-DS3-0324-7B/README.md
2025-04-22 08:58:53 +08:00

77 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
---
## 📖 Introduction
# DistilQwen2.5-DS3-0324 Series: Fast-Thinking Reasoning Models
## Overview
In response to the industry challenge of balancing efficient reasoning with cognitive capabilities, the DistilQwen2.5-DS3-0324 series innovatively transfers the fast-thinking capabilities of DeepSeekV3-0324 to lightweight models. Through a two-stage distillation framework, this series achieves high performance while delivering:
- **Enhanced Reasoning Speed**: Reduces output tokens by 60-80% (compared to slow-thinking models)
- **Reduced Resource Consumption**: Suitable for edge computing deployment
- **Elimination of Cognitive Bias**: Proprietary trajectory alignment technology
## Core Innovations
### 1. Fast-Thinking Distillation Framework
- **Stage 1: Fast-Thinking CoT Data Collection**
- **Long-to-Short Rewriting**: Extracts key reasoning steps from DeepSeek-R1
- **Teacher Model Distillation**: Captures the rapid reasoning trajectories of DeepSeekV3-0324
- **Stage 2: CoT Trajectory Cognitive Alignment**
- **Dynamic Difficulty Grading** (Easy/Medium/Hard)
- LLM-as-a-Judge evaluates small model comprehensibility
- Simple chain expansion → Adds necessary steps
- Hard chain simplification → Removes high-level logical leaps
- **Validation Mechanism**: Iterative optimization until all data reaches "Medium" rating
### 2. Performance Breakthroughs
- **32B Model** approaches the performance of closed-source models with 10x the parameters on the GPQA Diamond benchmark
- **Significant Improvement in Reasoning Efficiency** (see comparison table below)
| Model | MMLU_PRO Tokens | AIME2024 Tokens | Speed Gain |
|--------------------------------|-----------------|-----------------|------------|
| DistilQwen2.5-R1-32B (Slow-Thinking) | 4198 | 12178 | 1x |
| DistilQwen2.5-DS3-0324-32B | 690 | 4177 | 5-8x |
## Technical Advantages
- **Two-Stage Distillation**: First compresses reasoning length, then aligns cognitive trajectories
- **Dynamic Data Optimization**: Adaptive difficulty adjustment ensures knowledge transferability
- **Open-Source Compatibility**: Fine-tuned based on the Qwen2.5 base model
## 🚀 Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"alibaba-pai/DistilQwen2.5-DS3-0324-7B",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("alibaba-pai/DistilQwen2.5-DS3-0324-7B")
prompt = "Give me a short introduction to large language model."
messages=[
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant. You should think step-by-step."},
{"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=2048
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```