Files
DistilQwen2.5-DS3-0324-7B/README.md
2025-04-21 09:33:48 +00:00

75 lines
2.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
---
## 📖 Introduction
# DistilQwen2.5-DS3-0324 系列快思考推理模型
## 概述
在平衡高效推理与思维能力的行业挑战下DistilQwen2.5-DS3-0324系列创新性地将DeepSeekV3-0324的快思考能力迁移到轻量模型中。通过两阶段蒸馏框架该系列在保持高性能的同时实现
- **推理速度提升**输出token数减少60-80%(相比慢思考模型)
- **资源消耗降低**:适合边缘计算部署
- **认知偏差消除**:独创的轨迹对齐技术
## 核心创新
### 1. 快思考蒸馏框架
- **阶段一快思考CoT数据收集**
- **Long-to-Short改写**从DeepSeek-R1提炼关键推理步骤
- **教师模型蒸馏**提取DeepSeekV3-0324的快速推理轨迹
- **阶段二CoT轨迹认知对齐**
- **动态难度分级**(简单/中等/困难)
- LLM-as-a-Judge评估小模型可理解性
- 简单链扩展 → 补充必要步骤
- 困难链精简 → 移除高阶逻辑跳跃
- **验证机制**:迭代优化直至所有数据达"中等"评级
### 2. 性能突破
- **32B模型**在GPQA Diamond基准接近10倍参数量的闭源模型
- **推理效率**显著提升(见下表对比)
| 模型 | MMLU_PRO Tokens | AIME2024 Tokens | 速度增益 |
|--------------------------------|-----------------|-----------------|----------|
| DistilQwen2.5-R1-32B (慢思考) | 4198 | 12178 | 1x |
| DistilQwen2.5-DS3-0324-32B | 690 | 4177 | 5-8x |
## 技术优势
- **双阶段蒸馏**:先压缩推理长度,再对齐认知轨迹
- **动态数据优化**:自适应难度调整确保知识可迁移性
- **开源兼容**基于Qwen2.5基座模型微调
## 🚀 快速开始
```python
from modelscope import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"PAI/DistilQwen2.5-DS3-0324-32B",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("PAI/DistilQwen2.5-DS3-0324-32B")
prompt = "Give me a short introduction to large language model."
messages=[
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant. You should think step-by-step."},
{"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=2048
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```