Files
Qwen3-0.6B-planner-sft/README.md
ModelHub XC 407ec292b5 初始化项目,由ModelHub XC社区提供模型
Model: STEVENZHANG904/Qwen3-0.6B-planner-sft
Source: Original Platform
2026-05-20 16:04:48 +08:00

68 lines
2.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
base_model: Qwen/Qwen3-0.6B
datasets:
- Divij/qwen3-32b-mas-traces
language:
- en
library_name: transformers
tags:
- sft
- qwen3
- multi-agent
- distillation
- planner
---
# STEVENZHANG904/Qwen3-0.6B-planner-sft
SFT-finetuned **Qwen/Qwen3-0.6B** on the **planner** subset of [Divij/qwen3-32b-mas-traces](https://huggingface.co/datasets/Divij/qwen3-32b-mas-traces),
which contains traces of **Qwen3-32B** acting as a `planner` agent in a multi-agent system. This model is the
distilled student that learns to play the same role as Qwen3-32B in that pipeline.
## Branches
| Branch | Epochs trained | Notes |
|---|---|---|
| `epoch2` | 2 | intermediate |
| `epoch5` | 5 | intermediate |
| `main` | 10 | final |
## Training configuration
- **Base model:** `Qwen/Qwen3-0.6B`
- **Dataset:** `Divij/qwen3-32b-mas-traces` (config `planner`)
- **Loss:** assistant-only (system + user tokens masked)
- **Optimizer:** AdamW (β=(0.9, 0.95), wd=0.01, eps=1e-8)
- **Learning rate:** 1e-5, constant with 3% warmup
- **Sequence length:** 8192 (sequence packing on)
- **Precision:** bf16
- **Hardware:** 8× H100 80GB, DDP
- **Liger-Kernel:** on (chunked CE + fused RMSNorm)
## Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo = "STEVENZHANG904/Qwen3-0.6B-planner-sft"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="cuda")
# Planner role expects a task-spec prompt — see the dataset card for the exact format.
messages = [
{"role": "system", "content": "You are a helpful, creative, and smart assistant."},
{"role": "user", "content": "<your planner task spec here>"},
]
inputs = tok.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
out = model.generate(
inputs, max_new_tokens=4096,
do_sample=True, temperature=0.6, top_p=0.95, # Qwen3 thinking-mode defaults
)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
```
The model emits `<think>...</think>` reasoning blocks (inherited from Qwen3-32B traces).
**Use sampling**, not greedy decoding — small distilled models can loop in `<think>` under greedy.