初始化项目，由ModelHub XC社区提供模型

Model: STEVENZHANG904/Qwen3-0.6B-planner-sft Source: Original Platform
2026-05-20 16:04:48 +08:00
commit 407ec292b5
9 changed files with 151810 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,67 @@
+---
+license: apache-2.0
+base_model: Qwen/Qwen3-0.6B
+datasets:
+  - Divij/qwen3-32b-mas-traces
+language:
+  - en
+library_name: transformers
+tags:
+  - sft
+  - qwen3
+  - multi-agent
+  - distillation
+  - planner
+---
+
+# STEVENZHANG904/Qwen3-0.6B-planner-sft
+
+SFT-finetuned **Qwen/Qwen3-0.6B** on the **planner** subset of [Divij/qwen3-32b-mas-traces](https://huggingface.co/datasets/Divij/qwen3-32b-mas-traces),
+which contains traces of **Qwen3-32B** acting as a `planner` agent in a multi-agent system. This model is the
+distilled student that learns to play the same role as Qwen3-32B in that pipeline.
+
+## Branches
+
+| Branch | Epochs trained | Notes |
+|---|---|---|
+| `epoch2` | 2 | intermediate |
+| `epoch5` | 5 | intermediate |
+| `main` | 10 | final |
+
+## Training configuration
+
+- **Base model:** `Qwen/Qwen3-0.6B`
+- **Dataset:** `Divij/qwen3-32b-mas-traces` (config `planner`)
+- **Loss:** assistant-only (system + user tokens masked)
+- **Optimizer:** AdamW (β=(0.9, 0.95), wd=0.01, eps=1e-8)
+- **Learning rate:** 1e-5, constant with 3% warmup
+- **Sequence length:** 8192 (sequence packing on)
+- **Precision:** bf16
+- **Hardware:** 8× H100 80GB, DDP
+- **Liger-Kernel:** on (chunked CE + fused RMSNorm)
+
+## Inference
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+
+repo = "STEVENZHANG904/Qwen3-0.6B-planner-sft"
+tok = AutoTokenizer.from_pretrained(repo)
+model = AutoModelForCausalLM.from_pretrained(repo, dtype=torch.bfloat16, device_map="cuda")
+
+# Planner role expects a task-spec prompt — see the dataset card for the exact format.
+messages = [
+    {"role": "system", "content": "You are a helpful, creative, and smart assistant."},
+    {"role": "user", "content": "<your planner task spec here>"},
+]
+inputs = tok.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
+out = model.generate(
+    inputs, max_new_tokens=4096,
+    do_sample=True, temperature=0.6, top_p=0.95,  # Qwen3 thinking-mode defaults
+)
+print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
+```
+
+The model emits `<think>...</think>` reasoning blocks (inherited from Qwen3-32B traces).
+**Use sampling**, not greedy decoding — small distilled models can loop in `<think>` under greedy.