Files

ModelHub XC 119198d6e6 初始化项目，由ModelHub XC社区提供模型

Model: NhatCuong22/Qwen3-8B-OpusReasoning
Source: Original Platform

2026-05-12 10:45:21 +08:00

8.6 KiB

Raw Permalink Blame History

language, license, base_model, tags, datasets, model-index, pipeline_tag

language

license

base_model

Qwen3-8B-OpusReasoning

Model Overview

A reasoning-enhanced version of Qwen3-8B, fine-tuned via supervised knowledge distillation from Claude Opus 4.6 reasoning traces.

The goal is not token-level imitation of Opus output, but transfer of its reasoning structure and problem-solving style into a compact 8B model that can run locally. The model outputs structured chain-of-thought inside <think>...</think> tags before generating the final answer, following the Qwen3 thinking-mode convention.

Base model: unsloth/Qwen3-8B
Teacher model: Claude Opus 4.6 (reasoning traces, distilled)
Training type: Supervised Fine-Tuning (SFT) + LoRA → merged bf16
Framework: Unsloth 2026.4.5 + TRL SFTTrainer
Precision: bfloat16
Hardware: 1x NVIDIA A100-SXM4-80GB

Training Data

Dataset	Samples	Role
Crownelius/Opus-4.6-Reasoning-3300x	3,300	Main distillation — Claude Opus 4.6 reasoning traces
Jackrong/Qwen3.5-reasoning-700x	700	Auxiliary — supporting reasoning diversity
Total	4,000

Data Characteristics

Long-form chain-of-thought supervision (<think>...</think>)
Diverse reasoning domains: math, logic, code, analytical QA
High-quality Opus 4.6 teacher traces — carefully curated, no noisy labels
Conversation format compatible with Qwen3 chat template

Training Pipeline

LoRA Configuration

Parameter	Value
Rank (r)	64
Alpha	128
Dropout	0.05
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Hyperparameters

Parameter	Value
Effective batch size	1 × 16 = 16
Learning rate	5e-5
LR scheduler	Cosine
Epochs	3
Max sequence length	16,384
Optimizer	AdamW 8-bit
Warmup ratio	0.03
Weight decay	0.01
Packing	Enabled
Gradient checkpointing	Unsloth

Training Results

Final train loss: 0.6953
Runtime: ~153 minutes on A100-80GB

Distillation Philosophy

We distill reasoning structure, not surface tokens. Specifically, the model is encouraged to acquire:

Explicit problem decomposition — break complex questions into sub-goals
Assumption checking — state what's given, what's unknown, and verify constraints
Step-by-step derivation — one logical step per line, no skipped algebra
Reflection & backtracking — recognize dead-ends and revise rather than plow forward
Clean answer construction — separate <think> scratch work from the final user-facing answer

This follows the "Claude Opus style" of reasoning — deliberative, self-critical, and structurally transparent.

Reasoning Scaffold (Learned Pattern)

After fine-tuning, the model tends to produce reasoning traces with this shape:

Restate and parse the task — identify exactly what is being asked
Plan — list the approach or sub-problems
Work through each step — show algebra, logic, or code reasoning explicitly
Verify — sanity-check the intermediate results before committing
Construct the final answer — separate, clean, user-facing summary

Expected Improvements

In practice, the gain is not a dramatic capability jump over the base Qwen3-8B, but rather:

Improved stability in multi-step reasoning
Structured, readable traces instead of rambling CoT
Better instruction adherence when a problem has constraints
Fewer hallucinated intermediate steps thanks to Opus-style self-verification

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "NhatCuong22/Qwen3-8B-OpusReasoning"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "If a train travels 120km in 2 hours, stops for 30 minutes, then travels 90km in 1.5 hours, what is the average speed for the entire journey?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
text += "<think>\n"  # Activate thinking mode

inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.6,
    top_p=0.95,
    do_sample=True,
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=False)
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "NhatCuong22/Qwen3-8B-OpusReasoning",
    max_seq_length=16384,
    dtype="bfloat16",
)
FastLanguageModel.for_inference(model)

Recommended Generation Parameters

Parameter	Reasoning tasks	Creative tasks
temperature	0.6	0.8
top_p	0.95	0.95
max_new_tokens	2048-4096	1024-2048
repetition_penalty	1.0	1.05

Model Architecture

Parameter	Value
Parameters	~8B
Hidden size	4,096
Layers	36
Attention heads	32 (8 KV heads, GQA)
Intermediate size	12,288
Max position embeddings	40,960
Vocabulary size	151,936
Precision	bfloat16

Evaluation

Evaluated with lm-evaluation-harness (5-shot, bf16, A100-80GB).

Results vs base Qwen3-8B

Benchmark	Metric	Qwen3-8B-OpusReasoning	Base Qwen3-8B	Δ
MMLU	accuracy	75.40%	72.93%	+2.47 ✅
ARC-Challenge	acc_norm	65.87%	56.74%	+9.13 ✅✅
ARC-Challenge	accuracy	64.42%	—	—
HellaSwag	acc_norm	76.98%	74.91%	+2.07 ✅
HellaSwag	accuracy	58.32%	—	—
GSM8K	exact_match (strict)	86.20%	88.60%	-2.40
GSM8K	exact_match (flexible)	86.66%	—	—

Analysis

MMLU +2.47, ARC-Challenge +9.13, HellaSwag +2.07 — the model retains and slightly improves general knowledge and commonsense reasoning after reasoning-focused fine-tuning, without catastrophic forgetting.
ARC-Challenge +9.13 is a strong signal that Opus-style structured reasoning transfers well to scientific reasoning tasks.
GSM8K -2.4 is a minor regression, likely due to longer <think> traces being occasionally truncated by the default max_gen_toks — the model is still at ~87% on grade-school math.

More rigorous reasoning benchmarks (MMLU-Pro, MATH-Hard, AIME, IFEval, MuSR) are being evaluated and will be added here.

Best Suited For

Mathematical problem solving (arithmetic, algebra, word problems)
Logical reasoning and deduction
Code generation with explanation
Multi-step analytical question answering
Instruction-following tasks with constraints
Offline / on-prem reasoning assistants (fits in 16GB VRAM at bf16)

Limitations & Intended Use

Scale of supervision: fine-tuned on only ~4K samples — gains are stylistic and structural, not broad knowledge expansion
Hallucination risk: reasoning traces may confidently cite non-existent facts; verify external claims
Opus-style bias: inherits tendencies of the teacher (e.g., verbosity, occasional over-hedging)
Language: primarily English training data
Not verified for safety-critical use — research and learning only
Base model license constraints: follow Qwen3 upstream license in commercial settings

Acknowledgments

Qwen Team — Qwen3-8B base model
Unsloth — efficient fine-tuning kernels
Anthropic — Claude Opus reasoning (teacher)
Dataset authors: Crownelius, Jackrong

Citation

@misc{qwen3-8b-opusreasoning,
  title        = {Qwen3-8B-OpusReasoning},
  author       = {Vo Nhat Cuong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/NhatCuong22/Qwen3-8B-OpusReasoning}}
}

License

Apache 2.0 (inherits from Qwen3-8B upstream).

8.6 KiB Raw Permalink Blame History Unescape Escape