A reasoning-enhanced version of Qwen3-8B, fine-tuned via supervised knowledge distillation from Claude Opus 4.6 reasoning traces.
The goal is not token-level imitation of Opus output, but transfer of its reasoning structure and problem-solving style into a compact 8B model that can run locally. The model outputs structured chain-of-thought inside <think>...</think> tags before generating the final answer, following the Qwen3 thinking-mode convention.
We distill reasoning structure, not surface tokens. Specifically, the model is encouraged to acquire:
Explicit problem decomposition — break complex questions into sub-goals
Assumption checking — state what's given, what's unknown, and verify constraints
Step-by-step derivation — one logical step per line, no skipped algebra
Reflection & backtracking — recognize dead-ends and revise rather than plow forward
Clean answer construction — separate <think> scratch work from the final user-facing answer
This follows the "Claude Opus style" of reasoning — deliberative, self-critical, and structurally transparent.
Reasoning Scaffold (Learned Pattern)
After fine-tuning, the model tends to produce reasoning traces with this shape:
Restate and parse the task — identify exactly what is being asked
Plan — list the approach or sub-problems
Work through each step — show algebra, logic, or code reasoning explicitly
Verify — sanity-check the intermediate results before committing
Construct the final answer — separate, clean, user-facing summary
Expected Improvements
In practice, the gain is not a dramatic capability jump over the base Qwen3-8B, but rather:
Improved stability in multi-step reasoning
Structured, readable traces instead of rambling CoT
Better instruction adherence when a problem has constraints
Fewer hallucinated intermediate steps thanks to Opus-style self-verification
Usage
Transformers
fromtransformersimportAutoModelForCausalLM,AutoTokenizermodel_id="NhatCuong22/Qwen3-8B-OpusReasoning"tokenizer=AutoTokenizer.from_pretrained(model_id)model=AutoModelForCausalLM.from_pretrained(model_id,torch_dtype="bfloat16",device_map="auto",)messages=[{"role":"user","content":"If a train travels 120km in 2 hours, stops for 30 minutes, then travels 90km in 1.5 hours, what is the average speed for the entire journey?"}]text=tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)text+="<think>\n"# Activate thinking modeinputs=tokenizer(text,return_tensors="pt").to(model.device)outputs=model.generate(**inputs,max_new_tokens=2048,temperature=0.6,top_p=0.95,do_sample=True,)response=tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:],skip_special_tokens=False)print(response)
MMLU +2.47, ARC-Challenge +9.13, HellaSwag +2.07 — the model retains and slightly improves general knowledge and commonsense reasoning after reasoning-focused fine-tuning, without catastrophic forgetting.
ARC-Challenge +9.13 is a strong signal that Opus-style structured reasoning transfers well to scientific reasoning tasks.
GSM8K -2.4 is a minor regression, likely due to longer <think> traces being occasionally truncated by the default max_gen_toks — the model is still at ~87% on grade-school math.
More rigorous reasoning benchmarks (MMLU-Pro, MATH-Hard, AIME, IFEval, MuSR) are being evaluated and will be added here.
Best Suited For
Mathematical problem solving (arithmetic, algebra, word problems)
Logical reasoning and deduction
Code generation with explanation
Multi-step analytical question answering
Instruction-following tasks with constraints
Offline / on-prem reasoning assistants (fits in 16GB VRAM at bf16)
Limitations & Intended Use
Scale of supervision: fine-tuned on only ~4K samples — gains are stylistic and structural, not broad knowledge expansion