Go to file

ModelHub XC 119198d6e6 初始化项目，由ModelHub XC社区提供模型

Model: NhatCuong22/Qwen3-8B-OpusReasoning
Source: Original Platform

2026-05-12 10:45:21 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-12 10:45:21 +08:00

README.md

language, license, base_model, tags, datasets, model-index, pipeline_tag

language

license

base_model

Qwen3-8B-OpusReasoning

Model Overview

A reasoning-enhanced version of Qwen3-8B, fine-tuned via supervised knowledge distillation from Claude Opus 4.6 reasoning traces.

The goal is not token-level imitation of Opus output, but transfer of its reasoning structure and problem-solving style into a compact 8B model that can run locally. The model outputs structured chain-of-thought inside <think>...</think> tags before generating the final answer, following the Qwen3 thinking-mode convention.

Base model: unsloth/Qwen3-8B
Teacher model: Claude Opus 4.6 (reasoning traces, distilled)
Training type: Supervised Fine-Tuning (SFT) + LoRA → merged bf16
Framework: Unsloth 2026.4.5 + TRL SFTTrainer
Precision: bfloat16
Hardware: 1x NVIDIA A100-SXM4-80GB

Training Data

Dataset	Samples	Role
Crownelius/Opus-4.6-Reasoning-3300x	3,300	Main distillation — Claude Opus 4.6 reasoning traces
Jackrong/Qwen3.5-reasoning-700x	700	Auxiliary — supporting reasoning diversity
Total	4,000

Data Characteristics

Long-form chain-of-thought supervision (<think>...</think>)
Diverse reasoning domains: math, logic, code, analytical QA
High-quality Opus 4.6 teacher traces — carefully curated, no noisy labels
Conversation format compatible with Qwen3 chat template

Training Pipeline

LoRA Configuration

Parameter	Value
Rank (r)	64
Alpha	128
Dropout	0.05
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Hyperparameters

Parameter	Value
Effective batch size	1 × 16 = 16
Learning rate	5e-5
LR scheduler	Cosine
Epochs	3
Max sequence length	16,384
Optimizer	AdamW 8-bit
Warmup ratio	0.03
Weight decay	0.01
Packing	Enabled
Gradient checkpointing	Unsloth

Training Results

Final train loss: 0.6953
Runtime: ~153 minutes on A100-80GB

Distillation Philosophy

We distill reasoning structure, not surface tokens. Specifically, the model is encouraged to acquire:

Explicit problem decomposition — break complex questions into sub-goals
Assumption checking — state what's given, what's unknown, and verify constraints
Step-by-step derivation — one logical step per line, no skipped algebra
Reflection & backtracking — recognize dead-ends and revise rather than plow forward
Clean answer construction — separate <think> scratch work from the final user-facing answer

This follows the "Claude Opus style" of reasoning — deliberative, self-critical, and structurally transparent.

Reasoning Scaffold (Learned Pattern)

After fine-tuning, the model tends to produce reasoning traces with this shape:

Restate and parse the task — identify exactly what is being asked
Plan — list the approach or sub-problems
Work through each step — show algebra, logic, or code reasoning explicitly
Verify — sanity-check the intermediate results before committing
Construct the final answer — separate, clean, user-facing summary

Expected Improvements

In practice, the gain is not a dramatic capability jump over the base Qwen3-8B, but rather:

Improved stability in multi-step reasoning
Structured, readable traces instead of rambling CoT
Better instruction adherence when a problem has constraints
Fewer hallucinated intermediate steps thanks to Opus-style self-verification

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "NhatCuong22/Qwen3-8B-OpusReasoning"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="bfloat16",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "If a train travels 120km in 2 hours, stops for 30 minutes, then travels 90km in 1.5 hours, what is the average speed for the entire journey?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
text += "<think>\n"  # Activate thinking mode

inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.6,
    top_p=0.95,
    do_sample=True,
)

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=False)
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "NhatCuong22/Qwen3-8B-OpusReasoning",
    max_seq_length=16384,
    dtype="bfloat16",
)
FastLanguageModel.for_inference(model)

Recommended Generation Parameters

Parameter	Reasoning tasks	Creative tasks
temperature	0.6	0.8
top_p	0.95	0.95
max_new_tokens	2048-4096	1024-2048
repetition_penalty	1.0	1.05

Model Architecture

Parameter	Value
Parameters	~8B
Hidden size	4,096
Layers	36
Attention heads	32 (8 KV heads, GQA)
Intermediate size	12,288
Max position embeddings	40,960
Vocabulary size	151,936
Precision	bfloat16

Evaluation

Evaluated with lm-evaluation-harness (5-shot, bf16, A100-80GB).

Results vs base Qwen3-8B

Benchmark	Metric	Qwen3-8B-OpusReasoning	Base Qwen3-8B	Δ
MMLU	accuracy	75.40%	72.93%	+2.47 ✅
ARC-Challenge	acc_norm	65.87%	56.74%	+9.13 ✅✅
ARC-Challenge	accuracy	64.42%	—	—
HellaSwag	acc_norm	76.98%	74.91%	+2.07 ✅
HellaSwag	accuracy	58.32%	—	—
GSM8K	exact_match (strict)	86.20%	88.60%	-2.40
GSM8K	exact_match (flexible)	86.66%	—	—

Analysis

MMLU +2.47, ARC-Challenge +9.13, HellaSwag +2.07 — the model retains and slightly improves general knowledge and commonsense reasoning after reasoning-focused fine-tuning, without catastrophic forgetting.
ARC-Challenge +9.13 is a strong signal that Opus-style structured reasoning transfers well to scientific reasoning tasks.
GSM8K -2.4 is a minor regression, likely due to longer <think> traces being occasionally truncated by the default max_gen_toks — the model is still at ~87% on grade-school math.

More rigorous reasoning benchmarks (MMLU-Pro, MATH-Hard, AIME, IFEval, MuSR) are being evaluated and will be added here.

Best Suited For

Mathematical problem solving (arithmetic, algebra, word problems)
Logical reasoning and deduction
Code generation with explanation
Multi-step analytical question answering
Instruction-following tasks with constraints
Offline / on-prem reasoning assistants (fits in 16GB VRAM at bf16)

Limitations & Intended Use

Scale of supervision: fine-tuned on only ~4K samples — gains are stylistic and structural, not broad knowledge expansion
Hallucination risk: reasoning traces may confidently cite non-existent facts; verify external claims
Opus-style bias: inherits tendencies of the teacher (e.g., verbosity, occasional over-hedging)
Language: primarily English training data
Not verified for safety-critical use — research and learning only
Base model license constraints: follow Qwen3 upstream license in commercial settings

Acknowledgments

Qwen Team — Qwen3-8B base model
Unsloth — efficient fine-tuning kernels
Anthropic — Claude Opus reasoning (teacher)
Dataset authors: Crownelius, Jackrong

Citation

@misc{qwen3-8b-opusreasoning,
  title        = {Qwen3-8B-OpusReasoning},
  author       = {Vo Nhat Cuong},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/NhatCuong22/Qwen3-8B-OpusReasoning}}
}

License

Apache 2.0 (inherits from Qwen3-8B upstream).

README.md Unescape Escape

Qwen3-8B-OpusReasoning

Model Overview

Training Data

Data Characteristics

Training Pipeline

LoRA Configuration

Hyperparameters

Training Results

Distillation Philosophy

Reasoning Scaffold (Learned Pattern)

Expected Improvements

Usage

Transformers

Unsloth (faster inference)

Recommended Generation Parameters

Model Architecture

Evaluation

Results vs base Qwen3-8B

Analysis

Best Suited For

Limitations & Intended Use

Acknowledgments

Citation

License

README.md