Files
Qwen3-4B-Qwen3.6-plus-Reaso…/README.md
ModelHub XC 0db8bf3543 初始化项目,由ModelHub XC社区提供模型
Model: nhe-ai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled
Source: Original Platform
2026-05-30 04:47:17 +08:00

3.9 KiB

base_model, tags, license, language, datasets, pipeline_tag
base_model tags license language datasets pipeline_tag
unsloth/Qwen3-4B-Thinking-2507
text-generation-inference
transformers
unsloth
qwen3
distillation
reasoning
apache-2.0
en
khazarai/qwen3.6-plus-high-reasoning-500x
text-generation

Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled

alt="General Benchmark Comparison Chart"

  • Benchmark: khazarai/Multi-Domain-Reasoning-Benchmark
  • Total Questions: 100
Model Score
khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled 75.64
Qwen/Qwen3-4B-Thinking-2507 73.73

This is a reasoning-distilled variant of Qwen3-4B-Thinking, fine-tuned using LoRA via Unsloth to replicate the advanced reasoning capabilities of the larger Qwen3.6-plus teacher model. The distillation process focuses on reducing the "rambling" and "uncertainty" often found in smaller models during complex tasks, replacing them with concise, structured, and actionable solution paths.

Reasoning Comparison: Base vs. Distilled

The primary improvement in this model is the qualitative leap in reasoning structure. Below is a summary of the differences observed when solving complex graph problems (e.g., Shortest Path with Edge Reversals):

Base Model (Qwen3-4B-Thinking):

  • Style: Stream-of-consciousness, exploratory, and verbose.
  • Behavior: The model often talks to itself ("Hmm, interesting", "Wait, no"), struggles to interpret problem constraints correctly on the first try, and enters loops of self-correction. It mimics a student trying to figure out the problem as they speak.
  • Output: Contains high noise-to-signal ratio; solution paths are often buried under paragraphs of hesitation.

Distilled Model (Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled):

  • Style: Structured, professional, and report-oriented.
  • Behavior: The model analyzes the problem immediately, separates concerns (Input, Output, Constraints), and formulates a concrete algorithm plan (e.g., State-Space Dijkstra). It proceeds with confidence, avoiding logical dead-ends.
  • Output: Provides a clean breakdown: Problem Analysis -> Intuition -> Algorithm -> Complexity Analysis -> Pseudocode.

Verdict: The distilled model transforms the raw potential of the base model into an engineering-grade tool.

Model Specifications

  • Base Model: Qwen/Qwen3-4B-Thinking-2507
  • Model Type: Reasoning Distillation (QLoRA)
  • Framework: Unsloth
  • Fine-tuning Method: QLoRA (PEFT)
  • Teacher Model: Qwen3.6-plus
  • Distillation Dataset: khazarai/qwen3.6-plus-high-reasoning-500x
    • Total Tokens: 1,739,249
    • Max Sequence Length: 6,500 tokens

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled")
model = AutoModelForCausalLM.from_pretrained(
    "khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled",
    device_map={"": 0}
)

question = """
You are given a directed graph with N nodes and M edges, where each edge has a weight. You need to find the shortest path from node 1 to node N, but with a twist: you are allowed to reverse at most K edges (changing their direction) during your journey. The cost of reversing an edge is equal to its original weight. Design an efficient algorithm to solve this problem and analyze its time and space complexity. Consider both the case where K is small (K <= 5) and where K is large (K >= N/2).
"""

messages = [
    {"role" : "user", "content" : question}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,
    enable_thinking = True,
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 4048,
    temperature = 0.6,
    top_p = 0.95,
    top_k = 20,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)