--- tags: - gguf - llama.cpp - unsloth license: apache-2.0 datasets: - khazarai/qwen3.6-plus-high-reasoning-500x language: - en base_model: - khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled pipeline_tag: text-generation metrics: - accuracy --- # Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled-GGUF : GGUF ## Model: khazarai/Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled ![alt="General Benchmark Comparison Chart"](benchmark/evaluatedbyLLM.png) - **Success Rate**: 75.64% ## Model: Qwen/Qwen3-4B-Thinking-2507 ![alt="General Benchmark Comparison Chart"](benchmark/BaseModel.png) - **Success Rate**: 73.73% - **Benchmark**: khazarai/Multi-Domain-Reasoning-Benchmark - **Total Questions**: 100 This is a reasoning-distilled variant of Qwen3-4B-Thinking, fine-tuned using LoRA via Unsloth to replicate the advanced reasoning capabilities of the larger Qwen3.6-plus teacher model. The distillation process focuses on reducing the "rambling" and "uncertainty" often found in smaller models during complex tasks, replacing them with concise, structured, and actionable solution paths. ## Reasoning Comparison: Base vs. Distilled The primary improvement in this model is the qualitative leap in reasoning structure. Below is a summary of the differences observed when solving complex graph problems (e.g., Shortest Path with Edge Reversals): **Base Model (Qwen3-4B-Thinking)**: - Style: Stream-of-consciousness, exploratory, and verbose. - Behavior: The model often talks to itself ("Hmm, interesting", "Wait, no"), struggles to interpret problem constraints correctly on the first try, and enters loops of self-correction. It mimics a student trying to figure out the problem as they speak. - Output: Contains high noise-to-signal ratio; solution paths are often buried under paragraphs of hesitation. **Distilled Model (Qwen3-4B-Qwen3.6-plus-Reasoning-Distilled)**: - Style: Structured, professional, and report-oriented. - Behavior: The model analyzes the problem immediately, separates concerns (Input, Output, Constraints), and formulates a concrete algorithm plan (e.g., State-Space Dijkstra). It proceeds with confidence, avoiding logical dead-ends. - Output: Provides a clean breakdown: Problem Analysis -> Intuition -> Algorithm -> Complexity Analysis -> Pseudocode. **Verdict**: The distilled model transforms the raw potential of the base model into an engineering-grade tool. ## Model Specifications - **Base Model**: Qwen/Qwen3-4B-Thinking-2507 - **Model Type**: Reasoning Distillation (QLoRA) - **Framework**: Unsloth - **Fine-tuning Method**: QLoRA (PEFT) - **Teacher Model**: Qwen3.6-plus - **Distillation Dataset**: khazarai/qwen3.6-plus-high-reasoning-500x - Total Tokens: 1,739,249 - Max Sequence Length: 6,500 tokens ## Provided Quants (sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants) | Type | Size/GB | Notes | |:-----|--------:|:------| | Q4_K_1 | 2.3 | | | Q6_K | 3.3 | very good quality | | Q8_0 | 4.2 | fast, best quality | | bf16 | 8.0 | 16 bpw, overkill | Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better): ![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png) And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9