--- library_name: llama.cpp license: apache-2.0 language: - en base_model: reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT tags: - gguf - quantized - distillation - sft - reasoning - mathematics - physics - legal - stem - chain-of-thought - edge - mobile - convergentintel - knowledge-distillation pipeline_tag: text-generation --- # Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT — GGUF GGUF quantizations of [reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT) for local, mobile, and edge deployment via [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible runtimes. A 30B Thinking teacher compressed 50x into a model that fits on a smartwatch. ## Available Quantizations | File | Quant | Size | Use Case | |---|---|---|---| | `qwen3-0.6b-distilled-30b-thinking-sft-f16.gguf` | F16 | ~1.3 GB | Full precision reference | | `qwen3-0.6b-distilled-30b-thinking-sft-Q8_0.gguf` | Q8_0 | ~700 MB | Near-lossless, desktop/laptop | | `qwen3-0.6b-distilled-30b-thinking-sft-Q5_K_M.gguf` | Q5_K_M | ~500 MB | Balanced, mobile | | `qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf` | Q4_K_M | ~400 MB | Smallest, IoT/edge/smartwatch | **Recommended:** Q5_K_M for mobile, Q4_K_M for maximum compression. ## About the Model Two-stage build: **Stage 1 — Thinking Teacher Distillation:** Qwen3-0.6B distilled from Qwen3-30B-A3B-Thinking on 6,122 STEM chain-of-thought samples. The Thinking variant teacher produces extended reasoning traces with higher-entropy distributions, transferring richer deliberation structure into the student. Proof-weighted cross-entropy (2.5x → 1.5x on derivation tokens) + KL divergence at T=2.0. **Stage 2 — Legal SFT:** Supervised fine-tuning on [Alignment-Lab-AI/Lawyer-Instruct](https://huggingface.co/datasets/Alignment-Lab-AI/Lawyer-Instruct) at conservative learning rate (5e-6) to layer legal reasoning on top of the STEM backbone without overwriting it. | Attribute | Value | |---|---| | **Base model** | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) | | **Teacher model** | [Qwen/Qwen3-30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507) | | **Compression** | 50x parameters, ~75x with Q4_K_M | | **Developer** | Reaperdoesntrun / [Convergent Intelligence LLC](https://convergentintel.com): Research Division | ## Usage ### llama.cpp CLI ```bash ./llama-cli -m qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf \ -p "### Instruction:\nWhat is promissory estoppel?\n\n### Response:\n" \ -n 512 --temp 0.0 ``` ### llama.cpp Python ```python from llama_cpp import Llama llm = Llama(model_path="qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf", n_ctx=1024) output = llm( "### Instruction:\nProve that the square root of 2 is irrational.\n\n### Response:\n", max_tokens=512, temperature=0.0, ) print(output["choices"][0]["text"]) ``` ### Ollama ```bash echo 'FROM ./qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf' > Modelfile ollama create stem-legal-tiny -f Modelfile ollama run stem-legal-tiny "Explain the difference between a felony and a misdemeanor." ``` ### LM Studio Download any GGUF file from this repo and load directly in [LM Studio](https://lmstudio.ai/). ## Prompt Formats **STEM derivation (Stage 1):** ``` Solve the following problem carefully and show a rigorous derivation. Problem: [Your problem] Proof: ``` **Instruction-following (Stage 2):** ``` ### Instruction: [Your question] ### Response: ``` ## Limitations 0.6B is a hard capacity constraint. The model trades depth for deployability — it will make errors that larger models avoid. Multi-step proofs beyond ~8 steps degrade. Legal reasoning covers general concepts but lacks nuance. Always verify critical outputs. This is not a substitute for formal proof verification, licensed legal counsel, or professional analysis. ## Source Model Full training methodology, hyperparameters, and the two-stage pipeline are documented in: **[reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT)** ## Mathematical Foundations This is a GGUF-quantized variant. The mathematical foundations (Discrepancy Calculus, Topological Knowledge Distillation) are documented in the source model's card. The discrepancy operator $Df(x)$ and BV decomposition that inform the training pipeline are preserved through quantization — the structural boundaries detected by DISC during training are baked into the weights, not dependent on precision. ## Related Models | Model | Description | |---|---| | [Qwen3-0.6B-STEM-Proof-Distilled-Thinking](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-STEM-Proof-Distilled-Thinking) | Stage 1 only — pure STEM backbone | | [Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT) | Full precision source model | | [Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF) | Larger 1.7B variant GGUF | ## Citation ```bibtex @misc{colca2026thinking06bgguf, title={Qwen3-0.6B Distilled Thinking SFT: 50x Compression GGUF for Edge Deployment}, year={2026}, publisher={HuggingFace}, url={https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF}, note={Convergent Intelligence LLC: Research Division} } ``` --- *Convergent Intelligence LLC: Research Division* *"Where classical analysis fails to see, we begin."* --- ## Convergent Intelligence Portfolio *Part of the [Qwen3 0.6B Distillation Series](https://huggingface.co/reaperdoesntknow) by [Convergent Intelligence LLC: Research Division](https://huggingface.co/reaperdoesntknow)* # ## Mathematical Foundations This is a GGUF-quantized variant. The mathematical foundations (Discrepancy Calculus, Topological Knowledge Distillation) are documented in the source model's card. The discrepancy operator $Df(x)$ and BV decomposition that inform the training pipeline are preserved through quantization — the structural boundaries detected by DISC during training are baked into the weights, not dependent on precision. ## Related Models | Model | Downloads | Format | |-------|-----------|--------| | [Qwen3-0.6B-Distilled-30B-A3B](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B) | 36 | HF | | [Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT) | 33 | HF | ### Top Models from Our Lab | Model | Downloads | |-------|-----------| | [Qwen3-1.7B-Thinking-Distil](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Thinking-Distil) | 501 | | [LFM2.5-1.2B-Distilled-SFT](https://huggingface.co/reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT) | 342 | | [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) | 302 | | [Qwen3-1.7B-Coder-Distilled-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT-GGUF) | 194 | | [Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF) | 175 | **Total Portfolio: 41 models | 2,781 total downloads** *Last updated: 2026-03-28 12:49 UTC* ## DistilQwen Collection This model is part of the **[DistilQwen](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** proof-weighted distillation series. Collection: **9 models** | **2,788 downloads** ### Teacher Variant Comparison | Teacher | Student Size | Strength | Models | |---------|-------------|----------|--------| | Qwen3-30B-A3B (Instruct) | 1.7B | Instruction following, structured output, legal reasoning | 3 (833 DL) | | Qwen3-30B-A3B (Thinking) | 0.6B | Extended deliberation, higher-entropy distributions, proof derivation | 3 (779 DL) **← this model** | | Qwen3-30B-A3B (Coder) | 1.7B | Structured decomposition, STEM derivation, logical inference | 2 (825 DL) | ### Methodology **The only BF16 collection in the portfolio.** While the broader Convergent Intelligence catalog (43 models, 12,000+ downloads) was trained on CPU at FP32 for $24 total compute, the DistilQwen series was trained on H100 at BF16 with a 30B-parameter teacher. Same methodology, premium hardware. This is what happens when you give the pipeline real compute. All models use proof-weighted knowledge distillation: 55% cross-entropy with decaying proof weights (2.5× → 1.5×), 45% KL divergence at T=2.0. The proof weight amplifies loss on reasoning-critical tokens, forcing the student to allocate capacity to structural understanding rather than surface-level pattern matching. Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org/10.57967/hf/8165) ### Related in this series - [Qwen3-0.6B-Distilled-30B-A3B](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B) (236 downloads) - [Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT) (227 downloads) --- Part of the [reaperdoesntknow research portfolio](https://huggingface.co/reaperdoesntknow) — 49 models, 22,598 total downloads | Last refreshed: 2026-03-30 12:05 UTC