初始化项目,由ModelHub XC社区提供模型
Model: reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF Source: Original Platform
This commit is contained in:
39
.gitattributes
vendored
Normal file
39
.gitattributes
vendored
Normal file
@@ -0,0 +1,39 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
qwen3-0.6b-distilled-30b-thinking-sft-f16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
qwen3-0.6b-distilled-30b-thinking-sft-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
qwen3-0.6b-distilled-30b-thinking-sft-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
223
README.md
Normal file
223
README.md
Normal file
@@ -0,0 +1,223 @@
|
||||
---
|
||||
library_name: llama.cpp
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
base_model: reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT
|
||||
tags:
|
||||
- gguf
|
||||
- quantized
|
||||
- distillation
|
||||
- sft
|
||||
- reasoning
|
||||
- mathematics
|
||||
- physics
|
||||
- legal
|
||||
- stem
|
||||
- chain-of-thought
|
||||
- edge
|
||||
- mobile
|
||||
- convergentintel
|
||||
- knowledge-distillation
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT — GGUF
|
||||
|
||||
GGUF quantizations of [reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT) for local, mobile, and edge deployment via [llama.cpp](https://github.com/ggerganov/llama.cpp) and compatible runtimes.
|
||||
|
||||
A 30B Thinking teacher compressed 50x into a model that fits on a smartwatch.
|
||||
|
||||
## Available Quantizations
|
||||
|
||||
| File | Quant | Size | Use Case |
|
||||
|---|---|---|---|
|
||||
| `qwen3-0.6b-distilled-30b-thinking-sft-f16.gguf` | F16 | ~1.3 GB | Full precision reference |
|
||||
| `qwen3-0.6b-distilled-30b-thinking-sft-Q8_0.gguf` | Q8_0 | ~700 MB | Near-lossless, desktop/laptop |
|
||||
| `qwen3-0.6b-distilled-30b-thinking-sft-Q5_K_M.gguf` | Q5_K_M | ~500 MB | Balanced, mobile |
|
||||
| `qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf` | Q4_K_M | ~400 MB | Smallest, IoT/edge/smartwatch |
|
||||
|
||||
**Recommended:** Q5_K_M for mobile, Q4_K_M for maximum compression.
|
||||
|
||||
## About the Model
|
||||
|
||||
Two-stage build:
|
||||
|
||||
**Stage 1 — Thinking Teacher Distillation:** Qwen3-0.6B distilled from Qwen3-30B-A3B-Thinking on 6,122 STEM chain-of-thought samples. The Thinking variant teacher produces extended reasoning traces with higher-entropy distributions, transferring richer deliberation structure into the student. Proof-weighted cross-entropy (2.5x → 1.5x on derivation tokens) + KL divergence at T=2.0.
|
||||
|
||||
**Stage 2 — Legal SFT:** Supervised fine-tuning on [Alignment-Lab-AI/Lawyer-Instruct](https://huggingface.co/datasets/Alignment-Lab-AI/Lawyer-Instruct) at conservative learning rate (5e-6) to layer legal reasoning on top of the STEM backbone without overwriting it.
|
||||
|
||||
| Attribute | Value |
|
||||
|---|---|
|
||||
| **Base model** | [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) |
|
||||
| **Teacher model** | [Qwen/Qwen3-30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507) |
|
||||
| **Compression** | 50x parameters, ~75x with Q4_K_M |
|
||||
| **Developer** | Reaperdoesntrun / [Convergent Intelligence LLC](https://convergentintel.com): Research Division |
|
||||
|
||||
## Usage
|
||||
|
||||
### llama.cpp CLI
|
||||
|
||||
```bash
|
||||
./llama-cli -m qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf \
|
||||
-p "### Instruction:\nWhat is promissory estoppel?\n\n### Response:\n" \
|
||||
-n 512 --temp 0.0
|
||||
```
|
||||
|
||||
### llama.cpp Python
|
||||
|
||||
```python
|
||||
from llama_cpp import Llama
|
||||
|
||||
llm = Llama(model_path="qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf", n_ctx=1024)
|
||||
|
||||
output = llm(
|
||||
"### Instruction:\nProve that the square root of 2 is irrational.\n\n### Response:\n",
|
||||
max_tokens=512,
|
||||
temperature=0.0,
|
||||
)
|
||||
print(output["choices"][0]["text"])
|
||||
```
|
||||
|
||||
### Ollama
|
||||
|
||||
```bash
|
||||
echo 'FROM ./qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf' > Modelfile
|
||||
ollama create stem-legal-tiny -f Modelfile
|
||||
ollama run stem-legal-tiny "Explain the difference between a felony and a misdemeanor."
|
||||
```
|
||||
|
||||
### LM Studio
|
||||
|
||||
Download any GGUF file from this repo and load directly in [LM Studio](https://lmstudio.ai/).
|
||||
|
||||
## Prompt Formats
|
||||
|
||||
**STEM derivation (Stage 1):**
|
||||
|
||||
```
|
||||
Solve the following problem carefully and show a rigorous derivation.
|
||||
|
||||
Problem:
|
||||
[Your problem]
|
||||
|
||||
Proof:
|
||||
```
|
||||
|
||||
**Instruction-following (Stage 2):**
|
||||
|
||||
```
|
||||
### Instruction:
|
||||
[Your question]
|
||||
|
||||
### Response:
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
0.6B is a hard capacity constraint. The model trades depth for deployability — it will make errors that larger models avoid. Multi-step proofs beyond ~8 steps degrade. Legal reasoning covers general concepts but lacks nuance. Always verify critical outputs. This is not a substitute for formal proof verification, licensed legal counsel, or professional analysis.
|
||||
|
||||
## Source Model
|
||||
|
||||
Full training methodology, hyperparameters, and the two-stage pipeline are documented in:
|
||||
|
||||
**[reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT)**
|
||||
|
||||
|
||||
## Mathematical Foundations
|
||||
|
||||
This is a GGUF-quantized variant. The mathematical foundations (Discrepancy Calculus, Topological Knowledge Distillation) are documented in the source model's card. The discrepancy operator $Df(x)$ and BV decomposition that inform the training pipeline are preserved through quantization — the structural boundaries detected by DISC during training are baked into the weights, not dependent on precision.
|
||||
|
||||
## Related Models
|
||||
|
||||
| Model | Description |
|
||||
|---|---|
|
||||
| [Qwen3-0.6B-STEM-Proof-Distilled-Thinking](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-STEM-Proof-Distilled-Thinking) | Stage 1 only — pure STEM backbone |
|
||||
| [Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT) | Full precision source model |
|
||||
| [Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF) | Larger 1.7B variant GGUF |
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@misc{colca2026thinking06bgguf,
|
||||
title={Qwen3-0.6B Distilled Thinking SFT: 50x Compression GGUF for Edge Deployment},
|
||||
year={2026},
|
||||
publisher={HuggingFace},
|
||||
url={https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF},
|
||||
note={Convergent Intelligence LLC: Research Division}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Convergent Intelligence LLC: Research Division*
|
||||
*"Where classical analysis fails to see, we begin."*
|
||||
|
||||
---
|
||||
|
||||
## Convergent Intelligence Portfolio
|
||||
|
||||
*Part of the [Qwen3 0.6B Distillation Series](https://huggingface.co/reaperdoesntknow) by [Convergent Intelligence LLC: Research Division](https://huggingface.co/reaperdoesntknow)*
|
||||
|
||||
|
||||
#
|
||||
## Mathematical Foundations
|
||||
|
||||
This is a GGUF-quantized variant. The mathematical foundations (Discrepancy Calculus, Topological Knowledge Distillation) are documented in the source model's card. The discrepancy operator $Df(x)$ and BV decomposition that inform the training pipeline are preserved through quantization — the structural boundaries detected by DISC during training are baked into the weights, not dependent on precision.
|
||||
|
||||
## Related Models
|
||||
|
||||
| Model | Downloads | Format |
|
||||
|-------|-----------|--------|
|
||||
| [Qwen3-0.6B-Distilled-30B-A3B](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B) | 36 | HF |
|
||||
| [Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT) | 33 | HF |
|
||||
|
||||
### Top Models from Our Lab
|
||||
|
||||
| Model | Downloads |
|
||||
|-------|-----------|
|
||||
| [Qwen3-1.7B-Thinking-Distil](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Thinking-Distil) | 501 |
|
||||
| [LFM2.5-1.2B-Distilled-SFT](https://huggingface.co/reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT) | 342 |
|
||||
| [Qwen3-1.7B-Coder-Distilled-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT) | 302 |
|
||||
| [Qwen3-1.7B-Coder-Distilled-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT-GGUF) | 194 |
|
||||
| [Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF](https://huggingface.co/reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF) | 175 |
|
||||
|
||||
**Total Portfolio: 41 models | 2,781 total downloads**
|
||||
|
||||
|
||||
*Last updated: 2026-03-28 12:49 UTC*
|
||||
|
||||
<!-- DISTILQWEN-SPOTLIGHT-START -->
|
||||
|
||||
## DistilQwen Collection
|
||||
|
||||
This model is part of the **[DistilQwen](https://huggingface.co/collections/reaperdoesntknow/distilqwen-69bf40ec669117e3f069ef1c)** proof-weighted distillation series.
|
||||
Collection: **9 models** | **2,788 downloads**
|
||||
|
||||
### Teacher Variant Comparison
|
||||
|
||||
| Teacher | Student Size | Strength | Models |
|
||||
|---------|-------------|----------|--------|
|
||||
| Qwen3-30B-A3B (Instruct) | 1.7B | Instruction following, structured output, legal reasoning | 3 (833 DL) |
|
||||
| Qwen3-30B-A3B (Thinking) | 0.6B | Extended deliberation, higher-entropy distributions, proof derivation | 3 (779 DL) **← this model** |
|
||||
| Qwen3-30B-A3B (Coder) | 1.7B | Structured decomposition, STEM derivation, logical inference | 2 (825 DL) |
|
||||
|
||||
### Methodology
|
||||
|
||||
**The only BF16 collection in the portfolio.** While the broader Convergent Intelligence catalog (43 models, 12,000+ downloads) was trained on CPU at FP32 for $24 total compute, the DistilQwen series was trained on H100 at BF16 with a 30B-parameter teacher. Same methodology, premium hardware. This is what happens when you give the pipeline real compute.
|
||||
|
||||
All models use proof-weighted knowledge distillation: 55% cross-entropy with decaying proof weights (2.5× → 1.5×), 45% KL divergence at T=2.0. The proof weight amplifies loss on reasoning-critical tokens, forcing the student to allocate capacity to structural understanding rather than surface-level pattern matching.
|
||||
|
||||
Full methodology: [Structure Over Scale (DOI: 10.57967/hf/8165)](https://doi.org/10.57967/hf/8165)
|
||||
|
||||
### Related in this series
|
||||
|
||||
- [Qwen3-0.6B-Distilled-30B-A3B](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B) (236 downloads)
|
||||
- [Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT](https://huggingface.co/reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT) (227 downloads)
|
||||
|
||||
<!-- DISTILQWEN-SPOTLIGHT-END -->
|
||||
|
||||
---
|
||||
<sub>Part of the [reaperdoesntknow research portfolio](https://huggingface.co/reaperdoesntknow) — 49 models, 22,598 total downloads | Last refreshed: 2026-03-30 12:05 UTC</sub>
|
||||
<!-- cix-keeper-ts:2026-04-11T16:09:09Z -->
|
||||
<!-- card-refresh: 2026-03-30 -->
|
||||
3
qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf
Normal file
3
qwen3-0.6b-distilled-30b-thinking-sft-Q4_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:608e6f5781bfcb34e3d9f7fca9b513f5e5bb058ca8843241f1abe52a1839cf32
|
||||
size 484219808
|
||||
3
qwen3-0.6b-distilled-30b-thinking-sft-Q5_K_M.gguf
Normal file
3
qwen3-0.6b-distilled-30b-thinking-sft-Q5_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:cb3cc87de939b9ad08a406ce7434cdcbc4bec01b4b8b31c9e625ed518d92233d
|
||||
size 551377824
|
||||
3
qwen3-0.6b-distilled-30b-thinking-sft-Q8_0.gguf
Normal file
3
qwen3-0.6b-distilled-30b-thinking-sft-Q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:24c4a94ac8ebc1f8003f1d1f35c1de4d4f6e12727ef30d7f82dd89e3c9e398f6
|
||||
size 804753312
|
||||
3
qwen3-0.6b-distilled-30b-thinking-sft-f16.gguf
Normal file
3
qwen3-0.6b-distilled-30b-thinking-sft-f16.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:530f4d26cda5f399d54a65903d122b832b74d7127a78468bed213d94385336de
|
||||
size 1509347232
|
||||
Reference in New Issue
Block a user