From aa0cda188783932b70651b8a584509175e8623eb Mon Sep 17 00:00:00 2001 From: ModelHub XC Date: Sat, 18 Apr 2026 08:59:41 +0800 Subject: [PATCH] =?UTF-8?q?=E5=88=9D=E5=A7=8B=E5=8C=96=E9=A1=B9=E7=9B=AE?= =?UTF-8?q?=EF=BC=8C=E7=94=B1ModelHub=20XC=E7=A4=BE=E5=8C=BA=E6=8F=90?= =?UTF-8?q?=E4=BE=9B=E6=A8=A1=E5=9E=8B?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Model: g023/qwen3-tiny-v2 Source: Original Platform --- .gitattributes | 40 ++++++ Qwen3-g023-tiny-v2-Q2_K.gguf | 3 + Qwen3-g023-tiny-v2-Q3_K_M.gguf | 3 + Qwen3-g023-tiny-v2-Q4_K_M.gguf | 3 + Qwen3-g023-tiny-v2-Q6_K.gguf | 3 + Qwen3-g023-tiny-v2-Q8_0.gguf | 3 + README.md | 220 +++++++++++++++++++++++++++++++++ 7 files changed, 275 insertions(+) create mode 100644 .gitattributes create mode 100644 Qwen3-g023-tiny-v2-Q2_K.gguf create mode 100644 Qwen3-g023-tiny-v2-Q3_K_M.gguf create mode 100644 Qwen3-g023-tiny-v2-Q4_K_M.gguf create mode 100644 Qwen3-g023-tiny-v2-Q6_K.gguf create mode 100644 Qwen3-g023-tiny-v2-Q8_0.gguf create mode 100644 README.md diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..5f5d003 --- /dev/null +++ b/.gitattributes @@ -0,0 +1,40 @@ +*.7z filter=lfs diff=lfs merge=lfs -text +*.arrow filter=lfs diff=lfs merge=lfs -text +*.bin filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text +*.ftz filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.h5 filter=lfs diff=lfs merge=lfs -text +*.joblib filter=lfs diff=lfs merge=lfs -text +*.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text +*.model filter=lfs diff=lfs merge=lfs -text +*.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text +*.onnx filter=lfs diff=lfs merge=lfs -text +*.ot filter=lfs diff=lfs merge=lfs -text +*.parquet filter=lfs diff=lfs merge=lfs -text +*.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text +*.pt filter=lfs diff=lfs merge=lfs -text +*.pth filter=lfs diff=lfs merge=lfs -text +*.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text +saved_model/**/* filter=lfs diff=lfs merge=lfs -text +*.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text +*.tflite filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +Qwen3-g023-tiny-v2-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text +Qwen3-g023-tiny-v2-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text +Qwen3-g023-tiny-v2-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text +Qwen3-g023-tiny-v2-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text +Qwen3-g023-tiny-v2-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text diff --git a/Qwen3-g023-tiny-v2-Q2_K.gguf b/Qwen3-g023-tiny-v2-Q2_K.gguf new file mode 100644 index 0000000..a2ed603 --- /dev/null +++ b/Qwen3-g023-tiny-v2-Q2_K.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:93c5f34612e203562c15ed55059ef9b91d0e4ebe9747d79ea2478cf479876c5c +size 814695424 diff --git a/Qwen3-g023-tiny-v2-Q3_K_M.gguf b/Qwen3-g023-tiny-v2-Q3_K_M.gguf new file mode 100644 index 0000000..5f8ba78 --- /dev/null +++ b/Qwen3-g023-tiny-v2-Q3_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8e18787fb7ed95c5201f1c42709d1d1c1a6bbc540c3a709750cf44bed305d75e +size 987841536 diff --git a/Qwen3-g023-tiny-v2-Q4_K_M.gguf b/Qwen3-g023-tiny-v2-Q4_K_M.gguf new file mode 100644 index 0000000..a60b4a2 --- /dev/null +++ b/Qwen3-g023-tiny-v2-Q4_K_M.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b225c0fd5eae542b24e60c742f106c0ee7353df8f85ef0d06d8b22282db5ccda +size 1164067840 diff --git a/Qwen3-g023-tiny-v2-Q6_K.gguf b/Qwen3-g023-tiny-v2-Q6_K.gguf new file mode 100644 index 0000000..30151fe --- /dev/null +++ b/Qwen3-g023-tiny-v2-Q6_K.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e1d73d2e9396ce9a7a85adddcf63861be3df4aa382a7aae2d46064c834259d24 +size 1500365824 diff --git a/Qwen3-g023-tiny-v2-Q8_0.gguf b/Qwen3-g023-tiny-v2-Q8_0.gguf new file mode 100644 index 0000000..c8c6cac --- /dev/null +++ b/Qwen3-g023-tiny-v2-Q8_0.gguf @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2fb5fb8b5b6a3d9308dc522c4b8fa8e80fa861212f9d543cf929ba0fbb6feb18 +size 1941416960 diff --git a/README.md b/README.md new file mode 100644 index 0000000..3af560e --- /dev/null +++ b/README.md @@ -0,0 +1,220 @@ +--- +license: apache-2.0 +language: + - en +base_model: Qwen/Qwen3-1.7B +tags: + - qwen3 + - gguf + - layer-surgery + - small-language-model + - optimized + - thinking + - text-generation + - skip-connections + - interpolation +model_name: Qwen3-g023-tiny-v2 +pipeline_tag: text-generation +library_name: llama.cpp +quantized_by: g023 +--- + +# Qwen3-g023-tiny-v2 — GGUF + +**An advanced 30-layer Qwen3 variant using swap, interpolation, and skip-bridge surgery.** + +Created through innovative layer surgery combining multi-swap, interpolation, and bridge (skip connection) techniques. Scores **94.3/100** — a 6.5-point improvement over the original Qwen3-1.7B baseline (87.8/100) and the highest score achieved in two phases of experimentation across ~250 configurations. (I have my own benchmarks, so results may vary if you run your own tests.) + +## Available Quantizations + +| Quantization | Bits/Weight | Description | Download | +|:---:|:---:|:---|:---:| +| **Q8_0** | 8.00 | Highest quality, virtually lossless (USE THIS ONE) | [Qwen3-g023-tiny-v2-Q8_0.gguf](./Qwen3-g023-tiny-v2-Q8_0.gguf) | +| **Q6_K** | 6.57 | Excellent quality, good compression | [Qwen3-g023-tiny-v2-Q6_K.gguf](./Qwen3-g023-tiny-v2-Q6_K.gguf) | +| **Q4_K_M** | 4.85 | Good balance of quality and size | [Qwen3-g023-tiny-v2-Q4_K_M.gguf](./Qwen3-g023-tiny-v2-Q4_K_M.gguf) | +| **Q3_K_M** | 3.91 | High compression, moderate quality loss | [Qwen3-g023-tiny-v2-Q3_K_M.gguf](./Qwen3-g023-tiny-v2-Q3_K_M.gguf) | +| **Q2_K** | 3.35 | Maximum compression, significant quality loss | [Qwen3-g023-tiny-v2-Q2_K.gguf](./Qwen3-g023-tiny-v2-Q2_K.gguf) | + +## Model Details + +| Parameter | Value | +|:---|:---| +| Architecture | Qwen3ForCausalLM | +| Layers | **30** (28 original + 2 from surgery) | +| Hidden Size | 2,048 | +| Intermediate Size | 6,144 | +| Attention Heads | 16 query / 8 key-value (GQA) | +| Head Dimension | 128 | +| Vocabulary | 151,936 tokens | +| Max Context | 40,960 tokens | +| RoPE θ | 1,000,000 | +| Tied Embeddings | Yes | +| Total Parameters | **~1.82B** | +| Precision (source) | bfloat16 | + +## Surgery Operations + +This model was created by applying three innovative surgical operations to [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B): + +1. **Multi-swap: layers 12↔13 and 16↔17** — Reorders attention blocks at two critical points in the network for improved representational flow through the mid-layers. +2. **Interpolation: layers 20 & 22 (α=0.5)** — Creates a new layer by blending the weights of layers 20 and 22 at equal proportions, producing a smoother transition in the upper layers. +3. **Bridge (skip connection): layer 5 → after layer 20** — Copies early-layer representations (layer 5) and inserts them after layer 20, creating a skip connection that helps preserve low-level features deep in the network. + +### Why These Operations Work + +- **Multi-swap** corrects suboptimal layer ordering that emerged from pre-training, allowing better gradient flow through the network's critical middle section. +- **Interpolation** creates a synthetic transition layer that smooths the representation gap between layers 20 and 22, reducing the information bottleneck. +- **Bridge/skip connections** address the "forgetting problem" in deep networks by reintroducing early feature representations at later stages — a technique inspired by ResNet's residual connections but applied at the transformer layer level. + +## Benchmark Results + +| Metric | Original (28L) | [v1 (27L)](https://huggingface.co/g023/Qwen3-g023-tiny-v1-GGUF) | **v2 (30L)** | Δ vs Original | +|:---|:---:|:---:|:---:|:---:| +| **Overall Score** | 87.8 / 100 | 92.9 / 100 | **94.3 / 100** | **+6.5** | +| **Factual Accuracy** | 15/17 (88%) | 17/17 (100%) | **16/17 (94%)** | **+6%** | +| Avg Perplexity | — | 15.70 | **15.17** | — | +| Thinking Mode | ✅ | ✅ | ✅ | — | +| Non-Thinking Mode | ✅ | ✅ | ✅ | — | + +Evaluated using a comprehensive test suite with 17 factual questions, 2 completion coherence tests, perplexity measurements, repetition analysis, and thinking/non-thinking mode verification. + +## Features + +- **Thinking mode**: Full `` / `` reasoning support — toggle via `enable_thinking` parameter +- **Non-thinking mode**: Direct responses without chain-of-thought overhead +- **Tool calling**: Full function/tool calling support +- **System prompts**: Standard system message support +- **Chat template**: Qwen3 ChatML template embedded in the GGUF + +## Usage + +### With Ollama + +```bash +# Download the GGUF and create from Modelfile +cat > Modelfile << 'EOF' +FROM ./Qwen3-g023-tiny-v2-Q8_0.gguf + +PARAMETER temperature 1.0 +PARAMETER top_p 0.95 +PARAMETER top_k 45 +PARAMETER min_p 0.1 +PARAMETER num_ctx 40000 +PARAMETER mirostat 2 +PARAMETER mirostat_tau 5.0 +PARAMETER mirostat_eta 0.1 +PARAMETER repeat_last_n 16384 +PARAMETER repeat_penalty 1.1 +PARAMETER presence_penalty 0.5 +PARAMETER frequency_penalty 1.0 + +TEMPLATE """{{- if .System }} +<|im_start|>system +{{ .System }}<|im_end|> +{{ end }} +{{- range .Messages }} +{{- if eq .Role "user" }} +<|im_start|>user +{{ .Content }}<|im_end|> +{{- else if eq .Role "assistant" }} +<|im_start|>assistant +{{ .Content }}<|im_end|> +{{- end }} +{{- end }} +<|im_start|>assistant +""" +SYSTEM "You are a helpful assistant." +EOF + +ollama create qwen3-tiny-v2 -f Modelfile +ollama run qwen3-tiny-v2 +``` + +### With llama.cpp + +```bash +# Interactive chat +llama-cli -m Qwen3-g023-tiny-v2-Q8_0.gguf \ + --chat-template chatml -cnv + +# Thinking mode +llama-cli -m Qwen3-g023-tiny-v2-Q8_0.gguf \ + -p "<|im_start|>user\nExplain quantum computing<|im_end|>\n<|im_start|>assistant\n\n" \ + -n 512 + +# Non-thinking mode +llama-cli -m Qwen3-g023-tiny-v2-Q8_0.gguf \ + -p "<|im_start|>user\n/no_think What is 2+2?<|im_end|>\n<|im_start|>assistant\n" \ + -n 128 +``` + +### With Python (llama-cpp-python) + +```python +from llama_cpp import Llama + +model = Llama("Qwen3-g023-tiny-v2-Q8_0.gguf", n_ctx=4096) +response = model.create_chat_completion( + messages=[ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "What is the capital of France?"}, + ], + temperature=0.6, +) +print(response["choices"][0]["message"]["content"]) +``` + +## System Requirements + +| Quantization | RAM (CPU) | VRAM (GPU) | +|:---:|:---:|:---:| +| Q8_0 | ~2.2 GB | ~2.2 GB | +| Q6_K | ~1.8 GB | ~1.8 GB | +| Q4_K_M | ~1.4 GB | ~1.4 GB | +| Q3_K_M | ~1.2 GB | ~1.2 GB | +| Q2_K | ~1.0 GB | ~1.0 GB | + +## v1 vs v2 + +This model (v2) is the **Phase 2 champion**, using advanced multi-operation surgery for the highest overall score. + +| | [v1](https://huggingface.co/g023/Qwen3-g023-tiny-v1-GGUF) | v2 (this model) | +|:---|:---:|:---:| +| Layers | 27 | 30 | +| Parameters | ~1.67B | ~1.82B | +| Operations | del + swap | swap + interpolate + bridge | +| Score | 92.9 / 100 | 94.3 / 100 | +| Factual | 100% (17/17) | 94% (16/17) | +| Perplexity | 15.70 | 15.17 | +| Use Case | Max factual accuracy | Max overall score | + +**v1** is recommended when factual accuracy is paramount (100% vs 94%). +**v2** is recommended when overall quality matters more (94.3 vs 92.9). + +## Methodology + +Layer surgery was performed through a systematic, test-driven process across two phases: + +1. **Phase 1** (~150 configs): Exhaustive search across deletion, duplication, swapping, interpolation, and combined operations → champion: del_10 + swap_11↔12 (v1) +2. **Phase 2** (~95 configs): Advanced techniques including tripling, multi-swap, layer reversal, cycling, weight scaling, layer merging, skip bridges, and synthesis → champion: this model (v2) +3. **Evaluation**: Each configuration scored on factual accuracy (17 questions), completion coherence, perplexity, repetition ratio, and thinking mode functionality + +### Phase 2 Leaderboard (Top 5) + +| Rank | Configuration | Score | Factual | PPL | +|:---:|:---|:---:|:---:|:---:| +| 🥇 | swap(12↔13,16↔17) + interp(20↔22) + bridge(5→20) | **94.3** | 94% | 15.17 | +| 🥈 | swap(12↔13,16↔17) + interp(20↔22) | 93.9 | 94% | 14.74 | +| 🥉 | swap(12↔13) + interp(20↔22) + bridge(5→20) | 93.4 | 94% | 15.66 | +| 4 | multi-swap(12↔13,16↔17) | 93.1 | 100% | 14.90 | +| 5 | Phase 1 champion (del_10 + swap_11↔12) | 92.9 | 100% | 15.70 | + +## Credits + +- **Base model**: [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) by the Qwen team at Alibaba +- **Quantization**: llama.cpp +- **Surgery**: g023 + +## License + +Apache 2.0 — same as the original Qwen3-1.7B model.