初始化项目，由ModelHub XC社区提供模型

Model: g023/qwen3-tiny-v1 Source: Original Platform
2026-04-22 16:03:32 +08:00
commit 3e81c881df
7 changed files with 265 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,40 @@
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+Qwen3-g023-tiny-v1-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-g023-tiny-v1-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-g023-tiny-v1-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-g023-tiny-v1-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-g023-tiny-v1-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
--- a/Qwen3-g023-tiny-v1-Q2_K.gguf
+++ b/Qwen3-g023-tiny-v1-Q2_K.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:203d50e94354d786d169ff8cf05cd18d8e8f3d2f335278f782eca51b47035d4f
+size 759345248
--- a/Qwen3-g023-tiny-v1-Q3_K_M.gguf
+++ b/Qwen3-g023-tiny-v1-Q3_K_M.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:a98e1bf52b7ed8a5fee139586c91789d78456bb3baf0ce882a9eec51eb180063
+size 915386464
--- a/Qwen3-g023-tiny-v1-Q4_K_M.gguf
+++ b/Qwen3-g023-tiny-v1-Q4_K_M.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:6b43839b47fb45ca0ee455009bded9d4574deeb64abd733df315f1e797da8005
+size 1075294304
--- a/Qwen3-g023-tiny-v1-Q6_K.gguf
+++ b/Qwen3-g023-tiny-v1-Q6_K.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f2ab3f70eae940ae96a9f72e71876ff59a7a5f95f733e0263c5499b235514c75
+size 1376448608
--- a/Qwen3-g023-tiny-v1-Q8_0.gguf
+++ b/Qwen3-g023-tiny-v1-Q8_0.gguf
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3aa85e70ecb8986b126239bc9c056054e6e612b2bf4fb60df8ef3744351db5ff
+size 1780930656
--- a/README.md
+++ b/README.md
@@ -0,0 +1,210 @@
+---
+license: apache-2.0
+language:
+  - en
+base_model: Qwen/Qwen3-1.7B
+tags:
+  - qwen3
+  - gguf
+  - layer-surgery
+  - small-language-model
+  - pruned
+  - optimized
+  - thinking
+  - text-generation
+model_name: Qwen3-g023-tiny-v1
+pipeline_tag: text-generation
+library_name: llama.cpp
+quantized_by: g023
+---
+
+# Qwen3-g023-tiny-v1 — GGUF
+
+**A surgically optimized 27-layer Qwen3 variant that outperforms the original 28-layer model.**
+
+Created by selectively deleting a harmful layer and swapping adjacent layers for improved information flow. Scores **92.9/100** with **100% factual accuracy** — a 5.1-point improvement over the original Qwen3-1.7B baseline (87.8/100).
+
+## Available Quantizations
+
+| Quantization | Bits/Weight | Description | Download |
+|:---:|:---:|:---|:---:|
+| **Q8_0** | 8.00 | Highest quality, virtually lossless (USE THIS ONE) | [Qwen3-g023-tiny-v1-Q8_0.gguf](./Qwen3-g023-tiny-v1-Q8_0.gguf) |
+| **Q6_K** | 6.57 | Excellent quality, good compression | [Qwen3-g023-tiny-v1-Q6_K.gguf](./Qwen3-g023-tiny-v1-Q6_K.gguf) |
+| **Q4_K_M** | 4.85 | Good balance of quality and size | [Qwen3-g023-tiny-v1-Q4_K_M.gguf](./Qwen3-g023-tiny-v1-Q4_K_M.gguf) |
+| **Q3_K_M** | 3.91 | High compression, moderate quality loss | [Qwen3-g023-tiny-v1-Q3_K_M.gguf](./Qwen3-g023-tiny-v1-Q3_K_M.gguf) |
+| **Q2_K** | 3.35 | Maximum compression, significant quality loss | [Qwen3-g023-tiny-v1-Q2_K.gguf](./Qwen3-g023-tiny-v1-Q2_K.gguf) |
+
+## Model Details
+
+| Parameter | Value |
+|:---|:---|
+| Architecture | Qwen3ForCausalLM |
+| Layers | **27** (28 original − 1 deleted) |
+| Hidden Size | 2,048 |
+| Intermediate Size | 6,144 |
+| Attention Heads | 16 query / 8 key-value (GQA) |
+| Head Dimension | 128 |
+| Vocabulary | 151,936 tokens |
+| Max Context | 40,960 tokens |
+| RoPE θ | 1,000,000 |
+| Tied Embeddings | Yes |
+| Total Parameters | **~1.67B** |
+| Precision (source) | bfloat16 |
+
+## Surgery Operations
+
+This model was created by applying two surgical operations to [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B):
+
+1. **Delete layer 10** — Layer 10 was identified as harmful to model quality. Removing it improved the overall score from 85.9 to 91.4.
+2. **Swap layers 11 ↔ 12** (post-deletion indices) — Swapping these adjacent attention blocks optimized information flow between the model's middle layers, further improving the score to 92.9.
+
+### Key Findings
+
+- **Smaller is better**: The 27-layer model outperforms both the 28-layer original and various 29–30 layer expanded models.
+- **Layer 10 is actively harmful**: Removing it alone yields a +3.6 point improvement.
+- **Operations compound selectively**: Deletion + swap works, but deletion + duplication degrades quality.
+
+## Benchmark Results
+
+| Metric | Original (28L) | **v1 (27L)** | Δ |
+|:---|:---:|:---:|:---:|
+| **Overall Score** | 87.8 / 100 | **92.9 / 100** | **+5.1** |
+| **Factual Accuracy** | 15 / 17 (88%) | **17 / 17 (100%)** | **+12%** |
+| Avg Perplexity | — | 15.70 | — |
+| Thinking Mode | ✅ | ✅ | — |
+| Non-Thinking Mode | ✅ | ✅ | — |
+
+Evaluated using a comprehensive test suite with 17 factual questions, 2 completion coherence tests, perplexity measurements, repetition analysis, and thinking/non-thinking mode verification.
+
+## Features
+
+- **Thinking mode**: Full `<think>` / `</think>` reasoning support — toggle via `enable_thinking` parameter
+- **Non-thinking mode**: Direct responses without chain-of-thought overhead
+- **Tool calling**: Full function/tool calling support
+- **System prompts**: Standard system message support
+- **Chat template**: Qwen3 ChatML template embedded in the GGUF
+
+## Usage
+
+### With Ollama
+
+```bash
+# Download the GGUF and create from Modelfile
+cat > Modelfile << 'EOF'
+FROM ./Qwen3-g023-tiny-v1-Q8_0.gguf
+
+PARAMETER temperature 1.0
+PARAMETER top_p 0.95
+PARAMETER top_k 45
+PARAMETER min_p 0.1
+PARAMETER num_ctx 40000
+PARAMETER mirostat 2
+PARAMETER mirostat_tau 5.0
+PARAMETER mirostat_eta 0.1
+PARAMETER repeat_last_n 16384
+PARAMETER repeat_penalty 1.1
+PARAMETER presence_penalty 0.5
+PARAMETER frequency_penalty 1.0
+
+TEMPLATE """{{- if .System }}
+<|im_start|>system
+{{ .System }}<|im_end|>
+{{ end }}
+{{- range .Messages }}
+{{- if eq .Role "user" }}
+<|im_start|>user
+{{ .Content }}<|im_end|>
+{{- else if eq .Role "assistant" }}
+<|im_start|>assistant
+{{ .Content }}<|im_end|>
+{{- end }}
+{{- end }}
+<|im_start|>assistant
+"""
+SYSTEM "You are a helpful assistant."
+EOF
+
+ollama create qwen3-tiny-v1 -f Modelfile
+ollama run qwen3-tiny-v1
+```
+
+### With llama.cpp
+
+```bash
+# Interactive chat
+llama-cli -m Qwen3-g023-tiny-v1-Q8_0.gguf \
+  --chat-template chatml -cnv
+
+# Thinking mode
+llama-cli -m Qwen3-g023-tiny-v1-Q8_0.gguf \
+  -p "<|im_start|>user\nExplain quantum computing<|im_end|>\n<|im_start|>assistant\n<think>\n" \
+  -n 512
+
+# Non-thinking mode
+llama-cli -m Qwen3-g023-tiny-v1-Q8_0.gguf \
+  -p "<|im_start|>user\n/no_think What is 2+2?<|im_end|>\n<|im_start|>assistant\n" \
+  -n 128
+```
+
+### With Python (llama-cpp-python)
+
+```python
+from llama_cpp import Llama
+
+model = Llama("Qwen3-g023-tiny-v1-Q8_0.gguf", n_ctx=4096)
+response = model.create_chat_completion(
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "What is the capital of France?"},
+    ],
+    temperature=0.6,
+)
+print(response["choices"][0]["message"]["content"])
+```
+
+## System Requirements
+
+| Quantization | RAM (CPU) | VRAM (GPU) |
+|:---:|:---:|:---:|
+| Q8_0 | ~2.0 GB | ~2.0 GB |
+| Q6_K | ~1.7 GB | ~1.7 GB |
+| Q4_K_M | ~1.3 GB | ~1.3 GB |
+| Q3_K_M | ~1.1 GB | ~1.1 GB |
+| Q2_K | ~0.9 GB | ~0.9 GB |
+
+## v1 vs v2
+
+This model (v1) is the **Phase 1 champion**, focused on surgical precision with minimal operations.
+
+| | v1 (this model) | [v2](https://huggingface.co/g023/Qwen3-g023-tiny-v2-GGUF) |
+|:---|:---:|:---:|
+| Layers | 27 | 30 |
+| Parameters | ~1.67B | ~1.82B |
+| Operations | del + swap | swap + interpolate + bridge |
+| Score | 92.9 / 100 | 94.3 / 100 |
+| Factual | 100% (17/17) | 94% (16/17) |
+| Perplexity | 15.70 | 15.17 |
+| Use Case | Max factual accuracy | Max overall score |
+
+**v1** is recommended when factual accuracy is paramount (100% vs 94%).
+**v2** is recommended when overall quality matters more (94.3 vs 92.9).
+
+## Methodology
+
+Layer surgery was performed through a systematic, test-driven development process:
+
+1. **Phase 1**: Exhaustive search across 150+ configurations testing deletion, duplication, swapping, interpolation, and combined operations
+2. **Evaluation**: Each configuration was scored on factual accuracy (17 questions), completion coherence, perplexity, repetition ratio, and thinking mode functionality
+3. **Selection**: The champion was selected based on overall score, with factual accuracy as a tiebreaker
+
+The surgery framework is available in the [source repository](https://huggingface.co/g023/Qwen3-g023-tiny-v1-GGUF).
+
+## Credits
+
+- **Base model**: [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) by the Qwen team at Alibaba
+- **Quantization**: llama.cpp
+- **Surgery**: g023
+
+## License
+
+Apache 2.0 — same as the original Qwen3-1.7B model.