初始化项目,由ModelHub XC社区提供模型
Model: g023/qwen3-tiny-v2 Source: Original Platform
This commit is contained in:
40
.gitattributes
vendored
Normal file
40
.gitattributes
vendored
Normal file
@@ -0,0 +1,40 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
Qwen3-g023-tiny-v2-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Qwen3-g023-tiny-v2-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Qwen3-g023-tiny-v2-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Qwen3-g023-tiny-v2-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
Qwen3-g023-tiny-v2-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
3
Qwen3-g023-tiny-v2-Q2_K.gguf
Normal file
3
Qwen3-g023-tiny-v2-Q2_K.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:93c5f34612e203562c15ed55059ef9b91d0e4ebe9747d79ea2478cf479876c5c
|
||||
size 814695424
|
||||
3
Qwen3-g023-tiny-v2-Q3_K_M.gguf
Normal file
3
Qwen3-g023-tiny-v2-Q3_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:8e18787fb7ed95c5201f1c42709d1d1c1a6bbc540c3a709750cf44bed305d75e
|
||||
size 987841536
|
||||
3
Qwen3-g023-tiny-v2-Q4_K_M.gguf
Normal file
3
Qwen3-g023-tiny-v2-Q4_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:b225c0fd5eae542b24e60c742f106c0ee7353df8f85ef0d06d8b22282db5ccda
|
||||
size 1164067840
|
||||
3
Qwen3-g023-tiny-v2-Q6_K.gguf
Normal file
3
Qwen3-g023-tiny-v2-Q6_K.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e1d73d2e9396ce9a7a85adddcf63861be3df4aa382a7aae2d46064c834259d24
|
||||
size 1500365824
|
||||
3
Qwen3-g023-tiny-v2-Q8_0.gguf
Normal file
3
Qwen3-g023-tiny-v2-Q8_0.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2fb5fb8b5b6a3d9308dc522c4b8fa8e80fa861212f9d543cf929ba0fbb6feb18
|
||||
size 1941416960
|
||||
220
README.md
Normal file
220
README.md
Normal file
@@ -0,0 +1,220 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
base_model: Qwen/Qwen3-1.7B
|
||||
tags:
|
||||
- qwen3
|
||||
- gguf
|
||||
- layer-surgery
|
||||
- small-language-model
|
||||
- optimized
|
||||
- thinking
|
||||
- text-generation
|
||||
- skip-connections
|
||||
- interpolation
|
||||
model_name: Qwen3-g023-tiny-v2
|
||||
pipeline_tag: text-generation
|
||||
library_name: llama.cpp
|
||||
quantized_by: g023
|
||||
---
|
||||
|
||||
# Qwen3-g023-tiny-v2 — GGUF
|
||||
|
||||
**An advanced 30-layer Qwen3 variant using swap, interpolation, and skip-bridge surgery.**
|
||||
|
||||
Created through innovative layer surgery combining multi-swap, interpolation, and bridge (skip connection) techniques. Scores **94.3/100** — a 6.5-point improvement over the original Qwen3-1.7B baseline (87.8/100) and the highest score achieved in two phases of experimentation across ~250 configurations. (I have my own benchmarks, so results may vary if you run your own tests.)
|
||||
|
||||
## Available Quantizations
|
||||
|
||||
| Quantization | Bits/Weight | Description | Download |
|
||||
|:---:|:---:|:---|:---:|
|
||||
| **Q8_0** | 8.00 | Highest quality, virtually lossless (USE THIS ONE) | [Qwen3-g023-tiny-v2-Q8_0.gguf](./Qwen3-g023-tiny-v2-Q8_0.gguf) |
|
||||
| **Q6_K** | 6.57 | Excellent quality, good compression | [Qwen3-g023-tiny-v2-Q6_K.gguf](./Qwen3-g023-tiny-v2-Q6_K.gguf) |
|
||||
| **Q4_K_M** | 4.85 | Good balance of quality and size | [Qwen3-g023-tiny-v2-Q4_K_M.gguf](./Qwen3-g023-tiny-v2-Q4_K_M.gguf) |
|
||||
| **Q3_K_M** | 3.91 | High compression, moderate quality loss | [Qwen3-g023-tiny-v2-Q3_K_M.gguf](./Qwen3-g023-tiny-v2-Q3_K_M.gguf) |
|
||||
| **Q2_K** | 3.35 | Maximum compression, significant quality loss | [Qwen3-g023-tiny-v2-Q2_K.gguf](./Qwen3-g023-tiny-v2-Q2_K.gguf) |
|
||||
|
||||
## Model Details
|
||||
|
||||
| Parameter | Value |
|
||||
|:---|:---|
|
||||
| Architecture | Qwen3ForCausalLM |
|
||||
| Layers | **30** (28 original + 2 from surgery) |
|
||||
| Hidden Size | 2,048 |
|
||||
| Intermediate Size | 6,144 |
|
||||
| Attention Heads | 16 query / 8 key-value (GQA) |
|
||||
| Head Dimension | 128 |
|
||||
| Vocabulary | 151,936 tokens |
|
||||
| Max Context | 40,960 tokens |
|
||||
| RoPE θ | 1,000,000 |
|
||||
| Tied Embeddings | Yes |
|
||||
| Total Parameters | **~1.82B** |
|
||||
| Precision (source) | bfloat16 |
|
||||
|
||||
## Surgery Operations
|
||||
|
||||
This model was created by applying three innovative surgical operations to [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B):
|
||||
|
||||
1. **Multi-swap: layers 12↔13 and 16↔17** — Reorders attention blocks at two critical points in the network for improved representational flow through the mid-layers.
|
||||
2. **Interpolation: layers 20 & 22 (α=0.5)** — Creates a new layer by blending the weights of layers 20 and 22 at equal proportions, producing a smoother transition in the upper layers.
|
||||
3. **Bridge (skip connection): layer 5 → after layer 20** — Copies early-layer representations (layer 5) and inserts them after layer 20, creating a skip connection that helps preserve low-level features deep in the network.
|
||||
|
||||
### Why These Operations Work
|
||||
|
||||
- **Multi-swap** corrects suboptimal layer ordering that emerged from pre-training, allowing better gradient flow through the network's critical middle section.
|
||||
- **Interpolation** creates a synthetic transition layer that smooths the representation gap between layers 20 and 22, reducing the information bottleneck.
|
||||
- **Bridge/skip connections** address the "forgetting problem" in deep networks by reintroducing early feature representations at later stages — a technique inspired by ResNet's residual connections but applied at the transformer layer level.
|
||||
|
||||
## Benchmark Results
|
||||
|
||||
| Metric | Original (28L) | [v1 (27L)](https://huggingface.co/g023/Qwen3-g023-tiny-v1-GGUF) | **v2 (30L)** | Δ vs Original |
|
||||
|:---|:---:|:---:|:---:|:---:|
|
||||
| **Overall Score** | 87.8 / 100 | 92.9 / 100 | **94.3 / 100** | **+6.5** |
|
||||
| **Factual Accuracy** | 15/17 (88%) | 17/17 (100%) | **16/17 (94%)** | **+6%** |
|
||||
| Avg Perplexity | — | 15.70 | **15.17** | — |
|
||||
| Thinking Mode | ✅ | ✅ | ✅ | — |
|
||||
| Non-Thinking Mode | ✅ | ✅ | ✅ | — |
|
||||
|
||||
Evaluated using a comprehensive test suite with 17 factual questions, 2 completion coherence tests, perplexity measurements, repetition analysis, and thinking/non-thinking mode verification.
|
||||
|
||||
## Features
|
||||
|
||||
- **Thinking mode**: Full `<think>` / `</think>` reasoning support — toggle via `enable_thinking` parameter
|
||||
- **Non-thinking mode**: Direct responses without chain-of-thought overhead
|
||||
- **Tool calling**: Full function/tool calling support
|
||||
- **System prompts**: Standard system message support
|
||||
- **Chat template**: Qwen3 ChatML template embedded in the GGUF
|
||||
|
||||
## Usage
|
||||
|
||||
### With Ollama
|
||||
|
||||
```bash
|
||||
# Download the GGUF and create from Modelfile
|
||||
cat > Modelfile << 'EOF'
|
||||
FROM ./Qwen3-g023-tiny-v2-Q8_0.gguf
|
||||
|
||||
PARAMETER temperature 1.0
|
||||
PARAMETER top_p 0.95
|
||||
PARAMETER top_k 45
|
||||
PARAMETER min_p 0.1
|
||||
PARAMETER num_ctx 40000
|
||||
PARAMETER mirostat 2
|
||||
PARAMETER mirostat_tau 5.0
|
||||
PARAMETER mirostat_eta 0.1
|
||||
PARAMETER repeat_last_n 16384
|
||||
PARAMETER repeat_penalty 1.1
|
||||
PARAMETER presence_penalty 0.5
|
||||
PARAMETER frequency_penalty 1.0
|
||||
|
||||
TEMPLATE """{{- if .System }}
|
||||
<|im_start|>system
|
||||
{{ .System }}<|im_end|>
|
||||
{{ end }}
|
||||
{{- range .Messages }}
|
||||
{{- if eq .Role "user" }}
|
||||
<|im_start|>user
|
||||
{{ .Content }}<|im_end|>
|
||||
{{- else if eq .Role "assistant" }}
|
||||
<|im_start|>assistant
|
||||
{{ .Content }}<|im_end|>
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
<|im_start|>assistant
|
||||
"""
|
||||
SYSTEM "You are a helpful assistant."
|
||||
EOF
|
||||
|
||||
ollama create qwen3-tiny-v2 -f Modelfile
|
||||
ollama run qwen3-tiny-v2
|
||||
```
|
||||
|
||||
### With llama.cpp
|
||||
|
||||
```bash
|
||||
# Interactive chat
|
||||
llama-cli -m Qwen3-g023-tiny-v2-Q8_0.gguf \
|
||||
--chat-template chatml -cnv
|
||||
|
||||
# Thinking mode
|
||||
llama-cli -m Qwen3-g023-tiny-v2-Q8_0.gguf \
|
||||
-p "<|im_start|>user\nExplain quantum computing<|im_end|>\n<|im_start|>assistant\n<think>\n" \
|
||||
-n 512
|
||||
|
||||
# Non-thinking mode
|
||||
llama-cli -m Qwen3-g023-tiny-v2-Q8_0.gguf \
|
||||
-p "<|im_start|>user\n/no_think What is 2+2?<|im_end|>\n<|im_start|>assistant\n" \
|
||||
-n 128
|
||||
```
|
||||
|
||||
### With Python (llama-cpp-python)
|
||||
|
||||
```python
|
||||
from llama_cpp import Llama
|
||||
|
||||
model = Llama("Qwen3-g023-tiny-v2-Q8_0.gguf", n_ctx=4096)
|
||||
response = model.create_chat_completion(
|
||||
messages=[
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "What is the capital of France?"},
|
||||
],
|
||||
temperature=0.6,
|
||||
)
|
||||
print(response["choices"][0]["message"]["content"])
|
||||
```
|
||||
|
||||
## System Requirements
|
||||
|
||||
| Quantization | RAM (CPU) | VRAM (GPU) |
|
||||
|:---:|:---:|:---:|
|
||||
| Q8_0 | ~2.2 GB | ~2.2 GB |
|
||||
| Q6_K | ~1.8 GB | ~1.8 GB |
|
||||
| Q4_K_M | ~1.4 GB | ~1.4 GB |
|
||||
| Q3_K_M | ~1.2 GB | ~1.2 GB |
|
||||
| Q2_K | ~1.0 GB | ~1.0 GB |
|
||||
|
||||
## v1 vs v2
|
||||
|
||||
This model (v2) is the **Phase 2 champion**, using advanced multi-operation surgery for the highest overall score.
|
||||
|
||||
| | [v1](https://huggingface.co/g023/Qwen3-g023-tiny-v1-GGUF) | v2 (this model) |
|
||||
|:---|:---:|:---:|
|
||||
| Layers | 27 | 30 |
|
||||
| Parameters | ~1.67B | ~1.82B |
|
||||
| Operations | del + swap | swap + interpolate + bridge |
|
||||
| Score | 92.9 / 100 | 94.3 / 100 |
|
||||
| Factual | 100% (17/17) | 94% (16/17) |
|
||||
| Perplexity | 15.70 | 15.17 |
|
||||
| Use Case | Max factual accuracy | Max overall score |
|
||||
|
||||
**v1** is recommended when factual accuracy is paramount (100% vs 94%).
|
||||
**v2** is recommended when overall quality matters more (94.3 vs 92.9).
|
||||
|
||||
## Methodology
|
||||
|
||||
Layer surgery was performed through a systematic, test-driven process across two phases:
|
||||
|
||||
1. **Phase 1** (~150 configs): Exhaustive search across deletion, duplication, swapping, interpolation, and combined operations → champion: del_10 + swap_11↔12 (v1)
|
||||
2. **Phase 2** (~95 configs): Advanced techniques including tripling, multi-swap, layer reversal, cycling, weight scaling, layer merging, skip bridges, and synthesis → champion: this model (v2)
|
||||
3. **Evaluation**: Each configuration scored on factual accuracy (17 questions), completion coherence, perplexity, repetition ratio, and thinking mode functionality
|
||||
|
||||
### Phase 2 Leaderboard (Top 5)
|
||||
|
||||
| Rank | Configuration | Score | Factual | PPL |
|
||||
|:---:|:---|:---:|:---:|:---:|
|
||||
| 🥇 | swap(12↔13,16↔17) + interp(20↔22) + bridge(5→20) | **94.3** | 94% | 15.17 |
|
||||
| 🥈 | swap(12↔13,16↔17) + interp(20↔22) | 93.9 | 94% | 14.74 |
|
||||
| 🥉 | swap(12↔13) + interp(20↔22) + bridge(5→20) | 93.4 | 94% | 15.66 |
|
||||
| 4 | multi-swap(12↔13,16↔17) | 93.1 | 100% | 14.90 |
|
||||
| 5 | Phase 1 champion (del_10 + swap_11↔12) | 92.9 | 100% | 15.70 |
|
||||
|
||||
## Credits
|
||||
|
||||
- **Base model**: [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) by the Qwen team at Alibaba
|
||||
- **Quantization**: llama.cpp
|
||||
- **Surgery**: g023
|
||||
|
||||
## License
|
||||
|
||||
Apache 2.0 — same as the original Qwen3-1.7B model.
|
||||
Reference in New Issue
Block a user