初始化项目,由ModelHub XC社区提供模型

Model: rkumar70900/qwen2.5-1.5b-gguf-experiments
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-12 16:28:55 +08:00
commit 816607ef6d
15 changed files with 283 additions and 0 deletions

48
.gitattributes vendored Normal file
View File

@@ -0,0 +1,48 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-IQ2_S.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-f16.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-IQ1_M.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-IQ1_S.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-IQ2_XS.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-IQ2_XXS.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-Q2_K_S.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
gguf/qwen2.5-1.5b-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text

196
README.md Normal file
View File

@@ -0,0 +1,196 @@
---
base_model: Qwen/Qwen2.5-1.5B-Instruct
language:
- en
license: apache-2.0
tags:
- llama.cpp
- gguf
- quantized
- qwen2.5
- text-generation
pipeline_tag: text-generation
---
# Qwen2.5-1.5B-Instruct — GGUF Quantization Experiments
This repo contains **Qwen2.5-1.5B-Instruct** quantized into multiple GGUF formats using [llama.cpp](https://github.com/ggerganov/llama.cpp). It was created as part of a hands-on quantization experiment documenting the full process from raw HuggingFace weights → multiple GGUF formats → quality evaluation.
---
## What's in This Repo
```
gguf/
├── qwen2.5-1.5b-f16.gguf ~2.9 GB source of truth — full precision
├── qwen2.5-1.5b-Q8_0.gguf ~1.6 GB near-lossless
├── qwen2.5-1.5b-Q5_K_M.gguf ~1.0 GB great quality/size tradeoff
├── qwen2.5-1.5b-Q4_K_M.gguf ~935 MB recommended — sweet spot ★
├── qwen2.5-1.5b-Q4_K_S.gguf ~865 MB leaner 4-bit variant
├── qwen2.5-1.5b-Q2_K.gguf ~530 MB aggressive K-quant baseline
├── qwen2.5-1.5b-Q2_K_S.gguf ~530 MB aggressive K-quant — needs imatrix
├── qwen2.5-1.5b-IQ3_M.gguf ~680 MB importance-weighted 3-bit
├── qwen2.5-1.5b-IQ2_XS.gguf ~480 MB importance-weighted 2-bit — needs imatrix
├── qwen2.5-1.5b-IQ2_XXS.gguf ~420 MB most aggressive — needs imatrix
├── qwen2.5-1.5b-IQ2_S.gguf ~450 MB importance-weighted 2.5-bit — needs imatrix
├── qwen2.5-1.5b-IQ1_M.gguf ~300 MB extreme 1.75-bit — needs imatrix
└── qwen2.5-1.5b-IQ1_S.gguf ~280 MB extreme 1.56-bit — needs imatrix
```
> **Note on f16:** The F16 file is included as the reference baseline for perplexity comparisons. It is not intended for general inference use — at 2.9 GB it offers no practical advantage over Q8_0 for local deployment.
---
## Which File Should I Use?
| Use Case | Recommended Format |
|---|---|
| Best quality, VRAM not a concern | `Q8_0` |
| Daily driver — best quality/size tradeoff | `Q4_K_M` ← start here |
| Tight on memory, want decent quality | `Q4_K_S` or `Q2_K` |
| Edge deployment / very limited RAM | `IQ2_XS` or `IQ2_XXS` |
| Research / extreme compression testing | `IQ1_M` or `IQ1_S` |
| Partial GPU offload (CPU + GPU split) | `Q4_K_M` or `IQ3_M` |
If you're not sure, **start with Q4_K_M**. It's the most tested format in the community and gives you ~68% size reduction with minimal quality loss.
> ⚠️ **IQ1 and IQ2 formats** (`IQ1_S`, `IQ1_M`, `IQ2_S`, `IQ2_XS`, `IQ2_XXS`, `Q2_K_S`) were all generated **with an importance matrix**. Without one, these formats produce significantly degraded output. See the Imatrix Calibration section below for details.
---
## Format Guide
### K-Quant Family (Q\*\_K\_\*)
Standard llama.cpp quantization using superblocks of 256 weights. The suffix means:
- `_S` (Small) — more aggressive, smaller file
- `_M` (Medium) — mixed-precision, smarter assignment of bits to sensitive layers
Despite the "4" in Q4_K_M, it is **not** uniform 4-bit. Critical tensors like the embedding table and output projection are bumped to 6-bit internally. The "4" is the average bits-per-weight.
### IQ Family (IQ\*\_\*)
Importance-weighted quantization. These formats use an **importance matrix** — calibration data was run through the base model to identify which weights matter most, and precision was distributed accordingly. This is why IQ formats punch above their weight class at the same file size compared to K-quants.
The IQ2 files in this repo were generated with a WikiText-2 calibration dataset (see below). Without an importance matrix, these formats produce near-incoherent output — the imatrix is what makes them viable.
---
## Quantization Details
**Base model:** `Qwen/Qwen2.5-1.5B-Instruct`
**Quantization tool:** llama.cpp build `7074` (commit `22e1ce2f8`)
**Source precision:** F16 GGUF (converted from original SafeTensors)
**Platform:** Apple Silicon (arm64)
### Imatrix Calibration
The IQ2 formats were quantized using an importance matrix generated from WikiText-2:
```python
from datasets import load_dataset
dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train")
with open("calibration.txt", "w") as f:
for row in dataset:
text = row["text"].strip()
if len(text) > 100:
f.write(text + "\n")
```
```bash
./build/bin/llama-imatrix \
-m qwen2.5-1.5b-f16.gguf \
-f calibration.txt \
-o imatrix.dat \
--ctx-size 512 \
-ngl -1 \
--chunks 100
```
---
## How to Run
### llama.cpp CLI
```bash
./build/bin/llama-cli \
-m qwen2.5-1.5b-Q4_K_M.gguf \
-n 512 \
-ngl 99 \
--prompt "Explain the difference between supervised and unsupervised learning."
```
### llama.cpp Server (OpenAI-compatible)
```bash
./build/bin/llama-server \
-m qwen2.5-1.5b-Q4_K_M.gguf \
-ngl 99 \
--port 8080
```
Then hit `http://localhost:8080/v1/chat/completions` like any OpenAI endpoint.
### Python (llama-cpp-python)
```python
from llama_cpp import Llama
llm = Llama(
model_path="qwen2.5-1.5b-Q4_K_M.gguf",
n_gpu_layers=-1, # full GPU offload
n_ctx=4096,
)
output = llm(
"Explain quantization in simple terms:",
max_tokens=256,
temperature=0.7,
)
print(output["choices"][0]["text"])
```
### Ollama
```bash
ollama run hf.co/your-username/qwen2.5-1.5b-gguf-experiments:Q4_K_M
```
---
## Model Architecture (from metadata)
| Parameter | Value |
|---|---|
| Architecture | Qwen2 |
| Parameters | 1.5B |
| Layers | 28 |
| Hidden dimension | 1536 |
| FFN intermediate | 8960 |
| Attention heads (Q) | 12 |
| Attention heads (KV) | 2 |
| Attention type | Grouped Query Attention (GQA) |
| Context length | 32768 |
| Vocabulary size | 151,936 |
| Tokenizer | GPT-2 BPE (Qwen2 variant) |
---
## License
The quantized weights in this repo are derived from `Qwen/Qwen2.5-1.5B-Instruct` and inherit its [Apache 2.0 license](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct/blob/main/LICENSE).
---
## Citation
If you use these files in your work, please also cite the original Qwen2.5 model:
```bibtex
@misc{qwen2.5,
title = {Qwen2.5: A Party of Foundation Models},
author = {Qwen Team},
year = {2024},
url = {https://qwenlm.github.io/blog/qwen2.5/}
}
```

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:91a6211c48cb4f71da8504a7e84cbfab6b243f896262176b10e7a1b3c8098f4f
size 464461376

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6aed508b950625563d3c60fec27be69d6682674f078cc8f662b0bcd7d23e06d4
size 436527680

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:76349b394231f060cc57f22c22ab4f4f79ec14a127c0187d430e93f7d99ab3e8
size 563809856

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:05e1ea8dda8618abd3cf729daaeae85b5145fc5832f6c5563349f0d49874ef5f
size 550326848

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fc1d90304128306d5b3532258b1789559b9e43bd77baaeaee607cf600de4281b
size 511017536

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dd1415588a665b7560f6bf9478b395120cd256d8ffe9e2a276193fddad5ac06a
size 776663904

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:07cf94d5da6097fdb28433d243b27ce44071eb5a801d7aefc3141507f194529f
size 676304736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c327796ecd1f116d03cfba869b865caefa1aeaceb3df4eedf649b5535cb7b00
size 640135232

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:70f0037f9efe3359e90dd6e7ad1baeb661930cb67c440085c7d676f41a8518dc
size 986048352

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5b7032d46c5cf4515cabf1f0782c44f9006e9db9358ffa70ed1c87a3e8526386
size 940312416

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:da73299845cf2deed746a7c20796fc27a43798277692f5813b73f4306b987e13
size 1125050208

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0205113d09062f92f208d13b6ac221bf7af70d5fd4fc09c5ebd1127eeadd4989
size 1646572896

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:81dd18066baff3d98cf5933a3c81ec7d015ecde16e3bdff5e1f004e474ccc05a
size 3093669216