初始化项目,由ModelHub XC社区提供模型

Model: bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-18 01:32:12 +08:00
commit f79728d4e5
28 changed files with 306 additions and 0 deletions

60
.gitattributes vendored Normal file
View File

@@ -0,0 +1,60 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q6_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q5_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q4_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q4_0_8_8.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q4_0_4_8.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q4_0_4_4.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q3_K_XL.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q2_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT-f16.gguf filter=lfs diff=lfs merge=lfs -text
Llama-3.1-Tulu-3-8B-SFT.imatrix filter=lfs diff=lfs merge=lfs -text

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:771c6f9fdce545499bcd6d67c6b6ef51d6636180d73539a805f860e4e8cf359e
size 2948318976

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2dd101b00f1d1fae64495daea5d45209cebd746465615342c15042255dbd200e
size 3784865792

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:140ffc0a652b1b6557320cf1af2bd077be3bc88fd95d9653038badc72e35929e
size 3518789632

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ecb534baa4b7f1e903bb2dbb066b4acc362e806ed8016acb5607bb5eec94bb7e
size 4447708416

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e851ef7a4b93e55c9006ad6efbf9d034907065b75267b5eae75af312462188d7
size 3179170560

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:519cfe3ac487ca534cec77d31fd0fde919ddb7ab1ae8d0831c79a09521268a5b
size 3692226560

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:62532a2a77e831f600f384a7cf7d28951e6f97054739b63a11f3804dfbfbc257
size 4321998848

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:162a01f945a5f6e5cae734ec34ec87ddcdc0d28bca8d1ec4ced314a17d429e14
size 4018960384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a1d115efa4e8d2ad1fe73cbbd97fce7c9e0a8403b4860d245169392eb4d599f7
size 3664541696

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:93647acb35fe15f391e158a4d64ebe0b46becbb6154bd8687dd4f1f48570f4c8
size 4781697024

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:23546267a3e092d4eb0e004aa02a4f0bcbc6a684638c0ead7ca098c57f300b0a
size 4675938560

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:148f0ef367529a06c43f699bb229dea23e63ef283946e787a52a26e658aa01af
size 4661258496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0f650874e3fe9d6b18cc5d3c75512dca6e90e946f091c62c1239068923d5f95d
size 4661258496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6dbf4a7863f46fe88090f58ad3708777081936f3138f0edddfcdde26b7fb4ee3
size 4661258496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c056efd8eeb4a0f6e85e53c6ec87ecd5cc136061cc68c16fcb1d8bd513bcb854
size 5310703616

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3fad2c96aa9b9de19c2cda0f88a381c47ac768ca03a95059d9f6c439791f8592
size 4920781056

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cffcebdf2ce4bc8d4910b952a2ee6762fdc4b4fe45058888ffb9d357884da26a
size 4692715776

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:abbb4e30e006b140b08d3decc036086fb63b90bea89de46b8d0b248a4c56ad07
size 6057289728

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ae7df7243fada1533e6d6211c1bff19b55e38715f520b293d56cd470252c8730
size 5733038336

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4dcc7f0788b1ca0093426190b597567c3d4e9974032347e9d836ceab8cae6680
size 5599344896

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:19d51f86954fd1f5a5962f1da83109f83671797c865020b9d0e61175c6796526
size 6596061696

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:db391b18b036dee4661c7a83711b98912956f7eb60b299a562bf0d0586dfe4ac
size 6850537472

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9de7a9954d8a787a1e0287e1db66aae80b5a6d32dab6f207e9f8eba25c306a58
size 8540841984

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f512f867a9af8caf9996ee2d3d7bc08802110ac3c3f60df0020fd56b2e26e94d
size 16069023456

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6a28807a4912ffe5ff16c8085d1091a49211b441a18b1c337f0d79a7e6e7279c
size 4988170

170
README.md Normal file
View File

@@ -0,0 +1,170 @@
---
quantized_by: bartowski
pipeline_tag: text-generation
datasets:
- allenai/tulu-3-sft-mixture
base_model: allenai/Llama-3.1-Tulu-3-8B-SFT
license: llama3.1
language:
- en
---
## Llamacpp imatrix Quantizations of Llama-3.1-Tulu-3-8B-SFT
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b4132">b4132</a> for quantization.
Original model: https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT
All quants made using imatrix option with dataset from [here](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8)
Run them in [LM Studio](https://lmstudio.ai/)
## Prompt format
```
<|system|>
{system_prompt}
<|user|>
{prompt}
<|assistant|>
```
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
| -------- | ---------- | --------- | ----- | ----------- |
| [Llama-3.1-Tulu-3-8B-SFT-f16.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-f16.gguf) | f16 | 16.07GB | false | Full F16 weights. |
| [Llama-3.1-Tulu-3-8B-SFT-Q8_0.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q8_0.gguf) | Q8_0 | 8.54GB | false | Extremely high quality, generally unneeded but max available quant. |
| [Llama-3.1-Tulu-3-8B-SFT-Q6_K_L.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q6_K_L.gguf) | Q6_K_L | 6.85GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q6_K.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q6_K.gguf) | Q6_K | 6.60GB | false | Very high quality, near perfect, *recommended*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q5_K_L.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q5_K_L.gguf) | Q5_K_L | 6.06GB | false | Uses Q8_0 for embed and output weights. High quality, *recommended*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q5_K_M.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q5_K_M.gguf) | Q5_K_M | 5.73GB | false | High quality, *recommended*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q5_K_S.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q5_K_S.gguf) | Q5_K_S | 5.60GB | false | High quality, *recommended*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q4_K_L.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q4_K_L.gguf) | Q4_K_L | 5.31GB | false | Uses Q8_0 for embed and output weights. Good quality, *recommended*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q4_K_M.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q4_K_M.gguf) | Q4_K_M | 4.92GB | false | Good quality, default size for most use cases, *recommended*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q3_K_XL.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q3_K_XL.gguf) | Q3_K_XL | 4.78GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
| [Llama-3.1-Tulu-3-8B-SFT-Q4_K_S.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q4_K_S.gguf) | Q4_K_S | 4.69GB | false | Slightly lower quality with more space savings, *recommended*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q4_0.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q4_0.gguf) | Q4_0 | 4.68GB | false | Legacy format, generally not worth using over similarly sized formats |
| [Llama-3.1-Tulu-3-8B-SFT-Q4_0_8_8.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q4_0_8_8.gguf) | Q4_0_8_8 | 4.66GB | false | Optimized for ARM and AVX inference. Requires 'sve' support for ARM (see details below). *Don't use on Mac*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q4_0_4_8.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q4_0_4_8.gguf) | Q4_0_4_8 | 4.66GB | false | Optimized for ARM inference. Requires 'i8mm' support (see details below). *Don't use on Mac*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q4_0_4_4.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q4_0_4_4.gguf) | Q4_0_4_4 | 4.66GB | false | Optimized for ARM inference. Should work well on all ARM chips, not for use with GPUs. *Don't use on Mac*. |
| [Llama-3.1-Tulu-3-8B-SFT-IQ4_XS.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-IQ4_XS.gguf) | IQ4_XS | 4.45GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
| [Llama-3.1-Tulu-3-8B-SFT-Q3_K_L.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q3_K_L.gguf) | Q3_K_L | 4.32GB | false | Lower quality but usable, good for low RAM availability. |
| [Llama-3.1-Tulu-3-8B-SFT-Q3_K_M.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q3_K_M.gguf) | Q3_K_M | 4.02GB | false | Low quality. |
| [Llama-3.1-Tulu-3-8B-SFT-IQ3_M.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-IQ3_M.gguf) | IQ3_M | 3.78GB | false | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| [Llama-3.1-Tulu-3-8B-SFT-Q2_K_L.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q2_K_L.gguf) | Q2_K_L | 3.69GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
| [Llama-3.1-Tulu-3-8B-SFT-Q3_K_S.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q3_K_S.gguf) | Q3_K_S | 3.66GB | false | Low quality, not recommended. |
| [Llama-3.1-Tulu-3-8B-SFT-IQ3_XS.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-IQ3_XS.gguf) | IQ3_XS | 3.52GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
| [Llama-3.1-Tulu-3-8B-SFT-Q2_K.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-Q2_K.gguf) | Q2_K | 3.18GB | false | Very low quality but surprisingly usable. |
| [Llama-3.1-Tulu-3-8B-SFT-IQ2_M.gguf](https://huggingface.co/bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF/blob/main/Llama-3.1-Tulu-3-8B-SFT-IQ2_M.gguf) | IQ2_M | 2.95GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
## Embed/output weights
Some of these quants (Q3_K_XL, Q4_K_L etc) are the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of what they would normally default to.
## Downloading using huggingface-cli
<details>
<summary>Click to view download instructions</summary>
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF --include "Llama-3.1-Tulu-3-8B-SFT-Q4_K_M.gguf" --local-dir ./
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download bartowski/Llama-3.1-Tulu-3-8B-SFT-GGUF --include "Llama-3.1-Tulu-3-8B-SFT-Q8_0/*" --local-dir ./
```
You can either specify a new local-dir (Llama-3.1-Tulu-3-8B-SFT-Q8_0) or download them all in place (./)
</details>
## Q4_0_X_X information
<details>
<summary>Click to view Q4_0_X_X information</summary>
These are *NOT* for Metal (Apple) or GPU (nvidia/AMD/intel) offloading, only ARM chips (and certain AVX2/AVX512 CPUs).
If you're using an ARM chip, the Q4_0_X_X quants will have a substantial speedup. Check out Q4_0_4_4 speed comparisons [on the original pull request](https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660)
To check which one would work best for your ARM chip, you can check [AArch64 SoC features](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) (thanks EloyOn!).
If you're using a CPU that supports AVX2 or AVX512 (typically server CPUs and AMD's latest Zen5 CPUs) and are not offloading to a GPU, the Q4_0_8_8 may offer a nice speed as well:
<details>
<summary>Click to view benchmarks on an AVX2 system (EPYC7702)</summary>
| model | size | params | backend | threads | test | t/s | % (vs Q4_0) |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |-------------: |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp512 | 204.03 ± 1.03 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp1024 | 282.92 ± 0.19 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp2048 | 259.49 ± 0.44 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg128 | 39.12 ± 0.27 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg256 | 39.31 ± 0.69 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg512 | 40.52 ± 0.03 | 100% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp512 | 301.02 ± 1.74 | 147% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp1024 | 287.23 ± 0.20 | 101% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp2048 | 262.77 ± 1.81 | 101% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg128 | 18.80 ± 0.99 | 48% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg256 | 24.46 ± 3.04 | 83% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg512 | 36.32 ± 3.59 | 90% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp512 | 271.71 ± 3.53 | 133% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp1024 | 279.86 ± 45.63 | 100% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp2048 | 320.77 ± 5.00 | 124% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg128 | 43.51 ± 0.05 | 111% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg256 | 43.35 ± 0.09 | 110% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg512 | 42.60 ± 0.31 | 105% |
Q4_0_8_8 offers a nice bump to prompt processing and a small bump to text generation
</details>
</details>
## Which file should I choose?
<details>
<summary>Click here for details</summary>
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
</details>
## Credits
Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset.
Thank you ZeroWw for the inspiration to experiment with embed/output.
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}