Update metadata with huggingface_hub

This commit is contained in:
ai-modelscope
2024-12-17 14:29:01 +08:00
parent b4ba5e1294
commit 6894f22f66
26 changed files with 276 additions and 63 deletions

55
.gitattributes vendored
View File

@@ -1,47 +1,58 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q6_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q5_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q4_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q3_K_XL.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q2_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B-f16.gguf filter=lfs diff=lfs merge=lfs -text
Llama-OpenReviewer-8B.imatrix filter=lfs diff=lfs merge=lfs -text

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d69cb7b38274c6c3aca53b4f3d3667c701aed154869531b1c7de3a5133d67cdb
size 2948282176

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dd37a084123715ea697a2cd19f75a83401aee496f3c334a2df230174ee13f595
size 3784824640

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ce563df6650492e29dccb3f13141aa845adcccffa0446986c239a9d36d42048f
size 3518748480

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2d0ea2155877c2827e10394de70bc84b51be8870f599d3029d76ceff125d3209
size 4677990208

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d63d7b1df96295e2bad18e0e68b63cd44be47fa04c9638c4ddf2cf0d28abe750
size 4447663936

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ebc8e6bf554a4928311db8831c562dbfda565210677d0f005e11654ff35cd564
size 3179132736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cecafeba0175849e0b092407b1ad9e52a4471600024727e7f9fa06f16d38ec3f
size 3692156736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a7803e52df5fcb9280a434f9c4e69f78b278bd2af502b000a14f9860e8b7b814
size 4321957696

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3c453a9f4625f1dc19f25a65e4cf5e64aa45b0dddcee33a7c0d26023f2136a80
size 4018919232

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:36a4577af61a9b649da70313576ea20a786942e6c5c03811ca26ea6ef1bdd028
size 3664500544

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f3d9ecae40225fde9989efc07352c2c7647a9b6fde7750838dd5f24f71de50ce
size 4781627200

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5126c7cb47b6cd7daa941df7eac363c179f2957c82744624e0757ef533bd70e5
size 4675893056

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6829bcb0869e0f73a3521c7fb0bb602af2fb18b59c993136626623f44a8ae607
size 5310633792

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b48fd7eee01738de4adcb271fc3c7c5b306f8c75b9804794706dbfdf7a6835f0
size 4920735552

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a4a8de471770d976372b2080a8c334567c5ec049a53fe1830467f0400ad187dc
size 4692670272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6a26be51d79448072889d8a4b622d34e04b0e621e68301c653f80d240e910c42
size 6057219904

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9e10a8c9eb63d4c0ac04e8f8f61cd76757fd681de5d62c0c971d71cb4d42d72e
size 5732988736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:def9174943416df34857454f2365522c3bcab733aa076c415b827aee2e18f463
size 5599295296

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5aac17b5687b2789e2e0119e7b93d9b85759e4b78bf55aca71cb9cd7faaffe02
size 6596007744

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5dcb490b0dd0fea817d807e022c4fc41075804972b2864c3223ae69f0e490de8
size 6850467648

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fd43b7c5797edb04a4e59b6cd55cc051b234a23c899a7384c09a33e98e04d17c
size 8540772160

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1f340c46d4a569ff4b0c1ca95792a9235c4546edb832452bdb590aab2805a980
size 16068892192

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:00ea7ce97aebcef4a1138848654631d3acde1a22dc603a04b4d90f4fbc2f6cbb
size 4988170

214
README.md
View File

@@ -1,47 +1,179 @@
---
license: Apache License 2.0
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
quantized_by: bartowski
pipeline_tag: text-generation
tags:
- review
- peer
- paper
- generation
- automatic
- feedback
- conference
- article
- manuscript
- openreview
license: llama3.1
base_model: maxidl/Llama-OpenReviewer-8B
language:
- en
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
SDK下载
```bash
#安装ModelScope
pip install modelscope
## Llamacpp imatrix Quantizations of Llama-OpenReviewer-8B
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b4327">b4327</a> for quantization.
Original model: https://huggingface.co/maxidl/Llama-OpenReviewer-8B
All quants made using imatrix option with dataset from [here](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8)
Run them in [LM Studio](https://lmstudio.ai/)
## Prompt format
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('bartowski/Llama-OpenReviewer-8B-GGUF')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/bartowski/Llama-OpenReviewer-8B-GGUF.git
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
| -------- | ---------- | --------- | ----- | ----------- |
| [Llama-OpenReviewer-8B-f16.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-f16.gguf) | f16 | 16.07GB | false | Full F16 weights. |
| [Llama-OpenReviewer-8B-Q8_0.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q8_0.gguf) | Q8_0 | 8.54GB | false | Extremely high quality, generally unneeded but max available quant. |
| [Llama-OpenReviewer-8B-Q6_K_L.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q6_K_L.gguf) | Q6_K_L | 6.85GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
| [Llama-OpenReviewer-8B-Q6_K.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q6_K.gguf) | Q6_K | 6.60GB | false | Very high quality, near perfect, *recommended*. |
| [Llama-OpenReviewer-8B-Q5_K_L.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q5_K_L.gguf) | Q5_K_L | 6.06GB | false | Uses Q8_0 for embed and output weights. High quality, *recommended*. |
| [Llama-OpenReviewer-8B-Q5_K_M.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q5_K_M.gguf) | Q5_K_M | 5.73GB | false | High quality, *recommended*. |
| [Llama-OpenReviewer-8B-Q5_K_S.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q5_K_S.gguf) | Q5_K_S | 5.60GB | false | High quality, *recommended*. |
| [Llama-OpenReviewer-8B-Q4_K_L.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q4_K_L.gguf) | Q4_K_L | 5.31GB | false | Uses Q8_0 for embed and output weights. Good quality, *recommended*. |
| [Llama-OpenReviewer-8B-Q4_K_M.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q4_K_M.gguf) | Q4_K_M | 4.92GB | false | Good quality, default size for most use cases, *recommended*. |
| [Llama-OpenReviewer-8B-Q3_K_XL.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q3_K_XL.gguf) | Q3_K_XL | 4.78GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
| [Llama-OpenReviewer-8B-Q4_K_S.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q4_K_S.gguf) | Q4_K_S | 4.69GB | false | Slightly lower quality with more space savings, *recommended*. |
| [Llama-OpenReviewer-8B-Q4_0.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q4_0.gguf) | Q4_0 | 4.68GB | false | Legacy format, offers online repacking for ARM and AVX CPU inference. |
| [Llama-OpenReviewer-8B-IQ4_NL.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-IQ4_NL.gguf) | IQ4_NL | 4.68GB | false | Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference. |
| [Llama-OpenReviewer-8B-IQ4_XS.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-IQ4_XS.gguf) | IQ4_XS | 4.45GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
| [Llama-OpenReviewer-8B-Q3_K_L.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q3_K_L.gguf) | Q3_K_L | 4.32GB | false | Lower quality but usable, good for low RAM availability. |
| [Llama-OpenReviewer-8B-Q3_K_M.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q3_K_M.gguf) | Q3_K_M | 4.02GB | false | Low quality. |
| [Llama-OpenReviewer-8B-IQ3_M.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-IQ3_M.gguf) | IQ3_M | 3.78GB | false | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| [Llama-OpenReviewer-8B-Q2_K_L.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q2_K_L.gguf) | Q2_K_L | 3.69GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
| [Llama-OpenReviewer-8B-Q3_K_S.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q3_K_S.gguf) | Q3_K_S | 3.66GB | false | Low quality, not recommended. |
| [Llama-OpenReviewer-8B-IQ3_XS.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-IQ3_XS.gguf) | IQ3_XS | 3.52GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
| [Llama-OpenReviewer-8B-Q2_K.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-Q2_K.gguf) | Q2_K | 3.18GB | false | Very low quality but surprisingly usable. |
| [Llama-OpenReviewer-8B-IQ2_M.gguf](https://huggingface.co/bartowski/Llama-OpenReviewer-8B-GGUF/blob/main/Llama-OpenReviewer-8B-IQ2_M.gguf) | IQ2_M | 2.95GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
## Embed/output weights
Some of these quants (Q3_K_XL, Q4_K_L etc) are the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of what they would normally default to.
## Downloading using huggingface-cli
<details>
<summary>Click to view download instructions</summary>
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download bartowski/Llama-OpenReviewer-8B-GGUF --include "Llama-OpenReviewer-8B-Q4_K_M.gguf" --local-dir ./
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download bartowski/Llama-OpenReviewer-8B-GGUF --include "Llama-OpenReviewer-8B-Q8_0/*" --local-dir ./
```
You can either specify a new local-dir (Llama-OpenReviewer-8B-Q8_0) or download them all in place (./)
</details>
## ARM/AVX information
Previously, you would download Q4_0_4_4/4_8/8_8, and these would have their weights interleaved in memory in order to improve performance on ARM and AVX machines by loading up more data in one pass.
Now, however, there is something called "online repacking" for weights. details in [this PR](https://github.com/ggerganov/llama.cpp/pull/9921). If you use Q4_0 and your hardware would benefit from repacking weights, it will do it automatically on the fly.
As of llama.cpp build [b4282](https://github.com/ggerganov/llama.cpp/releases/tag/b4282) you will not be able to run the Q4_0_X_X files and will instead need to use Q4_0.
Additionally, if you want to get slightly better quality for , you can use IQ4_NL thanks to [this PR](https://github.com/ggerganov/llama.cpp/pull/10541) which will also repack the weights for ARM, though only the 4_4 for now. The loading time may be slower but it will result in an overall speed incrase.
<details>
<summary>Click to view Q4_0_X_X information (deprecated</summary>
I'm keeping this section to show the potential theoretical uplift in performance from using the Q4_0 with online repacking.
<details>
<summary>Click to view benchmarks on an AVX2 system (EPYC7702)</summary>
| model | size | params | backend | threads | test | t/s | % (vs Q4_0) |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |-------------: |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp512 | 204.03 ± 1.03 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp1024 | 282.92 ± 0.19 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp2048 | 259.49 ± 0.44 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg128 | 39.12 ± 0.27 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg256 | 39.31 ± 0.69 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg512 | 40.52 ± 0.03 | 100% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp512 | 301.02 ± 1.74 | 147% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp1024 | 287.23 ± 0.20 | 101% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp2048 | 262.77 ± 1.81 | 101% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg128 | 18.80 ± 0.99 | 48% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg256 | 24.46 ± 3.04 | 83% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg512 | 36.32 ± 3.59 | 90% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp512 | 271.71 ± 3.53 | 133% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp1024 | 279.86 ± 45.63 | 100% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp2048 | 320.77 ± 5.00 | 124% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg128 | 43.51 ± 0.05 | 111% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg256 | 43.35 ± 0.09 | 110% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg512 | 42.60 ± 0.31 | 105% |
Q4_0_8_8 offers a nice bump to prompt processing and a small bump to text generation
</details>
</details>
## Which file should I choose?
<details>
<summary>Click here for details</summary>
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
</details>
## Credits
Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset.
Thank you ZeroWw for the inspiration to experiment with embed/output.
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}