Update metadata with huggingface_hub

This commit is contained in:
ai-modelscope
2024-11-24 20:25:41 +08:00
parent 1fce9087d6
commit 3307f1c217
28 changed files with 233 additions and 55 deletions

50
.gitattributes vendored
View File

@@ -1,38 +1,60 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q6_K_L.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q5_K_L.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q4_K_L.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q4_0_8_8.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q4_0_4_8.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q4_0_4_4.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q3_K_XL.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q2_K_L.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B-f16.gguf filter=lfs diff=lfs merge=lfs -text
L3-Rhaenys-8B.imatrix filter=lfs diff=lfs merge=lfs -text

3
L3-Rhaenys-8B-IQ2_M.gguf Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:704b993a37b477fa35001dbf9bf8c7f1a52927b4433a2f090f7c2288430a3f35
size 2948282144

3
L3-Rhaenys-8B-IQ3_M.gguf Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c87859d63c66393b1129437714f65688cad1f4c1042ab2e728511836d92fe56c
size 3784824608

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d88eb4fbca205a52ae20e95f663d3a45876cd501a1bca848bc2718cc41573e02
size 3518748448

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:083198b79bc40f7c64fa5797a59e233556b71d86f1f307c76b410cba80af3d25
size 4447663904

3
L3-Rhaenys-8B-Q2_K.gguf Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ac44edb055b07856d85c5787c190821e3cc5b3a33a85ff54f1e68b4948dda57a
size 3179132704

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0ab72c95650b7b6303c1e70127da1b09303441e228f8a8b85054746e7ed21049
size 3692156704

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:89c7d9adfe0e659a5c5aa18d7ae27142ee4dd430b4d5bf5d852fed05cb4449e6
size 4321957664

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0dcbfc17d4ae6668ec9ba412fe968c2cbff47a41375d6b2b238d802f2cb23288
size 4018919200

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:97fba2594917824ebeefaac89e9bd77f201d5d59692508e4c1b6e107015aa774
size 3664500512

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:af10090737608436670e9ca02eb134093745f83bec9ea6d2531171804411136d
size 4781627168

3
L3-Rhaenys-8B-Q4_0.gguf Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ae566649f0ff5801ad441727d10d2e02a4b48af30f5ce235ea9e950f5995b21d
size 4675893024

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:313cc2822d63943778bc4a431aa4273e5354d73b20675eed2beb9156144c3f3c
size 4661212960

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:301b3ce059e75748f2814bb9fdb0eca77b0929389f3a57ad756ebf5ad6196f6e
size 4661212960

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:be0147c17d387ab43616190305ac73167e1f0d3a657f772a468e7497ec0efbef
size 4661212960

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f9d4c9ea8ef307e64414b887428d842f1a866a119fc77b41dfe28572e2738a3f
size 5310633760

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:38af33483003e0e114ebeebf11730ef522c6ab9a9247d6a23371e2faa38cc063
size 4920735520

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bf2f6150af3ec1238c2ee80dbeee651e6d969461e3bdb5e5b6147202c72fc6ad
size 4692670240

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4d861187b2408196def4fe3dd9c9a4630ad44eed7c35fae06191f9deb1fdefcc
size 6057219872

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4e6ca22fe32c2cf8a12320af8ed305364bdeee333934ccd5740ecc9fad12afbf
size 5732988704

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d6d557cce6f2b97af9c275ff7c63161c812cd21322b7dd295b05ef560861ce3a
size 5599295264

3
L3-Rhaenys-8B-Q6_K.gguf Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:25efc7a555a0075b906e8f3ba58de5fdf8c8e7861117b1b1e3ab7ba248b856ce
size 6596007712

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7faed0d71f863fcd9654d029d7415e360cbfd6d2c558893c960d8922cb1654fd
size 6850467616

3
L3-Rhaenys-8B-Q8_0.gguf Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ac6e3d3c383e149c25d309dbbbd0d72693c8b873c7ffd2a66422a6c6c118a121
size 8540772128

3
L3-Rhaenys-8B-f16.gguf Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2411c052f7bfb6b16f9381780f8218bfa607cd1b93ae6a1027eb45b97abb2d8c
size 16068892192

3
L3-Rhaenys-8B.imatrix Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:29785aae65a320de084733f8ce8d44ae487c6a45c5ef0f7626d00cce3054f5dc
size 4988170

162
README.md
View File

@@ -1,47 +1,127 @@
---
license: Apache License 2.0
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
base_model: tannedbum/L3-Rhaenys-8B
language:
- en
license: cc-by-nc-4.0
pipeline_tag: text-generation
tags:
- mergekit
- merge
- roleplay
- sillytavern
- llama3
- not-for-all-audiences
quantized_by: bartowski
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
SDK下载
```bash
#安装ModelScope
pip install modelscope
## Llamacpp imatrix Quantizations of L3-Rhaenys-8B
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3634">b3634</a> for quantization.
Original model: https://huggingface.co/tannedbum/L3-Rhaenys-8B
All quants made using imatrix option with dataset from [here](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8)
Run them in [LM Studio](https://lmstudio.ai/)
## Prompt format
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('bartowski/L3-Rhaenys-8B-GGUF')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/bartowski/L3-Rhaenys-8B-GGUF.git
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
| -------- | ---------- | --------- | ----- | ----------- |
| [L3-Rhaenys-8B-f16.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-f16.gguf) | f16 | 16.07GB | false | Full F16 weights. |
| [L3-Rhaenys-8B-Q8_0.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q8_0.gguf) | Q8_0 | 8.54GB | false | Extremely high quality, generally unneeded but max available quant. |
| [L3-Rhaenys-8B-Q6_K_L.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q6_K_L.gguf) | Q6_K_L | 6.85GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
| [L3-Rhaenys-8B-Q6_K.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q6_K.gguf) | Q6_K | 6.60GB | false | Very high quality, near perfect, *recommended*. |
| [L3-Rhaenys-8B-Q5_K_L.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q5_K_L.gguf) | Q5_K_L | 6.06GB | false | Uses Q8_0 for embed and output weights. High quality, *recommended*. |
| [L3-Rhaenys-8B-Q5_K_M.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q5_K_M.gguf) | Q5_K_M | 5.73GB | false | High quality, *recommended*. |
| [L3-Rhaenys-8B-Q5_K_S.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q5_K_S.gguf) | Q5_K_S | 5.60GB | false | High quality, *recommended*. |
| [L3-Rhaenys-8B-Q4_K_L.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q4_K_L.gguf) | Q4_K_L | 5.31GB | false | Uses Q8_0 for embed and output weights. Good quality, *recommended*. |
| [L3-Rhaenys-8B-Q4_K_M.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q4_K_M.gguf) | Q4_K_M | 4.92GB | false | Good quality, default size for must use cases, *recommended*. |
| [L3-Rhaenys-8B-Q3_K_XL.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q3_K_XL.gguf) | Q3_K_XL | 4.78GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
| [L3-Rhaenys-8B-Q4_K_S.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q4_K_S.gguf) | Q4_K_S | 4.69GB | false | Slightly lower quality with more space savings, *recommended*. |
| [L3-Rhaenys-8B-Q4_0.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q4_0.gguf) | Q4_0 | 4.68GB | false | Legacy format, generally not worth using over similarly sized formats |
| [L3-Rhaenys-8B-Q4_0_8_8.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q4_0_8_8.gguf) | Q4_0_8_8 | 4.66GB | false | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
| [L3-Rhaenys-8B-Q4_0_4_8.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q4_0_4_8.gguf) | Q4_0_4_8 | 4.66GB | false | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
| [L3-Rhaenys-8B-Q4_0_4_4.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q4_0_4_4.gguf) | Q4_0_4_4 | 4.66GB | false | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
| [L3-Rhaenys-8B-IQ4_XS.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-IQ4_XS.gguf) | IQ4_XS | 4.45GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
| [L3-Rhaenys-8B-Q3_K_L.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q3_K_L.gguf) | Q3_K_L | 4.32GB | false | Lower quality but usable, good for low RAM availability. |
| [L3-Rhaenys-8B-Q3_K_M.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q3_K_M.gguf) | Q3_K_M | 4.02GB | false | Low quality. |
| [L3-Rhaenys-8B-IQ3_M.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-IQ3_M.gguf) | IQ3_M | 3.78GB | false | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| [L3-Rhaenys-8B-Q2_K_L.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q2_K_L.gguf) | Q2_K_L | 3.69GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
| [L3-Rhaenys-8B-Q3_K_S.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q3_K_S.gguf) | Q3_K_S | 3.66GB | false | Low quality, not recommended. |
| [L3-Rhaenys-8B-IQ3_XS.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-IQ3_XS.gguf) | IQ3_XS | 3.52GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
| [L3-Rhaenys-8B-Q2_K.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-Q2_K.gguf) | Q2_K | 3.18GB | false | Very low quality but surprisingly usable. |
| [L3-Rhaenys-8B-IQ2_M.gguf](https://huggingface.co/bartowski/L3-Rhaenys-8B-GGUF/blob/main/L3-Rhaenys-8B-IQ2_M.gguf) | IQ2_M | 2.95GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
## Embed/output weights
Some of these quants (Q3_K_XL, Q4_K_L etc) are the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of what they would normally default to.
Some say that this improves the quality, others don't notice any difference. If you use these models PLEASE COMMENT with your findings. I would like feedback that these are actually used and useful so I don't keep uploading quants no one is using.
Thanks!
## Credits
Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
Thank you ZeroWw for the inspiration to experiment with embed/output
## Downloading using huggingface-cli
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download bartowski/L3-Rhaenys-8B-GGUF --include "L3-Rhaenys-8B-Q4_K_M.gguf" --local-dir ./
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download bartowski/L3-Rhaenys-8B-GGUF --include "L3-Rhaenys-8B-Q8_0/*" --local-dir ./
```
You can either specify a new local-dir (L3-Rhaenys-8B-Q8_0) or download them all in place (./)
## Which file should I choose?
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}