Update metadata with huggingface_hub

This commit is contained in:
ai-modelscope
2024-11-27 03:52:10 +08:00
parent 99146a1e73
commit 233a0ed779
28 changed files with 237 additions and 63 deletions

57
.gitattributes vendored
View File

@@ -1,47 +1,60 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q6_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q5_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q4_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q4_0_8_8.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q4_0_4_8.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q4_0_4_4.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q3_K_XL.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q2_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B-f16.gguf filter=lfs diff=lfs merge=lfs -text
Apollo-2.0-Llama-3.1-8B.imatrix filter=lfs diff=lfs merge=lfs -text

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b82471f8feeba6cbbb370cd835af0de649415f0cf697e2e10958a1f008131958
size 2948281920

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fc374911c39d9cc891d648a0da274144636e6a35d931ab967c0a9c5c115a15b4
size 3784824384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d2994d390ed4c56a0f45f4c13f20beb39dc3e76da786071a2a6528676535b35f
size 3518748224

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:332a03fa3e69ab73cabbf7281ce41cfaa7700feaf8afc4fd213b55db4e10f6cd
size 4447663680

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:87bd8f4563261c7dc1dcecc0f4aa4586c3a0a81905ddd3fb8a09db58f96bad24
size 3179132480

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1b2816f8049856df7ed45646df2ef3c57434f58636eb6dfe255658a75dc50d4f
size 3692156480

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:045f123d2c7d4f5379b80003b290319fcba965749d7f5ce8968ea2157a19cf85
size 4321957440

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7acabdd309918c013b02751e721adc76ef85a5aa8aedfe1d53560d46b9330b02
size 4018918976

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1e4ef821d8bfcd583daa8c6782e52e9bc20e089f0e8f2a26b889a9d50c41dbd7
size 3664500288

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:804f34d152fc9c9c41d8c3f7414582755188ad1596a11e9540f67abc1e0ae2d8
size 4781626944

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dcd679e808876c988350ca19eeda09f431f359ae3ad1759a46d8fb2beee85e3e
size 4675892800

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9d9dcd99e69adc210d6087a1a4e8ddafa87cce3353d6bac2eef7d85e1277d333
size 4661212736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f63d8d7061894aa97c4e87bb8a23f8151e5632a7d8f33fa72249e278db706986
size 4661212736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5014209d3a57f4599fde39b087fc8cd0dd1cdd028c22d74d5adcaf473b1b36de
size 4661212736

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f50179a5511fdca830adeeca014a7ce8fec2d04220b248dff089f0671e594502
size 5310633536

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:df9334b66cc07f9599ef1b2d6c6e13202d853095b4bb2a6a1e8d2cec85d49477
size 4920735296

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:35715261899b6b3470196d172ad0ff86e8ae1f6b4a41168bfccb093da82fef16
size 4692670016

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:08d60903583f8d5a2a5005e01d15f2c6d09d730478cb877d117afa11c553c1af
size 6057219648

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9333b32b0119512a0496c5fddd16e167e265eba843d08cebe3f486bcb2ca623b
size 5732988480

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7803d81b2d62a2624987575e54d9caffc558e7bd231ca1321d089c145a9d4782
size 5599295040

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:53e9969f07e1d4f053670a7405c554bb085dbea4aa398df76a97e6def590279e
size 6596007488

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d6f4185dc241473a62b2a40faeabc42850c57ca22d41d91c2b21e2ccf80b6f6e
size 6850467392

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0dd214467af833c3f8e47eee50f862867bd85946f102b6cac89e28b2b453a78b
size 8540771904

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:aad3ad4e0f109b6d5bd6c212030953cfe87a6ecd140313a727ceb52a828438e0
size 16068891936

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:48d6216e5a66ea38240229762f007d2c4f19ef4f3b3654a8c987b24386a6e233
size 4988170

167
README.md
View File

@@ -1,47 +1,132 @@
---
license: Apache License 2.0
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
base_model: Locutusque/Apollo-2.0-Llama-3.1-8B
datasets:
- Locutusque/ApolloRP-2.0-SFT
language:
- en
library_name: transformers
license: llama3.1
pipeline_tag: text-generation
tags:
- not-for-all-audiences
quantized_by: bartowski
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
SDK下载
```bash
#安装ModelScope
pip install modelscope
## Llamacpp imatrix Quantizations of Apollo-2.0-Llama-3.1-8B
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3772">b3772</a> for quantization.
Original model: https://huggingface.co/Locutusque/Apollo-2.0-Llama-3.1-8B
All quants made using imatrix option with dataset from [here](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8)
Run them in [LM Studio](https://lmstudio.ai/)
## Prompt format
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('bartowski/Apollo-2.0-Llama-3.1-8B-GGUF')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF.git
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
| -------- | ---------- | --------- | ----- | ----------- |
| [Apollo-2.0-Llama-3.1-8B-f16.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-f16.gguf) | f16 | 16.07GB | false | Full F16 weights. |
| [Apollo-2.0-Llama-3.1-8B-Q8_0.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q8_0.gguf) | Q8_0 | 8.54GB | false | Extremely high quality, generally unneeded but max available quant. |
| [Apollo-2.0-Llama-3.1-8B-Q6_K_L.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q6_K_L.gguf) | Q6_K_L | 6.85GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
| [Apollo-2.0-Llama-3.1-8B-Q6_K.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q6_K.gguf) | Q6_K | 6.60GB | false | Very high quality, near perfect, *recommended*. |
| [Apollo-2.0-Llama-3.1-8B-Q5_K_L.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q5_K_L.gguf) | Q5_K_L | 6.06GB | false | Uses Q8_0 for embed and output weights. High quality, *recommended*. |
| [Apollo-2.0-Llama-3.1-8B-Q5_K_M.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q5_K_M.gguf) | Q5_K_M | 5.73GB | false | High quality, *recommended*. |
| [Apollo-2.0-Llama-3.1-8B-Q5_K_S.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q5_K_S.gguf) | Q5_K_S | 5.60GB | false | High quality, *recommended*. |
| [Apollo-2.0-Llama-3.1-8B-Q4_K_L.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q4_K_L.gguf) | Q4_K_L | 5.31GB | false | Uses Q8_0 for embed and output weights. Good quality, *recommended*. |
| [Apollo-2.0-Llama-3.1-8B-Q4_K_M.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q4_K_M.gguf) | Q4_K_M | 4.92GB | false | Good quality, default size for must use cases, *recommended*. |
| [Apollo-2.0-Llama-3.1-8B-Q3_K_XL.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q3_K_XL.gguf) | Q3_K_XL | 4.78GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
| [Apollo-2.0-Llama-3.1-8B-Q4_K_S.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q4_K_S.gguf) | Q4_K_S | 4.69GB | false | Slightly lower quality with more space savings, *recommended*. |
| [Apollo-2.0-Llama-3.1-8B-Q4_0.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q4_0.gguf) | Q4_0 | 4.68GB | false | Legacy format, generally not worth using over similarly sized formats |
| [Apollo-2.0-Llama-3.1-8B-Q4_0_8_8.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q4_0_8_8.gguf) | Q4_0_8_8 | 4.66GB | false | Optimized for ARM inference. Requires 'sve' support (see link below). |
| [Apollo-2.0-Llama-3.1-8B-Q4_0_4_8.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q4_0_4_8.gguf) | Q4_0_4_8 | 4.66GB | false | Optimized for ARM inference. Requires 'i8mm' support (see link below). |
| [Apollo-2.0-Llama-3.1-8B-Q4_0_4_4.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q4_0_4_4.gguf) | Q4_0_4_4 | 4.66GB | false | Optimized for ARM inference. Should work well on all ARM chips, pick this if you're unsure. |
| [Apollo-2.0-Llama-3.1-8B-IQ4_XS.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-IQ4_XS.gguf) | IQ4_XS | 4.45GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
| [Apollo-2.0-Llama-3.1-8B-Q3_K_L.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q3_K_L.gguf) | Q3_K_L | 4.32GB | false | Lower quality but usable, good for low RAM availability. |
| [Apollo-2.0-Llama-3.1-8B-Q3_K_M.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q3_K_M.gguf) | Q3_K_M | 4.02GB | false | Low quality. |
| [Apollo-2.0-Llama-3.1-8B-IQ3_M.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-IQ3_M.gguf) | IQ3_M | 3.78GB | false | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| [Apollo-2.0-Llama-3.1-8B-Q2_K_L.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q2_K_L.gguf) | Q2_K_L | 3.69GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
| [Apollo-2.0-Llama-3.1-8B-Q3_K_S.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q3_K_S.gguf) | Q3_K_S | 3.66GB | false | Low quality, not recommended. |
| [Apollo-2.0-Llama-3.1-8B-IQ3_XS.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-IQ3_XS.gguf) | IQ3_XS | 3.52GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
| [Apollo-2.0-Llama-3.1-8B-Q2_K.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-Q2_K.gguf) | Q2_K | 3.18GB | false | Very low quality but surprisingly usable. |
| [Apollo-2.0-Llama-3.1-8B-IQ2_M.gguf](https://huggingface.co/bartowski/Apollo-2.0-Llama-3.1-8B-GGUF/blob/main/Apollo-2.0-Llama-3.1-8B-IQ2_M.gguf) | IQ2_M | 2.95GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
## Embed/output weights
Some of these quants (Q3_K_XL, Q4_K_L etc) are the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of what they would normally default to.
Some say that this improves the quality, others don't notice any difference. If you use these models PLEASE COMMENT with your findings. I would like feedback that these are actually used and useful so I don't keep uploading quants no one is using.
Thanks!
## Downloading using huggingface-cli
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download bartowski/Apollo-2.0-Llama-3.1-8B-GGUF --include "Apollo-2.0-Llama-3.1-8B-Q4_K_M.gguf" --local-dir ./
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download bartowski/Apollo-2.0-Llama-3.1-8B-GGUF --include "Apollo-2.0-Llama-3.1-8B-Q8_0/*" --local-dir ./
```
You can either specify a new local-dir (Apollo-2.0-Llama-3.1-8B-Q8_0) or download them all in place (./)
## Q4_0_X_X
These are *NOT* for Metal (Apple) offloading, only ARM chips.
If you're using an ARM chip, the Q4_0_X_X quants will have a substantial speedup. Check out Q4_0_4_4 speed comparisons [on the original pull request](https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660)
To check which one would work best for your ARM chip, you can check [AArch64 SoC features](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) (thanks EloyOn!).
## Which file should I choose?
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
## Credits
Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
Thank you ZeroWw for the inspiration to experiment with embed/output
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}