Update metadata with huggingface_hub

This commit is contained in:
ai-modelscope
2024-11-26 06:13:02 +08:00
parent cca56ce3d3
commit 731cecdf1d
28 changed files with 228 additions and 63 deletions

57
.gitattributes vendored
View File

@@ -1,47 +1,60 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q6_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q5_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q4_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q4_0_8_8.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q4_0_4_8.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q4_0_4_4.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q3_K_XL.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q2_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B-f16.gguf filter=lfs diff=lfs merge=lfs -text
Gemma2-Gutenberg-Doppel-9B.imatrix filter=lfs diff=lfs merge=lfs -text

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9df2e91d096b88da0889be78830d420319c358daf65657d9cdde27093666fff9
size 3434669728

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cebf0535f8a487e5656da6461f55d7a005a2b520f5ecd4bd3ef9097ec9d8f0b3
size 4494616224

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c82a2a08d1673d0a83eb7793b7431779376bde2e10cfec3b7d41f3d1c8cbabaf
size 4144989856

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:89106820540de62d360064fb2b8b11b136a4ae22d545d58fd309f018c929f1fc
size 5183030944

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6fe86ab0d339f637e1fdedb77ff5f932f30a4a1e764d4f1bd066ff08d5eb55fe
size 3805398688

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:073c8e4f180c5495a95ac7c9ecd7975ed1711bff6c735d6a131520dccab97545
size 4027606688

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:727698bd78232010c68600554c1ffad02c48df9ebb4650a3234f2413d01b1d7d
size 5132453536

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a9d32f1bdf7be299bc99d8d03d490054da85b04af1f25c09e6d29ebe971b88a5
size 4761781920

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d474f205dc16b7cab7feaaceaee7c83853a295acadea7a099c17dacd96707fca
size 4337665696

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dc611a06ba05ad5eb2eb9240eaea673dc94026d70d47c110d07712164be0368e
size 5354661536

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:59f7df904c440306cedde2e06851b12c1ec7ec6846ea79fec74b6ba938e21ed0
size 5459199648

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7aeb03bc52dfd1257f58974fb6251a4c0bf917bad3ac7fff060e4552dd7988ba
size 5443143328

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:faaf2bedfd90dc6ce221def53c5a01f73907afe23cc237d8a519244045a07ec9
size 5443143328

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a4d4cf2c57fdf710311bb15197dc9b7a96bc65e7cc91879752b14e95f09c38ea
size 5443143328

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8a550abee0371c312e6c12bc5f57b4aa9573eb76d9cd184ef979f7dc0cd4af31
size 5983266464

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:77608393efbf71eaf24cda1b663a4e64bb076085effb9d34e8a654e3a93f726f
size 5761058464

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ce3b9b1df20fcf3d9ca8b967ed8a289d311d060a9491d7de886bc31b86f1be96
size 5478925984

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e0de0fad968b4dc3a359db00825198d9df07ca82cd1f97b380a5cf0a6ff9f62e
size 6869575328

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9d40ae8a544552ae99b6321fbba7faf994ad0e0c5cbc02f3708b5e90f4f0a497
size 6647367328

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7f3f81560be544e141a8b3873339ce392f2d9663d394edc732d773c38cef3cc1
size 6483592864

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:46dd5146d54a60853894f5b7f7a10a7af58b85ce4373da86182aff32136379fa
size 7589070496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3e375e814422acc3c7904cc2e586539599a54d0ed79e05ff2c91d2a142b17683
size 7811278496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d8b1286be0117f2bb59016fd48c5bbe19320b8e451b68f3c013a37768e9bf04d
size 9827149472

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b5433e3a9c02a35b9522afeac92dfc69857766a4b375060955f91c02e1aac0c9
size 18490680704

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1e43696d5896010989c36091afa0e8f31958fa56bcfa9fb57b40157419b2fc65
size 6116900

158
README.md
View File

@@ -1,47 +1,123 @@
---
license: Apache License 2.0
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
base_model: nbeerbower/Gemma2-Gutenberg-Doppel-9B
datasets:
- jondurbin/gutenberg-dpo-v0.1
- nbeerbower/gutenberg2-dpo
library_name: transformers
license: gemma
pipeline_tag: text-generation
quantized_by: bartowski
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
SDK下载
```bash
#安装ModelScope
pip install modelscope
## Llamacpp imatrix Quantizations of Gemma2-Gutenberg-Doppel-9B
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3841">b3841</a> for quantization.
Original model: https://huggingface.co/nbeerbower/Gemma2-Gutenberg-Doppel-9B
All quants made using imatrix option with dataset from [here](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8)
Run them in [LM Studio](https://lmstudio.ai/)
## Prompt format
No prompt format found, check original model page
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
| -------- | ---------- | --------- | ----- | ----------- |
| [Gemma2-Gutenberg-Doppel-9B-f16.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-f16.gguf) | f16 | 18.49GB | false | Full F16 weights. |
| [Gemma2-Gutenberg-Doppel-9B-Q8_0.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q8_0.gguf) | Q8_0 | 9.83GB | false | Extremely high quality, generally unneeded but max available quant. |
| [Gemma2-Gutenberg-Doppel-9B-Q6_K_L.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q6_K_L.gguf) | Q6_K_L | 7.81GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
| [Gemma2-Gutenberg-Doppel-9B-Q6_K.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q6_K.gguf) | Q6_K | 7.59GB | false | Very high quality, near perfect, *recommended*. |
| [Gemma2-Gutenberg-Doppel-9B-Q5_K_L.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q5_K_L.gguf) | Q5_K_L | 6.87GB | false | Uses Q8_0 for embed and output weights. High quality, *recommended*. |
| [Gemma2-Gutenberg-Doppel-9B-Q5_K_M.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q5_K_M.gguf) | Q5_K_M | 6.65GB | false | High quality, *recommended*. |
| [Gemma2-Gutenberg-Doppel-9B-Q5_K_S.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q5_K_S.gguf) | Q5_K_S | 6.48GB | false | High quality, *recommended*. |
| [Gemma2-Gutenberg-Doppel-9B-Q4_K_L.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q4_K_L.gguf) | Q4_K_L | 5.98GB | false | Uses Q8_0 for embed and output weights. Good quality, *recommended*. |
| [Gemma2-Gutenberg-Doppel-9B-Q4_K_M.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q4_K_M.gguf) | Q4_K_M | 5.76GB | false | Good quality, default size for must use cases, *recommended*. |
| [Gemma2-Gutenberg-Doppel-9B-Q4_K_S.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q4_K_S.gguf) | Q4_K_S | 5.48GB | false | Slightly lower quality with more space savings, *recommended*. |
| [Gemma2-Gutenberg-Doppel-9B-Q4_0.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q4_0.gguf) | Q4_0 | 5.46GB | false | Legacy format, generally not worth using over similarly sized formats |
| [Gemma2-Gutenberg-Doppel-9B-Q4_0_8_8.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q4_0_8_8.gguf) | Q4_0_8_8 | 5.44GB | false | Optimized for ARM inference. Requires 'sve' support (see link below). |
| [Gemma2-Gutenberg-Doppel-9B-Q4_0_4_8.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q4_0_4_8.gguf) | Q4_0_4_8 | 5.44GB | false | Optimized for ARM inference. Requires 'i8mm' support (see link below). |
| [Gemma2-Gutenberg-Doppel-9B-Q4_0_4_4.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q4_0_4_4.gguf) | Q4_0_4_4 | 5.44GB | false | Optimized for ARM inference. Should work well on all ARM chips, pick this if you're unsure. |
| [Gemma2-Gutenberg-Doppel-9B-Q3_K_XL.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q3_K_XL.gguf) | Q3_K_XL | 5.35GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
| [Gemma2-Gutenberg-Doppel-9B-IQ4_XS.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-IQ4_XS.gguf) | IQ4_XS | 5.18GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
| [Gemma2-Gutenberg-Doppel-9B-Q3_K_L.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q3_K_L.gguf) | Q3_K_L | 5.13GB | false | Lower quality but usable, good for low RAM availability. |
| [Gemma2-Gutenberg-Doppel-9B-Q3_K_M.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q3_K_M.gguf) | Q3_K_M | 4.76GB | false | Low quality. |
| [Gemma2-Gutenberg-Doppel-9B-IQ3_M.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-IQ3_M.gguf) | IQ3_M | 4.49GB | false | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| [Gemma2-Gutenberg-Doppel-9B-Q3_K_S.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q3_K_S.gguf) | Q3_K_S | 4.34GB | false | Low quality, not recommended. |
| [Gemma2-Gutenberg-Doppel-9B-IQ3_XS.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-IQ3_XS.gguf) | IQ3_XS | 4.14GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
| [Gemma2-Gutenberg-Doppel-9B-Q2_K_L.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q2_K_L.gguf) | Q2_K_L | 4.03GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
| [Gemma2-Gutenberg-Doppel-9B-Q2_K.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-Q2_K.gguf) | Q2_K | 3.81GB | false | Very low quality but surprisingly usable. |
| [Gemma2-Gutenberg-Doppel-9B-IQ2_M.gguf](https://huggingface.co/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF/blob/main/Gemma2-Gutenberg-Doppel-9B-IQ2_M.gguf) | IQ2_M | 3.43GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
## Embed/output weights
Some of these quants (Q3_K_XL, Q4_K_L etc) are the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of what they would normally default to.
Some say that this improves the quality, others don't notice any difference. If you use these models PLEASE COMMENT with your findings. I would like feedback that these are actually used and useful so I don't keep uploading quants no one is using.
Thanks!
## Downloading using huggingface-cli
First, make sure you have hugginface-cli installed:
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF.git
pip install -U "huggingface_hub[cli]"
```
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
Then, you can target the specific file you want:
```
huggingface-cli download bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF --include "Gemma2-Gutenberg-Doppel-9B-Q4_K_M.gguf" --local-dir ./
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download bartowski/Gemma2-Gutenberg-Doppel-9B-GGUF --include "Gemma2-Gutenberg-Doppel-9B-Q8_0/*" --local-dir ./
```
You can either specify a new local-dir (Gemma2-Gutenberg-Doppel-9B-Q8_0) or download them all in place (./)
## Q4_0_X_X
These are *NOT* for Metal (Apple) offloading, only ARM chips.
If you're using an ARM chip, the Q4_0_X_X quants will have a substantial speedup. Check out Q4_0_4_4 speed comparisons [on the original pull request](https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660)
To check which one would work best for your ARM chip, you can check [AArch64 SoC features](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) (thanks EloyOn!).
## Which file should I choose?
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
## Credits
Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
Thank you ZeroWw for the inspiration to experiment with embed/output
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}