Llamacpp quants

This commit is contained in:
ai-modelscope
2024-11-29 08:34:09 +08:00
parent c67b9cce2e
commit 520e0f0ac5
26 changed files with 356 additions and 63 deletions

55
.gitattributes vendored
View File

@@ -1,47 +1,58 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ1_M.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ1_S.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ2_S.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ2_XS.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ2_XXS.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ3_XXS.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Einstein-v6.1-Llama3-8B.imatrix filter=lfs diff=lfs merge=lfs -text

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:21f86f83100d4faff8d856dc5b70ac99faf57ea64189a42dd8308b67f8d8f97c
size 2161988480

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2531e4e4ebdc74a686ad87d5d15f4904ada163c15334dbfb50765f99068bac50
size 2019644288

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:23b32985e942521894f2110395f7f2ddbdcb1579d12ab37ee73805e993293e4b
size 2948299264

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f391fd4acb522dbeee5747b60ed209a6efa310fe1324f18c205488f8d745f1a3
size 2758507008

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9393bbba20984793115ab632ece5e6f66a7b79b287abbdb8a3369a390e3c5439
size 2605798272

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5f0d41fe152c6ccf26d1a9d33b08af3fcc8e5e48f77da2b225f8fa10eb9838a5
size 2399228800

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f8d0d1b8c316add6a983df354c811155a87731b9f727d41f3122cc953f50aeea
size 3784843904

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:82fd93002aa3e62fbc407476cf942c12945f34ead033ba44c71e9ac1e91afeb7
size 3682345600

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d285f7591dbbf02a35196bbd551dfcbc64ca4bb96d05e1a844981ad965a076f0
size 3518767744

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8aa53ced184d36bd7f438d4fb80ebe24e7b27a09b81676316063b49098429f3b
size 3274930688

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ff8d91e769c701ae25f66565810d81b868a6f7249223852d79da65032b124cdb
size 4678011648

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7a7043679df0c45add3b05e404fa8c9fd36db538deae39f569004ab995f4fcf2
size 4447684864

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2298db15ff26cadffe2eb15307e2eb6d278b2a56751ddd6db7f061f2996353f5
size 3179150336

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4a757b5ed01f279db926ee34e61ab631ead1c5694a36d3da01ae08197af0ee1c
size 4321976960

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c010cab23b319e1a20291109a5a99102a27e3b0a4dbd6653caedbe7ea189a320
size 4018938496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9712cf94bf34af86e537ba32bf0ba392e8894cb6e01db232c182c5e139c22c91
size 3664519808

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:447587bd8f60d9050232148d34fdb2d88b15b2413fd7f8e095a4606ec60b45bf
size 4920756992

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:827bbbcaa31ac9cca7b49b60fa4092884e4dea878433153fe4f3b2efb59c0c7a
size 4692691712

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d34b6328341946a78e691f0de36b1a595efba37156410a6bd2d7ce4481b6530
size 5733012224

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a2242ad8c5152aadfdb1353b1759a0d4a6eaa207c919e781c5707c983e154bed
size 5599318784

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:380964006b5eef42f64ff784398742d28f8c3c978f31ba757df4e9e5cee9a016
size 6596033408

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b381f0d74fe7793a8551e2bf08de7b83ceb2b3935c260b45b946c9a1fb6a5cb5
size 8540805760

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a2661c59f42b44d840d48534e20209a871d868a2e0fd697711e8171389a40539
size 4988166

294
README.md
View File

@@ -1,47 +1,259 @@
---
license: Apache License 2.0
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
language:
- en
license: other
tags:
- axolotl
- generated_from_trainer
- instruct
- finetune
- chatml
- gpt4
- synthetic data
- science
- physics
- chemistry
- biology
- math
- llama
- llama3
base_model: meta-llama/Meta-Llama-3-8B
datasets:
- allenai/ai2_arc
- camel-ai/physics
- camel-ai/chemistry
- camel-ai/biology
- camel-ai/math
- metaeval/reclor
- openbookqa
- mandyyyyii/scibench
- derek-thomas/ScienceQA
- TIGER-Lab/ScienceEval
- jondurbin/airoboros-3.2
- LDJnr/Capybara
- Cot-Alpaca-GPT4-From-OpenHermes-2.5
- STEM-AI-mtl/Electrical-engineering
- knowrohit07/saraswati-stem
- sablo/oasst2_curated
- lmsys/lmsys-chat-1m
- TIGER-Lab/MathInstruct
- bigbio/med_qa
- meta-math/MetaMathQA-40K
- openbookqa
- piqa
- metaeval/reclor
- derek-thomas/ScienceQA
- scibench
- sciq
- Open-Orca/SlimOrca
- migtissera/Synthia-v1.3
- TIGER-Lab/ScienceEval
- allenai/WildChat
- microsoft/orca-math-word-problems-200k
- openchat/openchat_sharegpt4_dataset
- teknium/GPTeacher-General-Instruct
- m-a-p/CodeFeedback-Filtered-Instruction
- totally-not-an-llm/EverythingLM-data-V3
- HuggingFaceH4/no_robots
- OpenAssistant/oasst_top1_2023-08-25
- WizardLM/WizardLM_evol_instruct_70k
model-index:
- name: Einstein-v6.1-Llama3-8B
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 62.46
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 82.41
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 66.19
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 55.1
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 79.32
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 66.11
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v6.1-Llama3-8B
name: Open LLM Leaderboard
quantized_by: bartowski
pipeline_tag: text-generation
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
SDK下载
```bash
#安装ModelScope
pip install modelscope
## Llamacpp imatrix Quantizations of Einstein-v6.1-Llama3-8B
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b2777">b2777</a> for quantization.
Original model: https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B
All quants made using imatrix option with dataset provided by Kalomaze [here](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
## Prompt format
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('bartowski/Einstein-v6.1-Llama3-8B-GGUF')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/bartowski/Einstein-v6.1-Llama3-8B-GGUF.git
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Description |
| -------- | ---------- | --------- | ----------- |
| [Einstein-v6.1-Llama3-8B-Q8_0.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q8_0.gguf) | Q8_0 | 8.54GB | Extremely high quality, generally unneeded but max available quant. |
| [Einstein-v6.1-Llama3-8B-Q6_K.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q6_K.gguf) | Q6_K | 6.59GB | Very high quality, near perfect, *recommended*. |
| [Einstein-v6.1-Llama3-8B-Q5_K_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q5_K_M.gguf) | Q5_K_M | 5.73GB | High quality, *recommended*. |
| [Einstein-v6.1-Llama3-8B-Q5_K_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q5_K_S.gguf) | Q5_K_S | 5.59GB | High quality, *recommended*. |
| [Einstein-v6.1-Llama3-8B-Q4_K_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q4_K_M.gguf) | Q4_K_M | 4.92GB | Good quality, uses about 4.83 bits per weight, *recommended*. |
| [Einstein-v6.1-Llama3-8B-Q4_K_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q4_K_S.gguf) | Q4_K_S | 4.69GB | Slightly lower quality with more space savings, *recommended*. |
| [Einstein-v6.1-Llama3-8B-IQ4_NL.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ4_NL.gguf) | IQ4_NL | 4.67GB | Decent quality, slightly smaller than Q4_K_S with similar performance *recommended*. |
| [Einstein-v6.1-Llama3-8B-IQ4_XS.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ4_XS.gguf) | IQ4_XS | 4.44GB | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
| [Einstein-v6.1-Llama3-8B-Q3_K_L.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q3_K_L.gguf) | Q3_K_L | 4.32GB | Lower quality but usable, good for low RAM availability. |
| [Einstein-v6.1-Llama3-8B-Q3_K_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q3_K_M.gguf) | Q3_K_M | 4.01GB | Even lower quality. |
| [Einstein-v6.1-Llama3-8B-IQ3_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ3_M.gguf) | IQ3_M | 3.78GB | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| [Einstein-v6.1-Llama3-8B-IQ3_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ3_S.gguf) | IQ3_S | 3.68GB | Lower quality, new method with decent performance, recommended over Q3_K_S quant, same size with better performance. |
| [Einstein-v6.1-Llama3-8B-Q3_K_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q3_K_S.gguf) | Q3_K_S | 3.66GB | Low quality, not recommended. |
| [Einstein-v6.1-Llama3-8B-IQ3_XS.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ3_XS.gguf) | IQ3_XS | 3.51GB | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
| [Einstein-v6.1-Llama3-8B-IQ3_XXS.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ3_XXS.gguf) | IQ3_XXS | 3.27GB | Lower quality, new method with decent performance, comparable to Q3 quants. |
| [Einstein-v6.1-Llama3-8B-Q2_K.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-Q2_K.gguf) | Q2_K | 3.17GB | Very low quality but surprisingly usable. |
| [Einstein-v6.1-Llama3-8B-IQ2_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ2_M.gguf) | IQ2_M | 2.94GB | Very low quality, uses SOTA techniques to also be surprisingly usable. |
| [Einstein-v6.1-Llama3-8B-IQ2_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ2_S.gguf) | IQ2_S | 2.75GB | Very low quality, uses SOTA techniques to be usable. |
| [Einstein-v6.1-Llama3-8B-IQ2_XS.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ2_XS.gguf) | IQ2_XS | 2.60GB | Very low quality, uses SOTA techniques to be usable. |
| [Einstein-v6.1-Llama3-8B-IQ2_XXS.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ2_XXS.gguf) | IQ2_XXS | 2.39GB | Lower quality, uses SOTA techniques to be usable. |
| [Einstein-v6.1-Llama3-8B-IQ1_M.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ1_M.gguf) | IQ1_M | 2.16GB | Extremely low quality, *not* recommended. |
| [Einstein-v6.1-Llama3-8B-IQ1_S.gguf](https://huggingface.co/bartowski/Einstein-v6.1-Llama3-8B-GGUF/blob/main/Einstein-v6.1-Llama3-8B-IQ1_S.gguf) | IQ1_S | 2.01GB | Extremely low quality, *not* recommended. |
## Downloading using huggingface-cli
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download bartowski/Einstein-v6.1-Llama3-8B-GGUF --include "Einstein-v6.1-Llama3-8B-Q4_K_M.gguf" --local-dir ./ --local-dir-use-symlinks False
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download bartowski/Einstein-v6.1-Llama3-8B-GGUF --include "Einstein-v6.1-Llama3-8B-Q8_0.gguf/*" --local-dir Einstein-v6.1-Llama3-8B-Q8_0 --local-dir-use-symlinks False
```
You can either specify a new local-dir (Einstein-v6.1-Llama3-8B-Q8_0) or download them all in place (./)
## Which file should I choose?
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}