Llamacpp quants

This commit is contained in:
ai-modelscope
2024-07-11 23:06:11 +08:00
parent fc6f2b39c4
commit 9a613da0e0
80 changed files with 291 additions and 53 deletions

49
.gitattributes vendored
View File

@@ -1,37 +1,60 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ1_M.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ1_S.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ2_S.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ2_XS.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ2_XXS.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ3_XXS.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-f32.gguf/Phi-3-medium-128k-instruct-f32-00001-of-00002.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct-f32.gguf/Phi-3-medium-128k-instruct-f32-00002-of-00002.gguf filter=lfs diff=lfs merge=lfs -text
Phi-3-medium-128k-instruct.imatrix filter=lfs diff=lfs merge=lfs -text

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
998375e6a045031159fbf315160504fdd6f4570ad2dd99235714f5eee9c4ed4c
1720699803.541717

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
3c3574930aff50056788f80fed9f9873ac808ff01411d12c6d9c4404dcd9c759
1720699668.6641228

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
4ac128777bcf71ddf40d72bfbfdf5c028b15a19d76a2cbc3a0a101560a3f1ff9
1720699594.338856

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
f119217ecda02e04166f9ce1347614ea4299a0450f3c3258f99a3e372e23f5b3
1720699800.3382757

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
8c769c4137173dd434c070e116e4b0599af2b12752ba4c7188a1bf8bf5372a55
1720699579.1741178

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
9dc512a2ddaed5576fe7ca1a981963c83c04079db3c9c8eabf87ea6690fa74f2
1720699920.5230262

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
a4a1df7bac1fe1a9991388bc78e5eb057999e9ade7e96d832cbdc377e4a267a8
1720699943.9514942

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
63e84a0c958c0bca68f6301057795284c79c36dafdc81ae4b2f084e603fd529e
1720700567.0469356

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
b4a79e3eaaf7daf0653a8b8cadd3b6e9d71be71cc0abbce723170cf8ded3acd4
1720702078.911839

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
07cdb03e7d491b53fa294c497d109707c15ef612180f5a728e802bc480e9327a
1720701498.0246139

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
177a001bf59ff2b7dd63393dfee1e53540a56a066ef5321582f8b552ac7218f3
1720702007.2237422

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
a9098e8e89e69b71592a1f86d06d835652a42e844f0eafad99f71ebbebef379d
1720702287.7485409

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
902f408660621f211a4cc11387e86754d1e5cbf15341a076a7342b1e9461a30b
1720701469.298739

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
d2c7f00f5b2ba6bbd185a9b404f02a2705a8c4709513734372f0d32efc20ab98
1720702637.392722

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
38b3f72bdd536b163168583ce8deef6ac4966444c67c699b1ee2e51e82d3fc42
1720703778.3648539

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
7c323ef2d5a5e9bc2a7bbb90a460301c90e271d9404545b8b1ab134ab228a87d
1720702705.0262878

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
d8ed734336b8b83977874afc85f11674bf3b49a92e8e02ac4197e6812f10c242
1720704264.7329867

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
d39678f87e2c17fccc545955f5d61cc26f3a592c56d503cfea0d7d0e7ac6cf81
1720703810.3809412

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
5881f155da96899eb0639ae35b8b8f072267b96ca0955926cd5ef8bf6cee9fbd
1720706051.8604083

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
a0da98dc6e5f1929e4a4343b4983e345e0202ce5ffdac578ee5947c79fdd01f9
1720705286.8964772

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
fb6337cbbe910dd9a8eb5078b30e09174853a2b5bbf8cebe0fff97d2e8ab9106
1720706019.9594212

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
3d344e88ec30ae94b506159a43106ce55126687a2427ff7821f1ebf5dd42a648
1720706636.2906826

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
658293f962c634fe5e82792f9ed1377659dc696f3242f9fcfa50b4784e4d43d0
1720707476.2532258

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
50d896f18ef862d6962ffb8a1d52daf857959dcd023f9295da514c5e13f77a1f
1720707234.8394835

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
499a8bb5daebfc977e82ecc53018aa645e92dff3d869c2ef74eddf68f04c858f
1720703818.0220778

View File

View File

@@ -0,0 +1,3 @@
c5d2399e2aaa92288f79a3230b6303767fa9e3d8
3b01534129b05996b082547519f06abe004751b2
1720703818.922302

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:998375e6a045031159fbf315160504fdd6f4570ad2dd99235714f5eee9c4ed4c
size 3242157088

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3c3574930aff50056788f80fed9f9873ac808ff01411d12c6d9c4404dcd9c759
size 2957997088

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4ac128777bcf71ddf40d72bfbfdf5c028b15a19d76a2cbc3a0a101560a3f1ff9
size 4717006368

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f119217ecda02e04166f9ce1347614ea4299a0450f3c3258f99a3e372e23f5b3
size 4338126368

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c769c4137173dd434c070e116e4b0599af2b12752ba4c7188a1bf8bf5372a55
size 4127405088

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9dc512a2ddaed5576fe7ca1a981963c83c04079db3c9c8eabf87ea6690fa74f2
size 3715757088

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a4a1df7bac1fe1a9991388bc78e5eb057999e9ade7e96d832cbdc377e4a267a8
size 6473977888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:63e84a0c958c0bca68f6301057795284c79c36dafdc81ae4b2f084e603fd529e
size 6064889888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b4a79e3eaaf7daf0653a8b8cadd3b6e9d71be71cc0abbce723170cf8ded3acd4
size 5806841888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:07cdb03e7d491b53fa294c497d109707c15ef612180f5a728e802bc480e9327a
size 5453262368

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:177a001bf59ff2b7dd63393dfee1e53540a56a066ef5321582f8b552ac7218f3
size 7897125408

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a9098e8e89e69b71592a1f86d06d835652a42e844f0eafad99f71ebbebef379d
size 7466011168

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:902f408660621f211a4cc11387e86754d1e5cbf15341a076a7342b1e9461a30b
size 5143000608

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d2c7f00f5b2ba6bbd185a9b404f02a2705a8c4709513734372f0d32efc20ab98
size 7490297888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:38b3f72bdd536b163168583ce8deef6ac4966444c67c699b1ee2e51e82d3fc42
size 6923411488

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7c323ef2d5a5e9bc2a7bbb90a460301c90e271d9404545b8b1ab134ab228a87d
size 6064889888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d8ed734336b8b83977874afc85f11674bf3b49a92e8e02ac4197e6812f10c242
size 8566821408

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d39678f87e2c17fccc545955f5d61cc26f3a592c56d503cfea0d7d0e7ac6cf81
size 7954469408

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5881f155da96899eb0639ae35b8b8f072267b96ca0955926cd5ef8bf6cee9fbd
size 10074190368

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a0da98dc6e5f1929e4a4343b4983e345e0202ce5ffdac578ee5947c79fdd01f9
size 9621582368

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:fb6337cbbe910dd9a8eb5078b30e09174853a2b5bbf8cebe0fff97d2e8ab9106
size 11453817888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3d344e88ec30ae94b506159a43106ce55126687a2427ff7821f1ebf5dd42a648
size 14834712608

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:658293f962c634fe5e82792f9ed1377659dc696f3242f9fcfa50b4784e4d43d0
size 32010790752

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:50d896f18ef862d6962ffb8a1d52daf857959dcd023f9295da514c5e13f77a1f
size 23830902880

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:499a8bb5daebfc977e82ecc53018aa645e92dff3d869c2ef74eddf68f04c858f
size 5330287

141
README.md
View File

@@ -1,47 +1,108 @@
---
license: Apache License 2.0
license: mit
license_link: https://huggingface.co/microsoft/Phi-3-medium-128k-instruct/resolve/main/LICENSE
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
language:
- multilingual
pipeline_tag: text-generation
tags:
- nlp
- code
inference:
parameters:
temperature: 0.7
widget:
- messages:
- role: user
content: Can you provide ways to eat combinations of bananas and dragonfruits?
quantized_by: bartowski
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
SDK下载
```bash
#安装ModelScope
pip install modelscope
## Llamacpp imatrix Quantizations of Phi-3-medium-128k-instruct
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> pull request <a href="https://github.com/ggerganov/llama.cpp/pull/7225">7225</a> for quantization.
Original model: https://huggingface.co/microsoft/Phi-3-medium-128k-instruct
All quants made using imatrix option with dataset from [here](https://gist.github.com/bartowski1182/b6ac44691e994344625687afe3263b3a)
## Prompt format
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('AI-ModelScope/Phi-3-medium-128k-instruct-GGUF')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/AI-ModelScope/Phi-3-medium-128k-instruct-GGUF.git
<|user|> {prompt}<|end|><|assistant|><|end|>
```
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Description |
| -------- | ---------- | --------- | ----------- |
| [Phi-3-medium-128k-instruct-Q8_0.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-Q8_0.gguf) | Q8_0 | 14.83GB | Extremely high quality, generally unneeded but max available quant. |
| [Phi-3-medium-128k-instruct-Q6_K.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-Q6_K.gguf) | Q6_K | 11.45GB | Very high quality, near perfect, *recommended*. |
| [Phi-3-medium-128k-instruct-Q5_K_M.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-Q5_K_M.gguf) | Q5_K_M | 10.07GB | High quality, *recommended*. |
| [Phi-3-medium-128k-instruct-Q5_K_S.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-Q5_K_S.gguf) | Q5_K_S | 9.62GB | High quality, *recommended*. |
| [Phi-3-medium-128k-instruct-Q4_K_M.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-Q4_K_M.gguf) | Q4_K_M | 8.56GB | Good quality, uses about 4.83 bits per weight, *recommended*. |
| [Phi-3-medium-128k-instruct-Q4_K_S.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-Q4_K_S.gguf) | Q4_K_S | 7.95GB | Slightly lower quality with more space savings, *recommended*. |
| [Phi-3-medium-128k-instruct-IQ4_NL.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ4_NL.gguf) | IQ4_NL | 7.89GB | Decent quality, slightly smaller than Q4_K_S with similar performance *recommended*. |
| [Phi-3-medium-128k-instruct-IQ4_XS.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ4_XS.gguf) | IQ4_XS | 7.46GB | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
| [Phi-3-medium-128k-instruct-Q3_K_L.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-Q3_K_L.gguf) | Q3_K_L | 7.49GB | Lower quality but usable, good for low RAM availability. |
| [Phi-3-medium-128k-instruct-Q3_K_M.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-Q3_K_M.gguf) | Q3_K_M | 6.92GB | Even lower quality. |
| [Phi-3-medium-128k-instruct-IQ3_M.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ3_M.gguf) | IQ3_M | 6.47GB | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| [Phi-3-medium-128k-instruct-IQ3_S.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ3_S.gguf) | IQ3_S | 6.06GB | Lower quality, new method with decent performance, recommended over Q3_K_S quant, same size with better performance. |
| [Phi-3-medium-128k-instruct-Q3_K_S.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-Q3_K_S.gguf) | Q3_K_S | 6.06GB | Low quality, not recommended. |
| [Phi-3-medium-128k-instruct-IQ3_XS.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ3_XS.gguf) | IQ3_XS | 5.80GB | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
| [Phi-3-medium-128k-instruct-IQ3_XXS.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ3_XXS.gguf) | IQ3_XXS | 5.45GB | Lower quality, new method with decent performance, comparable to Q3 quants. |
| [Phi-3-medium-128k-instruct-Q2_K.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-Q2_K.gguf) | Q2_K | 5.14GB | Very low quality but surprisingly usable. |
| [Phi-3-medium-128k-instruct-IQ2_M.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ2_M.gguf) | IQ2_M | 4.71GB | Very low quality, uses SOTA techniques to also be surprisingly usable. |
| [Phi-3-medium-128k-instruct-IQ2_S.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ2_S.gguf) | IQ2_S | 4.33GB | Very low quality, uses SOTA techniques to be usable. |
| [Phi-3-medium-128k-instruct-IQ2_XS.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ2_XS.gguf) | IQ2_XS | 4.12GB | Very low quality, uses SOTA techniques to be usable. |
| [Phi-3-medium-128k-instruct-IQ2_XXS.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ2_XXS.gguf) | IQ2_XXS | 3.71GB | Lower quality, uses SOTA techniques to be usable. |
| [Phi-3-medium-128k-instruct-IQ1_M.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ1_M.gguf) | IQ1_M | 3.24GB | Extremely low quality, *not* recommended. |
| [Phi-3-medium-128k-instruct-IQ1_S.gguf](https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/blob/main/Phi-3-medium-128k-instruct-IQ1_S.gguf) | IQ1_S | 2.95GB | Extremely low quality, *not* recommended. |
## Downloading using huggingface-cli
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download bartowski/Phi-3-medium-128k-instruct-GGUF --include "Phi-3-medium-128k-instruct-Q4_K_M.gguf" --local-dir ./
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download bartowski/Phi-3-medium-128k-instruct-GGUF --include "Phi-3-medium-128k-instruct-Q8_0.gguf/*" --local-dir Phi-3-medium-128k-instruct-Q8_0
```
You can either specify a new local-dir (Phi-3-medium-128k-instruct-Q8_0) or download them all in place (./)
## Which file should I choose?
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

1
configuration.json Normal file
View File

@@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-generation", "allow_remote": true}