初始化项目,由ModelHub XC社区提供模型

Model: bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-29 04:32:16 +08:00
commit 2b1b5ef16e
29 changed files with 307 additions and 0 deletions

62
.gitattributes vendored Normal file
View File

@@ -0,0 +1,62 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q6_K_L.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q5_K_L.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_K_L.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_XL.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ3_XXS.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q2_K_L.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-bf16.gguf filter=lfs diff=lfs merge=lfs -text
smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-imatrix.gguf filter=lfs diff=lfs merge=lfs -text
mmproj-smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-f16.gguf filter=lfs diff=lfs merge=lfs -text
mmproj-smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-bf16.gguf filter=lfs diff=lfs merge=lfs -text

164
README.md Normal file
View File

@@ -0,0 +1,164 @@
---
quantized_by: bartowski
pipeline_tag: image-text-to-text
base_model_relation: quantized
base_model: smolagents/SmolVLM2-2.2B-Instruct-Agentic-GUI
---
## Llamacpp imatrix Quantizations of SmolVLM2-2.2B-Instruct-Agentic-GUI by smolagents
Using <a href="https://github.com/ggml-org/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggml-org/llama.cpp/releases/tag/b6714">b6714</a> for quantization.
Original model: https://huggingface.co/smolagents/SmolVLM2-2.2B-Instruct-Agentic-GUI
All quants made using imatrix option with dataset from [here](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8) combined with a subset of combined_all_small.parquet from Ed Addario [here](https://huggingface.co/datasets/eaddario/imatrix-calibration/blob/main/combined_all_small.parquet)
Run them in [LM Studio](https://lmstudio.ai/)
Run them directly with [llama.cpp](https://github.com/ggml-org/llama.cpp), or any other llama.cpp based project
## Prompt format
No prompt format found, check original model page
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
| -------- | ---------- | --------- | ----- | ----------- |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-bf16.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-bf16.gguf) | bf16 | 3.63GB | false | Full BF16 weights. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q8_0.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q8_0.gguf) | Q8_0 | 1.93GB | false | Extremely high quality, generally unneeded but max available quant. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q6_K_L.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q6_K_L.gguf) | Q6_K_L | 1.54GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q6_K.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q6_K.gguf) | Q6_K | 1.49GB | false | Very high quality, near perfect, *recommended*. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q5_K_L.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q5_K_L.gguf) | Q5_K_L | 1.36GB | false | Uses Q8_0 for embed and output weights. High quality, *recommended*. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q5_K_M.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q5_K_M.gguf) | Q5_K_M | 1.30GB | false | High quality, *recommended*. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q5_K_S.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q5_K_S.gguf) | Q5_K_S | 1.26GB | false | High quality, *recommended*. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_K_L.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_K_L.gguf) | Q4_K_L | 1.19GB | false | Uses Q8_0 for embed and output weights. Good quality, *recommended*. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_1.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_1.gguf) | Q4_1 | 1.15GB | false | Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_K_M.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_K_M.gguf) | Q4_K_M | 1.11GB | false | Good quality, default size for most use cases, *recommended*. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_K_S.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_K_S.gguf) | Q4_K_S | 1.06GB | false | Slightly lower quality with more space savings, *recommended*. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_XL.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_XL.gguf) | Q3_K_XL | 1.06GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_0.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_0.gguf) | Q4_0 | 1.05GB | false | Legacy format, offers online repacking for ARM and AVX CPU inference. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ4_NL.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ4_NL.gguf) | IQ4_NL | 1.05GB | false | Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ4_XS.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ4_XS.gguf) | IQ4_XS | 0.99GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_L.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_L.gguf) | Q3_K_L | 0.98GB | false | Lower quality but usable, good for low RAM availability. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_M.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_M.gguf) | Q3_K_M | 0.90GB | false | Low quality. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ3_M.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ3_M.gguf) | IQ3_M | 0.85GB | false | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_S.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q3_K_S.gguf) | Q3_K_S | 0.82GB | false | Low quality, not recommended. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q2_K_L.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q2_K_L.gguf) | Q2_K_L | 0.81GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ3_XS.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ3_XS.gguf) | IQ3_XS | 0.78GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ3_XXS.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ3_XXS.gguf) | IQ3_XXS | 0.72GB | false | Lower quality, new method with decent performance, comparable to Q3 quants. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-Q2_K.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q2_K.gguf) | Q2_K | 0.71GB | false | Very low quality but surprisingly usable. |
| [SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ2_M.gguf](https://huggingface.co/bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF/blob/main/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-IQ2_M.gguf) | IQ2_M | 0.66GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
## Embed/output weights
Some of these quants (Q3_K_XL, Q4_K_L etc) are the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of what they would normally default to.
## Downloading using huggingface-cli
<details>
<summary>Click to view download instructions</summary>
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF --include "smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q4_K_M.gguf" --local-dir ./
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download bartowski/smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-GGUF --include "smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q8_0/*" --local-dir ./
```
You can either specify a new local-dir (smolagents_SmolVLM2-2.2B-Instruct-Agentic-GUI-Q8_0) or download them all in place (./)
</details>
## ARM/AVX information
Previously, you would download Q4_0_4_4/4_8/8_8, and these would have their weights interleaved in memory in order to improve performance on ARM and AVX machines by loading up more data in one pass.
Now, however, there is something called "online repacking" for weights. details in [this PR](https://github.com/ggml-org/llama.cpp/pull/9921). If you use Q4_0 and your hardware would benefit from repacking weights, it will do it automatically on the fly.
As of llama.cpp build [b4282](https://github.com/ggml-org/llama.cpp/releases/tag/b4282) you will not be able to run the Q4_0_X_X files and will instead need to use Q4_0.
Additionally, if you want to get slightly better quality for , you can use IQ4_NL thanks to [this PR](https://github.com/ggml-org/llama.cpp/pull/10541) which will also repack the weights for ARM, though only the 4_4 for now. The loading time may be slower but it will result in an overall speed incrase.
<details>
<summary>Click to view Q4_0_X_X information (deprecated</summary>
I'm keeping this section to show the potential theoretical uplift in performance from using the Q4_0 with online repacking.
<details>
<summary>Click to view benchmarks on an AVX2 system (EPYC7702)</summary>
| model | size | params | backend | threads | test | t/s | % (vs Q4_0) |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |-------------: |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp512 | 204.03 ± 1.03 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp1024 | 282.92 ± 0.19 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp2048 | 259.49 ± 0.44 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg128 | 39.12 ± 0.27 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg256 | 39.31 ± 0.69 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg512 | 40.52 ± 0.03 | 100% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp512 | 301.02 ± 1.74 | 147% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp1024 | 287.23 ± 0.20 | 101% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp2048 | 262.77 ± 1.81 | 101% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg128 | 18.80 ± 0.99 | 48% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg256 | 24.46 ± 3.04 | 83% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg512 | 36.32 ± 3.59 | 90% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp512 | 271.71 ± 3.53 | 133% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp1024 | 279.86 ± 45.63 | 100% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp2048 | 320.77 ± 5.00 | 124% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg128 | 43.51 ± 0.05 | 111% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg256 | 43.35 ± 0.09 | 110% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg512 | 42.60 ± 0.31 | 105% |
Q4_0_8_8 offers a nice bump to prompt processing and a small bump to text generation
</details>
</details>
## Which file should I choose?
<details>
<summary>Click here for details</summary>
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggml-org/llama.cpp/wiki/Feature-matrix)
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
</details>
## Credits
Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset.
Thank you ZeroWw for the inspiration to experiment with embed/output.
Thank you to LM Studio for sponsoring my work.
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3948ffeaa499d5f9fcdcc62ad697c014819de9b3074e5fdcc14bec6b9e4d4481
size 872301152

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5f477be9e10a76481ebba81f5ee6b23f68facee900cfb31f951a8d368bfb796d
size 872301152

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b26109fdf8604c85693c741a8a46995c0ffc9e2223bf02397868baad6fb3c2fa
size 658366336

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4e4361224d5ee538e597a1e88f708e211d0fa75e4acaf1f6cf3dcab53a7af995
size 853829504

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c7d742b22e51d870bf7e2a2e1d03261ecd753ebe3b9d42b83e831643b0ae945
size 782657408

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d2e4cab84e669aa93984389340ddde26b943c4669b883ebfbf6979e78c3cde15
size 723640192

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:477f20b7463d209b8c108d4acf10d17102b4b199c8642b65440bf4785c600ada
size 1047719808

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b2911be2361a64d455d068c8d5ed26ab3c264c5f6acb40d17c861ca3fb3808a0
size 994234240

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dee119d72a43020e0c0259f13e23307d8f643fcd07f12bd20786a6cb01f10b4a
size 707919744

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b1c5ed8c913aa416f29c7591d4c787ca0552c005d2fb378642ede11198646755
size 806479744

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cfa25a5b935a3454be5c41df1be12d991142cff17931c1b8e9a30ad1c31aa7b3
size 976119680

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d7f8a1857e042c38fd7c83b0b143f4b19d37aa8f1526d0bb1e49f8156764fd76
size 903767936

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5dbbb21ccea18365710c59142d3bb4a05ae2c0fcdfefb19af409264fcbe5c7a2
size 820406144

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:56a02b50389cfe8b0baa8ff23ada7f67b298ae42e731c3b22b8e115e99d7f46f
size 1064429440

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4a6c9cd468b97521f70c5498ac73ab2e1accd9ff6f557e1174ac1ea90c2801a9
size 1050865536

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e5f6fc0890e3a0e3c1ca4c135c1418dff86a0b0b6889792ef45c013e7f089ef5
size 1154690944

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:beea9551e101fb400fb429a5818182c0ef5b320338767820f88e4978837cfcfc
size 1187506048

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a9e553b743421b711a7b7fa3bfc401b9e429188e470e7e6ba5ee2d22b06ae5e0
size 1112600448

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f796575efd709ae00adf70492649fedaae2d169f5706e498558efcb908409446
size 1056108416

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:34c63c144a0c9bf486abaae11514d2a22d99d3268a11da21495c3a6717c87875
size 1357375360

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3a9e9ab69b3103ac762ab91497a04ca4bd44cb3a1314944ceb936d9a7236ce4d
size 1295085440

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0c4b3eca60b3f6d7fd6e93f6d0662ae1cb1ab2dbf903dd4d7de8bbd2ee7c3564
size 1261662080

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1654561ce95c1b2494f67751aea2af68b88a6ebe7faf512455cbcd8741a36aba
size 1488975744

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:14e86eecc437c55b0991d98484ab5584227d697697c3cb57eca3a19bf7b786e8
size 1537861504

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:60a8f44d9ad650b77b54d06bc0341e7a5ab6f24a06d2bb6b764eab4d87435fe6
size 1927931776

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1b8bb4626b1e3166d01961a5c2925ed62c5166b9dfea304bb02c893d66cd75a5
size 3627116096

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d207142aac93c6902edc389c2044b2f6133c2e12661d8e695fc22c019e4e2f4f
size 1991968