初始化项目,由ModelHub XC社区提供模型

Model: lactroiii/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-04-15 20:10:05 +08:00
commit aa1eff353b
31 changed files with 353 additions and 0 deletions

64
.gitattributes vendored Normal file
View File

@@ -0,0 +1,64 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q6_K_L.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q5_K_L.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_L.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q3_K_XL.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q2_K_L.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ2_M.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ2_S.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ2_XS.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ2_XXS.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-f32/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-f32-00001-of-00003.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-f32/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-f32-00002-of-00003.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B-f32/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-f32-00003-of-00003.gguf filter=lfs diff=lfs merge=lfs -text
cognitivecomputations_Dolphin3.0-R1-Mistral-24B.imatrix filter=lfs diff=lfs merge=lfs -text

202
README.md Normal file
View File

@@ -0,0 +1,202 @@
---
quantized_by: bartowski
pipeline_tag: text-generation
datasets:
- cognitivecomputations/dolphin-r1
- OpenCoder-LLM/opc-sft-stage1
- OpenCoder-LLM/opc-sft-stage2
- microsoft/orca-agentinstruct-1M-v1
- microsoft/orca-math-word-problems-200k
- NousResearch/hermes-function-calling-v1
- AI-MO/NuminaMath-CoT
- AI-MO/NuminaMath-TIR
- allenai/tulu-3-sft-mixture
- cognitivecomputations/dolphin-coder
- HuggingFaceTB/smoltalk
- cognitivecomputations/samantha-data
- m-a-p/CodeFeedback-Filtered-Instruction
- m-a-p/Code-Feedback
base_model: cognitivecomputations/Dolphin3.0-R1-Mistral-24B
language:
- en
---
## Llamacpp imatrix Quantizations of Dolphin3.0-R1-Mistral-24B by cognitivecomputations
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b4585">b4585</a> for quantization.
Original model: https://huggingface.co/cognitivecomputations/Dolphin3.0-R1-Mistral-24B
All quants made using imatrix option with dataset from [here](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8)
Run them in [LM Studio](https://lmstudio.ai/)
Run them directly with [llama.cpp](https://github.com/ggerganov/llama.cpp), or any other llama.cpp based project
## Prompt format
```
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
## Recommended reasoning system prompt
For reasoning, it's recommended you use the following system prompt:
```
You are Dolphin, an AI assistant that helps humanity, trained by Eric Hartford to specialize in reasoning and first-principles analysis.
When responding, always format your replies using <think>{reasoning}</think>{answer}. Use at least 6 reasoning steps and perform a root cause analysis before answering. However, if the answer is very easy and requires little thought, you may leave the <think></think> block empty.
Your responses should be detailed, structured with rich Markdown formatting, and engaging with emojis. Be extensive in your explanations, just as the greatest scientific minds would be. Always reason through the problem first, unless it's trivial, in which case you may answer directly.
```
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
| -------- | ---------- | --------- | ----- | ----------- |
| [Dolphin3.0-R1-Mistral-24B-f32.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/tree/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-f32) | f32 | 94.30GB | true | Full F32 weights. |
| [Dolphin3.0-R1-Mistral-24B-Q8_0.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q8_0.gguf) | Q8_0 | 25.05GB | false | Extremely high quality, generally unneeded but max available quant. |
| [Dolphin3.0-R1-Mistral-24B-Q6_K_L.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q6_K_L.gguf) | Q6_K_L | 19.67GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
| [Dolphin3.0-R1-Mistral-24B-Q6_K.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q6_K.gguf) | Q6_K | 19.35GB | false | Very high quality, near perfect, *recommended*. |
| [Dolphin3.0-R1-Mistral-24B-Q5_K_L.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q5_K_L.gguf) | Q5_K_L | 17.18GB | false | Uses Q8_0 for embed and output weights. High quality, *recommended*. |
| [Dolphin3.0-R1-Mistral-24B-Q5_K_M.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q5_K_M.gguf) | Q5_K_M | 16.76GB | false | High quality, *recommended*. |
| [Dolphin3.0-R1-Mistral-24B-Q5_K_S.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q5_K_S.gguf) | Q5_K_S | 16.30GB | false | High quality, *recommended*. |
| [Dolphin3.0-R1-Mistral-24B-Q4_1.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_1.gguf) | Q4_1 | 14.87GB | false | Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon. |
| [Dolphin3.0-R1-Mistral-24B-Q4_K_L.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_L.gguf) | Q4_K_L | 14.83GB | false | Uses Q8_0 for embed and output weights. Good quality, *recommended*. |
| [Dolphin3.0-R1-Mistral-24B-Q4_K_M.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_M.gguf) | Q4_K_M | 14.33GB | false | Good quality, default size for most use cases, *recommended*. |
| [Dolphin3.0-R1-Mistral-24B-Q4_K_S.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_S.gguf) | Q4_K_S | 13.55GB | false | Slightly lower quality with more space savings, *recommended*. |
| [Dolphin3.0-R1-Mistral-24B-Q4_0.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_0.gguf) | Q4_0 | 13.49GB | false | Legacy format, offers online repacking for ARM and AVX CPU inference. |
| [Dolphin3.0-R1-Mistral-24B-IQ4_NL.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ4_NL.gguf) | IQ4_NL | 13.47GB | false | Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference. |
| [Dolphin3.0-R1-Mistral-24B-Q3_K_XL.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q3_K_XL.gguf) | Q3_K_XL | 12.99GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
| [Dolphin3.0-R1-Mistral-24B-IQ4_XS.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ4_XS.gguf) | IQ4_XS | 12.76GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
| [Dolphin3.0-R1-Mistral-24B-Q3_K_L.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q3_K_L.gguf) | Q3_K_L | 12.40GB | false | Lower quality but usable, good for low RAM availability. |
| [Dolphin3.0-R1-Mistral-24B-Q3_K_M.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q3_K_M.gguf) | Q3_K_M | 11.47GB | false | Low quality. |
| [Dolphin3.0-R1-Mistral-24B-IQ3_M.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ3_M.gguf) | IQ3_M | 10.65GB | false | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| [Dolphin3.0-R1-Mistral-24B-Q3_K_S.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q3_K_S.gguf) | Q3_K_S | 10.40GB | false | Low quality, not recommended. |
| [Dolphin3.0-R1-Mistral-24B-IQ3_XS.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ3_XS.gguf) | IQ3_XS | 9.91GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
| [Dolphin3.0-R1-Mistral-24B-Q2_K_L.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q2_K_L.gguf) | Q2_K_L | 9.55GB | false | Uses Q8_0 for embed and output weights. Very low quality but surprisingly usable. |
| [Dolphin3.0-R1-Mistral-24B-Q2_K.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q2_K.gguf) | Q2_K | 8.89GB | false | Very low quality but surprisingly usable. |
| [Dolphin3.0-R1-Mistral-24B-IQ2_M.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ2_M.gguf) | IQ2_M | 8.11GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
| [Dolphin3.0-R1-Mistral-24B-IQ2_S.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ2_S.gguf) | IQ2_S | 7.48GB | false | Low quality, uses SOTA techniques to be usable. |
| [Dolphin3.0-R1-Mistral-24B-IQ2_XS.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ2_XS.gguf) | IQ2_XS | 7.21GB | false | Low quality, uses SOTA techniques to be usable. |
| [Dolphin3.0-R1-Mistral-24B-IQ2_XXS.gguf](https://huggingface.co/bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF/blob/main/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-IQ2_XXS.gguf) | IQ2_XXS | 6.55GB | false | Very low quality, uses SOTA techniques to be usable. |
## Embed/output weights
Some of these quants (Q3_K_XL, Q4_K_L etc) are the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of what they would normally default to.
## Downloading using huggingface-cli
<details>
<summary>Click to view download instructions</summary>
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF --include "cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q4_K_M.gguf" --local-dir ./
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF --include "cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q8_0/*" --local-dir ./
```
You can either specify a new local-dir (cognitivecomputations_Dolphin3.0-R1-Mistral-24B-Q8_0) or download them all in place (./)
</details>
## ARM/AVX information
Previously, you would download Q4_0_4_4/4_8/8_8, and these would have their weights interleaved in memory in order to improve performance on ARM and AVX machines by loading up more data in one pass.
Now, however, there is something called "online repacking" for weights. details in [this PR](https://github.com/ggerganov/llama.cpp/pull/9921). If you use Q4_0 and your hardware would benefit from repacking weights, it will do it automatically on the fly.
As of llama.cpp build [b4282](https://github.com/ggerganov/llama.cpp/releases/tag/b4282) you will not be able to run the Q4_0_X_X files and will instead need to use Q4_0.
Additionally, if you want to get slightly better quality for , you can use IQ4_NL thanks to [this PR](https://github.com/ggerganov/llama.cpp/pull/10541) which will also repack the weights for ARM, though only the 4_4 for now. The loading time may be slower but it will result in an overall speed incrase.
<details>
<summary>Click to view Q4_0_X_X information (deprecated</summary>
I'm keeping this section to show the potential theoretical uplift in performance from using the Q4_0 with online repacking.
<details>
<summary>Click to view benchmarks on an AVX2 system (EPYC7702)</summary>
| model | size | params | backend | threads | test | t/s | % (vs Q4_0) |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |-------------: |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp512 | 204.03 ± 1.03 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp1024 | 282.92 ± 0.19 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | pp2048 | 259.49 ± 0.44 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg128 | 39.12 ± 0.27 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg256 | 39.31 ± 0.69 | 100% |
| qwen2 3B Q4_0 | 1.70 GiB | 3.09 B | CPU | 64 | tg512 | 40.52 ± 0.03 | 100% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp512 | 301.02 ± 1.74 | 147% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp1024 | 287.23 ± 0.20 | 101% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | pp2048 | 262.77 ± 1.81 | 101% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg128 | 18.80 ± 0.99 | 48% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg256 | 24.46 ± 3.04 | 83% |
| qwen2 3B Q4_K_M | 1.79 GiB | 3.09 B | CPU | 64 | tg512 | 36.32 ± 3.59 | 90% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp512 | 271.71 ± 3.53 | 133% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp1024 | 279.86 ± 45.63 | 100% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | pp2048 | 320.77 ± 5.00 | 124% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg128 | 43.51 ± 0.05 | 111% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg256 | 43.35 ± 0.09 | 110% |
| qwen2 3B Q4_0_8_8 | 1.69 GiB | 3.09 B | CPU | 64 | tg512 | 42.60 ± 0.31 | 105% |
Q4_0_8_8 offers a nice bump to prompt processing and a small bump to text generation
</details>
</details>
## Which file should I choose?
<details>
<summary>Click here for details</summary>
A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU and Apple Metal, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
The I-quants are *not* compatible with Vulcan, which is also AMD, so if you have an AMD card double check if you're using the rocBLAS build or the Vulcan build. At the time of writing this, LM Studio has a preview with ROCm support, and other inference engines have specific builds for ROCm.
</details>
## Credits
Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset.
Thank you ZeroWw for the inspiration to experiment with embed/output.
Thank you to LM Studio for sponsoring my work.
Want to support my work? Visit my ko-fi page here: https://ko-fi.com/bartowski

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f4cbf14d053da8fdc8097a957b0d6904b32da6e24d7e47645851d548ac6b0816
size 8114065248

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5a92fcb9f9b978b6c757f24a9e903a57ebfe17b324706c66c5d52c3598a0b877
size 7478366048

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1b4b01f829122f823ea4d487c99f00289be049256fae2b5958e6431508c2b102
size 7207045952

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:439f26ccae69dabcdf70f0f2474e7965d1c72a05b78d253f80299bda564f3ef6
size 6545132352

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:afa0379ecaa19c280fb1c775b97436d63948d7eddcadd18b3085d67ebb4db8d8
size 10650965184

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6ceeb4c44cdabf20194a44801df77ec4872f98a9520ee957d212eae0c86f8513
size 9907131584

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0457c26f7b70b20fc9cf144da7599f257bc42be9374d9d13e11c14662850406c
size 13468031488

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5798d31291117ece0ff368bdcfd741e13abd6764a950398f99b76a19addf05bb
size 12758931648

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:37b7de806e8f5b3f8d2b264a82c92f7d176ddcd1e04ca899b63986a074a057b8
size 8890339488

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:09d1f137e3639b5c5b91728de661f26a28ebcdb4160ee93334987c6523cb7242
size 9545709472

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1fa07d4f6847d2a49154bf80e201d2a6eed878bccaa178de540f46f7c5c197be
size 12400776384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:97f3367832eef6e4c56c3594bdad994545821c6e473a588808f2f8fb989828ed
size 11474097344

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ce3bafb482d23a05cd943b193d8724f04a4eb5e98fe6902b3973e3bd98cb3de9
size 10400289984

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b385c561738611aabfb0809874da03e1fdbc48c8692d622d74890d4572e0ee1f
size 12987987872

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:daa37a4a3f8f2ab4ec6e16082e9b6a02250c5b3b301353c12bc399fcfb291b10
size 13494245888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d896d5ddcdce4526b6b95539ea318bb25fc9e303edd710b4362363c44abb618d
size 14873123968

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:310325e7c9ef8a2af4495dd308966592665cb96e3192960fbb48695c9539c5c9
size 14832007072

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d67de1e94fb32742bd09ee8beebbeb36a4b544785a8f8413dc4d9490e04eda6c
size 14333925888

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1ced01f3eeb89a52bee5fa86038eb4f0561bdccb5951a4e3fd94f3debcc39d3d
size 13549296128

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2a85c1bba50aba330aa21b2ec01d60c3f8549ef4c8a6d483fdd39b5c2e6ae323
size 17178195872

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5bf37ca755fdb24f4f0de816098ac4c64c7590ea97083c523bc585f3b1e555d2
size 16764002048

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:476d97af8e95bcafc6fb932e636f599fe503be9f781a4f998a9c9467db70ffe9
size 16304430848

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:714f9376972d51661cdb01317a180363ceb66f4331f1306a836e442be938644c
size 19345957984

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:458153283c06ecae7abb03ff8243c8c744077feb0f06505fdfbb76924ad1e0e7
size 19671021472

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bcbcb76c75ee2d61d10f234269cff462f5d71afa981a6c857cd96583f26325b9
size 25054803872

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d4eb2ef5757aa495aee605e5ab18a15e9cd4d4632e6e74501e70a928eb12b5c1
size 39812534432

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:0f97bfd471710d7ecc60cdff92eff08d68452901b1606f52e856092b9a9b5b9a
size 39804691808

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:d091b2ebafe74c58e6b1d7199878b8084f94e537c84809e80b3c270a39232763
size 14680353984

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f4df472efadb79e26ae50c5aa0c10bd5e5f5a333f062fb014e58a9bf7a27a1a1
size 10003538