---
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507/blob/main/LICENSE
pipeline_tag: text-generation
base_model:
- becnic/Qwen3-4B-Thinking-2507-Heretic
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
---
# Qwen3-4B-Thinking-2507-Heretic-GGUF
## Llamacpp imatrix Quantizations of Qwen3-4B-Thinking-2507-Heretic by becnic (from original Qwen3-4B-Thinking-2507)
Using llama.cpp release b7120 for quantization.
Original model: https://huggingface.co/becnic/Qwen3-4B-Thinking-2507-Heretic
Run them in [LM Studio](https://lmstudio.ai/)
Run them directly with [llama.cpp](https://github.com/ggerganov/llama.cpp), or any other llama.cpp based project
## Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
| -------- | ---------- | --------- | ----- | ----------- |
| [Qwen3-4B-Thinking-2507-Q8_0.gguf](https://huggingface.co/becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF/blob/main/Qwen3-4B-Thinking-2507-Heretic-Q8_0.gguf) | Q8_0 | 4.28GB | false | Extremely high quality |
## Downloading using huggingface-cli
Click to view download instructions
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
```
huggingface-cli download becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF --include "Qwen3-4B-Thinking-2507-Q8_0.gguf" --local-dir ./
```
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
```
huggingface-cli download becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF --include "Qwen3-4B-Thinking-2507-Q8_0.gguf/*" --local-dir ./
```
## Abliteration parameters
| Parameter | Value |
| :-------- | :---: |
| **direction_index** | 19.42 |
| **attn.o_proj.max_weight** | 1.23 |
| **attn.o_proj.max_weight_position** | 22.34 |
| **attn.o_proj.min_weight** | 0.69 |
| **attn.o_proj.min_weight_distance** | 10.42 |
| **mlp.down_proj.max_weight** | 1.12 |
| **mlp.down_proj.max_weight_position** | 29.64 |
| **mlp.down_proj.min_weight** | 1.08 |
| **mlp.down_proj.min_weight_distance** | 20.24 |
## Performance
| Metric | This model | Original model ([Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)) |
| :----- | :--------: | :---------------------------: |
| **KL divergence** | 0.06 | 0 *(by definition)* |
| **Refusals** | 6/100 | 96/100 |
## Model Overview
**Qwen3-4B-Thinking-2507** has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 4.0B
- Number of Paramaters (Non-Embedding): 3.6B
- Number of Layers: 36
- Number of Attention Heads (GQA): 32 for Q and 8 for KV
- Context Length: **262,144 natively**.
**NOTE: This model supports only thinking mode. Meanwhile, specifying `enable_thinking=True` is no longer required.**
Additionally, to enforce model thinking, the default chat template automatically includes ``. Therefore, it is normal for the model's output to contain only `` without an explicit opening `` tag.
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.