--- license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507/blob/main/LICENSE pipeline_tag: text-generation base_model: - becnic/Qwen3-4B-Thinking-2507-Heretic language: - en - de - fr - it - pt - hi - es - th --- # Qwen3-4B-Thinking-2507-Heretic-GGUF ## Llamacpp imatrix Quantizations of Qwen3-4B-Thinking-2507-Heretic by becnic (from original Qwen3-4B-Thinking-2507) Using llama.cpp release b7120 for quantization. Original model: https://huggingface.co/becnic/Qwen3-4B-Thinking-2507-Heretic Run them in [LM Studio](https://lmstudio.ai/) Run them directly with [llama.cpp](https://github.com/ggerganov/llama.cpp), or any other llama.cpp based project ## Download a file (not the whole branch) from below: | Filename | Quant type | File Size | Split | Description | | -------- | ---------- | --------- | ----- | ----------- | | [Qwen3-4B-Thinking-2507-Q8_0.gguf](https://huggingface.co/becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF/blob/main/Qwen3-4B-Thinking-2507-Heretic-Q8_0.gguf) | Q8_0 | 4.28GB | false | Extremely high quality | ## Downloading using huggingface-cli
Click to view download instructions First, make sure you have hugginface-cli installed: ``` pip install -U "huggingface_hub[cli]" ``` Then, you can target the specific file you want: ``` huggingface-cli download becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF --include "Qwen3-4B-Thinking-2507-Q8_0.gguf" --local-dir ./ ``` If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run: ``` huggingface-cli download becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF --include "Qwen3-4B-Thinking-2507-Q8_0.gguf/*" --local-dir ./ ```
## Abliteration parameters | Parameter | Value | | :-------- | :---: | | **direction_index** | 19.42 | | **attn.o_proj.max_weight** | 1.23 | | **attn.o_proj.max_weight_position** | 22.34 | | **attn.o_proj.min_weight** | 0.69 | | **attn.o_proj.min_weight_distance** | 10.42 | | **mlp.down_proj.max_weight** | 1.12 | | **mlp.down_proj.max_weight_position** | 29.64 | | **mlp.down_proj.min_weight** | 1.08 | | **mlp.down_proj.min_weight_distance** | 20.24 | ## Performance | Metric | This model | Original model ([Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)) | | :----- | :--------: | :---------------------------: | | **KL divergence** | 0.06 | 0 *(by definition)* | | **Refusals** | 6/100 | 96/100 | ## Model Overview **Qwen3-4B-Thinking-2507** has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 4.0B - Number of Paramaters (Non-Embedding): 3.6B - Number of Layers: 36 - Number of Attention Heads (GQA): 32 for Q and 8 for KV - Context Length: **262,144 natively**. **NOTE: This model supports only thinking mode. Meanwhile, specifying `enable_thinking=True` is no longer required.** Additionally, to enforce model thinking, the default chat template automatically includes ``. Therefore, it is normal for the model's output to contain only `` without an explicit opening `` tag. For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/). **Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.