101 lines
3.6 KiB
Markdown
101 lines
3.6 KiB
Markdown
---
|
|
license: apache-2.0
|
|
license_link: https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507/blob/main/LICENSE
|
|
pipeline_tag: text-generation
|
|
base_model:
|
|
- becnic/Qwen3-4B-Thinking-2507-Heretic
|
|
language:
|
|
- en
|
|
- de
|
|
- fr
|
|
- it
|
|
- pt
|
|
- hi
|
|
- es
|
|
- th
|
|
---
|
|
|
|
# Qwen3-4B-Thinking-2507-Heretic-GGUF
|
|
|
|
## Llamacpp imatrix Quantizations of Qwen3-4B-Thinking-2507-Heretic by becnic (from original Qwen3-4B-Thinking-2507)
|
|
|
|
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b7120">b7120</a> for quantization.
|
|
|
|
Original model: https://huggingface.co/becnic/Qwen3-4B-Thinking-2507-Heretic
|
|
|
|
Run them in [LM Studio](https://lmstudio.ai/)
|
|
|
|
Run them directly with [llama.cpp](https://github.com/ggerganov/llama.cpp), or any other llama.cpp based project
|
|
|
|
## Download a file (not the whole branch) from below:
|
|
|
|
| Filename | Quant type | File Size | Split | Description |
|
|
| -------- | ---------- | --------- | ----- | ----------- |
|
|
| [Qwen3-4B-Thinking-2507-Q8_0.gguf](https://huggingface.co/becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF/blob/main/Qwen3-4B-Thinking-2507-Heretic-Q8_0.gguf) | Q8_0 | 4.28GB | false | Extremely high quality |
|
|
|
|
## Downloading using huggingface-cli
|
|
|
|
<details>
|
|
<summary>Click to view download instructions</summary>
|
|
|
|
First, make sure you have hugginface-cli installed:
|
|
|
|
```
|
|
pip install -U "huggingface_hub[cli]"
|
|
```
|
|
|
|
Then, you can target the specific file you want:
|
|
|
|
```
|
|
huggingface-cli download becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF --include "Qwen3-4B-Thinking-2507-Q8_0.gguf" --local-dir ./
|
|
```
|
|
|
|
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
|
|
|
|
```
|
|
huggingface-cli download becnic/Qwen3-4B-Thinking-2507-Heretic-GGUF --include "Qwen3-4B-Thinking-2507-Q8_0.gguf/*" --local-dir ./
|
|
```
|
|
|
|
</details>
|
|
|
|
## Abliteration parameters
|
|
|
|
| Parameter | Value |
|
|
| :-------- | :---: |
|
|
| **direction_index** | 19.42 |
|
|
| **attn.o_proj.max_weight** | 1.23 |
|
|
| **attn.o_proj.max_weight_position** | 22.34 |
|
|
| **attn.o_proj.min_weight** | 0.69 |
|
|
| **attn.o_proj.min_weight_distance** | 10.42 |
|
|
| **mlp.down_proj.max_weight** | 1.12 |
|
|
| **mlp.down_proj.max_weight_position** | 29.64 |
|
|
| **mlp.down_proj.min_weight** | 1.08 |
|
|
| **mlp.down_proj.min_weight_distance** | 20.24 |
|
|
|
|
## Performance
|
|
|
|
| Metric | This model | Original model ([Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)) |
|
|
| :----- | :--------: | :---------------------------: |
|
|
| **KL divergence** | 0.06 | 0 *(by definition)* |
|
|
| **Refusals** | 6/100 | 96/100 |
|
|
|
|
## Model Overview
|
|
|
|
**Qwen3-4B-Thinking-2507** has the following features:
|
|
- Type: Causal Language Models
|
|
- Training Stage: Pretraining & Post-training
|
|
- Number of Parameters: 4.0B
|
|
- Number of Paramaters (Non-Embedding): 3.6B
|
|
- Number of Layers: 36
|
|
- Number of Attention Heads (GQA): 32 for Q and 8 for KV
|
|
- Context Length: **262,144 natively**.
|
|
|
|
**NOTE: This model supports only thinking mode. Meanwhile, specifying `enable_thinking=True` is no longer required.**
|
|
|
|
Additionally, to enforce model thinking, the default chat template automatically includes `<think>`. Therefore, it is normal for the model's output to contain only `</think>` without an explicit opening `<think>` tag.
|
|
|
|
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
|
|
|
**Supported languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
|
|
|