71 lines
3.3 KiB
Markdown
71 lines
3.3 KiB
Markdown
|
|
---
|
||
|
|
license: apache-2.0
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
library_name: transformers
|
||
|
|
base_model:
|
||
|
|
- Qwen/Qwen3-4B-Instruct-2507
|
||
|
|
tags:
|
||
|
|
- SOMbliterated
|
||
|
|
- heretic
|
||
|
|
- uncensored
|
||
|
|
- decensored
|
||
|
|
- abliterated
|
||
|
|
- kohonen
|
||
|
|
---
|
||
|
|
# Qwen/Qwen3-4B-Instruct-2507 - SOMbliterated
|
||
|
|
|
||
|
|
This is a SOMbliterated (decensored) version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507), made using [Heretic](https://github.com/p-e-w/heretic) v1.2.0 **with** Pull Request https://github.com/p-e-w/heretic/pull/196 adding *multi-directional* abliteration with the directions determined by trainable self-organizing neural networks. (Self-Organizing Maps / Kohonen networks)
|
||
|
|
|
||
|
|
They assume that in advanced recent neural network the refusal concept is not just a single direction, but a complex manifold, just like numbers and days of week are encoded in circles or helixes. Now, this manifold is eliminated more surgically, from multiple sides, providing precisional ablation instead of complete lobotomy.
|
||
|
|
|
||
|
|
The method is based on the amazing work https://arxiv.org/abs/2511.08379v2.
|
||
|
|
|
||
|
|
For this abliteration, in particular, there were used **five** directions.
|
||
|
|
|
||
|
|
## Performance
|
||
|
|
|
||
|
|
Here I will compare this method with [Automated Gabliteration](https://github.com/Goekdeniz-Guelmez/gabliteration) by [Goekdeniz Guelmez](https://github.com/Goekdeniz-Guelmez). Gabliteration is also a multi-directional abliteration method.
|
||
|
|
|
||
|
|
| Metric | **This model** | [Gabliterated](https://huggingface.co/Goekdeniz-Guelmez/Qwen3-4B-Instruct-2507-gabliterated) | [Uncensored-HauhauCS-Aggressive](https://huggingface.co/HauhauCS/Qwen3-4B-2507-Instruct-Uncensored-HauhauCS-Aggressive) | Original model ([Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)) |
|
||
|
|
| :----- | :--------: | :--------: | :--------: | :---------------------------: |
|
||
|
|
| **KL divergence** | **0.0792** | 0.2522 | 0.1594 | 0 *(by definition)* |
|
||
|
|
| **Refusals** | **3/100** | 4/100 | 7/100 | 100/100 |
|
||
|
|
|
||
|
|
As it seen, this model has much less damage to the original capabilities than Gabliteration, *while* having less refusals as well!
|
||
|
|
|
||
|
|
26-03-29 - I'd like to highlight that these numbers are better even than the closed-source HauhauCS's uncensoring method.
|
||
|
|
|
||
|
|
## Subjective results
|
||
|
|
|
||
|
|
Yes, it works.
|
||
|
|
|
||
|
|
## SOMbliteration parameters
|
||
|
|
|
||
|
|
| Parameter | Value |
|
||
|
|
| :-------- | :---: |
|
||
|
|
| **direction_index** | 19.10 |
|
||
|
|
| **attn.o_proj.max_weight.0** | 0.98 |
|
||
|
|
| **attn.o_proj.max_weight.1** | 1.12 |
|
||
|
|
| **attn.o_proj.max_weight.2** | 1.13 |
|
||
|
|
| **attn.o_proj.max_weight.3** | 1.44 |
|
||
|
|
| **attn.o_proj.max_weight.4** | 1.20 |
|
||
|
|
| **attn.o_proj.max_weight_position** | 25.57 |
|
||
|
|
| **attn.o_proj.min_weight.0** | 0.26 |
|
||
|
|
| **attn.o_proj.min_weight.1** | 0.98 |
|
||
|
|
| **attn.o_proj.min_weight.2** | 0.58 |
|
||
|
|
| **attn.o_proj.min_weight.3** | 1.37 |
|
||
|
|
| **attn.o_proj.min_weight.4** | 0.68 |
|
||
|
|
| **attn.o_proj.min_weight_distance** | 10.43 |
|
||
|
|
| **mlp.down_proj.max_weight.0** | 1.35 |
|
||
|
|
| **mlp.down_proj.max_weight.1** | 1.27 |
|
||
|
|
| **mlp.down_proj.max_weight.2** | 1.17 |
|
||
|
|
| **mlp.down_proj.max_weight.3** | 1.41 |
|
||
|
|
| **mlp.down_proj.max_weight.4** | 0.84 |
|
||
|
|
| **mlp.down_proj.max_weight_position** | 28.13 |
|
||
|
|
| **mlp.down_proj.min_weight.0** | 0.27 |
|
||
|
|
| **mlp.down_proj.min_weight.1** | 0.62 |
|
||
|
|
| **mlp.down_proj.min_weight.2** | 0.45 |
|
||
|
|
| **mlp.down_proj.min_weight.3** | 0.07 |
|
||
|
|
| **mlp.down_proj.min_weight.4** | 0.48 |
|
||
|
|
| **mlp.down_proj.min_weight_distance** | 3.03 |
|