Files
Qwen3-0.6B-gabliterated/README.md
ModelHub XC c599320d82 初始化项目,由ModelHub XC社区提供模型
Model: Goekdeniz-Guelmez/Qwen3-0.6B-gabliterated
Source: Original Platform
2026-06-01 23:18:22 +08:00

91 lines
2.6 KiB
Markdown

---
base_model: Qwen/Qwen3-0.6B
tags:
- uncensored
- gabliteration
datasets:
- mlabonne/harmless_alpaca
- mlabonne/harmful_behaviors
library_name: gabliteration
arxiv: "2512.18901"
model-index:
- name: Qwen_Qwen3-0.6B-gabliterated
results:
- task:
type: text-generation
dataset:
type: harmless_alpaca
name: Harmless Alpaca
metrics:
- name: KL Divergence
type: pass@1
value: 0.0591
- task:
type: text-generation
dataset:
type: harmful_behaviors
name: Harmful Behaviors
metrics:
- name: Refusal Rate
type: pass@1
value: 0.05
---
# Gabliterated Model Series
![Logo/JPG](gabliteration-logo.jpg)
## Overview
With this model series, I introduce the first **Gabliteration**, a novel neural weight modification technique that advances beyond traditional abliteration methods through adaptive multi-directional projections with regularized layer selection.
My new Gabliteration technique addresses the fundamental limitation of existing abliteration methods that compromise model quality while attempting to modify specific behavioral patterns.
```text
Refusal: 5/100
KL Div: 0.0591
Config:
Samples: 400
Skip: [4, 3]
Layer: 0.66 (selected: 18)
Scale: 0.48
λ: 0.05
k: 3
β: 0.54
Adaptive: False
τ: 0.84
```
## Model Variants
This series includes models ranging from 0.6B to 32B parameters, demonstrating the scalability and effectiveness of the Gabliteration technique across different model sizes.
## Quants
- [GGUF (mradermacher)]()
## Technical Background
Building upon the foundational work of Arditi et al. (2024) on single-direction abliteration, Gabliteration extends to a comprehensive multi-directional framework with theoretical guarantees.
My method employs singular value decomposition on difference matrices between harmful and harmless prompt representations to extract multiple refusal directions.
### Dynamic Layer Selection
This model was created using fixed layer selection.
A fixed layer fraction was used based on empirical tuning.
Selected layer: **18** (out of 28 total layers)
## Citation
If you use these models, please cite the original research (paper coming later this year):
```
Gülmez, G. (2025). Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models. https://arxiv.org/abs/2512.18901
```
## Acknowledgments
This work builds upon the foundational research by Arditi et al. (2024) on refusal direction identification in large language models.