--- base_model: Qwen/Qwen3-0.6B tags: - uncensored - gabliteration datasets: - mlabonne/harmless_alpaca - mlabonne/harmful_behaviors library_name: gabliteration arxiv: "2512.18901" model-index: - name: Qwen_Qwen3-0.6B-gabliterated results: - task: type: text-generation dataset: type: harmless_alpaca name: Harmless Alpaca metrics: - name: KL Divergence type: pass@1 value: 0.0591 - task: type: text-generation dataset: type: harmful_behaviors name: Harmful Behaviors metrics: - name: Refusal Rate type: pass@1 value: 0.05 --- # Gabliterated Model Series ![Logo/JPG](gabliteration-logo.jpg) ## Overview With this model series, I introduce the first **Gabliteration**, a novel neural weight modification technique that advances beyond traditional abliteration methods through adaptive multi-directional projections with regularized layer selection. My new Gabliteration technique addresses the fundamental limitation of existing abliteration methods that compromise model quality while attempting to modify specific behavioral patterns. ```text Refusal: 5/100 KL Div: 0.0591 Config: Samples: 400 Skip: [4, 3] Layer: 0.66 (selected: 18) Scale: 0.48 λ: 0.05 k: 3 β: 0.54 Adaptive: False τ: 0.84 ``` ## Model Variants This series includes models ranging from 0.6B to 32B parameters, demonstrating the scalability and effectiveness of the Gabliteration technique across different model sizes. ## Quants - [GGUF (mradermacher)]() ## Technical Background Building upon the foundational work of Arditi et al. (2024) on single-direction abliteration, Gabliteration extends to a comprehensive multi-directional framework with theoretical guarantees. My method employs singular value decomposition on difference matrices between harmful and harmless prompt representations to extract multiple refusal directions. ### Dynamic Layer Selection This model was created using fixed layer selection. A fixed layer fraction was used based on empirical tuning. Selected layer: **18** (out of 28 total layers) ## Citation If you use these models, please cite the original research (paper coming later this year): ``` Gülmez, G. (2025). Gabliteration: Adaptive Multi-Directional Neural Weight Modification for Selective Behavioral Alteration in Large Language Models. https://arxiv.org/abs/2512.18901 ``` ## Acknowledgments This work builds upon the foundational research by Arditi et al. (2024) on refusal direction identification in large language models.