Qwen3-4B-Cybersecurity-Here…/README.md

---
language:
- en
license: apache-2.0
base_model: DexopT/Qwen3-4B-Cybersecurity
tags:
- cybersecurity
- penetration-testing
- offensive-security
- infosec
- red-team
- abliteration
- uncensored
- heretic
- qwen3
- qwen
- unsloth
- fine-tuned
- text-generation
datasets:
- DexopT/cyber_heretic
- DexopT/heretic-bad-prompts
- DexopT/heretic-good-prompts
---

<div align="center">

<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://huggingface.co/datasets/Qwen/documentation-images/resolve/main/Qwen-logo_White.png">
  <source media="(prefers-color-scheme: light)" srcset="https://huggingface.co/datasets/Qwen/documentation-images/resolve/main/Qwen-logo_Black.png">
  <img alt="Qwen Logo" src="https://huggingface.co/datasets/Qwen/documentation-images/resolve/main/Qwen-logo_Black.png" width="180"/>
</picture>

# Qwen3-4B-Cybersecurity-Heretic-16bit

**Qwen3-4B-Cybersecurity with refusal directions removed via Heretic abliteration**

[![Base Model](https://img.shields.io/badge/Base-Qwen3--4B--Cybersecurity-blue?style=flat-square)](https://huggingface.co/DexopT/Qwen3-4B-Cybersecurity)
[![Heretic](https://img.shields.io/badge/🔪_Abliteration-Heretic_v1.2.0-black?style=flat-square)](https://github.com/p-e-w/heretic)
[![GGUF](https://img.shields.io/badge/📦_GGUF-Q8_/_Q4-brightgreen?style=flat-square)](https://huggingface.co/DexopT/Qwen3-4B-Cybersecurity-GGUF)
[![Pass Rate](https://img.shields.io/badge/Pass_Rate-76%25_(38/50)-red?style=flat-square)](https://huggingface.co/DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit)
[![License](https://img.shields.io/badge/License-Apache_2.0-yellow?style=flat-square)](https://apache.org/licenses/LICENSE-2.0)

</div>

---

> 🔵 **Base version (with refusals):** [DexopT/Qwen3-4B-Cybersecurity](https://huggingface.co/DexopT/Qwen3-4B-Cybersecurity)

## Model Description

**Qwen3-4B-Cybersecurity-Heretic-16bit** is [DexopT/Qwen3-4B-Cybersecurity](https://huggingface.co/DexopT/Qwen3-4B-Cybersecurity) with refusal directions surgically removed using [Heretic v1.2.0](https://github.com/p-e-w/heretic) — a technique that identifies and subtracts the model's "refusal direction" from its residual stream without retraining.

**Refusal test results:** 38/50 prompts answered (**76% pass rate**) using custom cybersecurity-specific bad/good prompt datasets.

---

## Model Family

| Model | Description | Link |
|-------|-------------|------|
| Qwen3-4B-Cybersecurity | Base fine-tuned model | [→](https://huggingface.co/DexopT/Qwen3-4B-Cybersecurity) |
| **Qwen3-4B-Cybersecurity-Heretic-16bit** | Abliterated version (this repo) | 📍 You are here |
| Qwen3-4B-Cybersecurity-GGUF | Q8_0 + Q4_K_M quantized for llama.cpp | [→](https://huggingface.co/DexopT/Qwen3-4B-Cybersecurity-GGUF) |

---

## Abliteration Details

| Parameter | Value |
|-----------|-------|
| Tool | [Heretic v1.2.0](https://github.com/p-e-w/heretic) |
| Trials run | 30 |
| Selected trial | Trial 24 |
| Refusals after abliteration | 49 / 100 |
| Pass rate | 76% (38/50) |
| KL divergence | 0.067 |
| Bad prompts dataset | [DexopT/heretic-bad-prompts](https://huggingface.co/datasets/DexopT/heretic-bad-prompts) |
| Good prompts dataset | [DexopT/heretic-good-prompts](https://huggingface.co/datasets/DexopT/heretic-good-prompts) |
| Format | 16bit merged safetensors |

### What is Heretic Abliteration?

Heretic computes the "refusal direction" — a vector in the model's residual stream that activates when the model encounters requests it refuses. It then uses LoRA to project out this direction, reducing refusals without retraining. Unlike jailbreaks, this modifies model weights directly.

### Trial Selection

Trial 24 was selected from 30 Pareto-optimal trials balancing refusal reduction and model capability (KL divergence):

| Trial | Refusals | KL Divergence | Notes |
|-------|----------|---------------|-------|
| **24 ✓** | **49/100** | **0.067** | **Best balance — selected** |
| 18 | 53/100 | 0.063 | |
| 21 | 54/100 | 0.047 | |
| 15 | 93/100 | 0.001 | Minimal abliteration |

---

## Datasets Used

| Dataset | Role | Link |
|---------|------|------|
| [DexopT/cyber_heretic](https://huggingface.co/datasets/DexopT/cyber_heretic) | Fine-tuning training data | [→](https://huggingface.co/datasets/DexopT/cyber_heretic) |
| [DexopT/heretic-bad-prompts](https://huggingface.co/datasets/DexopT/heretic-bad-prompts) | Heretic refusal direction computation | [→](https://huggingface.co/datasets/DexopT/heretic-bad-prompts) |
| [DexopT/heretic-good-prompts](https://huggingface.co/datasets/DexopT/heretic-good-prompts) | Heretic baseline computation | [→](https://huggingface.co/datasets/DexopT/heretic-good-prompts) |

Bad prompts cover: WiFi/WPA2 cracking, ransomware, rogue AP, XSS, WAF bypass, malicious macros.
Good prompts cover: SQL injection, reverse shells, keyloggers, privilege escalation, AD attacks.

---

## Usage

### Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit")

messages = [
    {"role": "system", "content": "You are an expert cybersecurity assistant."},
    {"role": "user", "content": "Write a Python reverse shell payload."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.8)

print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```

### MLX (Apple Silicon)

```bash
pip install mlx-lm

mlx_lm.convert \
  --hf-path DexopT/Qwen3-4B-Cybersecurity-Heretic-16bit \
  --mlx-path ~/models/qwen3-heretic-mlx \
  --quantize --q-bits 8

mlx_lm.chat --model ~/models/qwen3-heretic-mlx
```

### LM Studio / llama.cpp / Ollama

Use the GGUF version: [DexopT/Qwen3-4B-Cybersecurity-GGUF](https://huggingface.co/DexopT/Qwen3-4B-Cybersecurity-GGUF)

---

## ⚠️ Disclaimer

This model is intended for **educational and research purposes only**. The abliteration process reduces but does not eliminate all safety behaviors. Use responsibly and only on systems you have explicit permission to test. The authors are not responsible for any misuse.

---

## Links

| | |
|---|---|
| 🔵 Base Model | [DexopT/Qwen3-4B-Cybersecurity](https://huggingface.co/DexopT/Qwen3-4B-Cybersecurity) |
| 📦 GGUF (Q8 + Q4) | [DexopT/Qwen3-4B-Cybersecurity-GGUF](https://huggingface.co/DexopT/Qwen3-4B-Cybersecurity-GGUF) |
| 📊 Training Dataset | [DexopT/cyber_heretic](https://huggingface.co/datasets/DexopT/cyber_heretic) |
| 🔪 Heretic Tool | [github.com/p-e-w/heretic](https://github.com/p-e-w/heretic) |
| 🔧 Original Base | [unsloth/Qwen3-4B-Instruct-2507](https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507) |
| 🏠 Qwen3 Collection | [Qwen3 on HuggingFace](https://huggingface.co/collections/Qwen/qwen3) |