Files
ModelHub XC a5ce41067f 初始化项目,由ModelHub XC社区提供模型
Model: sci4ai/Qwen2.5-14B-Instruct-Abliterated
Source: Original Platform
2026-04-23 22:59:09 +08:00

73 lines
3.1 KiB
Markdown

---
license: apache-2.0
base_model: Qwen/Qwen2.5-14B-Instruct
tags:
- abliterated
- uncensored
- qwen2.5
language:
- en
pipeline_tag: text-generation
---
# Qwen2.5-14B-Instruct-abliterated
This is an abliterated version of [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) with refusal behavior removed via activation-based weight surgery.
## Method
Abliteration removes the "refusal direction" from the model's residual stream by:
1. **Collecting hidden states** from 200 harmful and 200 harmless prompts using single-sample forward passes (no padding artifacts)
2. **Computing per-layer refusal directions** as the normalized mean difference between harmful and harmless hidden states at the last token position
3. **Ablating weights** by orthogonalizing `o_proj` and `down_proj` weight matrices against each layer's refusal direction
This follows the approach from [Sumandora/remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) and [mlabonne's layerwise abliteration](https://huggingface.co/blog/mlabonne/abliteration), using plain `transformers` with `output_hidden_states=True` rather than TransformerLens.
### Parameters
| Parameter | Value |
|-----------|-------|
| Layers ablated | 2 to 48 (47 of 48 layers) |
| Refusal weight | 1.0 (full removal) |
| Harmful prompts | 200 |
| Harmless prompts | 200 |
| Precision | bfloat16 |
| Hardware | NVIDIA A100 80GB (Vast.ai) |
### Weight surgery details
For each layer in the ablation range, the refusal direction `d` is projected out of:
- **`o_proj.weight`** (attention output): `W_new = W - d @ (d^T @ W)`
- **`down_proj.weight`** (MLP output): `W_new = W - d @ (d^T @ W)`
These are the matrices that write into the residual stream. By removing the refusal component from their output, the model can no longer inject refusal signals into the generation process.
## Recommendations
**Recommended for agentic and tool calling workloads.** The 14B is the sweet spot in this series for agentic tasks — it reliably follows tool call formats, handles multi-step reasoning, and fits comfortably in 16GB VRAM at bfloat16. If tool calling accuracy is your priority, prefer this over the 7B (less reliable) or 32B (overkill for most pipelines).
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"sci4ai/Qwen2.5-14B-Instruct-Abliterated",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("sci4ai/Qwen2.5-14B-Instruct-Abliterated")
messages = [{"role": "user", "content": "Your prompt here"}]
toks = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output = model.generate(toks, max_new_tokens=512, do_sample=True, temperature=0.7)
print(tokenizer.decode(output[0][toks.shape[1]:], skip_special_tokens=True))
```
## Disclaimer
This model is provided for research purposes. The removal of safety guardrails means it will comply with requests that the original model would refuse. Users are responsible for how they use this model.