Files

12 lines
291 B
Markdown
Raw Permalink Normal View History

---
language:
- en
pipeline_tag: text-generation
tags:
- SafetyAlignment
---
Trained by https://github.com/YuanBoXie/DeepRefusal
[1] [Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction, EMNLP 2025](https://arxiv.org/abs/2509.15202)