12 lines
291 B
Markdown
12 lines
291 B
Markdown
|
|
---
|
||
|
|
language:
|
||
|
|
- en
|
||
|
|
pipeline_tag: text-generation
|
||
|
|
tags:
|
||
|
|
- SafetyAlignment
|
||
|
|
---
|
||
|
|
|
||
|
|
|
||
|
|
Trained by https://github.com/YuanBoXie/DeepRefusal
|
||
|
|
|
||
|
|
[1] [Beyond Surface Alignment: Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction, EMNLP 2025](https://arxiv.org/abs/2509.15202)
|