初始化项目,由ModelHub XC社区提供模型
Model: AI-ISL/DeepSeek-R1-Distill-Llama-8B-SP Source: Original Platform
This commit is contained in:
37
README.md
Normal file
37
README.md
Normal file
@@ -0,0 +1,37 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
tags:
|
||||
- chain-of-thought
|
||||
- safety
|
||||
- alignment
|
||||
- reasoning
|
||||
- large-language-model
|
||||
library_name: transformers
|
||||
inference: true
|
||||
---
|
||||
|
||||
# SAFEPATH-R-8B
|
||||
|
||||
This model is the **SAFEPATH-aligned version of DeepSeek-R1-Distill-Llama-8B**, fine-tuned using prefix-only safety priming.
|
||||
|
||||
## Model Description
|
||||
|
||||
SAFEPATH applies a minimal alignment technique by inserting the phrase: *Let's think about safety first* (Safety Primer) at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.
|
||||
|
||||
- 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks
|
||||
- 🧠 **Preserved Reasoning**: Maintains accuracy on MATH500, GPQA, and AIME24
|
||||
- ⚡ **Efficiency**: Fine-tuned with only 20 steps
|
||||
|
||||
## Intended Use
|
||||
|
||||
This model is intended for research in:
|
||||
- Safety alignment in Large Reasoning Models (LRMs)
|
||||
- Robust reasoning under adversarial settings
|
||||
- Chain-of-thought alignment studies
|
||||
|
||||
For details, see our [paper](https://arxiv.org/pdf/2505.14667).
|
||||
|
||||
## Overview Results
|
||||
<p align="left">
|
||||
<img src="https://github.com/AI-ISL/AI-ISL.github.io/blob/main/static/images/safepath/main_results.png?raw=true" width="800"/>
|
||||
</p>
|
||||
Reference in New Issue
Block a user