初始化项目，由ModelHub XC社区提供模型

Model: AI-ISL/DeepSeek-R1-Distill-Llama-8B-SP Source: Original Platform
2026-05-06 07:45:54 +08:00
commit 5b6d13bb29
12 changed files with 2520 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,37 @@
+---
+license: apache-2.0
+tags:
+  - chain-of-thought
+  - safety
+  - alignment
+  - reasoning
+  - large-language-model
+library_name: transformers
+inference: true
+---
+
+# SAFEPATH-R-8B
+
+This model is the **SAFEPATH-aligned version of DeepSeek-R1-Distill-Llama-8B**, fine-tuned using prefix-only safety priming.
+
+## Model Description
+
+SAFEPATH applies a minimal alignment technique by inserting the phrase: *Let's think about safety first* (Safety Primer) at the beginning of the reasoning block. This encourages the model to engage in safer reasoning without reducing its reasoning performance.
+
+- 🔐 **Improved Safety**: Reduces harmful outputs (e.g., StrongReject, BeaverTails) and is robust to jailbreak attacks
+- 🧠 **Preserved Reasoning**: Maintains accuracy on MATH500, GPQA, and AIME24
+- ⚡ **Efficiency**: Fine-tuned with only 20 steps
+
+## Intended Use
+
+This model is intended for research in:
+- Safety alignment in Large Reasoning Models (LRMs)
+- Robust reasoning under adversarial settings
+- Chain-of-thought alignment studies
+
+For details, see our [paper](https://arxiv.org/pdf/2505.14667).
+
+## Overview Results
+<p align="left">
+  <img src="https://github.com/AI-ISL/AI-ISL.github.io/blob/main/static/images/safepath/main_results.png?raw=true" width="800"/>
+</p>