初始化项目，由ModelHub XC社区提供模型

Model: normster/RealGuardrails-Qwen2.5-7B-SFT-DPO Source: Original Platform
2026-05-13 02:11:34 +08:00
commit bb5e2f77a4
9 changed files with 151697 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,30 @@
+---
+license: mit
+datasets:
+- normster/RealGuardrails
+base_model:
+- Qwen/Qwen2.5-7B
+- normster/RealGuardrails-Qwen2.5-7B-SFT
+library_name: transformers
+---
+
+# RealGuardrails Models
+
+This model was trained on the [RealGuardrails](https://huggingface.co/datasets/normster/RealGuardrails) dataset, an instruction-tuning dataset focused on improving system prompt adherence and precedence. In particular, it was trained via SFT on the `systemmix` split (150K examples) using our custom training library [torchllms](https://github.com/normster/torchllms) (yielding [normster/RealGuardrails-Qwen2.5-7B-SFT](https://huggingface.co/normster/RealGuardrails-Qwen2.5-7B-SFT)), and then trained via DPO on the `preferencemix` split (30K examples), and converted back to a `transformers` compatible checkpoint.
+
+## Training Hyperparameters
+
+| Name | Value |
+| :--- | :--- |
+| DPO beta | 0.01 |
+| optimizer | AdamW |
+| batch size | 128 |
+| learning rate | 1e-5 |
+| lr scheduler | cosine with 50 warmup steps |
+| betas | (0.9, 0.999) |
+| eps | 1e-8 |
+| weight decay | 0 |
+| epochs | 1 |
+| max grad norm | 1.0 |
+| precision | bf16 |
+| max length | 4096 |