初始化项目,由ModelHub XC社区提供模型
Model: normster/RealGuardrails-Qwen2.5-7B-SFT-DPO Source: Original Platform
This commit is contained in:
30
README.md
Normal file
30
README.md
Normal file
@@ -0,0 +1,30 @@
|
||||
---
|
||||
license: mit
|
||||
datasets:
|
||||
- normster/RealGuardrails
|
||||
base_model:
|
||||
- Qwen/Qwen2.5-7B
|
||||
- normster/RealGuardrails-Qwen2.5-7B-SFT
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
# RealGuardrails Models
|
||||
|
||||
This model was trained on the [RealGuardrails](https://huggingface.co/datasets/normster/RealGuardrails) dataset, an instruction-tuning dataset focused on improving system prompt adherence and precedence. In particular, it was trained via SFT on the `systemmix` split (150K examples) using our custom training library [torchllms](https://github.com/normster/torchllms) (yielding [normster/RealGuardrails-Qwen2.5-7B-SFT](https://huggingface.co/normster/RealGuardrails-Qwen2.5-7B-SFT)), and then trained via DPO on the `preferencemix` split (30K examples), and converted back to a `transformers` compatible checkpoint.
|
||||
|
||||
## Training Hyperparameters
|
||||
|
||||
| Name | Value |
|
||||
| :--- | :--- |
|
||||
| DPO beta | 0.01 |
|
||||
| optimizer | AdamW |
|
||||
| batch size | 128 |
|
||||
| learning rate | 1e-5 |
|
||||
| lr scheduler | cosine with 50 warmup steps |
|
||||
| betas | (0.9, 0.999) |
|
||||
| eps | 1e-8 |
|
||||
| weight decay | 0 |
|
||||
| epochs | 1 |
|
||||
| max grad norm | 1.0 |
|
||||
| precision | bf16 |
|
||||
| max length | 4096 |
|
||||
Reference in New Issue
Block a user