base_model, datasets, language, license, library_name, pipeline_tag, tags
| base_model |
datasets |
language |
license |
library_name |
pipeline_tag |
tags |
| Qwen/Qwen3-4B-Instruct-2507 |
| u-10bei/dpo-dataset-qwen-cot |
|
|
apache-2.0 |
transformers |
text-generation |
| dpo |
| unsloth |
| qwen |
| alignment |
|
exp033-dpo-wd005-merged
SFT + DPO merged model. Full 16-bit weights, no adapter loading required.
Training Pipeline
- SFT: tomofusa/exp015-blend-h-lora
- DPO: u-10bei/dpo-dataset-qwen-cot (1 epoch, lr=5e-07, beta=0.1)
DPO Configuration
- Learning rate: 5e-07
- Beta: 0.1
- Loss type: ipo
- LoRA: r=64, alpha=128
- Max length: 1024