Files
dpo-qwen-y-v35/README.md
ModelHub XC 9e3ca5e370 初始化项目,由ModelHub XC社区提供模型
Model: yamaTK/dpo-qwen-y-v35
Source: Original Platform
2026-05-22 04:44:18 +08:00

28 lines
493 B
Markdown

---
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- u-10bei/dpo-dataset-qwen-cot
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- dpo
- unsloth
- qwen
- alignment
---
# dpo-qwen-y-v35
DPO fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507**.
Full-merged 16-bit weights. No adapter loading required.
## Training Configuration
- **Method**: DPO
- **Epochs**: 1
- **Learning rate**: 1e-07
- **Beta**: 0.1
- **Max sequence length**: 1024