--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/dpo-dataset-qwen-cot language: - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - dpo - unsloth - qwen - alignment --- # exp033-dpo-wd005-merged SFT + DPO merged model. Full 16-bit weights, no adapter loading required. ## Training Pipeline 1. **SFT**: tomofusa/exp015-blend-h-lora 2. **DPO**: u-10bei/dpo-dataset-qwen-cot (1 epoch, lr=5e-07, beta=0.1) ## DPO Configuration - **Learning rate**: 5e-07 - **Beta**: 0.1 - **Loss type**: ipo - **LoRA**: r=64, alpha=128 - **Max length**: 1024