--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/dpo-dataset-qwen-cot language: - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - dpo - unsloth - qwen - alignment --- # dpo-qwen-y-v35 DPO fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507**. Full-merged 16-bit weights. No adapter loading required. ## Training Configuration - **Method**: DPO - **Epochs**: 1 - **Learning rate**: 1e-07 - **Beta**: 0.1 - **Max sequence length**: 1024