Model: Yurori/qwen3-4b-dpo-qwen-cot-merged Source: Original Platform
This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO) via the Unsloth library. This repository contains the full-merged 16-bit weights. No adapter loading is required.