base_model, datasets, language, license, library_name, pipeline_tag, tags
base_model datasets language license library_name pipeline_tag tags
Qwen/Qwen3-4B-Instruct-2507
u-10bei/dpo-dataset-qwen-cot
en
apache-2.0 transformers text-generation
dpo
unsloth
qwen
alignment

exp033-dpo-wd005-merged

SFT + DPO merged model. Full 16-bit weights, no adapter loading required.

Training Pipeline

  1. SFT: tomofusa/exp015-blend-h-lora
  2. DPO: u-10bei/dpo-dataset-qwen-cot (1 epoch, lr=5e-07, beta=0.1)

DPO Configuration

  • Learning rate: 5e-07
  • Beta: 0.1
  • Loss type: ipo
  • LoRA: r=64, alpha=128
  • Max length: 1024
Description
Model synced from source: tomofusa/exp033-dpo-wd005-merged
Readme 2 MiB
Languages
Jinja 100%