Model: yamaTK/dpo-qwen-y-v35 Source: Original Platform
base_model, datasets, language, license, library_name, pipeline_tag, tags
| base_model | datasets | language | license | library_name | pipeline_tag | tags | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen/Qwen3-4B-Instruct-2507 |
|
|
apache-2.0 | transformers | text-generation |
|
dpo-qwen-y-v35
DPO fine-tuned version of Qwen/Qwen3-4B-Instruct-2507. Full-merged 16-bit weights. No adapter loading required.
Training Configuration
- Method: DPO
- Epochs: 1
- Learning rate: 1e-07
- Beta: 0.1
- Max sequence length: 1024
Description
Languages
Jinja
100%