6bc9ab11a1118a3e48d6eb4ec5bbf2164e0ae83f
Model: kenzrx/dpo-qwen-cot-merged Source: Original Platform
base_model, datasets, language, license, library_name, pipeline_tag, tags
| base_model | datasets | language | license | library_name | pipeline_tag | tags | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| kenzrx/qwen3-4b-sft-merged |
|
|
apache-2.0 | transformers | text-generation |
|
qwen3-4b-instruct-2507-sft-dpo-qwen-cot-merged
This repository provides full-merged 16-bit weights (no adapter loading required).
What this model is
This model was trained in two stages:
- SFT (Supervised Fine-Tuning) to learn high-quality reference answers / formatting
- DPO (Direct Preference Optimization) to align outputs toward preferred responses
Lineage
- Original base: Qwen/Qwen3-4B-Instruct-2507
- Stage 1 (SFT) output (merged): kenzrx/qwen3-4b-sft-merged
- Stage 2 (DPO) output (this repo): merged 16-bit weights
Training Objective (DPO)
The DPO stage optimizes the model to prefer chosen outputs over rejected outputs given the same prompt, improving response alignment and structured quality.
Training Configuration (DPO)
- Start model (SFT merged): kenzrx/qwen3-4b-sft-merged
- Method: DPO (Direct Preference Optimization)
- Epochs: 1
- Learning rate: 1e-07
- Beta: 0.1
- Max sequence length: 1024
- LoRA Config (during training): r=8, alpha=16 (merged into final 16-bit weights)
Datasets
- SFT dataset: structured_data_with_cot_dataset_v2
- DPO preference dataset: structured_data_with_cot_dataset_v2
Usage
You can use this model directly with transformers.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "your_id/your-repo-name"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
)
prompt = "Your question here"
messages = [
{"role": "user", "content": prompt},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Description
Languages
Jinja
100%