2.2 KiB
2.2 KiB
base_model, datasets, language, license, library_name, pipeline_tag, tags
| base_model | datasets | language | license | library_name | pipeline_tag | tags | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| kenzrx/qwen3-4b-sft-merged |
|
|
apache-2.0 | transformers | text-generation |
|
qwen3-4b-instruct-2507-sft-dpo-qwen-cot-merged
This repository provides full-merged 16-bit weights (no adapter loading required).
What this model is
This model was trained in two stages:
- SFT (Supervised Fine-Tuning) to learn high-quality reference answers / formatting
- DPO (Direct Preference Optimization) to align outputs toward preferred responses
Lineage
- Original base: Qwen/Qwen3-4B-Instruct-2507
- Stage 1 (SFT) output (merged): kenzrx/qwen3-4b-sft-merged
- Stage 2 (DPO) output (this repo): merged 16-bit weights
Training Objective (DPO)
The DPO stage optimizes the model to prefer chosen outputs over rejected outputs given the same prompt, improving response alignment and structured quality.
Training Configuration (DPO)
- Start model (SFT merged): kenzrx/qwen3-4b-sft-merged
- Method: DPO (Direct Preference Optimization)
- Epochs: 1
- Learning rate: 1e-07
- Beta: 0.1
- Max sequence length: 1024
- LoRA Config (during training): r=8, alpha=16 (merged into final 16-bit weights)
Datasets
- SFT dataset: structured_data_with_cot_dataset_v2
- DPO preference dataset: structured_data_with_cot_dataset_v2
Usage
You can use this model directly with transformers.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "your_id/your-repo-name"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
)
prompt = "Your question here"
messages = [
{"role": "user", "content": prompt},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))