base_model, datasets, language, license, library_name, pipeline_tag, tags
base_model datasets language license library_name pipeline_tag tags
kenzrx/qwen3-4b-sft-merged
structured_data_with_cot_dataset_v2
structured_data_with_cot_dataset_v2
en
apache-2.0 transformers text-generation
qwen
unsloth
transformers
text-generation
lora
merged
dpo
alignment
sft

qwen3-4b-instruct-2507-sft-dpo-qwen-cot-merged

This repository provides full-merged 16-bit weights (no adapter loading required).

What this model is

This model was trained in two stages:

  1. SFT (Supervised Fine-Tuning) to learn high-quality reference answers / formatting
  2. DPO (Direct Preference Optimization) to align outputs toward preferred responses

Lineage

  • Original base: Qwen/Qwen3-4B-Instruct-2507
  • Stage 1 (SFT) output (merged): kenzrx/qwen3-4b-sft-merged
  • Stage 2 (DPO) output (this repo): merged 16-bit weights

Training Objective (DPO)

The DPO stage optimizes the model to prefer chosen outputs over rejected outputs given the same prompt, improving response alignment and structured quality.

Training Configuration (DPO)

  • Start model (SFT merged): kenzrx/qwen3-4b-sft-merged
  • Method: DPO (Direct Preference Optimization)
  • Epochs: 1
  • Learning rate: 1e-07
  • Beta: 0.1
  • Max sequence length: 1024
  • LoRA Config (during training): r=8, alpha=16 (merged into final 16-bit weights)

Datasets

  • SFT dataset: structured_data_with_cot_dataset_v2
  • DPO preference dataset: structured_data_with_cot_dataset_v2

Usage

You can use this model directly with transformers.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "your_id/your-repo-name"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

prompt = "Your question here"
messages = [
    {"role": "user", "content": prompt},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Description
Model synced from source: kenzrx/dpo-qwen-cot-merged
Readme 2 MiB
Languages
Jinja 100%