--- base_model: kenzrx/qwen3-4b-sft-merged datasets: - structured_data_with_cot_dataset_v2 - structured_data_with_cot_dataset_v2 language: - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - qwen - unsloth - transformers - text-generation - lora - merged - dpo - alignment - sft --- # qwen3-4b-instruct-2507-sft-dpo-qwen-cot-merged This repository provides **full-merged 16-bit weights** (no adapter loading required). ## What this model is This model was trained in **two stages**: 1) **SFT (Supervised Fine-Tuning)** to learn high-quality reference answers / formatting 2) **DPO (Direct Preference Optimization)** to align outputs toward preferred responses ### Lineage - **Original base**: Qwen/Qwen3-4B-Instruct-2507 - **Stage 1 (SFT) output (merged)**: kenzrx/qwen3-4b-sft-merged - **Stage 2 (DPO) output (this repo)**: merged 16-bit weights ## Training Objective (DPO) The DPO stage optimizes the model to prefer **chosen** outputs over **rejected** outputs given the same prompt, improving response alignment and structured quality. ## Training Configuration (DPO) - **Start model (SFT merged)**: kenzrx/qwen3-4b-sft-merged - **Method**: DPO (Direct Preference Optimization) - **Epochs**: 1 - **Learning rate**: 1e-07 - **Beta**: 0.1 - **Max sequence length**: 1024 - **LoRA Config (during training)**: r=8, alpha=16 (merged into final 16-bit weights) ## Datasets - **SFT dataset**: structured_data_with_cot_dataset_v2 - **DPO preference dataset**: structured_data_with_cot_dataset_v2 ## Usage You can use this model directly with `transformers`. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "your_id/your-repo-name" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto", ) prompt = "Your question here" messages = [ {"role": "user", "content": prompt}, ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True))