Files
qwen3-4b-nako13-dpo-qwen-co…/README.md
ModelHub XC 859a65e14b 初始化项目,由ModelHub XC社区提供模型
Model: nakotsuko13/qwen3-4b-nako13-dpo-qwen-cot-merged
Source: Original Platform
2026-05-29 13:47:04 +08:00

70 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- u-10bei/dpo-dataset-qwen-cot
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- dpo
- unsloth
- qwen
- alignment
- structured-output
---
# qwen3-4b-nako13-dpo-qwen-cot-merged
This model is a high-performance variant of **Qwen/Qwen3-4B-Instruct-2507**, optimized for precise structured data generation.
It was developed through a **two-stage fine-tuning process** to ensure both high knowledge density and strict output formatting.
## Training Process
1. **Stage 1: SFT (Supervised Fine-Tuning)**
- **Base Model**: Qwen/Qwen3-4B-Instruct-2507
- **Adapter**: [nakotsuko13/qwen3-4b-nako13-structured-output-lora](https://huggingface.co/nakotsuko13/qwen3-4b-nako13-structured-output-lora)
- **Focus**: Trained on 16,500+ samples to master JSON, XML, CSV, and YAML structures.
2. **Stage 2: DPO (Direct Preference Optimization)**
- **Dataset**: u-10bei/dpo-dataset-qwen-cot
- **Focus**: Optimized to eliminate conversational filler and provide direct, raw structured outputs.
## Training Configuration (DPO)
- **Method**: DPO (Direct Preference Optimization)
- **Epochs**: 1
- **Learning rate**: 5e-07
- **Beta**: 0.01
- **Max sequence length**: 1024
- **LoRA Config**: r=64, alpha=128 (Merged into final weights)
## Usage
This is a **full-merged 16-bit model**. It can be used directly with standard `transformers` or `vLLM`.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = nakotsuko13/qwen3-4b-nako13-dpo-qwen-cot-merged
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Test inference
prompt = "Your question here"
inputs = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
```
## Sources & License (IMPORTANT)
* **Training Data**: [u-10bei/dpo-dataset-qwen-cot]
* **License**: MIT License. (As per dataset terms).
* **Compliance**: Users must follow the original base model's license terms.