nakotsuko13/qwen3-4b-nako13-dpo-qwen-cot-merged

Go to file

ModelHub XC 859a65e14b 初始化项目，由ModelHub XC社区提供模型

Model: nakotsuko13/qwen3-4b-nako13-dpo-qwen-cot-merged
Source: Original Platform

2026-05-29 13:47:04 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

model-00001-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

model-00002-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-29 13:47:04 +08:00

README.md

base_model, datasets, language, license, library_name, pipeline_tag, tags

base_model

datasets

language

license

library_name

pipeline_tag

＜qwen3-4b-nako13-dpo-qwen-cot-merged＞

This model is a high-performance variant of Qwen/Qwen3-4B-Instruct-2507, optimized for precise structured data generation. It was developed through a two-stage fine-tuning process to ensure both high knowledge density and strict output formatting.

Training Process

Stage 1: SFT (Supervised Fine-Tuning)
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Adapter: nakotsuko13/qwen3-4b-nako13-structured-output-lora
- Focus: Trained on 16,500+ samples to master JSON, XML, CSV, and YAML structures.
Stage 2: DPO (Direct Preference Optimization)
- Dataset: u-10bei/dpo-dataset-qwen-cot
- Focus: Optimized to eliminate conversational filler and provide direct, raw structured outputs.

Training Configuration (DPO)

Method: DPO (Direct Preference Optimization)
Epochs: 1
Learning rate: 5e-07
Beta: 0.01
Max sequence length: 1024
LoRA Config: r=64, alpha=128 (Merged into final weights)

Usage

This is a full-merged 16-bit model. It can be used directly with standard transformers or vLLM.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = nakotsuko13/qwen3-4b-nako13-dpo-qwen-cot-merged

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Test inference
prompt = "Your question here"
inputs = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Sources & License (IMPORTANT)

Training Data: [u-10bei/dpo-dataset-qwen-cot]
License: MIT License. (As per dataset terms).
Compliance: Users must follow the original base model's license terms.

README.md Unescape Escape

＜qwen3-4b-nako13-dpo-qwen-cot-merged＞

Training Process

Training Configuration (DPO)

Usage

Sources & License (IMPORTANT)

README.md