Files
adv_sft3J_dpo_merged/README.md
ModelHub XC 67fc5203c1 初始化项目,由ModelHub XC社区提供模型
Model: Hi-Satoh/adv_sft3J_dpo_merged
Source: Original Platform
2026-05-23 01:40:29 +08:00

56 lines
1.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- Hi-Satoh/test_dpo_dataset
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- dpo
- unsloth
- qwen
- alignment
---
# 【課題】qwen3-4b-dpo-qwen-cot-merged
This model is a fine-tuned version of **Qwen/Qwen3-4B-Instruct-2507** using **Direct Preference Optimization (DPO)** via the **Unsloth** library.
This repository contains the **full-merged 16-bit weights**. No adapter loading is required.
## Training Objective
This model has been optimized using DPO to align its responses with preferred outputs, focusing on improving reasoning (Chain-of-Thought) and structured response quality based on the provided preference dataset.
## Training Configuration
- **Base model**: Qwen/Qwen3-4B-Instruct-2507
- **Method**: DPO (Direct Preference Optimization)
- **Epochs**: 2
- **Learning rate**: 1e-06
- **Beta**: 0.05
- **Max sequence length**: 4096
- **LoRA Config**: r=8, alpha=16 (merged into base)
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "Hi-Satoh/adv_sft3J_dpo_merged"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
```
## Sources & License (IMPORTANT)
* **Training Data**: [Hi-Satoh/test_dpo_dataset]
* **License**: MIT License. (As per dataset terms).
* **Compliance**: Users must follow the original base model's license terms.