初始化项目,由ModelHub XC社区提供模型
Model: lllyx/Qwen3-1.7B-SFT Source: Original Platform
This commit is contained in:
139
README.md
Normal file
139
README.md
Normal file
@@ -0,0 +1,139 @@
|
||||
---
|
||||
license: other
|
||||
language:
|
||||
- en
|
||||
- zh
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- qwen
|
||||
- qwen3
|
||||
- math
|
||||
- sft
|
||||
- supervised-fine-tuning
|
||||
- full-finetuning
|
||||
- reasoning
|
||||
base_model: Qwen/Qwen3-1.7B-Base
|
||||
---
|
||||
|
||||
<h1 align="center">Qwen3-1.7B-SFT</h1>
|
||||
|
||||
<div align="center" style="line-height: 1;">
|
||||
<a href="https://arxiv.org/abs/2604.13016" style="margin: 2px;">
|
||||
<img alt="Paper" src="https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
||||
</a>
|
||||
<a href="https://github.com/thunlp/OPD" style="margin: 2px;">
|
||||
<img alt="Github" src="https://img.shields.io/badge/OPD-000000?style=for-the-badge&logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
||||
</a>
|
||||
<a href="https://huggingface.co/papers/2604.13016" style="margin: 2px;">
|
||||
<img alt="HF Papers" src="https://img.shields.io/badge/HF--Paper-%23FFD14D?style=for-the-badge&logo=huggingface&logoColor=black" style="display: inline-block; vertical-align: middle;"/>
|
||||
</a>
|
||||
<a href="https://x.com/HBX_hbx/status/2044464414829777354" style="margin: 2px;">
|
||||
<img alt="Twitter" src="https://img.shields.io/badge/Twitter-%23000000.svg?style=for-the-badge&logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
||||
</a>
|
||||
</div>
|
||||
|
||||
<br>
|
||||
|
||||
Qwen3-1.7B-SFT is a supervised fine-tuned model based on **Qwen3-1.7B-Base**, trained on the **DeepMath-4B** dataset for mathematical reasoning and problem-solving.
|
||||
|
||||
This model is associated with the paper:
|
||||
**Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe**
|
||||
Paper link: https://arxiv.org/abs/2604.13016
|
||||
|
||||
## Model Description
|
||||
|
||||
This model is obtained by full-parameter supervised fine-tuning (SFT) from `Qwen3-1.7B-Base`.
|
||||
The training is designed to improve the model's performance on math-focused instruction-following and reasoning tasks.
|
||||
This model is intended in **Section 5.1. Off-Policy Distillation from Teacher Rollouts as Cold Start**.
|
||||
|
||||
### Key characteristics
|
||||
|
||||
- **Base model**: Qwen3-1.7B-Base
|
||||
- **Training stage**: Supervised Fine-Tuning (SFT)
|
||||
- **Finetuning type**: Full finetuning
|
||||
- **Primary domain**: Mathematical reasoning
|
||||
- **Thinking mode**: Disabled during training (`enable_thinking: false`)
|
||||
- **Context length**: 20480 tokens
|
||||
|
||||
## Training Details
|
||||
|
||||
### Training configuration
|
||||
|
||||
- **Framework**: LLaMA-Factory
|
||||
- **Stage**: `sft`
|
||||
- **Finetuning type**: `full`
|
||||
- **DeepSpeed config**: `ds_z2_config.json`
|
||||
- **Kernel optimization**: `enable_liger_kernel: True`
|
||||
- **Precision**: `bf16`
|
||||
- **Gradient checkpointing**: enabled
|
||||
- **Learning rate**: `1e-5`
|
||||
- **Scheduler**: cosine
|
||||
- **Warmup ratio**: `0.1`
|
||||
- **Number of epochs**: `2.0`
|
||||
- **Per-device train batch size**: `16`
|
||||
- **Gradient accumulation steps**: `1`
|
||||
- **Validation split**: `0.01`
|
||||
- **Evaluation strategy**: every `100` steps
|
||||
- **Save strategy**: every `100` steps
|
||||
|
||||
### Dataset
|
||||
|
||||
- **Training dataset**: `deep_math_4b`
|
||||
|
||||
## Training Hyperparameters
|
||||
|
||||
For reproducibility, the core configuration is summarized below:
|
||||
|
||||
```
|
||||
model_name_or_path: ../model/Qwen3-1.7B-Base
|
||||
trust_remote_code: true
|
||||
|
||||
stage: sft
|
||||
do_train: true
|
||||
finetuning_type: full
|
||||
deepspeed: examples/deepspeed/ds_z2_config.json
|
||||
enable_liger_kernel: true
|
||||
|
||||
dataset: deep_math_4b
|
||||
template: qwen3
|
||||
enable_thinking: false
|
||||
cutoff_len: 20480
|
||||
preprocessing_num_workers: 64
|
||||
dataloader_num_workers: 32
|
||||
|
||||
output_dir: ../model/Qwen3-1.7B-Base-SFT-DeepMath-4B
|
||||
logging_steps: 5
|
||||
save_steps: 100
|
||||
plot_loss: true
|
||||
overwrite_output_dir: true
|
||||
save_only_model: true
|
||||
report_to: swanlab
|
||||
|
||||
per_device_train_batch_size: 16
|
||||
gradient_accumulation_steps: 1
|
||||
gradient_checkpointing: true
|
||||
learning_rate: 1.0e-5
|
||||
num_train_epochs: 2.0
|
||||
lr_scheduler_type: cosine
|
||||
warmup_ratio: 0.1
|
||||
bf16: true
|
||||
ddp_timeout: 180000000
|
||||
|
||||
val_size: 0.01
|
||||
per_device_eval_batch_size: 1
|
||||
eval_strategy: steps
|
||||
eval_steps: 100
|
||||
```
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this model, please consider citing the related paper:
|
||||
```
|
||||
@article{li2026rethinking,
|
||||
title={Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe},
|
||||
author={Li, Yaxuan and Zuo, Yuxin and He, Bingxiang and Zhang, Jinqian and Xiao, Chaojun and Qian, Cheng and Yu, Tianyu and Gao, Huan-ang and Yang, Wenkai and Liu, Zhiyuan and Ding, Ning},
|
||||
journal={arXiv preprint arXiv:2604.13016},
|
||||
year={2026}
|
||||
}
|
||||
```
|
||||
Paper: https://arxiv.org/abs/2604.13016
|
||||
Reference in New Issue
Block a user