初始化项目,由ModelHub XC社区提供模型

Model: taki555/Qwen3-4B-Instruct-2507-Art
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-02 20:29:07 +08:00
commit c3f6550f63
14 changed files with 152316 additions and 0 deletions

35
README.md Normal file
View File

@@ -0,0 +1,35 @@
---
base_model:
- Qwen/Qwen3-4B-Instruct-2507
datasets:
- taki555/DeepScaleR-Easy
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
---
# Art-Qwen3-4B-Instruct-2507
This is the CoT (Chain-of-Thought) efficient version of the **Qwen3-4B-Instruct-2507** model, developed as part of the research presented in the paper [The Art of Efficient Reasoning: Data, Reward, and Optimization](https://huggingface.co/papers/2602.20945).
## Model Description
Art-Qwen3-4B is optimized to produce short yet accurate reasoning trajectories. By using reward shaping and Reinforcement Learning (RL), the training process follows a two-stage paradigm: length adaptation and reasoning refinement. This approach aims to provide the benefits of scaled reasoning while minimizing the heavy computational overhead typically associated with long CoT outputs.
The model was trained on the [DeepScaleR-Easy](https://huggingface.co/datasets/taki555/DeepScaleR-Easy) dataset.
- **Project Page:** [https://wutaiqiang.github.io/project/Art](https://wutaiqiang.github.io/project/Art)
- **Paper:** [The Art of Efficient Reasoning: Data, Reward, and Optimization](https://huggingface.co/papers/2602.20945)
## Citation
```bibtex
@inproceedings{wu2026art,
title={The Art of Efficient Reasoning: Data, Reward, and Optimization},
author={Taiqiang Wu and Zenan Xu and Bo Zhou and Ngai Wong},
year={2026},
url={https://arxiv.org/pdf/2602.20945}
}
```