Art-Qwen3-1.7B is optimized for efficient reasoning, aiming to produce short yet accurate thinking trajectories. It was trained using Reinforcement Learning (RL) with specialized reward shaping on the DeepScaleR-Easy dataset. The training follows a two-stage paradigm involving length adaptation and reasoning refinement to maintain high accuracy while reducing computational overhead.
@inproceedings{wu2026art,title={The Art of Efficient Reasoning: Data, Reward, and Optimization},author={Taiqiang Wu and Zenan Xu and Bo Zhou and Ngai Wong},year={2026},url={https://arxiv.org/pdf/2602.20945}}