Go to file

ModelHub XC c3f6550f63 初始化项目，由ModelHub XC社区提供模型

Model: taki555/Qwen3-4B-Instruct-2507-Art
Source: Original Platform

2026-05-02 20:29:07 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

model-00001-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

model-00002-of-00002.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-05-02 20:29:07 +08:00

README.md

base_model, datasets, language, license, library_name, pipeline_tag

base_model

datasets

language

license

library_name

pipeline_tag

Qwen/Qwen3-4B-Instruct-2507

taki555/DeepScaleR-Easy

apache-2.0

transformers

text-generation

Art-Qwen3-4B-Instruct-2507

This is the CoT (Chain-of-Thought) efficient version of the Qwen3-4B-Instruct-2507 model, developed as part of the research presented in the paper The Art of Efficient Reasoning: Data, Reward, and Optimization.

Model Description

Art-Qwen3-4B is optimized to produce short yet accurate reasoning trajectories. By using reward shaping and Reinforcement Learning (RL), the training process follows a two-stage paradigm: length adaptation and reasoning refinement. This approach aims to provide the benefits of scaled reasoning while minimizing the heavy computational overhead typically associated with long CoT outputs.

The model was trained on the DeepScaleR-Easy dataset.

Project Page: https://wutaiqiang.github.io/project/Art
Paper: The Art of Efficient Reasoning: Data, Reward, and Optimization

Citation

@inproceedings{wu2026art,
  title={The Art of Efficient Reasoning: Data, Reward, and Optimization},
  author={Taiqiang Wu and Zenan Xu and Bo Zhou and Ngai Wong},
  year={2026},
  url={https://arxiv.org/pdf/2602.20945}
}