Qwen3-0.6B-Art/README.md

---
base_model:
- Qwen/Qwen3-0.6B
datasets:
- taki555/DeepScaleR-Easy
language:
- en
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
---

# Qwen3-0.6B-Art

This is the Chain-of-Thought (CoT) efficient version of the [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) model, trained on the [DeepScaleR-Easy](https://huggingface.co/datasets/taki555/DeepScaleR-Easy) dataset. 

This model was introduced in the paper [The Art of Efficient Reasoning: Data, Reward, and Optimization](https://huggingface.co/papers/2602.20945). Check the [Project Page](https://wutaiqiang.github.io/project/Art) for more details.

## Model Description
Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning but also suffer from heavy computational overhead. Efficient reasoning aims to incentivize short yet accurate thinking trajectories, typically through reward shaping with Reinforcement Learning (RL). 

This model follows a two-stage training paradigm: length adaptation and reasoning refinement. It is optimized to maintain a sufficient density of positive reward signals while avoiding the "short-is-correct" trap, demonstrating robust and generalized efficient reasoning capabilities.

## Citation

```bibtex
@inproceedings{wu2026art,
  title={The Art of Efficient Reasoning: Data, Reward, and Optimization},
  author={Taiqiang Wu and Zenan Xu and Bo Zhou and Ngai Wong},
  year={2026},
  url={https://arxiv.org/pdf/2602.20945}
}
```
初始化项目，由ModelHub XC社区提供模型 Model: taki555/Qwen3-0.6B-Art Source: Original Platform 2026-04-22 06:37:56 +08:00			`---`
			`base_model:`
			`- Qwen/Qwen3-0.6B`
			`datasets:`
			`- taki555/DeepScaleR-Easy`
			`language:`
			`- en`
			`license: apache-2.0`
			`pipeline_tag: text-generation`
			`library_name: transformers`
			`---`

			`# Qwen3-0.6B-Art`

			`This is the Chain-of-Thought (CoT) efficient version of the [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) model, trained on the [DeepScaleR-Easy](https://huggingface.co/datasets/taki555/DeepScaleR-Easy) dataset.`

			`This model was introduced in the paper [The Art of Efficient Reasoning: Data, Reward, and Optimization](https://huggingface.co/papers/2602.20945). Check the [Project Page](https://wutaiqiang.github.io/project/Art) for more details.`

			`## Model Description`
			`Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning but also suffer from heavy computational overhead. Efficient reasoning aims to incentivize short yet accurate thinking trajectories, typically through reward shaping with Reinforcement Learning (RL).`

			`This model follows a two-stage training paradigm: length adaptation and reasoning refinement. It is optimized to maintain a sufficient density of positive reward signals while avoiding the "short-is-correct" trap, demonstrating robust and generalized efficient reasoning capabilities.`

			`## Citation`

			```bibtex
			`@inproceedings{wu2026art,`
			`title={The Art of Efficient Reasoning: Data, Reward, and Optimization},`
			`author={Taiqiang Wu and Zenan Xu and Bo Zhou and Ngai Wong},`
			`year={2026},`
			`url={https://arxiv.org/pdf/2602.20945}`
			`}`
			```