--- license: mit library_name: transformers pipeline_tag: text-generation base_model: Qwen/Qwen2.5-1.5B-Instruct tags: - tool-use - reinforcement-learning - agent - travel-planning --- # Agent-STAR-RL-1.5B This repository contains the **Agent-STAR-RL-1.5B** model, which is part of the research presented in the paper "[Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe](https://huggingface.co/papers/2603.21972)". Agent-STAR is a systematic study of the reinforcement learning (RL) design space for long-horizon tool-using agents using the [TravelPlanner](https://github.com/OSU-NLP-Group/TravelPlanner/) testbed. The model is trained using the **STAR** pipeline: **Data Synthesis → SFT → RL**. ## Model Details - **Backbone:** Qwen2.5-1.5B-Instruct - **Training Stage:** Reinforcement Learning (RL) - **Primary Task:** Long-horizon tool orchestration and planning. - **Paper:** [Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe](https://huggingface.co/papers/2603.21972) - **Repository:** [GitHub - Agent-STAR](https://github.com/WxxShirley/Agent-STAR) - **Dataset:** [Agent-STAR-TravelDataset](https://huggingface.co/datasets/xxwu/Agent-STAR-TravelDataset) According to the paper's findings, smaller models like this 1.5B variant benefit from scale-aware recipes including staged (curriculum-style) rewards and enhanced exploration to handle the complex constraints of multi-turn environments. ## Usage To run ReAct inference using the official implementation, you can use the following command structure: ```shell cd Inference python3 -u main.py \ --model xxwu/Agent-STAR-RL-1.5B \ --save_suffix your_suffix \ --max_workers 20 \ --split validation \ --max_context 32768 \ --max_turns 60 ``` Note: You will need to prepare the [travel database](https://huggingface.co/datasets/xxwu/Agent-STAR-TravelDatabase) as described in the GitHub repository. ## Citation If you find Agent-STAR helpful to your work, please cite the following: ```bibtex @misc{wu2026agentstar, title={Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe}, author={Xixi Wu and Qianguo Sun and Ruiyang Zhang and Chao Song and Junlong Wu and Yiyan Qi and Hong Cheng}, year={2026}, eprint={2603.21972}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2603.21972}, } ``` ## Acknowledgements We thank the authors of [TravelPlanner](https://github.com/OSU-NLP-Group/TravelPlanner/) for their benchmark and the [rLLM](https://github.com/rllm-org/rllm/) framework contributors for supporting the RL training process.