初始化项目，由ModelHub XC社区提供模型

Model: xxwu/Agent-STAR-RL-1.5B Source: Original Platform
2026-05-30 03:24:20 +08:00
commit e6e083e71c
11 changed files with 151826 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,65 @@
+---
+license: mit
+library_name: transformers
+pipeline_tag: text-generation
+base_model: Qwen/Qwen2.5-1.5B-Instruct
+tags:
+- tool-use
+- reinforcement-learning
+- agent
+- travel-planning
+---
+
+# Agent-STAR-RL-1.5B
+
+This repository contains the **Agent-STAR-RL-1.5B** model, which is part of the research presented in the paper "[Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe](https://huggingface.co/papers/2603.21972)".
+
+Agent-STAR is a systematic study of the reinforcement learning (RL) design space for long-horizon tool-using agents using the [TravelPlanner](https://github.com/OSU-NLP-Group/TravelPlanner/) testbed. The model is trained using the **STAR** pipeline: **Data Synthesis → SFT → RL**.
+
+## Model Details
+
+- **Backbone:** Qwen2.5-1.5B-Instruct
+- **Training Stage:** Reinforcement Learning (RL)
+- **Primary Task:** Long-horizon tool orchestration and planning.
+- **Paper:** [Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe](https://huggingface.co/papers/2603.21972)
+- **Repository:** [GitHub - Agent-STAR](https://github.com/WxxShirley/Agent-STAR)
+- **Dataset:** [Agent-STAR-TravelDataset](https://huggingface.co/datasets/xxwu/Agent-STAR-TravelDataset)
+
+According to the paper's findings, smaller models like this 1.5B variant benefit from scale-aware recipes including staged (curriculum-style) rewards and enhanced exploration to handle the complex constraints of multi-turn environments.
+
+## Usage
+
+To run ReAct inference using the official implementation, you can use the following command structure:
+
+```shell
+cd Inference
+python3 -u main.py \
+  --model xxwu/Agent-STAR-RL-1.5B \
+  --save_suffix your_suffix \
+  --max_workers 20 \
+  --split validation \
+  --max_context 32768 \
+  --max_turns 60 
+```
+
+Note: You will need to prepare the [travel database](https://huggingface.co/datasets/xxwu/Agent-STAR-TravelDatabase) as described in the GitHub repository.
+
+## Citation
+
+If you find Agent-STAR helpful to your work, please cite the following:
+
+```bibtex
+@misc{wu2026agentstar,
+      title={Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe}, 
+      author={Xixi Wu and Qianguo Sun and Ruiyang Zhang and Chao Song and Junlong Wu and Yiyan Qi and Hong Cheng},
+      year={2026},
+      eprint={2603.21972},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2603.21972}, 
+}
+```
+
+## Acknowledgements
+
+We thank the authors of [TravelPlanner](https://github.com/OSU-NLP-Group/TravelPlanner/) for their benchmark and the [rLLM](https://github.com/rllm-org/rllm/) framework contributors for supporting the RL training process.