--- library_name: transformers tags: - reinforcement-learning - retrieval - search-agent - s3 - on-policy - research model-index: - name: s3-8-3-3-20steps results: [] --- # s3-8-3-3-20steps **s3** is a reinforcement-learning–trained search agent that learns to plan retrieval and answer questions efficiently. This release provides weights **for research replication** only. For usage, training, and evaluation **follow our GitHub repo** (we intentionally do not include inference snippets here). - 📄 Reference: *“s3: You Don’t Need That Much Data to Train a Search Agent via RL”* (EMNLP 2025 Main). - 🧑‍💻 GitHub: https://github.com/pat-jj/s3 ## What is in this repo? A Hugging Face model folder with tokenizer files and sharded `*.safetensors` checkpoints exported from our VERL training runs (the “actor” policy). File layout mirrors the training outputs (e.g., `config.json`, `tokenizer.json`, and `model-00001-of-00004.safetensors`, etc.). ## Important notes - **We highly recommend training the model yourself via the GitHub repo.** In our experience, **testing/inference time can be much much heavier than training time**. - **Do not** treat these weights as a drop-in general QA system; they are intended for the s3 pipelines described in the paper and codebase. - **To run anything**, please **follow the GitHub instructions** end-to-end (env setup, datasets, evaluation scripts, and RL configs). ## Intended use & limitations Research replication, ablations, and educational study of on-policy RL for retrieval-augmented search agents. Commercial or safety-critical use is **not** advised without extensive review and additional safeguards. ## Citation ```bibtex @inproceedings{jiang2025s3, title = {s3: You Don't Need That Much Data to Train a Search Agent via RL}, author = {Jiang, Pengcheng and Xu, Xueqiang and Lin, Jiacheng and Xiao, Jinfeng and Wang, Zifeng and Sun, Jimeng and Han, Jiawei}, year = {2025}, booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing}, } ``` *Last updated:* 2025-09-29