ModelHub XC 93149644d8 初始化项目,由ModelHub XC社区提供模型
Model: pat-jj/s3-8-3-3-20steps
Source: Original Platform
2026-04-22 10:37:50 +08:00

library_name, tags, model-index
library_name tags model-index
transformers
reinforcement-learning
retrieval
search-agent
s3
on-policy
research
name results
s3-8-3-3-20steps

s3-8-3-3-20steps

s3 is a reinforcement-learningtrained search agent that learns to plan retrieval and answer questions efficiently. This release provides weights for research replication only. For usage, training, and evaluation follow our GitHub repo (we intentionally do not include inference snippets here).

  • 📄 Reference: “s3: You Dont Need That Much Data to Train a Search Agent via RL” (EMNLP 2025 Main).
  • 🧑‍💻 GitHub: https://github.com/pat-jj/s3

What is in this repo?

A Hugging Face model folder with tokenizer files and sharded *.safetensors checkpoints exported from our VERL training runs (the “actor” policy). File layout mirrors the training outputs (e.g., config.json, tokenizer.json, and model-00001-of-00004.safetensors, etc.).

Important notes

  • We highly recommend training the model yourself via the GitHub repo. In our experience, testing/inference time can be much much heavier than training time.
  • Do not treat these weights as a drop-in general QA system; they are intended for the s3 pipelines described in the paper and codebase.
  • To run anything, please follow the GitHub instructions end-to-end (env setup, datasets, evaluation scripts, and RL configs).

Intended use & limitations

Research replication, ablations, and educational study of on-policy RL for retrieval-augmented search agents. Commercial or safety-critical use is not advised without extensive review and additional safeguards.

Citation

@inproceedings{jiang2025s3,
  title = {s3: You Don't Need That Much Data to Train a Search Agent via RL},
  author = {Jiang, Pengcheng and Xu, Xueqiang and Lin, Jiacheng and Xiao, Jinfeng and Wang, Zifeng and Sun, Jimeng and Han, Jiawei},
  year = {2025},
  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
}

Last updated: 2025-09-29

Description
Model synced from source: pat-jj/s3-8-3-3-20steps
Readme 2 MiB