Files
s3-8-3-3-20steps/README.md
ModelHub XC 93149644d8 初始化项目,由ModelHub XC社区提供模型
Model: pat-jj/s3-8-3-3-20steps
Source: Original Platform
2026-04-22 10:37:50 +08:00

53 lines
2.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
library_name: transformers
tags:
- reinforcement-learning
- retrieval
- search-agent
- s3
- on-policy
- research
model-index:
- name: s3-8-3-3-20steps
results: []
---
# s3-8-3-3-20steps
**s3** is a reinforcement-learningtrained search agent that learns to plan retrieval and answer questions efficiently.
This release provides weights **for research replication** only. For usage, training, and evaluation **follow our GitHub repo** (we intentionally do not include inference snippets here).
- 📄 Reference: *“s3: You Dont Need That Much Data to Train a Search Agent via RL”* (EMNLP 2025 Main).
- 🧑‍💻 GitHub: https://github.com/pat-jj/s3
## What is in this repo?
A Hugging Face model folder with tokenizer files and sharded `*.safetensors` checkpoints exported from our VERL training runs (the “actor” policy).
File layout mirrors the training outputs (e.g., `config.json`, `tokenizer.json`, and `model-00001-of-00004.safetensors`, etc.).
## Important notes
- **We highly recommend training the model yourself via the GitHub repo.** In our experience, **testing/inference time can be much much heavier than training time**.
- **Do not** treat these weights as a drop-in general QA system; they are intended for the s3 pipelines described in the paper and codebase.
- **To run anything**, please **follow the GitHub instructions** end-to-end (env setup, datasets, evaluation scripts, and RL configs).
## Intended use & limitations
Research replication, ablations, and educational study of on-policy RL for retrieval-augmented search agents.
Commercial or safety-critical use is **not** advised without extensive review and additional safeguards.
## Citation
```bibtex
@inproceedings{jiang2025s3,
title = {s3: You Don't Need That Much Data to Train a Search Agent via RL},
author = {Jiang, Pengcheng and Xu, Xueqiang and Lin, Jiacheng and Xiao, Jinfeng and Wang, Zifeng and Sun, Jimeng and Han, Jiawei},
year = {2025},
booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
}
```
*Last updated:* 2025-09-29