初始化项目，由ModelHub XC社区提供模型

Model: pat-jj/s3-8-3-3-20steps Source: Original Platform
2026-04-22 10:37:50 +08:00
commit 93149644d8
15 changed files with 152144 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,52 @@
+---
+library_name: transformers
+tags:
+  - reinforcement-learning
+  - retrieval
+  - search-agent
+  - s3
+  - on-policy
+  - research
+model-index:
+  - name: s3-8-3-3-20steps
+    results: []
+---
+
+# s3-8-3-3-20steps
+
+**s3** is a reinforcement-learning–trained search agent that learns to plan retrieval and answer questions efficiently.
+This release provides weights **for research replication** only. For usage, training, and evaluation **follow our GitHub repo** (we intentionally do not include inference snippets here).
+
+- 📄 Reference: *“s3: You Don’t Need That Much Data to Train a Search Agent via RL”* (EMNLP 2025 Main).
+- 🧑‍💻 GitHub: https://github.com/pat-jj/s3
+
+## What is in this repo?
+
+A Hugging Face model folder with tokenizer files and sharded `*.safetensors` checkpoints exported from our VERL training runs (the “actor” policy). 
+File layout mirrors the training outputs (e.g., `config.json`, `tokenizer.json`, and `model-00001-of-00004.safetensors`, etc.).
+
+## Important notes
+
+- **We highly recommend training the model yourself via the GitHub repo.** In our experience, **testing/inference time can be much much heavier than training time**.
+- **Do not** treat these weights as a drop-in general QA system; they are intended for the s3 pipelines described in the paper and codebase.
+- **To run anything**, please **follow the GitHub instructions** end-to-end (env setup, datasets, evaluation scripts, and RL configs).
+
+
+
+## Intended use & limitations
+
+Research replication, ablations, and educational study of on-policy RL for retrieval-augmented search agents. 
+Commercial or safety-critical use is **not** advised without extensive review and additional safeguards.
+
+## Citation
+```bibtex
+@inproceedings{jiang2025s3,
+  title = {s3: You Don't Need That Much Data to Train a Search Agent via RL},
+  author = {Jiang, Pengcheng and Xu, Xueqiang and Lin, Jiacheng and Xiao, Jinfeng and Wang, Zifeng and Sun, Jimeng and Han, Jiawei},
+  year = {2025},
+  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
+}
+```
+
+
+*Last updated:* 2025-09-29