初始化项目,由ModelHub XC社区提供模型
Model: pat-jj/s3-8-3-3-20steps Source: Original Platform
This commit is contained in:
52
README.md
Normal file
52
README.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
library_name: transformers
|
||||
tags:
|
||||
- reinforcement-learning
|
||||
- retrieval
|
||||
- search-agent
|
||||
- s3
|
||||
- on-policy
|
||||
- research
|
||||
model-index:
|
||||
- name: s3-8-3-3-20steps
|
||||
results: []
|
||||
---
|
||||
|
||||
# s3-8-3-3-20steps
|
||||
|
||||
**s3** is a reinforcement-learning–trained search agent that learns to plan retrieval and answer questions efficiently.
|
||||
This release provides weights **for research replication** only. For usage, training, and evaluation **follow our GitHub repo** (we intentionally do not include inference snippets here).
|
||||
|
||||
- 📄 Reference: *“s3: You Don’t Need That Much Data to Train a Search Agent via RL”* (EMNLP 2025 Main).
|
||||
- 🧑💻 GitHub: https://github.com/pat-jj/s3
|
||||
|
||||
## What is in this repo?
|
||||
|
||||
A Hugging Face model folder with tokenizer files and sharded `*.safetensors` checkpoints exported from our VERL training runs (the “actor” policy).
|
||||
File layout mirrors the training outputs (e.g., `config.json`, `tokenizer.json`, and `model-00001-of-00004.safetensors`, etc.).
|
||||
|
||||
## Important notes
|
||||
|
||||
- **We highly recommend training the model yourself via the GitHub repo.** In our experience, **testing/inference time can be much much heavier than training time**.
|
||||
- **Do not** treat these weights as a drop-in general QA system; they are intended for the s3 pipelines described in the paper and codebase.
|
||||
- **To run anything**, please **follow the GitHub instructions** end-to-end (env setup, datasets, evaluation scripts, and RL configs).
|
||||
|
||||
|
||||
|
||||
## Intended use & limitations
|
||||
|
||||
Research replication, ablations, and educational study of on-policy RL for retrieval-augmented search agents.
|
||||
Commercial or safety-critical use is **not** advised without extensive review and additional safeguards.
|
||||
|
||||
## Citation
|
||||
```bibtex
|
||||
@inproceedings{jiang2025s3,
|
||||
title = {s3: You Don't Need That Much Data to Train a Search Agent via RL},
|
||||
author = {Jiang, Pengcheng and Xu, Xueqiang and Lin, Jiacheng and Xiao, Jinfeng and Wang, Zifeng and Sun, Jimeng and Han, Jiawei},
|
||||
year = {2025},
|
||||
booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
*Last updated:* 2025-09-29
|
||||
Reference in New Issue
Block a user