s3-8-3-3-20steps

Go to file

ModelHub XC 93149644d8 初始化项目，由ModelHub XC社区提供模型

Model: pat-jj/s3-8-3-3-20steps
Source: Original Platform

2026-04-22 10:37:50 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

added_tokens.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

generation_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

merges.txt

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

special_tokens_map.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

vocab.json

初始化项目，由ModelHub XC社区提供模型

2026-04-22 10:37:50 +08:00

README.md

library_name, tags, model-index

library_name

What is in this repo?

A Hugging Face model folder with tokenizer files and sharded *.safetensors checkpoints exported from our VERL training runs (the “actor” policy). File layout mirrors the training outputs (e.g., config.json, tokenizer.json, and model-00001-of-00004.safetensors, etc.).

Important notes

We highly recommend training the model yourself via the GitHub repo. In our experience, testing/inference time can be much much heavier than training time.
Do not treat these weights as a drop-in general QA system; they are intended for the s3 pipelines described in the paper and codebase.
To run anything, please follow the GitHub instructions end-to-end (env setup, datasets, evaluation scripts, and RL configs).

Intended use & limitations

Research replication, ablations, and educational study of on-policy RL for retrieval-augmented search agents. Commercial or safety-critical use is not advised without extensive review and additional safeguards.

Citation

@inproceedings{jiang2025s3,
  title = {s3: You Don't Need That Much Data to Train a Search Agent via RL},
  author = {Jiang, Pengcheng and Xu, Xueqiang and Lin, Jiacheng and Xiao, Jinfeng and Wang, Zifeng and Sun, Jimeng and Han, Jiawei},
  year = {2025},
  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
}

Last updated: 2025-09-29

README.md Unescape Escape

s3-8-3-3-20steps

What is in this repo?

Important notes

Intended use & limitations

Citation

README.md