Files
WideSeek-R1-4b/README.md
ModelHub XC 3bc9eed457 初始化项目,由ModelHub XC社区提供模型
Model: RLinf/WideSeek-R1-4b
Source: Original Platform
2026-05-03 14:35:40 +08:00

2.7 KiB

license, language, base_model, library_name, pipeline_tag, tags, metrics, model-index
license language base_model library_name pipeline_tag tags metrics model-index
apache-2.0
en
Qwen/Qwen3-4B
transformers text-generation
arxiv:2602.04634
accuracy
name results
WideSeek-R1-4B
task dataset metrics
type
WideSearch
type name
WideSearch WideSearch
type value
accuracy 40.0

WideSeek-R1-4B

Overview

image

Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability.

In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks.

Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.

For more details, see our project page

Citation

If you use this model in your research, please cite our paper:

@article{xu2026wideseek,
  title   = {WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning},
  author  = {Xu, Zelai and Xu, Zhexuan and Zhang, Ruize and Zhu, Chunyang and Yu, Shi and Liu, Weilin and Zhang, Quanlu and Ding, Wenbo and Yu, Chao and Wang, Yu},
  journal = {arXiv preprint arXiv:2602.04634},
  year    = {2026},
}