4.6 KiB
base_model, language, library_name, license, tags, pipeline_tag
| base_model | language | library_name | license | tags | pipeline_tag | |||
|---|---|---|---|---|---|---|---|---|
|
|
transformers | mit |
|
text-generation |
🔍 WebExplorer-8B
A state-of-the-art 8B parameter web agent model designed for complex information-seeking tasks and long-horizon reasoning.
🌟 Overview
WebExplorer-8B is an advanced web navigation agent trained on WebExplorer-QA. The model demonstrates exceptional performance on challenging information-seeking benchmarks while maintaining efficiency with only 8 billion parameters.
✨ Key Features
- 🌐 Long-horizon Reasoning: Supports up to 128K context length and 100 tool calling turns
- 🛠️ Tool Utilization: Masters search and browse functionalities
- 🏆 State-of-the-art Performance: Achieves best-in-class results among models under 10B parameters
🏗️ Model Architecture
Built on Qwen3-8B base model and trained through a two-phase approach:
- Supervised Fine-tuning (SFT): Cold-start initialization with high-quality trajectories
- Reinforcement Learning (RL): Enhanced using GRPO algorithm with progressive context expansion
📊 Performance
WebExplorer-8B achieves state-of-the-art performance across multiple information-seeking benchmarks at its scale:
| Model | BC-en | BC-zh | GAIA | WebWalkerQA | FRAMES | Xbench-DS | HLE |
|---|---|---|---|---|---|---|---|
| OpenAI-o3† | 50.9 | 58.1 | 70.5† | 71.7 | 84.0 | 66.7 | 20.2 |
| Claude-4-Sonnet† | 12.2 | 29.1 | 68.3† | 61.7 | 80.7 | 64.6 | 20.3 |
| GLM-4.5 | 26.4 | 37.5 | 66.0† | 65.6† | 78.9† | 70.0† | 21.2† |
| DeepSeek-V3.1 | 30.0 | 49.2 | 63.1† | 61.2† | 83.7 | 71.2 | 29.8 |
| Kimi-K2† | 14.1 | 28.8 | 57.7 | 63.0 | 72.0 | 50.0 | 18.1 |
| ==== | ==== | ==== | ==== | ==== | ==== | ==== | ==== |
| WebShaper-72B | - | - | 60.0 | 52.2 | - | - | - |
| WebShaper-32B (QwQ) | - | - | 53.3 | 49.7 | - | - | - |
| WebShaper-32B | - | - | 52.4 | 51.4 | - | - | - |
| WebSailor-72B | 12.0 | 30.1 | 55.4 | - | - | 55.0 | - |
| WebSailor-32B | 10.5 | 25.5 | 53.2 | - | - | 53.3 | - |
| WebSailor-7B | 6.7 | 14.2 | 33.0 | - | - | 34.3 | - |
| ASearcher-Web-QwQ | 5.2 | 15.6 | 52.8 | 34.3 | 70.9 | 42.1 | 12.5 |
| WebThinker-32B | 2.8 | - | 48.5 | 46.5 | - | - | 15.8 |
| MiroThinker-32B-DPO-v0.1 | 13.0 | 17.0 | 57.3 | 49.3 | 71.7 | - | 11.8 |
| MiroThinker-8B-DPO-v0.1 | 8.7 | 13.6 | 46.6 | 45.7 | 64.4 | - | - |
| WebExplorer-8B (SFT) | 7.9 | 21.3 | 43.7 | 59.8 | 72.6 | 47.5 | 16.0 |
| WebExplorer-8B (RL) | 15.7 | 32.0 | 50.0 | 62.7 | 75.7 | 53.7 | 17.3 |
Accuracy (%) of web agents on information-seeking benchmarks. BC-en and BC-zh denote BrowseComp-en and BrowseComp-zh respectively. XBench-DS refers to XBench-DeepSearch. Bold indicates the best performance among open-source models < 100B, while underlined values represent the best performance among models < 10B parameters. All scores of WebExplorer-8B are computed as Avg@4 using LLM-as-Judge. Entries marked with a dagger (†) were reproduced by us under our scaffold: on model name = entire row; on a number = that entry only.
🛠️ Tool Schema
WebExplorer-8B supports two tools for web interaction:
1. Browse Tool
{
"name": "browse",
"type": "function",
"description": "Extract specific information from a webpage",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "Target URL to browse. The webpage content will be processed by the LLM for information extraction."
},
"query": {
"type": "string",
"description": "Specific query about the webpage content. The LLM will analyze the content to answer this query."
}
},
"required": ["url", "query"]
}
}
2. Search Tool
{
"name": "search",
"type": "function",
"description": "Perform web search queries",
"parameters": {
"type": "object",
"properties": {
"queries": {
"type": "array",
"items": {
"type": "string"
},
"description": "List of search queries. Returns search results containing title, URL, and snippet for each query."
}
},
"required": ["queries"]
}
}