🔍 WebExplorer-8B

A state-of-the-art 8B parameter web agent model designed for complex information-seeking tasks and long-horizon reasoning.

🌟 Overview

WebExplorer-8B is an advanced web navigation agent trained on WebExplorer-QA. The model demonstrates exceptional performance on challenging information-seeking benchmarks while maintaining efficiency with only 8 billion parameters.

✨ Key Features

🌐 Long-horizon Reasoning: Supports up to 128K context length and 100 tool calling turns
🛠️ Tool Utilization: Masters search and browse functionalities
🏆 State-of-the-art Performance: Achieves best-in-class results among models under 10B parameters

🏗️ Model Architecture

Built on Qwen3-8B base model and trained through a two-phase approach:

Supervised Fine-tuning (SFT): Cold-start initialization with high-quality trajectories
Reinforcement Learning (RL): Enhanced using GRPO algorithm with progressive context expansion

📊 Performance

WebExplorer-8B achieves state-of-the-art performance across multiple information-seeking benchmarks at its scale:

Model	BC-en	BC-zh	GAIA	WebWalkerQA	FRAMES	Xbench-DS	HLE
OpenAI-o3†	50.9	58.1	70.5†	71.7	84.0	66.7	20.2
Claude-4-Sonnet†	12.2	29.1	68.3†	61.7	80.7	64.6	20.3
GLM-4.5	26.4	37.5	66.0†	65.6†	78.9†	70.0†	21.2†
DeepSeek-V3.1	30.0	49.2	63.1†	61.2†	83.7	71.2	29.8
Kimi-K2†	14.1	28.8	57.7	63.0	72.0	50.0	18.1
====	====	====	====	====	====	====	====
WebShaper-72B	-	-	60.0	52.2	-	-	-
WebShaper-32B (QwQ)	-	-	53.3	49.7	-	-	-
WebShaper-32B	-	-	52.4	51.4	-	-	-
WebSailor-72B	12.0	30.1	55.4	-	-	55.0	-
WebSailor-32B	10.5	25.5	53.2	-	-	53.3	-
WebSailor-7B	6.7	14.2	33.0	-	-	34.3	-
ASearcher-Web-QwQ	5.2	15.6	52.8	34.3	70.9	42.1	12.5
WebThinker-32B	2.8	-	48.5	46.5	-	-	15.8
MiroThinker-32B-DPO-v0.1	13.0	17.0	57.3	49.3	71.7	-	11.8
MiroThinker-8B-DPO-v0.1	8.7	13.6	46.6	45.7	64.4	-	-
WebExplorer-8B (SFT)	7.9	21.3	43.7	59.8	72.6	47.5	16.0
WebExplorer-8B (RL)	15.7	32.0	50.0	62.7	75.7	53.7	17.3

Accuracy (%) of web agents on information-seeking benchmarks. BC-en and BC-zh denote BrowseComp-en and BrowseComp-zh respectively. XBench-DS refers to XBench-DeepSearch. Bold indicates the best performance among open-source models < 100B, while underlined values represent the best performance among models < 10B parameters. All scores of WebExplorer-8B are computed as Avg@4 using LLM-as-Judge. Entries marked with a dagger (†) were reproduced by us under our scaffold: on model name = entire row; on a number = that entry only.

🛠️ Tool Schema

WebExplorer-8B supports two tools for web interaction:

1. Browse Tool

{
    "name": "browse",
    "type": "function",
    "description": "Extract specific information from a webpage",
    "parameters": {
        "type": "object",
        "properties": {
            "url": {
                "type": "string",
                "description": "Target URL to browse. The webpage content will be processed by the LLM for information extraction."
            },
            "query": {
                "type": "string",
                "description": "Specific query about the webpage content. The LLM will analyze the content to answer this query."
            }
        },
        "required": ["url", "query"]
    }
}

2. Search Tool

{
    "name": "search",
    "type": "function",
    "description": "Perform web search queries",
    "parameters": {
        "type": "object",
        "properties": {
            "queries": {
                "type": "array",
                "items": {
                    "type": "string"
                },
                "description": "List of search queries. Returns search results containing title, URL, and snippet for each query."
            }
        },
        "required": ["queries"]
    }
}

4.6 KiB Raw Blame History