Files
WebExplorer-8B/README.md
2025-09-08 09:37:01 +00:00

4.6 KiB

base_model, language, library_name, license, tags, pipeline_tag
base_model language library_name license tags pipeline_tag
Qwen/Qwen3-8B
en
transformers mit
LLM
text-generation

🔍 WebExplorer-8B

A state-of-the-art 8B parameter web agent model designed for complex information-seeking tasks and long-horizon reasoning.

🌟 Overview

WebExplorer-8B is an advanced web navigation agent trained on WebExplorer-QA. The model demonstrates exceptional performance on challenging information-seeking benchmarks while maintaining efficiency with only 8 billion parameters.

Key Features

  • 🌐 Long-horizon Reasoning: Supports up to 128K context length and 100 tool calling turns
  • 🛠️ Tool Utilization: Masters search and browse functionalities
  • 🏆 State-of-the-art Performance: Achieves best-in-class results among models under 10B parameters

🏗️ Model Architecture

Built on Qwen3-8B base model and trained through a two-phase approach:

  1. Supervised Fine-tuning (SFT): Cold-start initialization with high-quality trajectories
  2. Reinforcement Learning (RL): Enhanced using GRPO algorithm with progressive context expansion

📊 Performance

WebExplorer-8B achieves state-of-the-art performance across multiple information-seeking benchmarks at its scale:

Model BC-en BC-zh GAIA WebWalkerQA FRAMES Xbench-DS HLE
OpenAI-o3† 50.9 58.1 70.5† 71.7 84.0 66.7 20.2
Claude-4-Sonnet† 12.2 29.1 68.3† 61.7 80.7 64.6 20.3
GLM-4.5 26.4 37.5 66.0† 65.6† 78.9† 70.0† 21.2†
DeepSeek-V3.1 30.0 49.2 63.1† 61.2† 83.7 71.2 29.8
Kimi-K2† 14.1 28.8 57.7 63.0 72.0 50.0 18.1
==== ==== ==== ==== ==== ==== ==== ====
WebShaper-72B - - 60.0 52.2 - - -
WebShaper-32B (QwQ) - - 53.3 49.7 - - -
WebShaper-32B - - 52.4 51.4 - - -
WebSailor-72B 12.0 30.1 55.4 - - 55.0 -
WebSailor-32B 10.5 25.5 53.2 - - 53.3 -
WebSailor-7B 6.7 14.2 33.0 - - 34.3 -
ASearcher-Web-QwQ 5.2 15.6 52.8 34.3 70.9 42.1 12.5
WebThinker-32B 2.8 - 48.5 46.5 - - 15.8
MiroThinker-32B-DPO-v0.1 13.0 17.0 57.3 49.3 71.7 - 11.8
MiroThinker-8B-DPO-v0.1 8.7 13.6 46.6 45.7 64.4 - -
WebExplorer-8B (SFT) 7.9 21.3 43.7 59.8 72.6 47.5 16.0
WebExplorer-8B (RL) 15.7 32.0 50.0 62.7 75.7 53.7 17.3

Accuracy (%) of web agents on information-seeking benchmarks. BC-en and BC-zh denote BrowseComp-en and BrowseComp-zh respectively. XBench-DS refers to XBench-DeepSearch. Bold indicates the best performance among open-source models < 100B, while underlined values represent the best performance among models < 10B parameters. All scores of WebExplorer-8B are computed as Avg@4 using LLM-as-Judge. Entries marked with a dagger (†) were reproduced by us under our scaffold: on model name = entire row; on a number = that entry only.

🛠️ Tool Schema

WebExplorer-8B supports two tools for web interaction:

1. Browse Tool

{
    "name": "browse",
    "type": "function",
    "description": "Extract specific information from a webpage",
    "parameters": {
        "type": "object",
        "properties": {
            "url": {
                "type": "string",
                "description": "Target URL to browse. The webpage content will be processed by the LLM for information extraction."
            },
            "query": {
                "type": "string",
                "description": "Specific query about the webpage content. The LLM will analyze the content to answer this query."
            }
        },
        "required": ["url", "query"]
    }
}

2. Search Tool

{
    "name": "search",
    "type": "function",
    "description": "Perform web search queries",
    "parameters": {
        "type": "object",
        "properties": {
            "queries": {
                "type": "array",
                "items": {
                    "type": "string"
                },
                "description": "List of search queries. Returns search results containing title, URL, and snippet for each query."
            }
        },
        "required": ["queries"]
    }
}