Files
LiteResearcher-4B/README.md
ModelHub XC 5903a1f96e 初始化项目,由ModelHub XC社区提供模型
Model: simplex-ai-inc/LiteResearcher-4B
Source: Original Platform
2026-04-26 17:30:01 +08:00

156 lines
6.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
language:
- en
- zh
base_model:
- Qwen/Qwen3-4B-Thinking
tags:
- deep-research
- react-agent
- reinforcement-learning
- search-agent
- agentic-rl
pipeline_tag: text-generation
library_name: transformers
---
# LiteResearcher-4B
<p align="center"> <img src="assets/logo.png" alt="LiteResearcher Logo" width="400">
</p>
<p align="center"> <a href="https://simplex-ai-inc.github.io/LiteResearcher/">🌐 Project Page</a>
<a href="https://github.com/simplex-ai-inc/LiteResearcher">💻 Code</a>
<a href="https://arxiv.org/abs/2604.17931">📄 Paper</a>
</p>
**LiteResearcher-4B** is a 4B-parameter deep research agent trained via scalable agentic reinforcement learning. Despite its small size, it matches **Claude-4.5-Sonnet** on GAIA and outperforms open-source models up to **8× larger**.
## Key Results
| Benchmark | LiteResearcher-4B | Notable Comparison |
|---|---|---|
| **GAIA-Text** | **71.3%** | = Claude-4.5-Sonnet (71.2%) |
| **Xbench-DS** | **78.0%** | > Tongyi DeepSearch 30B (75.0%) |
| **Frames** | **83.1%** | > Claude-4-Sonnet (80.7%) |
| **WebWalkerQA** | **72.7%** | > Tongyi DeepSearch 30B (72.2%) |
All with only **4B parameters** — 832× smaller than comparable models.
## Model Details
- **Architecture**: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
- **Parameters**: 4B
- **Max Context**: 262,144 tokens
- **Training**: Two-stage difficulty-aware curriculum RL with virtual world environment
- **Agent Mode**: ReAct-style with `search` and `visit` tools
## How It Works
LiteResearcher operates as a ReAct agent that iteratively:
1. **Thinks** about what information is needed
2. **Searches** the web via Google
3. **Visits** webpages to extract evidence
4. **Answers** when sufficient information is gathered
The model uses `<think>`, `<tool_call>`, and `<answer>` tags to structure its reasoning.
## Quick Start
### With the Inference Framework
```bash
git clone https://github.com/simplex-ai-inc/LiteResearcher.git
cd LiteResearcher
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY
# Start SGLang server
python -m sglang.launch_server \
--model-path simplex-ai-inc/LiteResearcher-4B \
--port 6001 --tp 2
# Run inference
bash scripts/run_all.sh \
--model simplex-ai-inc/LiteResearcher-4B \
--dataset data/example.jsonl
```
### Direct Usage with Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "simplex-ai-inc/LiteResearcher-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
messages = [
{"role": "system", "content": "You are a deep research assistant..."},
{"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
```
## Training
LiteResearcher is trained with a three-component framework:
1. **Co-constructed Training Data & Corpus** — 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
2. **Stable Local Tool Environment** — Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
3. **Difficulty-Aware Curriculum RL** — Multi-stage training that progressively increases task difficulty and context length
## Benchmark Results
LiteResearcher-4B consistently outperforms open-source models up to 8× larger and matches or exceeds proprietary systems across eight benchmarks.
| Model | Size | GAIA | BrowseComp (en) | BrowseComp (zh) | Humanity | Frames | WebWalkerQA | MAIA | Xbench-DS |
|---|---|---|---|---|---|---|---|---|---|
| | | | | **Commercial Models** | | | | | |
| Claude-4-Sonnet | - | 68.3 | 12.2 | 29.1 | 20.3 | 80.7 | 61.7 | - | 64.6 |
| Claude-4.5-Sonnet | - | 71.2 | 19.6 | 40.8 | 24.5 | 85.0 | - | 53.4 | 66.0 |
| DeepSeek-V3.2 | - | 63.5 | 67.6 | 65.0 | 40.8 | 80.2 | - | 38.5 | 71.0 |
| DeepSeek-V3.1 | - | 63.1 | 30.0 | 49.2 | 29.8 | 83.7 | 61.2 | - | 71.0 |
| Minimax-M2 | - | 75.7 | 44.0 | 48.5 | 31.8 | - | - | - | 72.0 |
| OpenAI-GPT-5-high | - | 76.4 | 54.9 | 65.0 | 35.2 | - | - | 51.4 | 77.8 |
| GLM-4.6 | - | 71.9 | 45.1 | 49.5 | 30.4 | - | - | - | 70.0 |
| Kimi-Researcher | - | - | - | - | 26.9 | 78.8 | - | 36.0 | 69.0 |
| Kimi-K2-0905 | - | 60.2 | 7.4 | 22.2 | 21.7 | 58.1 | - | 25.2 | 61.0 |
| | | | | **Open-Source Models** | | | | | |
| Mirothinker | 8B | 66.4 | 31.1 | 40.2 | 21.5 | 80.6 | 60.6 | 40.4 | 60.6 |
| Tongyi DeepSearch | 30B | 70.9 | 43.4 | 46.7 | 32.9 | **90.6** | 72.2 | - | 75.0 |
| ASearcher QWQ v2 | 32B | 58.7 | - | - | - | 74.5 | - | - | 51.1 |
| WebSailor | 30B | 53.2 | - | - | - | - | - | - | 53.3 |
| WebDancer (QwQ) | 32B | 51.5 | 3.8 | 18.0 | - | - | 47.9 | - | 38.3 |
| WebExplorer | 8B | 50.0 | 15.7 | 32.0 | 17.3 | 75.7 | 62.7 | - | 53.7 |
| DeepMiner | 32B | 58.7 | 33.5 | 40.1 | - | - | - | - | 62.0 |
| AFM-RL | 32B | 55.3 | 11.1 | - | 18.0 | - | 63.0 | - | - |
| SFR-DeepResearch | 20B | 66.0 | - | - | 28.7 | 82.8 | - | - | - |
| AgentCPM-Explore | 4B | 63.9 | 24.1 | 29.1 | 19.1 | 82.7 | 68.1 | 40.5 | 70.0 |
| **LiteResearcher** | **4B** | **71.3** | 27.5\* | 32.5\* | 22.0 | 83.1 | **72.7** | **41.8** | **78.0** |
Best open-source results in **bold**. Results with \* use a 64k context window with a memory mechanism.
## Citation
```bibtex
@article{li2026literesearcher,
title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
journal={arXiv preprint arXiv:2604.17931},
year={2026}
}
```
## License
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).