156 lines
6.1 KiB
Markdown
156 lines
6.1 KiB
Markdown
---
|
||
license: apache-2.0
|
||
language:
|
||
- en
|
||
- zh
|
||
base_model:
|
||
- Qwen/Qwen3-4B-Thinking
|
||
tags:
|
||
- deep-research
|
||
- react-agent
|
||
- reinforcement-learning
|
||
- search-agent
|
||
- agentic-rl
|
||
pipeline_tag: text-generation
|
||
library_name: transformers
|
||
---
|
||
|
||
# LiteResearcher-4B
|
||
|
||
<p align="center"> <img src="assets/logo.png" alt="LiteResearcher Logo" width="400">
|
||
</p>
|
||
|
||
<p align="center"> <a href="https://simplex-ai-inc.github.io/LiteResearcher/">🌐 Project Page</a> •
|
||
<a href="https://github.com/simplex-ai-inc/LiteResearcher">💻 Code</a> •
|
||
<a href="https://arxiv.org/abs/2604.17931">📄 Paper</a>
|
||
</p>
|
||
|
||
**LiteResearcher-4B** is a 4B-parameter deep research agent trained via scalable agentic reinforcement learning. Despite its small size, it matches **Claude-4.5-Sonnet** on GAIA and outperforms open-source models up to **8× larger**.
|
||
|
||
## Key Results
|
||
|
||
| Benchmark | LiteResearcher-4B | Notable Comparison |
|
||
|---|---|---|
|
||
| **GAIA-Text** | **71.3%** | = Claude-4.5-Sonnet (71.2%) |
|
||
| **Xbench-DS** | **78.0%** | > Tongyi DeepSearch 30B (75.0%) |
|
||
| **Frames** | **83.1%** | > Claude-4-Sonnet (80.7%) |
|
||
| **WebWalkerQA** | **72.7%** | > Tongyi DeepSearch 30B (72.2%) |
|
||
|
||
All with only **4B parameters** — 8–32× smaller than comparable models.
|
||
|
||
## Model Details
|
||
|
||
- **Architecture**: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
|
||
- **Parameters**: 4B
|
||
- **Max Context**: 262,144 tokens
|
||
- **Training**: Two-stage difficulty-aware curriculum RL with virtual world environment
|
||
- **Agent Mode**: ReAct-style with `search` and `visit` tools
|
||
|
||
## How It Works
|
||
|
||
LiteResearcher operates as a ReAct agent that iteratively:
|
||
1. **Thinks** about what information is needed
|
||
2. **Searches** the web via Google
|
||
3. **Visits** webpages to extract evidence
|
||
4. **Answers** when sufficient information is gathered
|
||
|
||
The model uses `<think>`, `<tool_call>`, and `<answer>` tags to structure its reasoning.
|
||
|
||
## Quick Start
|
||
|
||
### With the Inference Framework
|
||
|
||
```bash
|
||
git clone https://github.com/simplex-ai-inc/LiteResearcher.git
|
||
cd LiteResearcher
|
||
pip install -r requirements.txt
|
||
|
||
# Configure API keys
|
||
cp .env.example .env
|
||
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY
|
||
|
||
# Start SGLang server
|
||
python -m sglang.launch_server \
|
||
--model-path simplex-ai-inc/LiteResearcher-4B \
|
||
--port 6001 --tp 2
|
||
|
||
# Run inference
|
||
bash scripts/run_all.sh \
|
||
--model simplex-ai-inc/LiteResearcher-4B \
|
||
--dataset data/example.jsonl
|
||
```
|
||
|
||
### Direct Usage with Transformers
|
||
|
||
```python
|
||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
|
||
model_name = "simplex-ai-inc/LiteResearcher-4B"
|
||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
|
||
|
||
messages = [
|
||
{"role": "system", "content": "You are a deep research assistant..."},
|
||
{"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
|
||
]
|
||
|
||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||
inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
||
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
|
||
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
|
||
```
|
||
|
||
## Training
|
||
|
||
LiteResearcher is trained with a three-component framework:
|
||
|
||
1. **Co-constructed Training Data & Corpus** — 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
|
||
2. **Stable Local Tool Environment** — Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
|
||
3. **Difficulty-Aware Curriculum RL** — Multi-stage training that progressively increases task difficulty and context length
|
||
|
||
## Benchmark Results
|
||
|
||
LiteResearcher-4B consistently outperforms open-source models up to 8× larger and matches or exceeds proprietary systems across eight benchmarks.
|
||
|
||
| Model | Size | GAIA | BrowseComp (en) | BrowseComp (zh) | Humanity | Frames | WebWalkerQA | MAIA | Xbench-DS |
|
||
|---|---|---|---|---|---|---|---|---|---|
|
||
| | | | | **Commercial Models** | | | | | |
|
||
| Claude-4-Sonnet | - | 68.3 | 12.2 | 29.1 | 20.3 | 80.7 | 61.7 | - | 64.6 |
|
||
| Claude-4.5-Sonnet | - | 71.2 | 19.6 | 40.8 | 24.5 | 85.0 | - | 53.4 | 66.0 |
|
||
| DeepSeek-V3.2 | - | 63.5 | 67.6 | 65.0 | 40.8 | 80.2 | - | 38.5 | 71.0 |
|
||
| DeepSeek-V3.1 | - | 63.1 | 30.0 | 49.2 | 29.8 | 83.7 | 61.2 | - | 71.0 |
|
||
| Minimax-M2 | - | 75.7 | 44.0 | 48.5 | 31.8 | - | - | - | 72.0 |
|
||
| OpenAI-GPT-5-high | - | 76.4 | 54.9 | 65.0 | 35.2 | - | - | 51.4 | 77.8 |
|
||
| GLM-4.6 | - | 71.9 | 45.1 | 49.5 | 30.4 | - | - | - | 70.0 |
|
||
| Kimi-Researcher | - | - | - | - | 26.9 | 78.8 | - | 36.0 | 69.0 |
|
||
| Kimi-K2-0905 | - | 60.2 | 7.4 | 22.2 | 21.7 | 58.1 | - | 25.2 | 61.0 |
|
||
| | | | | **Open-Source Models** | | | | | |
|
||
| Mirothinker | 8B | 66.4 | 31.1 | 40.2 | 21.5 | 80.6 | 60.6 | 40.4 | 60.6 |
|
||
| Tongyi DeepSearch | 30B | 70.9 | 43.4 | 46.7 | 32.9 | **90.6** | 72.2 | - | 75.0 |
|
||
| ASearcher QWQ v2 | 32B | 58.7 | - | - | - | 74.5 | - | - | 51.1 |
|
||
| WebSailor | 30B | 53.2 | - | - | - | - | - | - | 53.3 |
|
||
| WebDancer (QwQ) | 32B | 51.5 | 3.8 | 18.0 | - | - | 47.9 | - | 38.3 |
|
||
| WebExplorer | 8B | 50.0 | 15.7 | 32.0 | 17.3 | 75.7 | 62.7 | - | 53.7 |
|
||
| DeepMiner | 32B | 58.7 | 33.5 | 40.1 | - | - | - | - | 62.0 |
|
||
| AFM-RL | 32B | 55.3 | 11.1 | - | 18.0 | - | 63.0 | - | - |
|
||
| SFR-DeepResearch | 20B | 66.0 | - | - | 28.7 | 82.8 | - | - | - |
|
||
| AgentCPM-Explore | 4B | 63.9 | 24.1 | 29.1 | 19.1 | 82.7 | 68.1 | 40.5 | 70.0 |
|
||
| **LiteResearcher** | **4B** | **71.3** | 27.5\* | 32.5\* | 22.0 | 83.1 | **72.7** | **41.8** | **78.0** |
|
||
|
||
Best open-source results in **bold**. Results with \* use a 64k context window with a memory mechanism.
|
||
|
||
## Citation
|
||
|
||
```bibtex
|
||
@article{li2026literesearcher,
|
||
title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
|
||
author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
|
||
journal={arXiv preprint arXiv:2604.17931},
|
||
year={2026}
|
||
}
|
||
```
|
||
|
||
## License
|
||
|
||
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
|