LiteResearcher-4B/README.md

---
license: apache-2.0
language:
- en
- zh
base_model:
- Qwen/Qwen3-4B-Thinking
tags:
- deep-research
- react-agent
- reinforcement-learning
- search-agent
- agentic-rl
pipeline_tag: text-generation
library_name: transformers
---

# LiteResearcher-4B

<p align="center">  <img src="assets/logo.png" alt="LiteResearcher Logo" width="400">
</p>

<p align="center">  <a href="https://simplex-ai-inc.github.io/LiteResearcher/">🌐 Project Page</a> •
  <a href="https://github.com/simplex-ai-inc/LiteResearcher">💻 Code</a> •
  <a href="https://arxiv.org/abs/2604.17931">📄 Paper</a>
</p>

**LiteResearcher-4B** is a 4B-parameter deep research agent trained via scalable agentic reinforcement learning. Despite its small size, it matches **Claude-4.5-Sonnet** on GAIA and outperforms open-source models up to **8× larger**.

## Key Results

| Benchmark | LiteResearcher-4B | Notable Comparison |
|---|---|---|
| **GAIA-Text** | **71.3%** | = Claude-4.5-Sonnet (71.2%) |
| **Xbench-DS** | **78.0%** | > Tongyi DeepSearch 30B (75.0%) |
| **Frames** | **83.1%** | > Claude-4-Sonnet (80.7%) |
| **WebWalkerQA** | **72.7%** | > Tongyi DeepSearch 30B (72.2%) |

All with only **4B parameters** — 8–32× smaller than comparable models.

## Model Details

- **Architecture**: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
- **Parameters**: 4B
- **Max Context**: 262,144 tokens
- **Training**: Two-stage difficulty-aware curriculum RL with virtual world environment
- **Agent Mode**: ReAct-style with `search` and `visit` tools

## How It Works

LiteResearcher operates as a ReAct agent that iteratively:
1. **Thinks** about what information is needed
2. **Searches** the web via Google
3. **Visits** webpages to extract evidence
4. **Answers** when sufficient information is gathered

The model uses `<think>`, `<tool_call>`, and `<answer>` tags to structure its reasoning.

## Quick Start

### With the Inference Framework

```bash
git clone https://github.com/simplex-ai-inc/LiteResearcher.git
cd LiteResearcher
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY

# Start SGLang server
python -m sglang.launch_server \
    --model-path simplex-ai-inc/LiteResearcher-4B \
    --port 6001 --tp 2

# Run inference
bash scripts/run_all.sh \
    --model simplex-ai-inc/LiteResearcher-4B \
    --dataset data/example.jsonl
```

### Direct Usage with Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "simplex-ai-inc/LiteResearcher-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a deep research assistant..."},
    {"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
```

## Training

LiteResearcher is trained with a three-component framework:

1. **Co-constructed Training Data & Corpus** — 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
2. **Stable Local Tool Environment** — Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
3. **Difficulty-Aware Curriculum RL** — Multi-stage training that progressively increases task difficulty and context length

## Benchmark Results

LiteResearcher-4B consistently outperforms open-source models up to 8× larger and matches or exceeds proprietary systems across eight benchmarks.

| Model | Size | GAIA | BrowseComp (en) | BrowseComp (zh) | Humanity | Frames | WebWalkerQA | MAIA | Xbench-DS |
|---|---|---|---|---|---|---|---|---|---|
| | | | | **Commercial Models** | | | | | |
| Claude-4-Sonnet | - | 68.3 | 12.2 | 29.1 | 20.3 | 80.7 | 61.7 | - | 64.6 |
| Claude-4.5-Sonnet | - | 71.2 | 19.6 | 40.8 | 24.5 | 85.0 | - | 53.4 | 66.0 |
| DeepSeek-V3.2 | - | 63.5 | 67.6 | 65.0 | 40.8 | 80.2 | - | 38.5 | 71.0 |
| DeepSeek-V3.1 | - | 63.1 | 30.0 | 49.2 | 29.8 | 83.7 | 61.2 | - | 71.0 |
| Minimax-M2 | - | 75.7 | 44.0 | 48.5 | 31.8 | - | - | - | 72.0 |
| OpenAI-GPT-5-high | - | 76.4 | 54.9 | 65.0 | 35.2 | - | - | 51.4 | 77.8 |
| GLM-4.6 | - | 71.9 | 45.1 | 49.5 | 30.4 | - | - | - | 70.0 |
| Kimi-Researcher | - | - | - | - | 26.9 | 78.8 | - | 36.0 | 69.0 |
| Kimi-K2-0905 | - | 60.2 | 7.4 | 22.2 | 21.7 | 58.1 | - | 25.2 | 61.0 |
| | | | | **Open-Source Models** | | | | | |
| Mirothinker | 8B | 66.4 | 31.1 | 40.2 | 21.5 | 80.6 | 60.6 | 40.4 | 60.6 |
| Tongyi DeepSearch | 30B | 70.9 | 43.4 | 46.7 | 32.9 | **90.6** | 72.2 | - | 75.0 |
| ASearcher QWQ v2 | 32B | 58.7 | - | - | - | 74.5 | - | - | 51.1 |
| WebSailor | 30B | 53.2 | - | - | - | - | - | - | 53.3 |
| WebDancer (QwQ) | 32B | 51.5 | 3.8 | 18.0 | - | - | 47.9 | - | 38.3 |
| WebExplorer | 8B | 50.0 | 15.7 | 32.0 | 17.3 | 75.7 | 62.7 | - | 53.7 |
| DeepMiner | 32B | 58.7 | 33.5 | 40.1 | - | - | - | - | 62.0 |
| AFM-RL | 32B | 55.3 | 11.1 | - | 18.0 | - | 63.0 | - | - |
| SFR-DeepResearch | 20B | 66.0 | - | - | 28.7 | 82.8 | - | - | - |
| AgentCPM-Explore | 4B | 63.9 | 24.1 | 29.1 | 19.1 | 82.7 | 68.1 | 40.5 | 70.0 |
| **LiteResearcher** | **4B** | **71.3** | 27.5\* | 32.5\* | 22.0 | 83.1 | **72.7** | **41.8** | **78.0** |

Best open-source results in **bold**. Results with \* use a 64k context window with a memory mechanism.

## Citation

```bibtex
@article{li2026literesearcher,
  title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
  author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
  journal={arXiv preprint arXiv:2604.17931},
  year={2026}
}
```

## License

This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).