--- license: apache-2.0 language: - en - zh base_model: - Qwen/Qwen3-4B-Thinking tags: - deep-research - react-agent - reinforcement-learning - search-agent - agentic-rl pipeline_tag: text-generation library_name: transformers --- # LiteResearcher-4B

LiteResearcher Logo

🌐 Project Page💻 Code📄 Paper

**LiteResearcher-4B** is a 4B-parameter deep research agent trained via scalable agentic reinforcement learning. Despite its small size, it matches **Claude-4.5-Sonnet** on GAIA and outperforms open-source models up to **8× larger**. ## Key Results | Benchmark | LiteResearcher-4B | Notable Comparison | |---|---|---| | **GAIA-Text** | **71.3%** | = Claude-4.5-Sonnet (71.2%) | | **Xbench-DS** | **78.0%** | > Tongyi DeepSearch 30B (75.0%) | | **Frames** | **83.1%** | > Claude-4-Sonnet (80.7%) | | **WebWalkerQA** | **72.7%** | > Tongyi DeepSearch 30B (72.2%) | All with only **4B parameters** — 8–32× smaller than comparable models. ## Model Details - **Architecture**: Qwen3ForCausalLM (Qwen3-4B-Thinking base) - **Parameters**: 4B - **Max Context**: 262,144 tokens - **Training**: Two-stage difficulty-aware curriculum RL with virtual world environment - **Agent Mode**: ReAct-style with `search` and `visit` tools ## How It Works LiteResearcher operates as a ReAct agent that iteratively: 1. **Thinks** about what information is needed 2. **Searches** the web via Google 3. **Visits** webpages to extract evidence 4. **Answers** when sufficient information is gathered The model uses ``, ``, and `` tags to structure its reasoning. ## Quick Start ### With the Inference Framework ```bash git clone https://github.com/simplex-ai-inc/LiteResearcher.git cd LiteResearcher pip install -r requirements.txt # Configure API keys cp .env.example .env # Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY # Start SGLang server python -m sglang.launch_server \ --model-path simplex-ai-inc/LiteResearcher-4B \ --port 6001 --tp 2 # Run inference bash scripts/run_all.sh \ --model simplex-ai-inc/LiteResearcher-4B \ --dataset data/example.jsonl ``` ### Direct Usage with Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "simplex-ai-inc/LiteResearcher-4B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") messages = [ {"role": "system", "content": "You are a deep research assistant..."}, {"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer([text], return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95) print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)) ``` ## Training LiteResearcher is trained with a three-component framework: 1. **Co-constructed Training Data & Corpus** — 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics) 2. **Stable Local Tool Environment** — Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost 3. **Difficulty-Aware Curriculum RL** — Multi-stage training that progressively increases task difficulty and context length ## Benchmark Results LiteResearcher-4B consistently outperforms open-source models up to 8× larger and matches or exceeds proprietary systems across eight benchmarks. | Model | Size | GAIA | BrowseComp (en) | BrowseComp (zh) | Humanity | Frames | WebWalkerQA | MAIA | Xbench-DS | |---|---|---|---|---|---|---|---|---|---| | | | | | **Commercial Models** | | | | | | | Claude-4-Sonnet | - | 68.3 | 12.2 | 29.1 | 20.3 | 80.7 | 61.7 | - | 64.6 | | Claude-4.5-Sonnet | - | 71.2 | 19.6 | 40.8 | 24.5 | 85.0 | - | 53.4 | 66.0 | | DeepSeek-V3.2 | - | 63.5 | 67.6 | 65.0 | 40.8 | 80.2 | - | 38.5 | 71.0 | | DeepSeek-V3.1 | - | 63.1 | 30.0 | 49.2 | 29.8 | 83.7 | 61.2 | - | 71.0 | | Minimax-M2 | - | 75.7 | 44.0 | 48.5 | 31.8 | - | - | - | 72.0 | | OpenAI-GPT-5-high | - | 76.4 | 54.9 | 65.0 | 35.2 | - | - | 51.4 | 77.8 | | GLM-4.6 | - | 71.9 | 45.1 | 49.5 | 30.4 | - | - | - | 70.0 | | Kimi-Researcher | - | - | - | - | 26.9 | 78.8 | - | 36.0 | 69.0 | | Kimi-K2-0905 | - | 60.2 | 7.4 | 22.2 | 21.7 | 58.1 | - | 25.2 | 61.0 | | | | | | **Open-Source Models** | | | | | | | Mirothinker | 8B | 66.4 | 31.1 | 40.2 | 21.5 | 80.6 | 60.6 | 40.4 | 60.6 | | Tongyi DeepSearch | 30B | 70.9 | 43.4 | 46.7 | 32.9 | **90.6** | 72.2 | - | 75.0 | | ASearcher QWQ v2 | 32B | 58.7 | - | - | - | 74.5 | - | - | 51.1 | | WebSailor | 30B | 53.2 | - | - | - | - | - | - | 53.3 | | WebDancer (QwQ) | 32B | 51.5 | 3.8 | 18.0 | - | - | 47.9 | - | 38.3 | | WebExplorer | 8B | 50.0 | 15.7 | 32.0 | 17.3 | 75.7 | 62.7 | - | 53.7 | | DeepMiner | 32B | 58.7 | 33.5 | 40.1 | - | - | - | - | 62.0 | | AFM-RL | 32B | 55.3 | 11.1 | - | 18.0 | - | 63.0 | - | - | | SFR-DeepResearch | 20B | 66.0 | - | - | 28.7 | 82.8 | - | - | - | | AgentCPM-Explore | 4B | 63.9 | 24.1 | 29.1 | 19.1 | 82.7 | 68.1 | 40.5 | 70.0 | | **LiteResearcher** | **4B** | **71.3** | 27.5\* | 32.5\* | 22.0 | 83.1 | **72.7** | **41.8** | **78.0** | Best open-source results in **bold**. Results with \* use a 64k context window with a memory mechanism. ## Citation ```bibtex @article{li2026literesearcher, title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent}, author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang}, journal={arXiv preprint arXiv:2604.17931}, year={2026} } ``` ## License This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).