初始化项目,由ModelHub XC社区提供模型
Model: inclusionAI/DR-Venus-4B-RL-GGUF Source: Original Platform
This commit is contained in:
40
.gitattributes
vendored
Normal file
40
.gitattributes
vendored
Normal file
@@ -0,0 +1,40 @@
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.arrow filter=lfs diff=lfs merge=lfs -text
|
||||
*.bin filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
||||
*.ftz filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.h5 filter=lfs diff=lfs merge=lfs -text
|
||||
*.joblib filter=lfs diff=lfs merge=lfs -text
|
||||
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
||||
*.model filter=lfs diff=lfs merge=lfs -text
|
||||
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
||||
*.npy filter=lfs diff=lfs merge=lfs -text
|
||||
*.npz filter=lfs diff=lfs merge=lfs -text
|
||||
*.onnx filter=lfs diff=lfs merge=lfs -text
|
||||
*.ot filter=lfs diff=lfs merge=lfs -text
|
||||
*.parquet filter=lfs diff=lfs merge=lfs -text
|
||||
*.pb filter=lfs diff=lfs merge=lfs -text
|
||||
*.pickle filter=lfs diff=lfs merge=lfs -text
|
||||
*.pkl filter=lfs diff=lfs merge=lfs -text
|
||||
*.pt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pth filter=lfs diff=lfs merge=lfs -text
|
||||
*.rar filter=lfs diff=lfs merge=lfs -text
|
||||
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
||||
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
||||
*.tar filter=lfs diff=lfs merge=lfs -text
|
||||
*.tflite filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.wasm filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
||||
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
||||
DR-Venus-4B-RL.F16.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
DR-Venus-4B-RL.Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
DR-Venus-4B-RL.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
DR-Venus-4B-RL.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
DR-Venus-4B-RL.Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
|
||||
3
DR-Venus-4B-RL.F16.gguf
Normal file
3
DR-Venus-4B-RL.F16.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f651064e02a019d3d860c08bbc956f5bd65d2137fdaa47767eaddfad23549980
|
||||
size 8829196896
|
||||
3
DR-Venus-4B-RL.Q3_K_M.gguf
Normal file
3
DR-Venus-4B-RL.Q3_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:4bd7d9ce3597354742904130e0f11092a9104e1273d1e387f485e2d71065c89f
|
||||
size 2242746976
|
||||
3
DR-Venus-4B-RL.Q4_K_M.gguf
Normal file
3
DR-Venus-4B-RL.Q4_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c48f4d60d2010861ded239f92675d33dc548f1a6472b35cb163b79017b822669
|
||||
size 2716067936
|
||||
3
DR-Venus-4B-RL.Q5_K_M.gguf
Normal file
3
DR-Venus-4B-RL.Q5_K_M.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:7041ee9949b7989d384920185ff391a6f576ae55655a8f894cf97be248098ae8
|
||||
size 3156920416
|
||||
3
DR-Venus-4B-RL.Q6_K.gguf
Normal file
3
DR-Venus-4B-RL.Q6_K.gguf
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d91f31266a154ee038b54fe8f43491c9f5c526338d4f5396a9afc3569614217a
|
||||
size 3625326176
|
||||
151
README.md
Normal file
151
README.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# DR-Venus-4B-RL-GGUF
|
||||
|
||||
DR-Venus-4B-RL-GGUF is the reinforcement-learned [DR-Venus](https://github.com/inclusionAI/DR-Venus) checkpoint built on top of [inclusionAI/DR-Venus-4B-SFT](https://huggingface.co/inclusionAI/DR-Venus-4B-SFT). It is a 4B deep research agent designed for long-horizon web research with explicit tool use, evidence collection, and answer generation.
|
||||
|
||||
This model is trained entirely on open data. Starting from the SFT checkpoint, DR-Venus-4B-RL applies long-horizon agentic RL with IGPO-style information gain rewards and format-aware turn-level supervision to improve execution reliability under long tool-use trajectories.
|
||||
|
||||
## What This Model Is For
|
||||
|
||||
This checkpoint is intended for:
|
||||
|
||||
- long-horizon deep research with tool-augmented reasoning
|
||||
- improving execution reliability beyond supervised imitation
|
||||
- evidence-grounded answering with `search` and `visit`
|
||||
- deployment in the official [DR-Venus inference pipeline](https://github.com/inclusionAI/DR-Venus/tree/master/Inference)
|
||||
s
|
||||
It is not primarily optimized for:
|
||||
|
||||
- plain chat without tools
|
||||
- generic short-context instruction following
|
||||
- use cases that do not need multi-step retrieval and browsing
|
||||
|
||||
## Model Details
|
||||
|
||||
- Base model: [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507)
|
||||
- Initialization checkpoint: [inclusionAI/DR-Venus-4B-SFT](https://huggingface.co/inclusionAI/DR-Venus-4B-SFT)
|
||||
- Training stage: agentic reinforcement learning
|
||||
- Training framework: [`verl`](https://github.com/volcengine/verl) + [IGPO](https://github.com/GuoqingWang1/IGPO) algorithm
|
||||
- Tool setting: `search` + `visit`
|
||||
- Maximum rollout horizon: `200` interaction steps
|
||||
- Maximum rollout context length: `256K`
|
||||
- Intended domain: long-horizon open-domain research and evidence-grounded question answering
|
||||
|
||||
## How DR-Venus Builds RL Supervision
|
||||
|
||||
DR-Venus-4B-RL is trained with dense turn-level supervision tailored to deep research:
|
||||
|
||||
1. The model starts from the [DR-Venus supervised checkpoint](https://huggingface.co/inclusionAI/DR-Venus-4B-SFT).
|
||||
2. For each query, the agent interacts with the environment over multi-turn `search` and `visit` trajectories.
|
||||
3. IGPO uses information gain rewards to measure whether an intermediate turn increases the model's probability of producing the ground-truth answer.
|
||||
4. Information gain rewards are combined with outcome rewards and turn-level format-aware penalties.
|
||||
5. The policy is optimized using an IGPO objective with fine-grained credit assignment, specifically tailored for the long-horizon nature of deep research rollouts.
|
||||
|
||||
This design improves supervision density, credit assignment, and data efficiency compared with sparse trajectory-level RL alone.
|
||||
|
||||
## Training Data
|
||||
|
||||
This model is trained from open-data supervision constructed from:
|
||||
|
||||
- the DR-Venus SFT checkpoint as initialization
|
||||
- [REDSearcher 1K RL query-answer pairs](https://huggingface.co/datasets/Zchu/REDSearcher_RL_1K)
|
||||
- online rollouts with the DR-Venus `search` + `visit` tool environment
|
||||
|
||||
In the current paper setup:
|
||||
|
||||
- RL is performed entirely on open query-answer pairs
|
||||
- rollout groups are sampled with long-horizon agent interaction
|
||||
- generation is performed with up to `200` interaction steps per query
|
||||
|
||||
For more implementation details, please refer to the [DR-Venus GitHub repository](https://github.com/inclusionAI/DR-Venus).
|
||||
|
||||
## Training Recipe
|
||||
|
||||
The RL checkpoint is trained with the following setup reported in the current paper draft:
|
||||
|
||||
- algorithm: IGPO-style agentic RL
|
||||
- rollout group size: `8`
|
||||
- training batch size: `16`
|
||||
- learning rate: `1e-6`
|
||||
- rollout temperature: `1.0`
|
||||
- rollout top-p: `0.95`
|
||||
- maximum context length: `256K`
|
||||
- maximum generation length per turn: `8,192`
|
||||
- discount factor: `0.95`
|
||||
- format penalty scale: `1.0`
|
||||
- training framework: [`verl`](https://github.com/volcengine/verl) with vLLM rollout engine and FSDP trainer
|
||||
|
||||
The current paper configuration also enables browse-aware IG assignment and IG-scale style reward balancing.
|
||||
|
||||
## Evaluation Summary
|
||||
|
||||
DR-Venus-4B-RL improves over the SFT checkpoint on most tracked deep research benchmarks and sets a stronger small-model frontier.
|
||||
|
||||
### Results Against Open Models Under 9B
|
||||
|
||||
| Model | BrowseComp | BrowseComp-ZH | GAIA (Text-Only) | xBench-DS-2505 | xBench-DS-2510 | DeepSearchQA |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| DeepDive-9B-SFT | 5.6 | 15.7 | -- | 35.0 | -- | -- |
|
||||
| DeepDive-9B-RL | 6.3 | 15.1 | -- | 38.0 | -- | -- |
|
||||
| WebSailor-7B | 6.7 | 14.2 | 37.9 | 34.3 | -- | -- |
|
||||
| OffSeeker-8B-SFT | 10.6 | 24.2 | 47.6 | 48.0 | -- | -- |
|
||||
| OffSeeker-8B-DPO | 12.8 | 26.6 | 51.5 | 49.0 | -- | -- |
|
||||
| WebExplorer-8B-RL | 15.7 | 32.0 | 50.0 | 53.7 | 23.0 | 17.8 |
|
||||
| AgentCPM-Explore-4B | 24.1 | 29.1 | 63.9 | 70.0 | 34.0 | 32.8 |
|
||||
| DR-Venus-4B-SFT | 26.8 | 35.7 | 65.4 | 69.0 | 35.3 | 37.7 |
|
||||
| DR-Venus-4B-RL | 29.1 | 37.7 | 64.4 | 74.7 | 40.7 | 39.6 |
|
||||
|
||||
Relative to the SFT checkpoint, DR-Venus-4B-RL improves:
|
||||
|
||||
- BrowseComp by `+2.3`
|
||||
- BrowseComp-ZH by `+2.0`
|
||||
- xBench-DS-2505 by `+5.7`
|
||||
- xBench-DS-2510 by `+5.4`
|
||||
- DeepSearchQA by `+1.9`
|
||||
|
||||
These gains are associated with better formatting accuracy, more reliable tool use, and stronger long-horizon execution stability.
|
||||
|
||||
## Usage
|
||||
|
||||
This checkpoint should be used with the official [DR-Venus inference pipeline](https://github.com/inclusionAI/DR-Venus/tree/master/Inference).
|
||||
|
||||
```bash
|
||||
git clone https://github.com/inclusionAI/DR-Venus
|
||||
cd DR-Venus/Inference
|
||||
pip install -r requirements.txt
|
||||
# then configure the model path in run_demo.sh or run_web_demo.sh
|
||||
bash run_demo.sh
|
||||
```
|
||||
|
||||
For reproducing RL training or understanding the rollout setup, see the [`RL`](https://github.com/inclusionAI/DR-Venus/tree/master/RL) directory in the official repository.
|
||||
|
||||
## License and Release Notes
|
||||
|
||||
Please verify license compatibility with:
|
||||
|
||||
- the upstream base model
|
||||
- the released supervision data
|
||||
- the external tools and judge models used in training or evaluation
|
||||
|
||||
This section can be updated later with the final project-specific license statement.
|
||||
|
||||
## Citation
|
||||
|
||||
If you use this checkpoint, please cite the [DR-Venus project](https://github.com/inclusionAI/DR-Venus).
|
||||
|
||||
```bibtex
|
||||
@article{venus2026drvenus,
|
||||
title={DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data},
|
||||
author={Venus Team and Dai, Sunhao and Deng, Yong and Lin, Jinzhen and Song, Yusheng and Wang, Guoqing and Wu, Xiaofeng and Zhou, Yuqi and Yang, Shuo and Ying, Zhenzhe and Zhang, Zhanwei and Meng, Changhua and Wang, Weiqiang},
|
||||
journal={arXiv preprint arXiv:2604.19859},
|
||||
year={2026}
|
||||
}
|
||||
```
|
||||
|
||||
## Links
|
||||
|
||||
- GitHub: [https://github.com/inclusionAI/DR-Venus](https://github.com/inclusionAI/DR-Venus)
|
||||
- RL code: [https://github.com/inclusionAI/DR-Venus/tree/master/RL](https://github.com/inclusionAI/DR-Venus/tree/master/RL)
|
||||
- Inference code: [https://github.com/inclusionAI/DR-Venus/tree/master/Inference](https://github.com/inclusionAI/DR-Venus/tree/master/Inference)
|
||||
- SFT model: [https://huggingface.co/inclusionAI/DR-Venus-4B-SFT](https://huggingface.co/inclusionAI/DR-Venus-4B-SFT)
|
||||
- RL model: [https://huggingface.co/inclusionAI/DR-Venus-4B-RL](https://huggingface.co/inclusionAI/DR-Venus-4B-RL)
|
||||
- Collection: [https://huggingface.co/collections/inclusionAI/dr-venus](https://huggingface.co/collections/inclusionAI/dr-venus)
|
||||
Reference in New Issue
Block a user