Files
ModelHub XC 6a0b2d6c98 初始化项目,由ModelHub XC社区提供模型
Model: tanhuajie2001/Robo-Dopamine-GRM-2.0-4B-Preview
Source: Original Platform
2026-04-21 21:14:03 +08:00

126 lines
6.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
base_model:
- Qwen/Qwen3-VL-4B-Instruct
pipeline_tag: image-text-to-text
---
<h1 align="center">Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation</h1>
<h3 align="center">Joy is dopamines handiwork—whether in humans or in robotics.</h3>
<p align="center">
<a href="https://arxiv.org/abs/2512.23703"><img src="https://img.shields.io/badge/arXiv-2512.23703-b31b1b.svg" alt="arXiv"></a>
&nbsp;
<a href="https://robo-dopamine.github.io/"><img src="https://img.shields.io/badge/%F0%9F%8F%A0%20Project-Homepage-blue" alt="Project Homepage"></a>
&nbsp;
<a href="https://github.com/FlagOpen/Robo-Dopamine"><img src="https://img.shields.io/badge/🔍%20Code-Github-orange" alt="Github"></a>
&nbsp;
<a href="https://huggingface.co/collections/tanhuajie2001/robo-dopamine"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Weights-Huggingface-yellow" alt="Weights"></a>
&nbsp;
<a href="#"><img src="https://img.shields.io/badge/🤗%20Dataset-Stay%20tuned-green.svg" alt="Dataset"></a>
</p>
<div style="text-align: center; background-color: white;">
<img src="https://github.com/FlagOpen/Robo-Dopamine/raw/main/assets/teasor.png" width=100% >
</div>
## 🗞️ News
- **`2026-04-05`**: 🤗 We released [Robo-Dopamine-GRM-2.0-4B-Preview](https://huggingface.co/tanhuajie2001/Robo-Dopamine-GRM-2.0-4B-Preview) model.
- **`2026-03-05`**: 🤗 We released [Robo-Dopamine-GRM-2.0-8B-Preview](https://huggingface.co/tanhuajie2001/Robo-Dopamine-GRM-2.0-8B-Preview) model. More General, More Powerful!!!
- **`2026-03-02`**: 🤗 We released [Robo-Dopamine-GRM-8B](https://huggingface.co/tanhuajie2001/Robo-Dopamine-GRM-8B) model
- **`2026-02-22`**: 🔥🔥🔥 **Robo-Dopamine** gets accepted to CVPR 2026! See you in Denver, Colorado, USA!
- **`2026-02-10`**: ⚡ We released data generation pipeline and finetune codes. ***Try to finetune with your own data***.
- **`2026-01-26`**: 🔍 We released [Robo-Dopamine-Bench](https://huggingface.co/datasets/tanhuajie2001/Robo-Dopamine-Bench) benchmark and evaluation codes.
- **`2026-01-08`**: 🤗 We released [Robo-Dopamine-GRM-3B](https://huggingface.co/tanhuajie2001/Robo-Dopamine-GRM-3B) model and inference codes.
- **`2025-12-30`**: ✨ ***Codes, Dataset and Weights are coming soon! Stay tuned for updates***.
- **`2025-12-30`**: 🔥 We released our [Project Page](https://robo-dopamine.github.io/) of **Robo-Dopamine**.
## 🤖 Overview
**Robo-Dopamine** is composed of two core components: ***(a) Dopamine-Reward Modeling Method --*** At the heart of our reward modeling is to build the General Reward Model (GRM), a vision-language model that is prompted with a task description and conditioned on multi-view images of initial, goal, "BEFORE," and "AFTER" states to predict a relative progress or regress hop. To ensure a stable and accurate signal, we employ *Multi-Perspective Progress Fusion*, which combines incremental, forward-anchored, and backward-anchored predictions into a final fused reward. And ***(b) Dopamine-RL Training Framework --*** The Dopamine-RL framework first adapts the pre-trained GRM to a novel task using a single demonstration, i.e., *One-Shot GRM Adaptation*. Subsequently, it uses a theoretically-sound *Policy-Invariant Reward Shaping* method to convert the GRM's dense output into a reward signal that accelerates learning without altering the optimal policy.
This approach is universally compatible with a wide range of RL algorithms.
<div align="center">
<img src="https://github.com/FlagOpen/Robo-Dopamine/raw/main/assets/method.png" alt="Logo" style="width=100%;vertical-align:middle">
</div>
## 🛠️ Setup
```bash
# clone repo.
git clone https://github.com/FlagOpen/Robo-Dopamine.git
cd Robo-Dopamine
# build conda env.
conda create -n robo-dopamine python=3.10
conda activate robo-dopamine
pip install -r requirements.txt
```
## 💡 Simple Inference
```python
import os
from examples.inference import GRMInference
model = GRMInference("tanhuajie2001/Robo-Dopamine-GRM-2.0-4B-Preview")
TASK_INSTRUCTION = "organize the table"
BASE_DEMO_PATH = "./examples/demo_table"
OUTPUT_ROOT = "./results"
## Note: If no target/goal image is provided,
## please replace `GOAL_IMAGE_PATH` with the blank image "./examples/demo_table/blank_goal.png".
GOAL_IMAGE_PATH = "./examples/demo_table/goal_image.png" # "./examples/demo_table/blank_goal.png"
# select prediction model: Forward-Mode, Incremental-Mode or Backward-Mode
PREDICTION_MODE = "forward" # "incremental" or "backward"
# multi-view usage:
output_dir = model.run_pipeline(
cam_high_path = os.path.join(BASE_DEMO_PATH, "cam_high.mp4"),
cam_left_path = os.path.join(BASE_DEMO_PATH, "cam_left_wrist.mp4"),
cam_right_path = os.path.join(BASE_DEMO_PATH, "cam_right_wrist.mp4"),
out_root = OUTPUT_ROOT,
task = TASK_INSTRUCTION,
frame_interval = 10, # modify frame_interval as desired, but it shouldn't be set too small if using 'incremental'.
batch_size = 1, # please increase batch_size > 1, if you have enough GPU memory.
goal_image = GOAL_IMAGE_PATH,
eval_mode = PREDICTION_MODE,
visualize = True
)
print(f"Episode ({BASE_DEMO_PATH}) processed with multi-view {PREDICTION_MODE}-mode. Output at: {output_dir}")
# single-view usage:
output_dir = model.run_pipeline(
cam_high_path = os.path.join(BASE_DEMO_PATH, "cam_high.mp4"),
cam_left_path = os.path.join(BASE_DEMO_PATH, "cam_high.mp4"), # repeat cam_high
cam_right_path = os.path.join(BASE_DEMO_PATH, "cam_high.mp4"), # repeat cam_high
out_root = OUTPUT_ROOT,
task = TASK_INSTRUCTION,
frame_interval = 10, # modify frame_interval as desired, but it shouldn't be set too small if using 'incremental'.
batch_size = 1, # please increase batch_size > 1, if you have enough GPU memory.
goal_image = GOAL_IMAGE_PATH,
eval_mode = PREDICTION_MODE,
visualize = True
)
print(f"Episode ({BASE_DEMO_PATH}) processed with single-view {PREDICTION_MODE}-mode. Output at: {output_dir}")
```
## 📑 Citation
If you find our work helpful, feel free to cite it:
```
@article{tan2025robo,
title={Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation},
author={Tan, Huajie and Chen, Sixiang and Xu, Yijie and Wang, Zixiao and Ji, Yuheng and Chi, Cheng and Lyu, Yaoxu and Zhao, Zhongxia and Chen, Xiansheng and Co, Peterson and others},
journal={arXiv preprint arXiv:2512.23703},
year={2025}
}
```