初始化项目,由ModelHub XC社区提供模型
Model: longvideoagent/longvideoagent-qwen2.5-7b Source: Original Platform
This commit is contained in:
84
README.md
Normal file
84
README.md
Normal file
@@ -0,0 +1,84 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
tags:
|
||||
- long-video
|
||||
- video-understanding
|
||||
- video-qa
|
||||
- agent
|
||||
- qwen2.5
|
||||
- transformers
|
||||
- longtvqa
|
||||
pipeline_tag: text-generation
|
||||
base_model: Qwen/Qwen2.5-7B-Instruct
|
||||
library_name: transformers
|
||||
---
|
||||
|
||||
# LongVideoAgent Qwen2.5-7B
|
||||
|
||||
This repository hosts the released LLM checkpoint for **LongVideoAgent**, a multi-agent framework for long-video question answering. This model is a **Qwen2.5-7B-based checkpoint** used in the LongVideoAgent project.
|
||||
|
||||
Project links:
|
||||
|
||||
- Project page: https://longvideoagent.github.io/
|
||||
- Code: https://github.com/longvideoagent/LongVideoAgent
|
||||
- Paper: https://arxiv.org/abs/2512.20618
|
||||
|
||||
## Overview
|
||||
|
||||
LongVideoAgent decomposes long-video reasoning into multiple roles:
|
||||
|
||||
- `MasterAgent` for planning and answer generation
|
||||
- `GroundingAgent` for subtitle-based temporal grounding
|
||||
- `VisionAgent` for local visual evidence extraction
|
||||
|
||||
This checkpoint is intended for use with the official LongVideoAgent codebase and evaluation pipeline.
|
||||
|
||||
## LongTVQA+ Performance
|
||||
|
||||
Compared with the Qwen2.5-7B-Instruct baseline, this checkpoint improves LongTVQA+ accuracy by 6.66 percentage points.
|
||||
|
||||
| Model | LongTVQA+ Acc | Delta |
|
||||
| --- | ---: | ---: |
|
||||
| Qwen2.5-7B-Instruct | 57.33% | - |
|
||||
| LongVideoAgent Qwen2.5-7B | 64.00% | +6.66% |
|
||||
|
||||
## Intended Use
|
||||
|
||||
Use this model for:
|
||||
|
||||
- Research on long-video question answering
|
||||
- Reproducing LongVideoAgent experiments
|
||||
- Studying agentic reasoning over long videos
|
||||
|
||||
This checkpoint is **not** a general-purpose video model by itself. For inference and evaluation, please use the official repository:
|
||||
|
||||
- https://github.com/longvideoagent/LongVideoAgent
|
||||
|
||||
## Usage
|
||||
|
||||
Please follow the setup and inference instructions in the official repository and project documentation:
|
||||
|
||||
- https://github.com/longvideoagent/LongVideoAgent
|
||||
- https://longvideoagent.github.io/
|
||||
|
||||
If you use this checkpoint in your work, please also cite the LongVideoAgent paper below.
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@misc{liu2025longvideoagentmultiagentreasoninglong,
|
||||
title={LongVideoAgent: Multi-Agent Reasoning with Long Videos},
|
||||
author={Runtao Liu and Ziyi Liu and Jiaqi Tang and Yue Ma and Renjie Pi and Jipeng Zhang and Qifeng Chen},
|
||||
year={2025},
|
||||
eprint={2512.20618},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.AI},
|
||||
url={https://arxiv.org/abs/2512.20618},
|
||||
}
|
||||
```
|
||||
|
||||
## Acknowledgement
|
||||
|
||||
This checkpoint is built for the LongVideoAgent project and is based on **Qwen2.5-7B-Instruct**.
|
||||
Reference in New Issue
Block a user