longvideoagent-qwen2.5-7b/README.md

---
license: apache-2.0
language:
  - en
tags:
  - long-video
  - video-understanding
  - video-qa
  - agent
  - qwen2.5
  - transformers
  - longtvqa
pipeline_tag: text-generation
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: transformers
---

# LongVideoAgent Qwen2.5-7B

This repository hosts the released LLM checkpoint for **LongVideoAgent**, a multi-agent framework for long-video question answering. This model is a **Qwen2.5-7B-based checkpoint** used in the LongVideoAgent project.

Project links:

- Project page: https://longvideoagent.github.io/
- Code: https://github.com/longvideoagent/LongVideoAgent
- Paper: https://arxiv.org/abs/2512.20618

## Overview

LongVideoAgent decomposes long-video reasoning into multiple roles:

- `MasterAgent` for planning and answer generation
- `GroundingAgent` for subtitle-based temporal grounding
- `VisionAgent` for local visual evidence extraction

This checkpoint is intended for use with the official LongVideoAgent codebase and evaluation pipeline.

## LongTVQA+ Performance

Compared with the Qwen2.5-7B-Instruct baseline, this checkpoint improves LongTVQA+ accuracy by 6.66 percentage points.

| Model | LongTVQA+ Acc | Delta |
| --- | ---: | ---: |
| Qwen2.5-7B-Instruct | 57.33% | - |
| LongVideoAgent Qwen2.5-7B | 64.00% | +6.66% |

## Intended Use

Use this model for:

- Research on long-video question answering
- Reproducing LongVideoAgent experiments
- Studying agentic reasoning over long videos

This checkpoint is **not** a general-purpose video model by itself. For inference and evaluation, please use the official repository:

- https://github.com/longvideoagent/LongVideoAgent

## Usage

Please follow the setup and inference instructions in the official repository and project documentation:

- https://github.com/longvideoagent/LongVideoAgent
- https://longvideoagent.github.io/

If you use this checkpoint in your work, please also cite the LongVideoAgent paper below.

## Citation

```bibtex
@misc{liu2025longvideoagentmultiagentreasoninglong,
  title={LongVideoAgent: Multi-Agent Reasoning with Long Videos},
  author={Runtao Liu and Ziyi Liu and Jiaqi Tang and Yue Ma and Renjie Pi and Jipeng Zhang and Qifeng Chen},
  year={2025},
  eprint={2512.20618},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2512.20618},
}
```

## Acknowledgement

This checkpoint is built for the LongVideoAgent project and is based on **Qwen2.5-7B-Instruct**.