longvideoagent-qwen3-4b/README.md

---
license: apache-2.0
language:
- en
tags:
- long-video
- video-understanding
- video-qa
- agent
- qwen3
- transformers
- longtvqa
base_model: Qwen/Qwen3-4B-Thinking-2507
library_name: transformers
---

# LongVideoAgent Qwen3-4B

This repository hosts the released LLM checkpoint for **LongVideoAgent**, a multi-agent framework for long-video question answering. This model is a **Qwen3-4B-based checkpoint** used in the LongVideoAgent project.


## Overview

This model is trained based on the official repository: [longvideoagent/LongVideoAgent](https://github.com/longvideoagent/LongVideoAgent).

LongVideoAgent utilizes a multi-agent collaboration framework to decompose complex long-video reasoning into specialized roles. For detailed methodology and agent architecture, please refer to our paper on arXiv: [https://arxiv.org/abs/2512.20618](https://arxiv.org/abs/2512.20618).

This checkpoint is intended for use with the official LongVideoAgent codebase and evaluation pipeline.

## Performance

On the **LongTVQA+** test set, this model achieves an accuracy of **72%**, while `gpt-4o-mini` achieves 74% on the same benchmark. 

This demonstrates that our model delivers strong performance, achieving reasoning capabilities comparable to advanced closed-source models while utilizing a significantly smaller parameter size.

## Intended Use

Use this model for:

- Research on long-video question answering
- Reproducing LongVideoAgent experiments
- Studying agentic reasoning over long videos

This checkpoint is **not** a general-purpose video model by itself. For inference and evaluation, please use the official repository:

- https://github.com/longvideoagent/LongVideoAgent

## Usage
**Note on Context Length:** This model natively supports a context length of **262,144**. If you experience Out-Of-Memory (OOM) errors or have limited VRAM during inference, you can reduce the maximum context length in your vLLM parameters. For example: `max_model_len=120000`.

Please follow the setup and inference instructions in the official repository and project documentation:

- https://github.com/longvideoagent/LongVideoAgent
- https://longvideoagent.github.io/

If you use this checkpoint in your work, please also cite the LongVideoAgent paper below.

## Citation

```bibtex
@misc{liu2025longvideoagentmultiagentreasoninglong,
  title={LongVideoAgent: Multi-Agent Reasoning with Long Videos},
  author={Runtao Liu and Ziyi Liu and Jiaqi Tang and Yue Ma and Renjie Pi and Jipeng Zhang and Qifeng Chen},
  year={2025},
  eprint={2512.20618},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={[https://arxiv.org/abs/2512.20618](https://arxiv.org/abs/2512.20618)},
}
初始化项目，由ModelHub XC社区提供模型 Model: longvideoagent/longvideoagent-qwen3-4b Source: Original Platform 2026-04-13 18:43:06 +08:00			`---`
			`license: apache-2.0`
			`language:`
			`- en`
			`tags:`
			`- long-video`
			`- video-understanding`
			`- video-qa`
			`- agent`
			`- qwen3`
			`- transformers`
			`- longtvqa`
			`base_model: Qwen/Qwen3-4B-Thinking-2507`
			`library_name: transformers`
			`---`

			`# LongVideoAgent Qwen3-4B`

			`This repository hosts the released LLM checkpoint for LongVideoAgent, a multi-agent framework for long-video question answering. This model is a Qwen3-4B-based checkpoint used in the LongVideoAgent project.`


			`## Overview`

			`This model is trained based on the official repository: [longvideoagent/LongVideoAgent](https://github.com/longvideoagent/LongVideoAgent).`

			`LongVideoAgent utilizes a multi-agent collaboration framework to decompose complex long-video reasoning into specialized roles. For detailed methodology and agent architecture, please refer to our paper on arXiv: [https://arxiv.org/abs/2512.20618](https://arxiv.org/abs/2512.20618).`

			`This checkpoint is intended for use with the official LongVideoAgent codebase and evaluation pipeline.`

			`## Performance`

			On the LongTVQA+ test set, this model achieves an accuracy of 72%, while `gpt-4o-mini` achieves 74% on the same benchmark.

			`This demonstrates that our model delivers strong performance, achieving reasoning capabilities comparable to advanced closed-source models while utilizing a significantly smaller parameter size.`

			`## Intended Use`

			`Use this model for:`

			`- Research on long-video question answering`
			`- Reproducing LongVideoAgent experiments`
			`- Studying agentic reasoning over long videos`

			`This checkpoint is not a general-purpose video model by itself. For inference and evaluation, please use the official repository:`

			`- https://github.com/longvideoagent/LongVideoAgent`

			`## Usage`
			Note on Context Length: This model natively supports a context length of 262,144. If you experience Out-Of-Memory (OOM) errors or have limited VRAM during inference, you can reduce the maximum context length in your vLLM parameters. For example: `max_model_len=120000`.

			`Please follow the setup and inference instructions in the official repository and project documentation:`

			`- https://github.com/longvideoagent/LongVideoAgent`
			`- https://longvideoagent.github.io/`

			`If you use this checkpoint in your work, please also cite the LongVideoAgent paper below.`

			`## Citation`

			```bibtex
			`@misc{liu2025longvideoagentmultiagentreasoninglong,`
			`title={LongVideoAgent: Multi-Agent Reasoning with Long Videos},`
			`author={Runtao Liu and Ziyi Liu and Jiaqi Tang and Yue Ma and Renjie Pi and Jipeng Zhang and Qifeng Chen},`
			`year={2025},`
			`eprint={2512.20618},`
			`archivePrefix={arXiv},`
			`primaryClass={cs.AI},`
			`url={[https://arxiv.org/abs/2512.20618](https://arxiv.org/abs/2512.20618)},`
			`}`