--- license: apache-2.0 language: - en tags: - long-video - video-understanding - video-qa - agent - qwen3 - transformers - longtvqa base_model: Qwen/Qwen3-4B-Thinking-2507 library_name: transformers --- # LongVideoAgent Qwen3-4B This repository hosts the released LLM checkpoint for **LongVideoAgent**, a multi-agent framework for long-video question answering. This model is a **Qwen3-4B-based checkpoint** used in the LongVideoAgent project. ## Overview This model is trained based on the official repository: [longvideoagent/LongVideoAgent](https://github.com/longvideoagent/LongVideoAgent). LongVideoAgent utilizes a multi-agent collaboration framework to decompose complex long-video reasoning into specialized roles. For detailed methodology and agent architecture, please refer to our paper on arXiv: [https://arxiv.org/abs/2512.20618](https://arxiv.org/abs/2512.20618). This checkpoint is intended for use with the official LongVideoAgent codebase and evaluation pipeline. ## Performance On the **LongTVQA+** test set, this model achieves an accuracy of **72%**, while `gpt-4o-mini` achieves 74% on the same benchmark. This demonstrates that our model delivers strong performance, achieving reasoning capabilities comparable to advanced closed-source models while utilizing a significantly smaller parameter size. ## Intended Use Use this model for: - Research on long-video question answering - Reproducing LongVideoAgent experiments - Studying agentic reasoning over long videos This checkpoint is **not** a general-purpose video model by itself. For inference and evaluation, please use the official repository: - https://github.com/longvideoagent/LongVideoAgent ## Usage **Note on Context Length:** This model natively supports a context length of **262,144**. If you experience Out-Of-Memory (OOM) errors or have limited VRAM during inference, you can reduce the maximum context length in your vLLM parameters. For example: `max_model_len=120000`. Please follow the setup and inference instructions in the official repository and project documentation: - https://github.com/longvideoagent/LongVideoAgent - https://longvideoagent.github.io/ If you use this checkpoint in your work, please also cite the LongVideoAgent paper below. ## Citation ```bibtex @misc{liu2025longvideoagentmultiagentreasoninglong, title={LongVideoAgent: Multi-Agent Reasoning with Long Videos}, author={Runtao Liu and Ziyi Liu and Jiaqi Tang and Yue Ma and Renjie Pi and Jipeng Zhang and Qifeng Chen}, year={2025}, eprint={2512.20618}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={[https://arxiv.org/abs/2512.20618](https://arxiv.org/abs/2512.20618)}, }