初始化项目，由ModelHub XC社区提供模型

Model: longvideoagent/longvideoagent-qwen3-4b Source: Original Platform
2026-04-13 18:43:06 +08:00
commit 027a02d687
14 changed files with 152374 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,69 @@
+---
+license: apache-2.0
+language:
+- en
+tags:
+- long-video
+- video-understanding
+- video-qa
+- agent
+- qwen3
+- transformers
+- longtvqa
+base_model: Qwen/Qwen3-4B-Thinking-2507
+library_name: transformers
+---
+
+# LongVideoAgent Qwen3-4B
+
+This repository hosts the released LLM checkpoint for **LongVideoAgent**, a multi-agent framework for long-video question answering. This model is a **Qwen3-4B-based checkpoint** used in the LongVideoAgent project.
+
+
+## Overview
+
+This model is trained based on the official repository: [longvideoagent/LongVideoAgent](https://github.com/longvideoagent/LongVideoAgent).
+
+LongVideoAgent utilizes a multi-agent collaboration framework to decompose complex long-video reasoning into specialized roles. For detailed methodology and agent architecture, please refer to our paper on arXiv: [https://arxiv.org/abs/2512.20618](https://arxiv.org/abs/2512.20618).
+
+This checkpoint is intended for use with the official LongVideoAgent codebase and evaluation pipeline.
+
+## Performance
+
+On the **LongTVQA+** test set, this model achieves an accuracy of **72%**, while `gpt-4o-mini` achieves 74% on the same benchmark. 
+
+This demonstrates that our model delivers strong performance, achieving reasoning capabilities comparable to advanced closed-source models while utilizing a significantly smaller parameter size.
+
+## Intended Use
+
+Use this model for:
+
+- Research on long-video question answering
+- Reproducing LongVideoAgent experiments
+- Studying agentic reasoning over long videos
+
+This checkpoint is **not** a general-purpose video model by itself. For inference and evaluation, please use the official repository:
+
+- https://github.com/longvideoagent/LongVideoAgent
+
+## Usage
+**Note on Context Length:** This model natively supports a context length of **262,144**. If you experience Out-Of-Memory (OOM) errors or have limited VRAM during inference, you can reduce the maximum context length in your vLLM parameters. For example: `max_model_len=120000`.
+
+Please follow the setup and inference instructions in the official repository and project documentation:
+
+- https://github.com/longvideoagent/LongVideoAgent
+- https://longvideoagent.github.io/
+
+If you use this checkpoint in your work, please also cite the LongVideoAgent paper below.
+
+## Citation
+
+```bibtex
+@misc{liu2025longvideoagentmultiagentreasoninglong,
+  title={LongVideoAgent: Multi-Agent Reasoning with Long Videos},
+  author={Runtao Liu and Ziyi Liu and Jiaqi Tang and Yue Ma and Renjie Pi and Jipeng Zhang and Qifeng Chen},
+  year={2025},
+  eprint={2512.20618},
+  archivePrefix={arXiv},
+  primaryClass={cs.AI},
+  url={[https://arxiv.org/abs/2512.20618](https://arxiv.org/abs/2512.20618)},
+}