# Qwen2.5-VL-3B-Instruct-Traffic **Qwen2.5-VL-3B-Instruct-Traffic** is a multimodal model fine-tuned on the **MITS (Multimodal Intelligent Traffic Surveillance)** dataset for intelligent traffic surveillance scenarios. - **Tasks:** recognition, counting, localization, background awareness, reasoning - **Data:** 170,400 images + ~5M instruction-following VQA pairs from MITS - **Modality:** Image + Text → Text - **Domain:** traffic scenes (congestion, accidents, construction, smoke/fireworks, unusual weather, spills, etc.) ## Quick Links - 📚 Dataset: [`zhaokaikai/Multimodal_Intelligent_Traffic_Surveillance`](https://www.modelscope.cn/datasets/zhaokaikai/Multimodal_Intelligent_Traffic_Surveillance) - 💻 Usage & examples: please refer to the GitHub repo **https://github.com/LifeIsSoSolong/Multimodal-Intelligent-Traffic-Surveillance-Dataset-Models** ## Intended Use - Urban traffic monitoring, incident analysis, visual question answering for transportation management - Research on ITS-specific multimodal reasoning and instruction following ## Model Inputs/Outputs - **Input:** an image (traffic scene) + a natural language instruction/question - **Output:** a natural language response (e.g., description, count, event reasoning) ## Training Summary - Objective: instruction tuning on MITS traffic QA - Backbone family: Qwen2.5-VL 3B Instruct - Notes: align vision-language features to traffic-centric concepts and events ## Limitations & Notes - The model may make mistakes on rare objects or extreme weather/night scenes not well represented in training. - Not a safety-critical system; human verification is required for real-world decisions. ## License - Follow the licenses of this model and the MITS dataset as stated on their ModelScope pages. ## Citation If you use this model or dataset, please cite: ```bibtex @article{zhao2025mits, title = {MITS: A large-scale multimodal benchmark dataset for Intelligent Traffic Surveillance}, author = {Zhao, Kaikai and Liu, Zhaoxiang and Wang, Peng and Wang, Xin and Ma, Zhicheng and Xu, Yajun and Zhang, Wenjing and Nan, Yibing and Wang, Kai and Lian, Shiguo}, journal = {Image and Vision Computing}, pages = {105736}, year = {2025}, publisher = {Elsevier} } ``` ## Contact Unicom AI