初始化项目,由ModelHub XC社区提供模型

Model: Mungert/Qwen3-VL-2B-Instruct-GGUF
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-06-07 15:23:21 +08:00
commit da0aca854a
25 changed files with 403 additions and 0 deletions

58
.gitattributes vendored Normal file
View File

@@ -0,0 +1,58 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-f16_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q3_k_m.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q3_k_s.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q6_k_m.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q4_k_s.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q4_0.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q4_1.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q5_0.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q5_1.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-iq3_xs.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-iq3_xxs.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-iq3_m.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-iq4_xs.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-iq4_nl.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-imatrix.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-q8_0.mmproj filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-f16.mmproj filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-bf16.mmproj filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-f32.mmproj filter=lfs diff=lfs merge=lfs -text
Qwen3-VL-2B-Instruct-bf16.gguf filter=lfs diff=lfs merge=lfs -text

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:16d44cc27d57b7760648c03a5320e7b07a7dc1a8aa464b0b81691a1ea45418cc
size 3447350496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:109453d03520d7dd492ddc30d930fc50649977e7ee3fcee097a5371660657e43
size 822540544

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1063d48a3a211c4a0a2904ad4ee5a55772c2ddf2ac11b420c868d05d53a41512
size 819394816

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f53d4d18aaa54d791e548ba60e2a2343e5e84f91d0bb20e074582ed12f3cceba
size 2621596896

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:133741b445b97c488337f2221824391a043b6aae3d1307306f820aba63833015
size 1627846912

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:42887aab81e14ba4d4d34e6bdc1cd0d8074ea29399bfcac6941f531c405e44f6
size 2094560

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:eb066599beb9be8a43be978f8ae30c84fa73423b307db96bd24a573a3497120e
size 914308608

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1a55cf4e4548babcfc5e3a8ba2957baba28fc86403b17c5859d9767c3b48c18b
size 843169280

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:dce5f1734536a4646c155d0bb1ddc9e155d544275220a33bbc2c773f5cbe7db7
size 823901696

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:340fdc1d468943bc173d15fe74d9d21b39f8df9fe427188d60198967fef2bead
size 974202368

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2821875c094469a9ebeb881be6a8565f0ff6262b001d7c190409e185e5191903
size 1010384384

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:754c206b125122888960f943b854e4efec33c1a8ff0759c36ef7fce7cef8a819
size 934559232

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f52fc6009852cfa94af250c11d5360955b65f6214295e5af22f3ab38b6276779
size 893232640

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c234f751ca53f696e50c43ef96c64183135e25f2c80139cd8a262a83afdec6bf
size 1129784832

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:146d127929c223d9121c82f83a193a3fcd7cdc07a18b86b4c9f5b5c8ee60d323
size 1101178368

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:6511400510949509aa7e681070a715d0f98b4cab84879f8c4e3039d44d0bfb29
size 1174218240

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:601a3fb063880f74e5b1cdf2e1fcd60ca6ad10a071e0cd47d4ed27462a970bc1
size 1068842496

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:58eba5d929289184d8b9e7437c2cf95ec242b3cd5d596d9e6e1971b6668daf56
size 1305945600

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:c74387ce7769893bd559ad5fec977b70731631109dce50ad9028e50f114c36b6
size 1394025984

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b42784a1c0f6afb06a384e51bcca0ef295fd9fa7c53c0eab8b5c42b8cc2a8338
size 1336583680

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:72b720bdb3aa483375016cd7d454057c1cd80ee3f298184ff5e2e001ce5ea15f
size 1493116416

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f3af156efd3b0c4a3a020d963e2ee336af0a075efc8c50efd8964f9f48076de7
size 1834427616

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:34b7469b7bbfe9964d714fc7c5b3909d2d482752f26708a0f438b1a2ce1838e8
size 445053184

276
README.md Normal file
View File

@@ -0,0 +1,276 @@
---
license: apache-2.0
pipeline_tag: image-text-to-text
library_name: transformers
---
# <span style="color: #7FFF7F;">Qwen3-VL-2B-Instruct GGUF Models</span>
## <span style="color: #7F7FFF;">Model Generation Details</span>
This model was generated using [llama.cpp](https://github.com/ggerganov/llama.cpp) at commit [`16724b5b6`](https://github.com/ggerganov/llama.cpp/commit/16724b5b6836a2d4b8936a5824d2ff27c52b4517).
---
<a href="https://readyforquantum.com/huggingface_gguf_selection_guide.html" style="color: #7FFF7F;">
Click here to get info on choosing the right GGUF model format
</a>
---
<!--Begin Original Model Card-->
<a href="https://huggingface.co/spaces/akhaliq/Qwen3-VL-2B-Instruct" target="_blank" style="margin: 2px;">
<img alt="Demo" src="https://img.shields.io/badge/Demo-536af5" style="display: inline-block; vertical-align: middle;"/>
</a>
# Qwen3-VL-2B-Instruct
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date.
This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.
Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoningenhanced Thinking editions for flexible, ondemand deployment.
#### Key Enhancements:
* **Visual Agent**: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks.
* **Visual Coding Boost**: Generates Draw.io/HTML/CSS/JS from images/videos.
* **Advanced Spatial Perception**: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI.
* **Long Context & Video Understanding**: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing.
* **Enhanced Multimodal Reasoning**: Excels in STEM/Math—causal analysis and logical, evidence-based answers.
* **Upgraded Visual Recognition**: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc.
* **Expanded OCR**: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing.
* **Text Understanding on par with pure LLMs**: Seamless textvision fusion for lossless, unified comprehension.
#### Model Architecture Updates:
<p align="center">
<img src="https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3-VL/qwen3vl_arc.jpg" width="80%"/>
<p>
1. **Interleaved-MRoPE**: Fullfrequency allocation over time, width, and height via robust positional embeddings, enhancing longhorizon video reasoning.
2. **DeepStack**: Fuses multilevel ViT features to capture finegrained details and sharpen imagetext alignment.
3. **TextTimestamp Alignment:** Moves beyond TRoPE to precise, timestampgrounded event localization for stronger video temporal modeling.
This is the weight repository for Qwen3-VL-2B-Instruct.
---
## Model Performance
**Multimodal performance**
![](https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3-VL/qwen3vl_2b_32b_vl_instruct.jpg)
**Pure text performance**
![](https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3-VL/qwen3vl_2b_32b_text_instruct.jpg)
## Quickstart
Below, we provide simple examples to show how to use Qwen3-VL with 🤖 ModelScope and 🤗 Transformers.
The code of Qwen3-VL has been in the latest Hugging Face transformers and we advise you to build from source with command:
```
pip install git+https://github.com/huggingface/transformers
# pip install transformers==4.57.0 # currently, V4.57.0 is not released
```
### Using 🤗 Transformers to Chat
Here we show a code snippet to show how to use the chat model with `transformers`:
```python
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
# default: Load the model on the available device(s)
model = Qwen3VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen3-VL-2B-Instruct", dtype="auto", device_map="auto"
)
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
# model = Qwen3VLForConditionalGeneration.from_pretrained(
# "Qwen/Qwen3-VL-2B-Instruct",
# dtype=torch.bfloat16,
# attn_implementation="flash_attention_2",
# device_map="auto",
# )
processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-2B-Instruct")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Describe this image."},
],
}
]
# Preparation for inference
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
)
inputs = inputs.to(model.device)
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
```
### Generation Hyperparameters
#### VL
```bash
export greedy='false'
export top_p=0.8
export top_k=20
export temperature=0.7
export repetition_penalty=1.0
export presence_penalty=1.5
export out_seq_length=16384
```
#### Text
```bash
export greedy='false'
export top_p=1.0
export top_k=40
export repetition_penalty=1.0
export presence_penalty=2.0
export temperature=1.0
export out_seq_length=32768
```
## Citation
If you find our work helpful, feel free to give us a cite.
```
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}
@article{Qwen2.5-VL,
title={Qwen2.5-VL Technical Report},
author={Bai, Shuai and Chen, Keqin and Liu, Xuejing and Wang, Jialin and Ge, Wenbin and Song, Sibo and Dang, Kai and Wang, Peng and Wang, Shijie and Tang, Jun and Zhong, Humen and Zhu, Yuanzhi and Yang, Mingkun and Li, Zhaohai and Wan, Jianqiang and Wang, Pengfei and Ding, Wei and Fu, Zheren and Xu, Yiheng and Ye, Jiabo and Zhang, Xi and Xie, Tianbao and Cheng, Zesen and Zhang, Hang and Yang, Zhibo and Xu, Haiyang and Lin, Junyang},
journal={arXiv preprint arXiv:2502.13923},
year={2025}
}
@article{Qwen2VL,
title={Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution},
author={Wang, Peng and Bai, Shuai and Tan, Sinan and Wang, Shijie and Fan, Zhihao and Bai, Jinze and Chen, Keqin and Liu, Xuejing and Wang, Jialin and Ge, Wenbin and Fan, Yang and Dang, Kai and Du, Mengfei and Ren, Xuancheng and Men, Rui and Liu, Dayiheng and Zhou, Chang and Zhou, Jingren and Lin, Junyang},
journal={arXiv preprint arXiv:2409.12191},
year={2024}
}
@article{Qwen-VL,
title={Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond},
author={Bai, Jinze and Bai, Shuai and Yang, Shusheng and Wang, Shijie and Tan, Sinan and Wang, Peng and Lin, Junyang and Zhou, Chang and Zhou, Jingren},
journal={arXiv preprint arXiv:2308.12966},
year={2023}
}
```
<!--End Original Model Card-->
---
# <span id="testllm" style="color: #7F7FFF;">🚀 If you find these models useful</span>
Help me test my **AI-Powered Quantum Network Monitor Assistant** with **quantum-ready security checks**:
👉 [Quantum Network Monitor](https://readyforquantum.com/?assistant=open&utm_source=huggingface&utm_medium=referral&utm_campaign=huggingface_repo_readme)
The full Open Source Code for the Quantum Network Monitor Service available at my github repos ( repos with NetworkMonitor in the name) : [Source Code Quantum Network Monitor](https://github.com/Mungert69). You will also find the code I use to quantize the models if you want to do it yourself [GGUFModelBuilder](https://github.com/Mungert69/GGUFModelBuilder)
💬 **How to test**:
Choose an **AI assistant type**:
- `TurboLLM` (GPT-4.1-mini)
- `HugLLM` (Hugginface Open-source models)
- `TestLLM` (Experimental CPU-only)
### **What Im Testing**
Im pushing the limits of **small open-source models for AI network monitoring**, specifically:
- **Function calling** against live network services
- **How small can a model go** while still handling:
- Automated **Nmap security scans**
- **Quantum-readiness checks**
- **Network Monitoring tasks**
🟡 **TestLLM** Current experimental model (llama.cpp on 2 CPU threads on huggingface docker space):
-**Zero-configuration setup**
- ⏳ 30s load time (slow inference but **no API costs**) . No token limited as the cost is low.
- 🔧 **Help wanted!** If youre into **edge-device AI**, lets collaborate!
### **Other Assistants**
🟢 **TurboLLM** Uses **gpt-4.1-mini** :
- **It performs very well but unfortunatly OpenAI charges per token. For this reason tokens usage is limited.
- **Create custom cmd processors to run .net code on Quantum Network Monitor Agents**
- **Real-time network diagnostics and monitoring**
- **Security Audits**
- **Penetration testing** (Nmap/Metasploit)
🔵 **HugLLM** Latest Open-source models:
- 🌐 Runs on Hugging Face Inference API. Performs pretty well using the lastest models hosted on Novita.
### 💡 **Example commands you could test**:
1. `"Give me info on my websites SSL certificate"`
2. `"Check if my server is using quantum safe encyption for communication"`
3. `"Run a comprehensive security audit on my server"`
4. '"Create a cmd processor to .. (what ever you want)" Note you need to install a [Quantum Network Monitor Agent](https://readyforquantum.com/Download/?utm_source=huggingface&utm_medium=referral&utm_campaign=huggingface_repo_readme) to run the .net code on. This is a very flexible and powerful feature. Use with caution!
### Final Word
I fund the servers used to create these model files, run the Quantum Network Monitor service, and pay for inference from Novita and OpenAI—all out of my own pocket. All the code behind the model creation and the Quantum Network Monitor project is [open source](https://github.com/Mungert69). Feel free to use whatever you find helpful.
If you appreciate the work, please consider [buying me a coffee](https://www.buymeacoffee.com/mahadeva) ☕. Your support helps cover service costs and allows me to raise token limits for everyone.
I'm also open to job opportunities or sponsorship.
Thank you! 😊