Update README.md

This commit is contained in:
Cherrytest
2025-02-17 10:01:32 +00:00
parent 406085e2f9
commit 0421c94133

View File

@@ -1,4 +1,3 @@
--- ---
license: apache-2.0 license: apache-2.0
language: language:
@@ -7,6 +6,8 @@ pipeline_tag: image-text-to-text
tags: tags:
- multimodal - multimodal
library_name: transformers library_name: transformers
base_model:
- Qwen/Qwen2.5-VL-7B-Instruct
--- ---
# Qwen2.5-VL-7B-Instruct-AWQ # Qwen2.5-VL-7B-Instruct-AWQ
@@ -98,25 +99,25 @@ from qwen_vl_utils import process_vision_info
# default: Load the model on the available device(s) # default: Load the model on the available device(s)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained( model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct", torch_dtype="auto", device_map="auto" "Qwen/Qwen2.5-VL-7B-Instruct-AWQ", torch_dtype="auto", device_map="auto"
) )
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios. # We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
# model = Qwen2_5_VLForConditionalGeneration.from_pretrained( # model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
# "Qwen/Qwen2.5-VL-7B-Instruct", # "Qwen/Qwen2.5-VL-7B-Instruct-AWQ",
# torch_dtype=torch.bfloat16, # torch_dtype=torch.bfloat16,
# attn_implementation="flash_attention_2", # attn_implementation="flash_attention_2",
# device_map="auto", # device_map="auto",
# ) # )
# default processer # default processer
processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct-AWQ")
# The default range for the number of visual tokens per image in the model is 4-16384. # The default range for the number of visual tokens per image in the model is 4-16384.
# You can set min_pixels and max_pixels according to your needs, such as a token range of 256-1280, to balance performance and cost. # You can set min_pixels and max_pixels according to your needs, such as a token range of 256-1280, to balance performance and cost.
# min_pixels = 256*28*28 # min_pixels = 256*28*28
# max_pixels = 1280*28*28 # max_pixels = 1280*28*28
# processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels) # processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct-AWQ", min_pixels=min_pixels, max_pixels=max_pixels)
messages = [ messages = [
{ {
@@ -206,7 +207,7 @@ The model supports a wide range of resolution inputs. By default, it uses the na
min_pixels = 256 * 28 * 28 min_pixels = 256 * 28 * 28
max_pixels = 1280 * 28 * 28 max_pixels = 1280 * 28 * 28
processor = AutoProcessor.from_pretrained( processor = AutoProcessor.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct", min_pixels=min_pixels, max_pixels=max_pixels "Qwen/Qwen2.5-VL-7B-Instruct-AWQ", min_pixels=min_pixels, max_pixels=max_pixels
) )
``` ```
@@ -273,6 +274,26 @@ However, it should be noted that this method has a significant impact on the per
At the same time, for long video inputs, since MRoPE itself is more economical with ids, the max_position_embeddings can be directly modified to a larger value, such as 64k. At the same time, for long video inputs, since MRoPE itself is more economical with ids, the max_position_embeddings can be directly modified to a larger value, such as 64k.
### Benchmark
#### Performance of Quantized Models
This section reports the generation performance of quantized models (including GPTQ and AWQ) of the Qwen2.5-VL series. Specifically, we report:
- MMMU_VAL (Accuracy)
- DocVQA_VAL (Accuracy)
- MMBench_DEV_EN (Accuracy)
- MathVista_MINI (Accuracy)
We use [VLMEvalkit](https://github.com/open-compass/VLMEvalKit) to evaluate all models.
| Model Size | Quantization | MMMU_VAL | DocVQA_VAL | MMBench_EDV_EN | MathVista_MINI |
| --- | --- | --- | --- | --- | --- |
| Qwen2.5-VL-72B-Instruct | BF16<br><sup>([🤗](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2.5-VL-72B-Instruct)) | 70.0 | 96.1 | 88.2 | 75.3 |
| | AWQ<br><sup>([🤗](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct-AWQ)[🤖](https://modelscope.cn/models/qwen/Qwen2.5-VL-72B-Instruct-AWQ)) | 69.1 | 96.0 | 87.9 | 73.8 |
| Qwen2.5-VL-7B-Instruct | BF16<br><sup>([🤗](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2.5-VL-7B-Instruct)) | 58.4 | 94.9 | 84.1 | 67.9 |
| | AWQ<br><sup>([🤗](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct-AWQ)[🤖](https://modelscope.cn/models/qwen/Qwen2.5-VL-7B-Instruct-AWQ)) | 55.6 | 94.6 | 84.2 | 64.7 |
| Qwen2.5-VL-3B-Instruct | BF16<br><sup>([🤗](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)[🤖](https://modelscope.cn/models/qwen/Qwen2.5-VL-3B-Instruct)) | 51.7 | 93.0 | 79.8 | 61.4 |
| | AWQ<br><sup>([🤗](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct-AWQ)[🤖](https://modelscope.cn/models/qwen/Qwen2.5-VL-3B-Instruct-AWQ)) | 49.1 | 91.8 | 78.0 | 58.8 |
## Citation ## Citation