Fix and Clean up chat-template requirement for VLM (#6114)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
This commit is contained in:
XinyuanTong
2025-05-10 09:14:09 -07:00
committed by GitHub
parent c178abdabc
commit 9d8ec2e67e
16 changed files with 104 additions and 195 deletions

View File

@@ -5,7 +5,7 @@
Host the VLM:
```
python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --chat-template qwen2-vl --port 30000
python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --port 30000
```
It's recommended to reduce the memory usage by appending something like `--mem-fraction-static 0.6` to the command above.

View File

@@ -2,7 +2,7 @@
Bench the sglang-hosted vLM with benchmark MMMU
Usage:
Host the VLM: python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --chat-template qwen2-vl --port 30000
Host the VLM: python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --port 30000
Benchmark: python benchmark/mmmu/bench_sglang.py --port 30000 --concurrency 16