Fix and Clean up chat-template requirement for VLM (#6114)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-05-10 09:14:09 -07:00
parent c178abdabc
commit 9d8ec2e67e
16 changed files with 104 additions and 195 deletions
--- a/benchmark/mmmu/README.md
+++ b/benchmark/mmmu/README.md
@@ -5,7 +5,7 @@
 Host the VLM:

 ```
-python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --chat-template qwen2-vl --port 30000
+python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --port 30000
 ```

 It's recommended to reduce the memory usage by appending something like `--mem-fraction-static 0.6` to the command above.
--- a/benchmark/mmmu/bench_sglang.py
+++ b/benchmark/mmmu/bench_sglang.py
@@ -2,7 +2,7 @@
 Bench the sglang-hosted vLM with benchmark MMMU

 Usage:
-    Host the VLM: python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --chat-template qwen2-vl --port 30000
+    Host the VLM: python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --port 30000

    Benchmark: python benchmark/mmmu/bench_sglang.py --port 30000 --concurrency 16