Fix and Clean up chat-template requirement for VLM (#6114)

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
This commit is contained in:
XinyuanTong
2025-05-10 09:14:09 -07:00
committed by GitHub
parent c178abdabc
commit 9d8ec2e67e
16 changed files with 104 additions and 195 deletions

View File

@@ -2,16 +2,11 @@
These models accept multi-modal inputs (e.g., images and text) and generate text output. They augment language models with visual encoders and require a specific chat template for handling vision prompts.
```{important}
We need to specify `--chat-template` for VLMs because the chat template provided in HuggingFace tokenizer only supports text. If you do not specify a vision models `--chat-template`, the server uses HuggingFaces default template, which only supports text and the images wont be passed in.
```
## Example launch Command
```shell
python3 -m sglang.launch_server \
--model-path meta-llama/Llama-3.2-11B-Vision-Instruct \ # example HF/local path
--chat-template llama_3_vision \ # required chat template
--host 0.0.0.0 \
--port 30000 \
```