Fix and Clean up chat-template requirement for VLM (#6114)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
This commit is contained in:
@@ -27,11 +27,7 @@
|
||||
"source": [
|
||||
"## Launch A Server\n",
|
||||
"\n",
|
||||
"Launch the server in your terminal and wait for it to initialize.\n",
|
||||
"\n",
|
||||
"**Remember to add** `--chat-template` **for example** `--chat-template=qwen2-vl` **to specify the [vision chat template](https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template), otherwise, the server will only support text (images won’t be passed in), which can lead to degraded performance.**\n",
|
||||
"\n",
|
||||
"We need to specify `--chat-template` for vision language models because the chat template provided in Hugging Face tokenizer only supports text."
|
||||
"Launch the server in your terminal and wait for it to initialize."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -51,8 +47,7 @@
|
||||
"\n",
|
||||
"vision_process, port = launch_server_cmd(\n",
|
||||
" \"\"\"\n",
|
||||
"python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct \\\n",
|
||||
" --chat-template=qwen2-vl\n",
|
||||
"python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct\n",
|
||||
"\"\"\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
@@ -250,27 +245,6 @@
|
||||
"source": [
|
||||
"terminate_process(vision_process)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Chat Template\n",
|
||||
"\n",
|
||||
"As mentioned before, if you do not specify a vision model's `--chat-template`, the server uses Hugging Face's default template, which only supports text.\n",
|
||||
"\n",
|
||||
"We list popular vision models with their chat templates:\n",
|
||||
"\n",
|
||||
"- [meta-llama/Llama-3.2-Vision](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) uses `llama_3_vision`.\n",
|
||||
"- [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) uses `qwen2-vl`.\n",
|
||||
"- [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) uses `gemma-it`.\n",
|
||||
"- [openbmb/MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V) uses `minicpmv`.\n",
|
||||
"- [deepseek-ai/deepseek-vl2](https://huggingface.co/deepseek-ai/deepseek-vl2) uses `deepseek-vl2`.\n",
|
||||
"- [LlaVA-OneVision](https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov) uses `chatml-llava`.\n",
|
||||
"- [LLaVA-NeXT](https://huggingface.co/collections/lmms-lab/llava-next-6623288e2d61edba3ddbf5ff) uses `chatml-llava`.\n",
|
||||
"- [Llama3-LLaVA-NeXT](https://huggingface.co/lmms-lab/llama3-llava-next-8b) uses `llava_llama_3`.\n",
|
||||
"- [LLaVA-v1.5 / 1.6](https://huggingface.co/liuhaotian/llava-v1.6-34b) uses `vicuna_v1.1`."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -136,7 +136,7 @@ Detailed example in [openai compatible api](https://docs.sglang.ai/backend/opena
|
||||
Launch a server:
|
||||
|
||||
```bash
|
||||
python3 -m sglang.launch_server --model-path lmms-lab/llava-onevision-qwen2-7b-ov --chat-template chatml-llava
|
||||
python3 -m sglang.launch_server --model-path lmms-lab/llava-onevision-qwen2-7b-ov
|
||||
```
|
||||
|
||||
Download an image:
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
SGLang provides robust support for embedding models by integrating efficient serving mechanisms with its flexible programming interface. This integration allows for streamlined handling of embedding tasks, facilitating faster and more accurate retrieval and semantic search operations. SGLang's architecture enables better resource utilization and reduced latency in embedding model deployment.
|
||||
|
||||
```{important}
|
||||
They are executed with `--is-embedding` and some may require `--trust-remote-code` and/or `--chat-template`
|
||||
They are executed with `--is-embedding` and some may require `--trust-remote-code`
|
||||
```
|
||||
|
||||
## Example launch Command
|
||||
@@ -13,7 +13,6 @@ python3 -m sglang.launch_server \
|
||||
--model-path Alibaba-NLP/gme-Qwen2-VL-2B-Instruct \ # example HF/local path
|
||||
--is-embedding \
|
||||
--host 0.0.0.0 \
|
||||
--chat-template gme-qwen2-vl \ # set chat template
|
||||
--port 30000 \
|
||||
```
|
||||
|
||||
|
||||
@@ -2,16 +2,11 @@
|
||||
|
||||
These models accept multi-modal inputs (e.g., images and text) and generate text output. They augment language models with visual encoders and require a specific chat template for handling vision prompts.
|
||||
|
||||
```{important}
|
||||
We need to specify `--chat-template` for VLMs because the chat template provided in HuggingFace tokenizer only supports text. If you do not specify a vision model’s `--chat-template`, the server uses HuggingFace’s default template, which only supports text and the images won’t be passed in.
|
||||
```
|
||||
|
||||
## Example launch Command
|
||||
|
||||
```shell
|
||||
python3 -m sglang.launch_server \
|
||||
--model-path meta-llama/Llama-3.2-11B-Vision-Instruct \ # example HF/local path
|
||||
--chat-template llama_3_vision \ # required chat template
|
||||
--host 0.0.0.0 \
|
||||
--port 30000 \
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user