From c85cc045f893293e3b44e24d2e1f01ddc5849ea8 Mon Sep 17 00:00:00 2001 From: huqi Date: Tue, 30 Dec 2025 16:09:07 +0800 Subject: [PATCH] Docs: Remove deprecated --task parameter for embedding models (#5257) Fixes #3376 - Remove --task embed from vllm serve command in Qwen3_embedding.md - Remove task='embed' parameter from LLM constructor in Python example The --task parameter has been deprecated in recent vLLM versions in favor of automatic model type detection. - vLLM version: release/v0.13.0 - vLLM main: https://github.com/vllm-project/vllm/commit/ad32e3e19ccf0526cb6744a5fed09a138a5fb2f9 --------- Signed-off-by: hu-qi --- docs/source/tutorials/Qwen3_embedding.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/source/tutorials/Qwen3_embedding.md b/docs/source/tutorials/Qwen3_embedding.md index 5aca58e8..667c1de6 100644 --- a/docs/source/tutorials/Qwen3_embedding.md +++ b/docs/source/tutorials/Qwen3_embedding.md @@ -30,7 +30,7 @@ Using the Qwen3-Embedding-8B model as an example, first run the docker container ### Online Inference ```bash -vllm serve Qwen/Qwen3-Embedding-8B --task embed --host 127.0.0.1 --port 8888 +vllm serve Qwen/Qwen3-Embedding-8B --runner pooling --host 127.0.0.1 --port 8888 ``` Once your server is started, you can query the model with input prompts. @@ -71,7 +71,6 @@ if __name__=="__main__": input_texts = queries + documents model = LLM(model="Qwen/Qwen3-Embedding-8B", - task="embed", distributed_executor_backend="mp") outputs = model.embed(input_texts)