Docs: Remove deprecated --task parameter for embedding models (#5257)
Fixes #3376
- Remove --task embed from vllm serve command in Qwen3_embedding.md
- Remove task='embed' parameter from LLM constructor in Python example
The --task parameter has been deprecated in recent vLLM versions
in favor of automatic model type detection.
- vLLM version: release/v0.13.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: hu-qi <huqi1024@gmail.com>
This commit is contained in:
@@ -30,7 +30,7 @@ Using the Qwen3-Embedding-8B model as an example, first run the docker container
|
|||||||
### Online Inference
|
### Online Inference
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
vllm serve Qwen/Qwen3-Embedding-8B --task embed --host 127.0.0.1 --port 8888
|
vllm serve Qwen/Qwen3-Embedding-8B --runner pooling --host 127.0.0.1 --port 8888
|
||||||
```
|
```
|
||||||
|
|
||||||
Once your server is started, you can query the model with input prompts.
|
Once your server is started, you can query the model with input prompts.
|
||||||
@@ -71,7 +71,6 @@ if __name__=="__main__":
|
|||||||
input_texts = queries + documents
|
input_texts = queries + documents
|
||||||
|
|
||||||
model = LLM(model="Qwen/Qwen3-Embedding-8B",
|
model = LLM(model="Qwen/Qwen3-Embedding-8B",
|
||||||
task="embed",
|
|
||||||
distributed_executor_backend="mp")
|
distributed_executor_backend="mp")
|
||||||
|
|
||||||
outputs = model.embed(input_texts)
|
outputs = model.embed(input_texts)
|
||||||
|
|||||||
Reference in New Issue
Block a user