[Doc] Add qwen3 embedding 8b guide (#1734)

1. Add the tutorials for qwen3-embedding-8b
2. Remove VLLM_USE_V1=1  in docs, it's useless any more from 0.9.2


- vLLM version: v0.9.2
- vLLM main:
5923ab9524

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
wangxiyuan
2025-07-11 17:40:17 +08:00
committed by GitHub
parent 9c560b009a
commit b5b7e0ecc7
12 changed files with 106 additions and 31 deletions

View File

@@ -24,8 +24,6 @@ import os
from vllm import LLM
os.environ["VLLM_USE_V1"] = "1"
model = LLM(model="Qwen/Qwen2-7B-Instruct")
outputs = model.generate("Hello, how are you?")
```
@@ -46,8 +44,6 @@ offline example:
import os
from vllm import LLM
os.environ["VLLM_USE_V1"] = "1"
# TorchAirGraph is only work without chunked-prefill now
model = LLM(model="deepseek-ai/DeepSeek-R1-0528", additional_config={"torchair_graph_config": {"enabled": True},"ascend_scheduler_config": {"enabled": True,}})
outputs = model.generate("Hello, how are you?")
@@ -71,8 +67,6 @@ offline example:
import os
from vllm import LLM
os.environ["VLLM_USE_V1"] = "1"
model = LLM(model="someother_model_weight", enforce_eager=True)
outputs = model.generate("Hello, how are you?")
```

View File

@@ -40,7 +40,6 @@ The following is a simple example of how to use sleep mode.
from vllm.utils import GiB_bytes
os.environ["VLLM_USE_V1"] = "1"
os.environ["VLLM_USE_MODELSCOPE"] = "True"
os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn"
@@ -77,7 +76,6 @@ The following is a simple example of how to use sleep mode.
```bash
export VLLM_SERVER_DEV_MODE="1"
export VLLM_USE_V1="1"
export VLLM_WORKER_MULTIPROC_METHOD="spawn"
export VLLM_USE_MODELSCOPE="True"