[Doc] Add qwen3 embedding 8b guide (#1734)

1. Add the tutorials for qwen3-embedding-8b
2. Remove VLLM_USE_V1=1  in docs, it's useless any more from 0.9.2


- vLLM version: v0.9.2
- vLLM main:
5923ab9524

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
This commit is contained in:
wangxiyuan
2025-07-11 17:40:17 +08:00
committed by GitHub
parent 9c560b009a
commit b5b7e0ecc7
12 changed files with 106 additions and 31 deletions

View File

@@ -60,7 +60,6 @@ Run the following command to start the vLLM server:
```{code-block} bash
:substitutions:
export VLLM_USE_V1=1
vllm serve Qwen/Qwen3-0.6B \
--tensor-parallel-size 1 \
--enforce-eager \
@@ -90,7 +89,6 @@ Run the following command to start the vLLM server:
```{code-block} bash
:substitutions:
export VLLM_USE_V1=1
vllm serve Qwen/Qwen2.5-7B-Instruct \
--tensor-parallel-size 2 \
--enforce-eager \
@@ -129,7 +127,7 @@ Run the following command to start the vLLM server:
```{code-block} bash
:substitutions:
VLLM_USE_V1=1 vllm serve /home/pangu-pro-moe-mode/ \
vllm serve /home/pangu-pro-moe-mode/ \
--tensor-parallel-size 4 \
--enable-expert-parallel \
--dtype "float16" \
@@ -321,7 +319,7 @@ if __name__ == "__main__":
Run script:
```bash
VLLM_USE_V1=1 python example.py
python example.py
```
If you run this script successfully, you can see the info shown below: