[Doc] Add qwen3 embedding 8b guide (#1734)

1. Add the tutorials for qwen3-embedding-8b 2. Remove VLLM_USE_V1=1 in docs, it's useless any more from 0.9.2 - vLLM version: v0.9.2 - vLLM main: 5923ab9524 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-11 17:40:17 +08:00
parent 9c560b009a
commit b5b7e0ecc7
12 changed files with 106 additions and 31 deletions
--- a/docs/source/tutorials/multi_npu_qwen3_moe.md
+++ b/docs/source/tutorials/multi_npu_qwen3_moe.md
@@ -35,9 +35,6 @@ export VLLM_USE_MODELSCOPE=True

 # Set `max_split_size_mb` to reduce memory fragmentation and avoid out of memory
 export PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256
-
-# For vllm-ascend 0.9.2+, the V1 engine is enabled by default and no longer needs to be explicitly specified.
-export VLLM_USE_V1=1
 ```

 ### Online Inference on Multi-NPU