[doc]update --max-num-seqs in Qwen3-235b tutorial (#6197)

### What this PR does / why we need it?
This pr update --max-num-seqs in Qwen3-235b single-node-deployment
tutorial to ensure running into graph mode correctly.

- vLLM version: v0.14.0
- vLLM main:
d68209402d

Signed-off-by: Angazenn <supperccell@163.com>
This commit is contained in:
Angazenn
2026-01-23 17:11:10 +08:00
committed by GitHub
parent af4dbb6b26
commit 1e116829ac

View File

@@ -112,7 +112,7 @@ vllm serve vllm-ascend/Qwen3-235B-A22B-w8a8 \
--seed 1024 \ --seed 1024 \
--quantization ascend \ --quantization ascend \
--served-model-name qwen3 \ --served-model-name qwen3 \
--max-num-seqs 4 \ --max-num-seqs 32 \
--max-model-len 133000 \ --max-model-len 133000 \
--max-num-batched-tokens 8096 \ --max-num-batched-tokens 8096 \
--enable-expert-parallel \ --enable-expert-parallel \