[doc]update --max-num-seqs in Qwen3-235b tutorial (#6197)
### What this PR does / why we need it?
This pr update --max-num-seqs in Qwen3-235b single-node-deployment
tutorial to ensure running into graph mode correctly.
- vLLM version: v0.14.0
- vLLM main:
d68209402d
Signed-off-by: Angazenn <supperccell@163.com>
This commit is contained in:
@@ -112,7 +112,7 @@ vllm serve vllm-ascend/Qwen3-235B-A22B-w8a8 \
|
||||
--seed 1024 \
|
||||
--quantization ascend \
|
||||
--served-model-name qwen3 \
|
||||
--max-num-seqs 4 \
|
||||
--max-num-seqs 32 \
|
||||
--max-model-len 133000 \
|
||||
--max-num-batched-tokens 8096 \
|
||||
--enable-expert-parallel \
|
||||
|
||||
Reference in New Issue
Block a user