[doc][main] Correct mistakes in doc (#4945)

### What this PR does / why we need it?
Correct mistakes in doc

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
This commit is contained in:
lilinsiman
2025-12-12 19:17:10 +08:00
committed by GitHub
parent f708d919f8
commit fc818f1509
9 changed files with 18 additions and 28 deletions

View File

@@ -70,13 +70,13 @@ msgstr "运行 docker 容器,在单个 NPU 上启动 vLLM 服务器:"
#: ../../tutorials/single_npu_multimodal.md:154
msgid ""
"Add `--max_model_len` option to avoid ValueError that the "
"Qwen2.5-VL-7B-Instruct model's max seq len (128000) is larger than the "
"Qwen2.5-VL-7B-Instruct model's max_model_len (128000) is larger than the "
"maximum number of tokens that can be stored in KV cache. This will differ "
"with different NPU series base on the HBM size. Please modify the value "
"according to a suitable value for your NPU series."
msgstr ""
"新增 `--max_model_len` 选项,以避免出现 ValueError即 Qwen2.5-VL-7B-Instruct "
"模型的最大序列长度128000大于 KV 缓存可存储的最大 token 数。该数值会根据不同 NPU 系列的 HBM 大小而不同。请根据你的 NPU"
"模型的最大模型长度128000大于 KV 缓存可存储的最大 token 数。该数值会根据不同 NPU 系列的 HBM 大小而不同。请根据你的 NPU"
" 系列,将该值设置为合适的数值。"
#: ../../tutorials/single_npu_multimodal.md:157