[doc][main] Correct mistakes in doc (#4945)
### What this PR does / why we need it?
Correct mistakes in doc
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: lilinsiman <lilinsiman@gmail.com>
This commit is contained in:
@@ -70,13 +70,13 @@ msgstr "运行 docker 容器,在单个 NPU 上启动 vLLM 服务器:"
|
||||
#: ../../tutorials/single_npu_multimodal.md:154
|
||||
msgid ""
|
||||
"Add `--max_model_len` option to avoid ValueError that the "
|
||||
"Qwen2.5-VL-7B-Instruct model's max seq len (128000) is larger than the "
|
||||
"Qwen2.5-VL-7B-Instruct model's max_model_len (128000) is larger than the "
|
||||
"maximum number of tokens that can be stored in KV cache. This will differ "
|
||||
"with different NPU series base on the HBM size. Please modify the value "
|
||||
"according to a suitable value for your NPU series."
|
||||
msgstr ""
|
||||
"新增 `--max_model_len` 选项,以避免出现 ValueError,即 Qwen2.5-VL-7B-Instruct "
|
||||
"模型的最大序列长度(128000)大于 KV 缓存可存储的最大 token 数。该数值会根据不同 NPU 系列的 HBM 大小而不同。请根据你的 NPU"
|
||||
"模型的最大模型长度(128000)大于 KV 缓存可存储的最大 token 数。该数值会根据不同 NPU 系列的 HBM 大小而不同。请根据你的 NPU"
|
||||
" 系列,将该值设置为合适的数值。"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:157
|
||||
|
||||
Reference in New Issue
Block a user