Files
xc-llm-ascend/docs/source/locale/zh_CN/LC_MESSAGES/faqs.po
herizhen ff76c6780e [releases/v0.18.0][Doc][Misc] Modifying Configuration Parameters (#8618)
### What this PR does / why we need it?
This PR renames the environment variable VLLM_NIXL_ABORT_REQUEST_TIMEOUT
to VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT to align with the Mooncake
connector naming convention. It also updates the documentation and test
configurations to reflect this change and adjusts the suggested timeout
value in the documentation to 480 seconds for consistency.

### Does this PR introduce _any_ user-facing change?
Yes. The environment variable for configuring the abort request timeout
has been renamed. Users should update their environment settings from
VLLM_NIXL_ABORT_REQUEST_TIMEOUT to VLLM_MOONCAKE_ABORT_REQUEST_TIMEOUT.

### How was this patch tested?
The changes were verified by updating the corresponding test
configuration files and ensuring consistency across the documentation.

---------

Signed-off-by: herizhen <1270637059@qq.com>
Signed-off-by: herizhen <59841270+herizhen@users.noreply.github.com>
2026-04-23 16:23:31 +08:00

832 lines
37 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.18.0\n"
#: ../../source/faqs.md:1
msgid "FAQs"
msgstr "常见问题解答"
#: ../../source/faqs.md:3
msgid "Version Specific FAQs"
msgstr "版本特定常见问题"
#: ../../source/faqs.md:5
msgid ""
"[[v0.17.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-"
"ascend/issues/7173)"
msgstr ""
"[[v0.17.0rc1] 常见问题与反馈](https://github.com/vllm-project/vllm-"
"ascend/issues/7173)"
#: ../../source/faqs.md:6
msgid ""
"[[v0.13.0] FAQ & Feedback](https://github.com/vllm-project/vllm-"
"ascend/issues/6583)"
msgstr ""
"[[v0.13.0] 常见问题与反馈](https://github.com/vllm-project/vllm-"
"ascend/issues/6583)"
#: ../../source/faqs.md:8
msgid "General FAQs"
msgstr "通用常见问题"
#: ../../source/faqs.md:10
msgid "1. What devices are currently supported?"
msgstr "1.目前支持哪些设备?"
#: ../../source/faqs.md:12
msgid ""
"Currently, **ONLY** Atlas A2 series (Ascend-cann-kernels-910b), Atlas A3 "
"series (Atlas-A3-cann-kernels) and Atlas 300I (Ascend-cann-kernels-310p) "
"series are supported:"
msgstr ""
"目前,**仅**支持 Atlas A2 系列Ascend-cann-kernels-910b、Atlas A3 系列Atlas-A3"
"-cann-kernels和 Atlas 300IAscend-cann-kernels-310p系列"
#: ../../source/faqs.md:14
msgid ""
"Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 "
"Box16, Atlas 300T A2)"
msgstr ""
"Atlas A2 训练系列Atlas 800T A2、Atlas 900 A2 PoD、Atlas 200T A2 Box16、Atlas "
"300T A2"
#: ../../source/faqs.md:15
msgid "Atlas 800I A2 Inference series (Atlas 800I A2)"
msgstr "Atlas 800I A2 推理系列Atlas 800I A2"
#: ../../source/faqs.md:16
msgid ""
"Atlas A3 Training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas "
"9000 A3 SuperPoD)"
msgstr "Atlas A3 训练系列Atlas 800T A3、Atlas 900 A3 SuperPoD、Atlas 9000 A3 SuperPoD"
#: ../../source/faqs.md:17
msgid "Atlas 800I A3 Inference series (Atlas 800I A3)"
msgstr "Atlas 800I A3 推理系列Atlas 800I A3"
#: ../../source/faqs.md:18
msgid "[Experimental] Atlas 300I Inference series (Atlas 300I Duo)."
msgstr "[实验性] Atlas 300I 推理系列Atlas 300I Duo。"
#: ../../source/faqs.md:19
msgid ""
"[Experimental] Currently for 310I Duo the stable version is vllm-ascend "
"v0.10.0rc1."
msgstr "[实验性] 目前对于 310I Duo稳定版本是 vllm-ascend v0.10.0rc1。"
#: ../../source/faqs.md:21
msgid "Below series are NOT supported yet:"
msgstr "以下系列目前尚不支持:"
#: ../../source/faqs.md:23
msgid "Atlas 200I A2 (Ascend-cann-kernels-310b) unplanned yet"
msgstr "Atlas 200I A2Ascend-cann-kernels-310b尚未计划支持"
#: ../../source/faqs.md:24
msgid "Ascend 910, Ascend 910 Pro B (Ascend-cann-kernels-910) unplanned yet"
msgstr "Ascend 910、Ascend 910 Pro BAscend-cann-kernels-910尚未计划支持"
#: ../../source/faqs.md:26
msgid ""
"From a technical view, vllm-ascend supports devices if torch-npu is "
"supported. Otherwise, we have to implement it by using custom ops. We "
"also welcome you to join us to improve together."
msgstr ""
"从技术角度看,如果 torch-npu 支持某设备,则 vllm-ascend "
"也支持该设备。否则,我们需要通过自定义算子来实现。我们也欢迎您加入我们,共同改进。"
#: ../../source/faqs.md:28
msgid "2. How to get our docker containers?"
msgstr "2.如何获取我们的 Docker 容器?"
#: ../../source/faqs.md:30
msgid ""
"You can get our containers at `Quay.io`, e.g., [<u>vllm-"
"ascend</u>](https://quay.io/repository/ascend/vllm-ascend?tab=tags) and "
"[<u>cann</u>](https://quay.io/repository/ascend/cann?tab=tags)."
msgstr ""
"您可以在 `Quay.io` 获取我们的容器,例如:[<u>vllm-"
"ascend</u>](https://quay.io/repository/ascend/vllm-ascend?tab=tags) 和 "
"[<u>cann</u>](https://quay.io/repository/ascend/cann?tab=tags)。"
#: ../../source/faqs.md:32
msgid ""
"If you are in China, you can use `daocloud` or some other mirror sites to"
" accelerate your downloading:"
msgstr "如果您在中国,可以使用 `daocloud` 或其他镜像站点来加速下载:"
#: ../../source/faqs.md:42
msgid "Load Docker Images for offline environment"
msgstr "为离线环境加载 Docker 镜像"
#: ../../source/faqs.md:44
msgid ""
"If you want to use container image for offline environments (no internet "
"connection), you need to download container image in an environment with "
"internet access:"
msgstr "如果您想在离线环境(无互联网连接)中使用容器镜像,您需要在有互联网访问权限的环境中下载容器镜像:"
#: ../../source/faqs.md:46
msgid "**Exporting Docker images:**"
msgstr "**导出 Docker 镜像:**"
#: ../../source/faqs.md:58
msgid "**Importing Docker images in environment without internet access:**"
msgstr "**在无互联网访问权限的环境中导入 Docker 镜像:**"
#: ../../source/faqs.md:70
msgid "3. What models does vllm-ascend supports?"
msgstr "3.vllm-ascend 支持哪些模型?"
#: ../../source/faqs.md:72
msgid ""
"Find more details "
"[<u>here</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html)."
msgstr "更多详细信息请参见[<u>此处</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html)。"
#: ../../source/faqs.md:74
msgid "4. How to get in touch with our community?"
msgstr "4.如何与我们的社区取得联系?"
#: ../../source/faqs.md:76
msgid ""
"There are many channels that you can communicate with our community "
"developers / users:"
msgstr "您可以通过多种渠道与我们的社区开发者/用户进行交流:"
#: ../../source/faqs.md:78
msgid ""
"Submit a GitHub [<u>issue</u>](https://github.com/vllm-project/vllm-"
"ascend/issues?page=1)."
msgstr ""
"提交一个 GitHub [<u>issue</u>](https://github.com/vllm-project/vllm-"
"ascend/issues?page=1)。"
#: ../../source/faqs.md:79
msgid ""
"Join our [<u>weekly "
"meeting</u>](https://docs.google.com/document/d/1hCSzRTMZhIB8vRq1_qOOjx4c9uYUxvdQvDsMV2JcSrw/edit?tab=t.0#heading=h.911qu8j8h35z)"
" and share your ideas."
msgstr "参加我们的[<u>每周例会</u>](https://docs.google.com/document/d/1hCSzRTMZhIB8vRq1_qOOjx4c9uYUxvdQvDsMV2JcSrw/edit?tab=t.0#heading=h.911qu8j8h35z)并分享您的想法。"
#: ../../source/faqs.md:80
msgid ""
"Join our [<u>WeChat</u>](https://github.com/vllm-project/vllm-"
"ascend/issues/227) group and ask your questions."
msgstr ""
"加入我们的[<u>微信群</u>](https://github.com/vllm-project/vllm-"
"ascend/issues/227)并提出您的问题。"
#: ../../source/faqs.md:81
msgid ""
"Join our ascend channel in [<u>vLLM forums</u>](https://discuss.vllm.ai/c"
"/hardware-support/vllm-ascend-support/6) and publish your topics."
msgstr ""
"加入我们在 [<u>vLLM 论坛</u>](https://discuss.vllm.ai/c/hardware-support/vllm-"
"ascend-support/6) 的 ascend 频道并发布您的主题。"
#: ../../source/faqs.md:83
msgid "5. What features does vllm-ascend V1 supports?"
msgstr "5.vllm-ascend V1 支持哪些功能?"
#: ../../source/faqs.md:85
msgid ""
"Find more details "
"[<u>here</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)."
msgstr "更多详细信息请参见[<u>此处</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)。"
#: ../../source/faqs.md:87
msgid ""
"6. How to solve the problem of \"Failed to infer device type\" or "
"\"libatb.so: cannot open shared object file\"?"
msgstr "6.如何解决“无法推断设备类型”或“libatb.so无法打开共享对象文件”的问题"
#: ../../source/faqs.md:89
msgid ""
"Basically, the reason is that the NPU environment is not configured "
"correctly. You can:"
msgstr "基本上,原因是 NPU 环境未正确配置。您可以:"
#: ../../source/faqs.md:91
msgid "try `source /usr/local/Ascend/nnal/atb/set_env.sh` to enable NNAL package."
msgstr "尝试运行 `source /usr/local/Ascend/nnal/atb/set_env.sh` 以启用 NNAL 包。"
#: ../../source/faqs.md:92
msgid ""
"try `source /usr/local/Ascend/ascend-toolkit/set_env.sh` to enable CANN "
"package."
msgstr "尝试运行 `source /usr/local/Ascend/ascend-toolkit/set_env.sh` 以启用 CANN 包。"
#: ../../source/faqs.md:93
msgid "try `npu-smi info` to check whether the NPU is working."
msgstr "尝试运行 `npu-smi info` 来检查 NPU 是否正常工作。"
#: ../../source/faqs.md:95
msgid ""
"If the above steps are not working, you can try the following code in "
"Python to check whether there are any errors:"
msgstr "如果上述步骤无效,您可以在 Python 中尝试以下代码来检查是否有任何错误:"
#: ../../source/faqs.md:103
msgid "If all above steps are not working, feel free to submit a GitHub issue."
msgstr "如果以上所有步骤都无法解决问题,请随时提交一个 GitHub issue。"
#: ../../source/faqs.md:105
msgid "7. How vllm-ascend work with vLLM?"
msgstr "7.vllm-ascend 如何与 vLLM 协同工作?"
#: ../../source/faqs.md:107
msgid ""
"`vllm-ascend` is a hardware plugin for vLLM. The version of `vllm-ascend`"
" is the same as the version of `vllm`. For example, if you use `vllm` "
"0.9.1, you should use vllm-ascend 0.9.1 as well. For the main branch, we "
"ensure that `vllm-ascend` and `vllm` are compatible at every commit."
msgstr ""
"`vllm-ascend` 是 vLLM 的一个硬件插件。`vllm-ascend` 的版本与 `vllm` 的版本相同。例如,如果您使用 "
"`vllm` 0.9.1,您也应该使用 vllm-ascend 0.9.1。对于主分支,我们确保 `vllm-ascend` 和 `vllm` "
"在每次提交时都是兼容的。"
#: ../../source/faqs.md:109
msgid "8. Does vllm-ascend support Prefill Disaggregation feature?"
msgstr "8.vllm-ascend 是否支持 Prefill Disaggregation 功能?"
#: ../../source/faqs.md:111
msgid ""
"Yes, vllm-ascend supports Prefill Disaggregation feature with Mooncake "
"backend. See the [official "
"tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"
" for example."
msgstr ""
"是的vllm-ascend 支持通过 Mooncake 后端实现 Prefill Disaggregation "
"功能。示例请参见[官方教程](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)。"
#: ../../source/faqs.md:113
msgid "9. Does vllm-ascend support quantization method?"
msgstr "9.vllm-ascend 是否支持量化方法?"
#: ../../source/faqs.md:115
msgid ""
"Currently, w8a8, w4a8, and w4a4 quantization methods are already "
"supported by vllm-ascend."
msgstr "目前vllm-ascend 已支持 w8a8、w4a8 和 w4a4 量化方法。"
#: ../../source/faqs.md:117
msgid "10. How is vllm-ascend tested?"
msgstr "10.vllm-ascend 是如何测试的?"
#: ../../source/faqs.md:119
msgid ""
"vllm-ascend is tested in three aspects: functions, performance, and "
"accuracy."
msgstr "vllm-ascend 在三个方面进行测试:功能、性能和精度。"
#: ../../source/faqs.md:121
msgid ""
"**Functional test**: We added CI, including part of vllm's native unit "
"tests and vllm-ascend's own unit tests. In vllm-ascend's tests, we test "
"basic functionalities, popular model availability, and [supported "
"features](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)"
" through E2E test."
msgstr ""
"**功能测试**:我们添加了 CI包括部分 vllm 的原生单元测试和 vllm-ascend 自身的单元测试。在 vllm-ascend "
"的测试中,我们通过端到端测试来验证基本功能、主流模型的可用性以及[支持的功能](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)。"
#: ../../source/faqs.md:123
msgid ""
"**Performance test**: We provide [benchmark](https://github.com/vllm-"
"project/vllm-ascend/tree/main/benchmarks) tools for E2E performance "
"benchmark, which can be easily re-run locally. We will publish a perf "
"website to show the performance test results for each pull request."
msgstr ""
"**性能测试**:我们提供了用于端到端性能基准测试的[基准测试](https://github.com/vllm-project/vllm-"
"ascend/tree/main/benchmarks)工具,可以方便地在本地重新运行。我们将发布一个性能网站,展示每个拉取请求的性能测试结果。"
#: ../../source/faqs.md:125
msgid ""
"**Accuracy test**: We are working on adding accuracy test to the CI as "
"well."
msgstr "**准确性测试**:我们正在努力将准确性测试也添加到 CI 中。"
#: ../../source/faqs.md:127
msgid ""
"**Nightly test**: we'll run full test every night to make sure the code "
"is working."
msgstr "**夜间测试**:我们将每晚运行完整测试,以确保代码正常工作。"
#: ../../source/faqs.md:129
msgid ""
"For each release, we'll publish the performance test and accuracy test "
"report in the future."
msgstr "对于每个版本,我们未来都将发布性能测试和准确性测试报告。"
#: ../../source/faqs.md:131
msgid "11. How to fix the error \"InvalidVersion\" when using vllm-ascend?"
msgstr "11.使用 vllm-ascend 时如何修复 \"InvalidVersion\" 错误?"
#: ../../source/faqs.md:133
msgid ""
"The problem is usually caused by the installation of a development or "
"editable version of the vLLM package. In this case, we provide the "
"environment variable `VLLM_VERSION` to let users specify the version of "
"vLLM package to use. Please set the environment variable `VLLM_VERSION` "
"to the version of the vLLM package you have installed. The format of "
"`VLLM_VERSION` should be `X.Y.Z`."
msgstr ""
"此问题通常是由于安装了开发版或可编辑版本的 vLLM 包引起的。为此,我们提供了环境变量 `VLLM_VERSION`,允许用户指定要使用的 "
"vLLM 包版本。请将环境变量 `VLLM_VERSION` 设置为你已安装的 vLLM 包的版本。`VLLM_VERSION` 的格式应为 "
"`X.Y.Z`。"
#: ../../source/faqs.md:135
msgid "12. How to handle the out-of-memory issue?"
msgstr "12.如何处理内存不足问题?"
#: ../../source/faqs.md:137
msgid ""
"OOM errors typically occur when the model exceeds the memory capacity of "
"a single NPU. For general guidance, you can refer to [vLLM OOM "
"troubleshooting "
"documentation](https://docs.vllm.ai/en/latest/usage/troubleshooting/#out-"
"of-memory)."
msgstr ""
"当模型超出单个 NPU 的内存容量时,通常会发生 OOM内存不足错误。一般性指导可参考 [vLLM OOM "
"故障排除文档](https://docs.vllm.ai/en/latest/usage/troubleshooting/#out-of-"
"memory)。"
#: ../../source/faqs.md:139
msgid ""
"In scenarios where NPUs have limited high bandwidth memory (on-chip "
"memory) capacity, dynamic memory allocation/deallocation during inference"
" can exacerbate memory fragmentation, leading to OOM. To address this:"
msgstr "在 NPU 的高带宽内存(片上内存)容量有限的场景下,推理过程中的动态内存分配/释放会加剧内存碎片,导致 OOM。为解决此问题"
#: ../../source/faqs.md:141
msgid ""
"**Limit `--max-model-len`**: It can save the on-chip memory usage for KV "
"cache initialization step."
msgstr "**限制 `--max-model-len`**:它可以节省 KV 缓存初始化步骤的片上内存使用量。"
#: ../../source/faqs.md:143
msgid ""
"**Adjust `--gpu-memory-utilization`**: If unspecified, the default value "
"is `0.9`. You can decrease this value to reserve more memory to reduce "
"fragmentation risks. See details in: [vLLM - Inference and Serving - "
"Engine Arguments](https://docs.vllm.ai/en/latest/cli/serve/#-gpu-memory-"
"utilization)."
msgstr ""
"**调整 `--gpu-memory-utilization`**:如果未指定,默认值为 "
"`0.9`。你可以降低此值以预留更多内存,从而减少碎片风险。详情参见:[vLLM - 推理与服务 - "
"引擎参数](https://docs.vllm.ai/en/latest/cli/serve/#-gpu-memory-utilization)。"
#: ../../source/faqs.md:145
msgid ""
"**Configure `PYTORCH_NPU_ALLOC_CONF`**: Set this environment variable to "
"optimize NPU memory management. For example, you can use `export "
"PYTORCH_NPU_ALLOC_CONF=expandable_segments:True` to enable virtual memory"
" feature to mitigate memory fragmentation caused by frequent dynamic "
"memory size adjustments during runtime. See details in "
"[PYTORCH_NPU_ALLOC_CONF](https://www.hiascend.com/document/detail/zh/Pytorch/700/comref/Envvariables/Envir_012.html)."
msgstr ""
"**配置 `PYTORCH_NPU_ALLOC_CONF`**:设置此环境变量以优化 NPU 内存管理。例如,你可以使用 `export "
"PYTORCH_NPU_ALLOC_CONF=expandable_segments:True` "
"来启用虚拟内存功能,以缓解运行时频繁动态调整内存大小导致的内存碎片问题。详情参见:[PYTORCH_NPU_ALLOC_CONF](https://www.hiascend.com/document/detail/zh/Pytorch/700/comref/Envvariables/Envir_012.html)。"
#: ../../source/faqs.md:147
msgid "13. Failed to enable NPU graph mode when running DeepSeek"
msgstr "13.运行 DeepSeek 时无法启用 NPU 图模式"
#: ../../source/faqs.md:149
msgid ""
"Enabling NPU graph mode for DeepSeek may trigger an error. This is "
"because when both MLA (Multi-Head Latent Attention) and NPU graph mode "
"are active, the number of queries per KV head must be 32, 64, or 128. "
"However, DeepSeek-V2-Lite has only 16 attention heads, which results in "
"16 queries per KV—a value outside the supported range. Support for NPU "
"graph mode on DeepSeek-V2-Lite will be added in a future update."
msgstr ""
"为 DeepSeek 启用 NPU 图模式可能会触发错误。这是因为当 MLA多头潜在注意力和 NPU 图模式同时激活时,每个 KV 头的查询数必须为 "
"32、64 或 128。然而DeepSeek-V2-Lite 只有 16 个注意力头,导致每个 KV 有 16 个查询,该值超出了支持范围。对 "
"DeepSeek-V2-Lite 的 NPU 图模式支持将在未来的更新中添加。"
#: ../../source/faqs.md:151
#, python-brace-format
msgid ""
"And if you're using DeepSeek-V3 or DeepSeek-R1, please make sure after "
"the tensor parallel split, `num_heads`/`num_kv_heads` is {32, 64, 128}."
msgstr ""
"如果你正在使用 DeepSeek-V3 或 DeepSeek-R1请确保在张量并行切分后`num_heads`/`num_kv_heads` "
"的值为 {32, 64, 128} 中的一个。"
#: ../../source/faqs.md:158
msgid ""
"14. Failed to reinstall vllm-ascend from source after uninstalling vllm-"
"ascend"
msgstr "14.卸载 vllm-ascend 后无法从源码重新安装 vllm-ascend"
#: ../../source/faqs.md:160
msgid ""
"You may encounter the problem of C/C++ compilation failure when "
"reinstalling vllm-ascend from source using pip. If the installation "
"fails, use `python setup.py install` (recommended) to install, or use "
"`python setup.py clean` to clear the cache."
msgstr ""
"使用 pip 从源码重新安装 vllm-ascend 时,可能会遇到 C/C++ 编译失败的问题。如果安装失败,请使用 `python "
"setup.py install`(推荐)进行安装,或使用 `python setup.py clean` 清除缓存。"
#: ../../source/faqs.md:162
msgid "15. How to generate deterministic results when using vllm-ascend?"
msgstr "15.使用 vllm-ascend 时如何生成确定性结果?"
#: ../../source/faqs.md:164
msgid "There are several factors that affect output determinism:"
msgstr "有几个因素会影响输出的确定性:"
#: ../../source/faqs.md:166
msgid ""
"Sampler method: using **greedy sampling** by setting `temperature=0` in "
"`SamplingParams`, e.g.:"
msgstr "采样方法:通过在 `SamplingParams` 中设置 `temperature=0` 来使用 **贪婪采样**,例如:"
#: ../../source/faqs.md:191
msgid "Set the following environment parameters:"
msgstr "设置以下环境参数:"
#: ../../source/faqs.md:200
msgid ""
"16. How to fix the error \"ImportError: Please install vllm[audio] for "
"audio support\" for the Qwen2.5-Omni model"
msgstr ""
"16.对于 Qwen2.5-Omni 模型,如何修复 \"ImportError: Please install vllm[audio] for"
" audio support\" 错误?"
#: ../../source/faqs.md:202
msgid ""
"The `Qwen2.5-Omni` model requires the `librosa` package to be installed, "
"you need to install the `qwen-omni-utils` package to ensure all "
"dependencies are met, run `pip install qwen-omni-utils`. This package "
"will install `librosa` and its related dependencies, resolving the "
"`ImportError: No module named 'librosa'` issue and ensuring that the "
"audio processing functionality works correctly."
msgstr ""
"`Qwen2.5-Omni` 模型需要安装 `librosa` 包,你需要安装 `qwen-omni-utils` 包以确保满足所有依赖,运行 "
"`pip install qwen-omni-utils`。此包将安装 `librosa` 及其相关依赖,解决 `ImportError: No "
"module named 'librosa'` 问题,并确保音频处理功能正常工作。"
#: ../../source/faqs.md:205
msgid ""
"17. How to troubleshoot and resolve size capture failures resulting from "
"stream resource exhaustion, and what are the underlying causes?"
msgstr "17.如何排查和解决因流资源耗尽导致的尺寸捕获失败,其根本原因是什么?"
#: ../../source/faqs.md:213
msgid "Recommended mitigation strategies:"
msgstr "推荐的缓解策略:"
#: ../../source/faqs.md:215
#, python-brace-format
msgid ""
"Manually configure the compilation_config parameter with a reduced size "
"set: '{\"cudagraph_capture_sizes\":[size1, size2, size3, ...]}'."
msgstr ""
"手动配置 compilation_config "
"参数,使用缩减后的尺寸集合:'{\"cudagraph_capture_sizes\":[size1, size2, size3, ...]}'。"
#: ../../source/faqs.md:216
msgid ""
"Employ ACLgraph's full graph mode as an alternative to the piecewise "
"approach."
msgstr "采用 ACLgraph 的全图模式作为分段方法的替代方案。"
#: ../../source/faqs.md:218
msgid ""
"Root cause analysis: The current stream requirement calculation for size "
"captures only accounts for measurable factors including: data parallel "
"size, tensor parallel size, expert parallel configuration, piece graph "
"count, multistream-overlap shared expert settings, and HCCL communication"
" mode (AIV/AICPU). However, numerous unquantifiable elements, such as "
"operator characteristics and specific hardware features, consume "
"additional streams outside of this calculation framework, resulting in "
"stream resource exhaustion during size capture operations."
msgstr ""
"根本原因分析:当前尺寸捕获的流需求计算仅考虑了可测量的因素,包括:数据并行大小、张量并行大小、专家并行配置、分段图数量、多流重叠共享专家设置以及 "
"HCCL "
"通信模式AIV/AICPU。然而许多不可量化的元素例如算子特性和特定硬件特性在此计算框架之外消耗了额外的流导致尺寸捕获操作期间流资源耗尽。"
#: ../../source/faqs.md:221
msgid "18. How to install custom version of torch_npu?"
msgstr "18.如何安装自定义版本的 torch_npu"
#: ../../source/faqs.md:223
msgid ""
"torch-npu will be overridden when installing vllm-ascend. If you need to"
" install a specific version of torch-npu, you can manually install the "
"specified version of torch-npu after vllm-ascend is installed."
msgstr ""
"安装 vllm-ascend 时会覆盖 torch-npu。如果你需要安装特定版本的 torch-npu可以在 vllm-ascend "
"安装后手动安装指定版本的 torch-npu。"
#: ../../source/faqs.md:225
msgid ""
"19. On certain systems (e.g., Kylin OS), `docker pull` may fail with an "
"`invalid tar header` error"
msgstr "19.在某些系统上(例如 Kylin OS`docker pull` 可能因 `invalid tar header` 错误而失败"
#: ../../source/faqs.md:227
msgid ""
"On certain operating systems, such as Kylin OS, you may encounter an "
"`invalid tar header` error during the `docker pull` process:"
msgstr "在某些操作系统上,例如 Kylin OS你可能会在 `docker pull` 过程中遇到 `invalid tar header` 错误:"
#: ../../source/faqs.md:233
msgid ""
"This is often due to system compatibility issues. You can resolve this by"
" using an offline loading method with a second machine."
msgstr "这通常是由于系统兼容性问题。你可以使用第二台机器通过离线加载方法来解决此问题。"
#: ../../source/faqs.md:235
msgid ""
"On a separate host machine (e.g., a standard Ubuntu server), pull the "
"image for the target ARM64 architecture and package it into a `.tar` "
"file."
msgstr "在一台独立的主机上(例如,标准的 Ubuntu 服务器),拉取目标 ARM64 架构的镜像并将其打包成 `.tar` 文件。"
#: ../../source/faqs.md:248
msgid "Transfer the image archive"
msgstr "传输镜像归档文件"
#: ../../source/faqs.md:250
msgid ""
"Copy the `vllm_ascend_<tag>.tar` file (where `<tag>` is the image tag you"
" used) to your target machine"
msgstr "将 `vllm_ascend_<tag>.tar` 文件(其中 `<tag>` 是你使用的镜像标签)复制到你的目标机器"
#: ../../source/faqs.md:252
msgid ""
"20. Why am I getting an error when executing the script to start a Docker"
" container? The error message is: \"operation not permitted\""
msgstr "20.为什么执行启动 Docker 容器的脚本时会出错?错误信息是:\"operation not permitted\""
#: ../../source/faqs.md:254
msgid ""
"When using `--shm-size`, you may need to add the `--privileged=true` flag"
" to your `docker run` command to grant the container necessary "
"permissions. Please be aware that using `--privileged=true` grants the "
"container extensive privileges on the host system, which can be a "
"security risk. Only use this option if you understand the implications "
"and trust the container's source."
msgstr ""
"使用 `--shm-size` 时,你可能需要在 `docker run` 命令中添加 `--privileged=true` "
"标志,以授予容器必要的权限。请注意,使用 `--privileged=true` "
"会授予容器在主机系统上的广泛权限,这可能带来安全风险。只有在理解其影响并信任容器来源的情况下才使用此选项。"
#: ../../source/faqs.md:256
msgid "21. How to achieve low latency in a small batch scenario?"
msgstr "21.如何在小批量场景下实现低延迟?"
#: ../../source/faqs.md:258
msgid ""
"The performance of `torch_npu.npu_fused_infer_attention_score` in small "
"batch scenarios is not satisfactory, mainly due to the lack of flash "
"decoding function. We offer an alternative operator in "
"`tools/install_flash_infer_attention_score_ops_a2.sh` and "
"`tools/install_flash_infer_attention_score_ops_a3.sh`, you can install it"
" using the following instruction:"
msgstr ""
"`torch_npu.npu_fused_infer_attention_score` 在小批量场景下的性能不理想,主要是由于缺乏 Flash "
"Decoding 功能。我们在 `tools/install_flash_infer_attention_score_ops_a2.sh` 和 "
"`tools/install_flash_infer_attention_score_ops_a3.sh` "
"中提供了一个替代算子,你可以使用以下指令安装它:"
#: ../../source/faqs.md:266
msgid ""
"**NOTE**: Don't set `additional_config.pa_shape_list` when using this "
"method; otherwise, it will lead to another attention operator. "
"**Important**: Please make sure you're using the **official image** of "
"`vllm-ascend`; otherwise, you **must change** the directory `/vllm-"
"workspace` in `tools/install_flash_infer_attention_score_ops_a2.sh` or "
"`tools/install_flash_infer_attention_score_ops_a3.sh` to your own, or "
"create one. If you're not the root user, you need `sudo` **privileges** "
"to run this script."
msgstr ""
"**注意**:使用此方法时不要设置 "
"`additional_config.pa_shape_list`;否则会导致使用另一个注意力算子。**重要**:请确保你使用的是 `vllm-"
"ascend` 的**官方镜像**;否则,你**必须将** "
"`tools/install_flash_infer_attention_score_ops_a2.sh` 或 "
"`tools/install_flash_infer_attention_score_ops_a3.sh` 中的目录 `/vllm-"
"workspace` **更改为你自己的目录**,或者创建一个。如果你不是 root 用户,则需要 `sudo` **权限**来运行此脚本。"
#: ../../source/faqs.md:269
msgid ""
"22. How to set `SOC_VERSION` when building from source on a CPU-only "
"machine?"
msgstr "22.在仅含 CPU 的机器上从源码构建时,如何设置 `SOC_VERSION`"
#: ../../source/faqs.md:271
msgid ""
"When building from source (e.g. `pip install -e .`), the build may try to"
" infer the target chip via `npu-smi`. If `npu-smi` is not available "
"(common in CPU-only build environments), you must set `SOC_VERSION` "
"manually before installation."
msgstr ""
"从源码构建时(例如 `pip install -e .`),构建过程可能会尝试通过 `npu-smi` 推断目标芯片。如果 `npu-smi` "
"不可用(在仅含 CPU 的构建环境中很常见),则必须在安装前手动设置 `SOC_VERSION`。"
#: ../../source/faqs.md:273
msgid "You can use the defaults from `Dockerfile*` as a reference. For example:"
msgstr "你可以参考 `Dockerfile*` 中的默认值。例如:"
#: ../../source/faqs.md:289
msgid "23. Compilation error occasionally encounters with triton-ascend"
msgstr "23.triton-ascend 偶尔遇到编译错误"
#: ../../source/faqs.md:291
msgid ""
"As shown in [#7782](https://github.com/vllm-project/vllm-"
"ascend/issues/7782), triton-ascend occasionally encounters compilation "
"errors, which is a known issue in triton-ascend 3.2.0. To avoid this "
"issue, please use the official docker images or install the specific "
"triton-ascend version as following:"
msgstr ""
"如 [#7782](https://github.com/vllm-project/vllm-ascend/issues/7782) 所示"
"triton-ascend 偶尔会遇到编译错误,这是 triton-ascend 3.2.0 中的一个已知问题。为避免此问题,请使用官方 "
"docker 镜像或按以下方式安装特定的 triton-ascend 版本:"
#: ../../source/faqs.md:300
msgid "24. Why TPOT increases drastically as concurrency grows?"
msgstr "24.为什么 TPOT 随着并发增长而急剧增加?"
#: ../../source/faqs.md:302
msgid ""
"When testing a vLLM server, one may find that TPOT increases as "
"concurrency increases (for example, TPOT increases by 0.5 ~ 1ms when "
"concurrency increases by 4). This phenomenon is normal in most cases. "
"However, sometimes TPOT may increase dramatically (10 to 100ms for "
"example) as concurrency grows. This is possibly caused by "
"[**PREEMPTION**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption)"
" in vLLM. Generally, when your server hits KV cache limits, vLLM tries to"
" free KV cache of requests to ensure sufficient space for other requests,"
" which is called preemption in vLLM. When a request is preempted, the "
"default behavior is to recompute the KV cache of this request again in "
"the future, which is why the performance might drop significantly. There "
"are several ways to verify this:"
msgstr ""
"在测试 vLLM 服务器时,可能会发现 TPOT 随着并发度的增加而增加(例如,并发度增加 4 时TPOT 增加 0.5 ~ "
"1ms。在大多数情况下这种现象是正常的。然而有时随着并发度的增长TPOT 可能会急剧增加(例如增加 10 到 100ms。这可能是由 "
"vLLM 中的 "
"[**抢占**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption)"
" 引起的。通常,当服务器达到 KV 缓存限制时vLLM 会尝试释放请求的 KV 缓存,以确保为其他请求提供足够的空间,这在 vLLM "
"中称为抢占。当一个请求被抢占时,默认行为是在未来重新计算该请求的 KV 缓存,这就是性能可能显著下降的原因。有几种方法可以验证这一点:"
#: ../../source/faqs.md:305
msgid ""
"vLLM usually logs stats on your server. You might see metrics like `GPU "
"KV cache usage: 99.0%,`. When reaching 100%, it triggers preemption."
msgstr ""
"vLLM 通常会在服务器上记录统计信息。您可能会看到类似 `GPU KV cache usage: 99.0%,` 的指标。当达到 100% "
"时,会触发抢占。"
#: ../../source/faqs.md:306
msgid ""
"When launching a vLLM server, you will see logs like `GPU KV cache size: "
"66340 tokens` and `Maximum concurrency for 16,384 tokens per request: "
"4.05`. These are estimated KV cache capacity for a single DP group. You "
"can adjust the overall request traffic according to this."
msgstr ""
"启动 vLLM 服务器时,您会看到类似 `GPU KV cache size: 66340 tokens` 和 `Maximum "
"concurrency for 16,384 tokens per request: 4.05` 的日志。这些是针对单个 DP 组的估计 KV "
"缓存容量。您可以据此调整总体请求流量。"
#: ../../source/faqs.md:308
msgid ""
"Preemption cannot be avoided completely since KV cache usage always has a"
" limit. But there are methods to reduce the chances of preemption. As is "
"suggested in "
"[**PREEMPTION**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption),"
" the core strategy is to increase available KV cache. For example, one "
"can increase `--gpu-memory-utilization` or decrease `--max-num-seqs` && "
"`--max-num-batched-tokens`."
msgstr ""
"抢占无法完全避免,因为 KV 缓存的使用总是有限制的。但有方法可以减少抢占的发生几率。正如 "
"[**抢占**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption)"
" 中所建议的,核心策略是增加可用的 KV 缓存。例如,可以增加 `--gpu-memory-utilization` 或减少 `--max-"
"num-seqs` 和 `--max-num-batched-tokens`。"
#~ msgid ""
#~ "[[v0.7.3.post1] FAQ & Feedback](https://github.com"
#~ "/vllm-project/vllm-ascend/issues/1007)"
#~ msgstr ""
#~ "[[v0.7.3.post1] 常见问题与反馈](https://github.com/vllm-project"
#~ "/vllm-ascend/issues/1007)"
#~ msgid "7. How does vllm-ascend perform?"
#~ msgstr "7. vllm-ascend 的性能如何?"
#~ msgid ""
#~ "Currently, only some models are "
#~ "improved. Such as `Qwen2.5 VL`, `Qwen3`,"
#~ " `Deepseek V3`. Others are not good"
#~ " enough. From 0.9.0rc2, Qwen and "
#~ "Deepseek works with graph mode to "
#~ "play a good performance. What's more,"
#~ " you can install `mindie-turbo` with"
#~ " `vllm-ascend v0.7.3` to speed up "
#~ "the inference as well."
#~ msgstr ""
#~ "目前,只有部分模型得到了改进,例如 `Qwen2.5 VL`、`Qwen3` 和 "
#~ "`Deepseek V3`。其他模型的效果还不够理想。从 0.9.0rc2 版本开始Qwen "
#~ "和 Deepseek 已支持图模式,以获得更好的性能。此外,您还可以在 `vllm-"
#~ "ascend v0.7.3` 上安装 `mindie-turbo` "
#~ "来进一步加速推理。"
#~ msgid ""
#~ "Currently, only 1P1D is supported on "
#~ "V0 Engine. For V1 Engine or NPND"
#~ " support, We will make it stable "
#~ "and supported by vllm-ascend in "
#~ "the future."
#~ msgstr "目前V0 引擎仅支持 1P1D。对于 V1 引擎或 NPND 的支持,我们将在未来使其稳定并由 vllm-ascend 提供支持。"
#~ msgid ""
#~ "Currently, w8a8 quantization is already "
#~ "supported by vllm-ascend originally on"
#~ " v0.8.4rc2 or higher, If you're using"
#~ " vllm 0.7.3 version, w8a8 quantization "
#~ "is supporeted with the integration of"
#~ " vllm-ascend and mindie-turbo, please"
#~ " use `pip install vllm-ascend[mindie-"
#~ "turbo]`."
#~ msgstr ""
#~ "目前w8a8 量化已在 v0.8.4rc2 或更高版本的 vllm-"
#~ "ascend 中原生支持。如果您使用的是 vllm 0.7.3 版本,通过集成 "
#~ "vllm-ascend 和 mindie-turbo 也支持 w8a8"
#~ " 量化,请使用 `pip install vllm-ascend[mindie-"
#~ "turbo]`。"
#~ msgid "11. How to run w8a8 DeepSeek model?"
#~ msgstr "11. 如何运行 w8a8 DeepSeek 模型?"
#~ msgid ""
#~ "Please following the [inferencing "
#~ "tutorial](https://vllm-"
#~ "ascend.readthedocs.io/en/latest/tutorials/multi_node.html) and"
#~ " replace model to DeepSeek."
#~ msgstr ""
#~ "请按照[推理教程](https://vllm-"
#~ "ascend.readthedocs.io/en/latest/tutorials/multi_node.html)进行操作,并将模型替换为"
#~ " DeepSeek。"
#~ msgid ""
#~ "12. There is no output in log "
#~ "when loading models using vllm-ascend,"
#~ " How to solve it?"
#~ msgstr "12. 使用 vllm-ascend 加载模型时日志没有输出,如何解决?"
#~ msgid ""
#~ "If you're using vllm 0.7.3 version, "
#~ "this is a known progress bar "
#~ "display issue in VLLM, which has "
#~ "been resolved in [this PR](https://github.com"
#~ "/vllm-project/vllm/pull/12428), please cherry-"
#~ "pick it locally by yourself. Otherwise,"
#~ " please fill up an issue."
#~ msgstr ""
#~ "如果您使用的是 vllm 0.7.3 版本,这是 VLLM "
#~ "中一个已知的进度条显示问题,已在 [此 PR](https://github.com/vllm-"
#~ "project/vllm/pull/12428) 中解决,请自行在本地进行 cherry-"
#~ "pick。否则请提交一个 issue。"
#~ msgid ""
#~ "You may encounter the following error"
#~ " if running DeepSeek with NPU graph"
#~ " mode enabled. The allowed number of"
#~ " queries per kv when enabling both"
#~ " MLA and Graph mode only support "
#~ "{32, 64, 128}, **Thus this is not"
#~ " supported for DeepSeek-V2-Lite**, as it"
#~ " only has 16 attention heads. The "
#~ "NPU graph mode support on "
#~ "DeepSeek-V2-Lite will be done in the "
#~ "future."
#~ msgstr ""
#~ "如果在启用 NPU 图模式的情况下运行 DeepSeek您可能会遇到以下错误。当同时启用 "
#~ "MLA 和图模式时,每个 kv 允许的查询数仅支持 {32, 64, "
#~ "128}**因此这不支持 DeepSeek-V2-Lite**,因为它只有 16 "
#~ "个注意力头。未来将增加对 DeepSeek-V2-Lite 的 NPU 图模式支持。"