# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
msgid ""
msgstr ""
"Project-Id-Version:  vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.18.0\n"

#: ../../source/faqs.md:1
msgid "FAQs"
msgstr "常见问题解答"

#: ../../source/faqs.md:3
msgid "Version Specific FAQs"
msgstr "版本特定常见问题"

#: ../../source/faqs.md:5
msgid ""
"[[v0.17.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-"
"ascend/issues/7173)"
msgstr ""
"[[v0.17.0rc1] 常见问题与反馈](https://github.com/vllm-project/vllm-"
"ascend/issues/7173)"

#: ../../source/faqs.md:6
msgid ""
"[[v0.13.0] FAQ & Feedback](https://github.com/vllm-project/vllm-"
"ascend/issues/6583)"
msgstr ""
"[[v0.13.0] 常见问题与反馈](https://github.com/vllm-project/vllm-"
"ascend/issues/6583)"

#: ../../source/faqs.md:8
msgid "General FAQs"
msgstr "通用常见问题"

#: ../../source/faqs.md:10
msgid "1. What devices are currently supported?"
msgstr "1. 目前支持哪些设备？"

#: ../../source/faqs.md:12
msgid ""
"Currently, **ONLY** Atlas A2 series (Ascend-cann-kernels-910b), Atlas A3 "
"series (Atlas-A3-cann-kernels) and Atlas 300I (Ascend-cann-kernels-310p) "
"series are supported:"
msgstr ""
"目前，**仅**支持 Atlas A2 系列（Ascend-cann-kernels-910b）、Atlas A3 系列（Atlas-A3"
"-cann-kernels）和 Atlas 300I（Ascend-cann-kernels-310p）系列："

#: ../../source/faqs.md:14
msgid ""
"Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 "
"Box16, Atlas 300T A2)"
msgstr ""
"Atlas A2 训练系列（Atlas 800T A2、Atlas 900 A2 PoD、Atlas 200T A2 Box16、Atlas "
"300T A2）"

#: ../../source/faqs.md:15
msgid "Atlas 800I A2 Inference series (Atlas 800I A2)"
msgstr "Atlas 800I A2 推理系列（Atlas 800I A2）"

#: ../../source/faqs.md:16
msgid ""
"Atlas A3 Training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas "
"9000 A3 SuperPoD)"
msgstr "Atlas A3 训练系列（Atlas 800T A3、Atlas 900 A3 SuperPoD、Atlas 9000 A3 SuperPoD）"

#: ../../source/faqs.md:17
msgid "Atlas 800I A3 Inference series (Atlas 800I A3)"
msgstr "Atlas 800I A3 推理系列（Atlas 800I A3）"

#: ../../source/faqs.md:18
msgid "[Experimental] Atlas 300I Inference series (Atlas 300I Duo)."
msgstr "[实验性] Atlas 300I 推理系列（Atlas 300I Duo）。"

#: ../../source/faqs.md:19
msgid ""
"[Experimental] Currently for 310I Duo the stable version is vllm-ascend "
"v0.10.0rc1."
msgstr "[实验性] 目前对于 310I Duo，稳定版本是 vllm-ascend v0.10.0rc1。"

#: ../../source/faqs.md:21
msgid "Below series are NOT supported yet:"
msgstr "以下系列目前尚不支持："

#: ../../source/faqs.md:23
msgid "Atlas 200I A2 (Ascend-cann-kernels-310b) unplanned yet"
msgstr "Atlas 200I A2（Ascend-cann-kernels-310b）尚未计划支持"

#: ../../source/faqs.md:24
msgid "Ascend 910, Ascend 910 Pro B (Ascend-cann-kernels-910) unplanned yet"
msgstr "Ascend 910、Ascend 910 Pro B（Ascend-cann-kernels-910）尚未计划支持"

#: ../../source/faqs.md:26
msgid ""
"From a technical view, vllm-ascend supports devices if torch-npu is "
"supported. Otherwise, we have to implement it by using custom ops. We "
"also welcome you to join us to improve together."
msgstr ""
"从技术角度看，如果 torch-npu 支持某设备，则 vllm-ascend "
"也支持该设备。否则，我们需要通过自定义算子来实现。我们也欢迎您加入我们，共同改进。"

#: ../../source/faqs.md:28
msgid "2. How to get our docker containers?"
msgstr "2. 如何获取我们的 Docker 容器？"

#: ../../source/faqs.md:30
msgid ""
"You can get our containers at `Quay.io`, e.g., [<u>vllm-"
"ascend</u>](https://quay.io/repository/ascend/vllm-ascend?tab=tags) and "
"[<u>cann</u>](https://quay.io/repository/ascend/cann?tab=tags)."
msgstr ""
"您可以在 `Quay.io` 获取我们的容器，例如：[<u>vllm-"
"ascend</u>](https://quay.io/repository/ascend/vllm-ascend?tab=tags) 和 "
"[<u>cann</u>](https://quay.io/repository/ascend/cann?tab=tags)。"

#: ../../source/faqs.md:32
msgid ""
"If you are in China, you can use `daocloud` or some other mirror sites to"
" accelerate your downloading:"
msgstr "如果您在中国，可以使用 `daocloud` 或其他镜像站点来加速下载："

#: ../../source/faqs.md:42
msgid "Load Docker Images for offline environment"
msgstr "为离线环境加载 Docker 镜像"

#: ../../source/faqs.md:44
msgid ""
"If you want to use container image for offline environments (no internet "
"connection), you need to download container image in an environment with "
"internet access:"
msgstr "如果您想在离线环境（无互联网连接）中使用容器镜像，您需要在有互联网访问权限的环境中下载容器镜像："

#: ../../source/faqs.md:46
msgid "**Exporting Docker images:**"
msgstr "**导出 Docker 镜像：**"

#: ../../source/faqs.md:58
msgid "**Importing Docker images in environment without internet access:**"
msgstr "**在无互联网访问权限的环境中导入 Docker 镜像：**"

#: ../../source/faqs.md:70
msgid "3. What models does vllm-ascend supports?"
msgstr "3. vllm-ascend 支持哪些模型？"

#: ../../source/faqs.md:72
msgid ""
"Find more details "
"[<u>here</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html)."
msgstr "更多详细信息请参见[<u>此处</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html)。"

#: ../../source/faqs.md:74
msgid "4. How to get in touch with our community?"
msgstr "4. 如何与我们的社区取得联系？"

#: ../../source/faqs.md:76
msgid ""
"There are many channels that you can communicate with our community "
"developers / users:"
msgstr "您可以通过多种渠道与我们的社区开发者/用户进行交流："

#: ../../source/faqs.md:78
msgid ""
"Submit a GitHub [<u>issue</u>](https://github.com/vllm-project/vllm-"
"ascend/issues?page=1)."
msgstr ""
"提交一个 GitHub [<u>issue</u>](https://github.com/vllm-project/vllm-"
"ascend/issues?page=1)。"

#: ../../source/faqs.md:79
msgid ""
"Join our [<u>weekly "
"meeting</u>](https://docs.google.com/document/d/1hCSzRTMZhIB8vRq1_qOOjx4c9uYUxvdQvDsMV2JcSrw/edit?tab=t.0#heading=h.911qu8j8h35z)"
" and share your ideas."
msgstr "参加我们的[<u>每周例会</u>](https://docs.google.com/document/d/1hCSzRTMZhIB8vRq1_qOOjx4c9uYUxvdQvDsMV2JcSrw/edit?tab=t.0#heading=h.911qu8j8h35z)并分享您的想法。"

#: ../../source/faqs.md:80
msgid ""
"Join our [<u>WeChat</u>](https://github.com/vllm-project/vllm-"
"ascend/issues/227) group and ask your questions."
msgstr ""
"加入我们的[<u>微信群</u>](https://github.com/vllm-project/vllm-"
"ascend/issues/227)并提出您的问题。"

#: ../../source/faqs.md:81
msgid ""
"Join our ascend channel in [<u>vLLM forums</u>](https://discuss.vllm.ai/c"
"/hardware-support/vllm-ascend-support/6) and publish your topics."
msgstr ""
"加入我们在 [<u>vLLM 论坛</u>](https://discuss.vllm.ai/c/hardware-support/vllm-"
"ascend-support/6) 的 ascend 频道并发布您的主题。"

#: ../../source/faqs.md:83
msgid "5. What features does vllm-ascend V1 supports?"
msgstr "5. vllm-ascend V1 支持哪些功能？"

#: ../../source/faqs.md:85
msgid ""
"Find more details "
"[<u>here</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)."
msgstr "更多详细信息请参见[<u>此处</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)。"

#: ../../source/faqs.md:87
msgid ""
"6. How to solve the problem of \"Failed to infer device type\" or "
"\"libatb.so: cannot open shared object file\"?"
msgstr "6. 如何解决“无法推断设备类型”或“libatb.so：无法打开共享对象文件”的问题？"

#: ../../source/faqs.md:89
msgid ""
"Basically, the reason is that the NPU environment is not configured "
"correctly. You can:"
msgstr "基本上，原因是 NPU 环境未正确配置。您可以："

#: ../../source/faqs.md:91
msgid "try `source /usr/local/Ascend/nnal/atb/set_env.sh` to enable NNAL package."
msgstr "尝试运行 `source /usr/local/Ascend/nnal/atb/set_env.sh` 以启用 NNAL 包。"

#: ../../source/faqs.md:92
msgid ""
"try `source /usr/local/Ascend/ascend-toolkit/set_env.sh` to enable CANN "
"package."
msgstr "尝试运行 `source /usr/local/Ascend/ascend-toolkit/set_env.sh` 以启用 CANN 包。"

#: ../../source/faqs.md:93
msgid "try `npu-smi info` to check whether the NPU is working."
msgstr "尝试运行 `npu-smi info` 来检查 NPU 是否正常工作。"

#: ../../source/faqs.md:95
msgid ""
"If the above steps are not working, you can try the following code in "
"Python to check whether there are any errors:"
msgstr "如果上述步骤无效，您可以在 Python 中尝试以下代码来检查是否有任何错误："

#: ../../source/faqs.md:103
msgid "If all above steps are not working, feel free to submit a GitHub issue."
msgstr "如果以上所有步骤都无法解决问题，请随时提交一个 GitHub issue。"

#: ../../source/faqs.md:105
msgid "7. How vllm-ascend work with vLLM?"
msgstr "7. vllm-ascend 如何与 vLLM 协同工作？"

#: ../../source/faqs.md:107
msgid ""
"`vllm-ascend` is a hardware plugin for vLLM. The version of `vllm-ascend`"
" is the same as the version of `vllm`. For example, if you use `vllm` "
"0.9.1, you should use vllm-ascend 0.9.1 as well. For the main branch, we "
"ensure that `vllm-ascend` and `vllm` are compatible at every commit."
msgstr ""
"`vllm-ascend` 是 vLLM 的一个硬件插件。`vllm-ascend` 的版本与 `vllm` 的版本相同。例如，如果您使用 "
"`vllm` 0.9.1，您也应该使用 vllm-ascend 0.9.1。对于主分支，我们确保 `vllm-ascend` 和 `vllm` "
"在每次提交时都是兼容的。"

#: ../../source/faqs.md:109
msgid "8. Does vllm-ascend support Prefill Disaggregation feature?"
msgstr "8. vllm-ascend 是否支持 Prefill Disaggregation 功能？"

#: ../../source/faqs.md:111
msgid ""
"Yes, vllm-ascend supports Prefill Disaggregation feature with Mooncake "
"backend. See the [official "
"tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"
" for example."
msgstr ""
"是的，vllm-ascend 支持通过 Mooncake 后端实现 Prefill Disaggregation "
"功能。示例请参见[官方教程](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)。"

#: ../../source/faqs.md:113
msgid "9. Does vllm-ascend support quantization method?"
msgstr "9. vllm-ascend 是否支持量化方法？"

#: ../../source/faqs.md:115
msgid ""
"Currently, w8a8, w4a8, and w4a4 quantization methods are already "
"supported by vllm-ascend."
msgstr "目前，vllm-ascend 已支持 w8a8、w4a8 和 w4a4 量化方法。"

#: ../../source/faqs.md:117
msgid "10. How is vllm-ascend tested?"
msgstr "10. vllm-ascend 是如何测试的？"

#: ../../source/faqs.md:119
msgid ""
"vllm-ascend is tested in three aspects: functions, performance, and "
"accuracy."
msgstr "vllm-ascend 在三个方面进行测试：功能、性能和精度。"

#: ../../source/faqs.md:121
msgid ""
"**Functional test**: We added CI, including part of vllm's native unit "
"tests and vllm-ascend's own unit tests. In vllm-ascend's tests, we test "
"basic functionalities, popular model availability, and [supported "
"features](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)"
" through E2E test."
msgstr ""
"**功能测试**：我们添加了 CI，包括部分 vllm 的原生单元测试和 vllm-ascend 自身的单元测试。在 vllm-ascend "
"的测试中，我们通过端到端测试来验证基本功能、主流模型的可用性以及[支持的功能](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)。"

#: ../../source/faqs.md:123
msgid ""
"**Performance test**: We provide [benchmark](https://github.com/vllm-"
"project/vllm-ascend/tree/main/benchmarks) tools for E2E performance "
"benchmark, which can be easily re-run locally. We will publish a perf "
"website to show the performance test results for each pull request."
msgstr ""
"**性能测试**：我们提供了用于端到端性能基准测试的[基准测试](https://github.com/vllm-project/vllm-"
"ascend/tree/main/benchmarks)工具，可以方便地在本地重新运行。我们将发布一个性能网站，展示每个拉取请求的性能测试结果。"

#: ../../source/faqs.md:125
msgid ""
"**Accuracy test**: We are working on adding accuracy test to the CI as "
"well."
msgstr "**准确性测试**：我们正在努力将准确性测试也添加到 CI 中。"

#: ../../source/faqs.md:127
msgid ""
"**Nightly test**: we'll run full test every night to make sure the code "
"is working."
msgstr "**夜间测试**：我们将每晚运行完整测试，以确保代码正常工作。"

#: ../../source/faqs.md:129
msgid ""
"For each release, we'll publish the performance test and accuracy test "
"report in the future."
msgstr "对于每个版本，我们未来都将发布性能测试和准确性测试报告。"

#: ../../source/faqs.md:131
msgid "11. How to fix the error \"InvalidVersion\" when using vllm-ascend?"
msgstr "11. 使用 vllm-ascend 时如何修复 \"InvalidVersion\" 错误？"

#: ../../source/faqs.md:133
msgid ""
"The problem is usually caused by the installation of a development or "
"editable version of the vLLM package. In this case, we provide the "
"environment variable `VLLM_VERSION` to let users specify the version of "
"vLLM package to use. Please set the environment variable `VLLM_VERSION` "
"to the version of the vLLM package you have installed. The format of "
"`VLLM_VERSION` should be `X.Y.Z`."
msgstr ""
"此问题通常是由于安装了开发版或可编辑版本的 vLLM 包引起的。为此，我们提供了环境变量 `VLLM_VERSION`，允许用户指定要使用的 "
"vLLM 包版本。请将环境变量 `VLLM_VERSION` 设置为你已安装的 vLLM 包的版本。`VLLM_VERSION` 的格式应为 "
"`X.Y.Z`。"

#: ../../source/faqs.md:135
msgid "12. How to handle the out-of-memory issue?"
msgstr "12. 如何处理内存不足问题？"

#: ../../source/faqs.md:137
msgid ""
"OOM errors typically occur when the model exceeds the memory capacity of "
"a single NPU. For general guidance, you can refer to [vLLM OOM "
"troubleshooting "
"documentation](https://docs.vllm.ai/en/latest/usage/troubleshooting/#out-"
"of-memory)."
msgstr ""
"当模型超出单个 NPU 的内存容量时，通常会发生 OOM（内存不足）错误。一般性指导可参考 [vLLM OOM "
"故障排除文档](https://docs.vllm.ai/en/latest/usage/troubleshooting/#out-of-"
"memory)。"

#: ../../source/faqs.md:139
msgid ""
"In scenarios where NPUs have limited high bandwidth memory (on-chip "
"memory) capacity, dynamic memory allocation/deallocation during inference"
" can exacerbate memory fragmentation, leading to OOM. To address this:"
msgstr "在 NPU 的高带宽内存（片上内存）容量有限的场景下，推理过程中的动态内存分配/释放会加剧内存碎片，导致 OOM。为解决此问题："

#: ../../source/faqs.md:141
msgid ""
"**Limit `--max-model-len`**: It can save the on-chip memory usage for KV "
"cache initialization step."
msgstr "**限制 `--max-model-len`**：它可以节省 KV 缓存初始化步骤的片上内存使用量。"

#: ../../source/faqs.md:143
msgid ""
"**Adjust `--gpu-memory-utilization`**: If unspecified, the default value "
"is `0.9`. You can decrease this value to reserve more memory to reduce "
"fragmentation risks. See details in: [vLLM - Inference and Serving - "
"Engine Arguments](https://docs.vllm.ai/en/latest/cli/serve/#-gpu-memory-"
"utilization)."
msgstr ""
"**调整 `--gpu-memory-utilization`**：如果未指定，默认值为 "
"`0.9`。你可以降低此值以预留更多内存，从而减少碎片风险。详情参见：[vLLM - 推理与服务 - "
"引擎参数](https://docs.vllm.ai/en/latest/cli/serve/#-gpu-memory-utilization)。"

#: ../../source/faqs.md:145
msgid ""
"**Configure `PYTORCH_NPU_ALLOC_CONF`**: Set this environment variable to "
"optimize NPU memory management. For example, you can use `export "
"PYTORCH_NPU_ALLOC_CONF=expandable_segments:True` to enable virtual memory"
" feature to mitigate memory fragmentation caused by frequent dynamic "
"memory size adjustments during runtime. See details in "
"[PYTORCH_NPU_ALLOC_CONF](https://www.hiascend.com/document/detail/zh/Pytorch/700/comref/Envvariables/Envir_012.html)."
msgstr ""
"**配置 `PYTORCH_NPU_ALLOC_CONF`**：设置此环境变量以优化 NPU 内存管理。例如，你可以使用 `export "
"PYTORCH_NPU_ALLOC_CONF=expandable_segments:True` "
"来启用虚拟内存功能，以缓解运行时频繁动态调整内存大小导致的内存碎片问题。详情参见：[PYTORCH_NPU_ALLOC_CONF](https://www.hiascend.com/document/detail/zh/Pytorch/700/comref/Envvariables/Envir_012.html)。"

#: ../../source/faqs.md:147
msgid "13. Failed to enable NPU graph mode when running DeepSeek"
msgstr "13. 运行 DeepSeek 时无法启用 NPU 图模式"

#: ../../source/faqs.md:149
msgid ""
"Enabling NPU graph mode for DeepSeek may trigger an error. This is "
"because when both MLA (Multi-Head Latent Attention) and NPU graph mode "
"are active, the number of queries per KV head must be 32, 64, or 128. "
"However, DeepSeek-V2-Lite has only 16 attention heads, which results in "
"16 queries per KV—a value outside the supported range. Support for NPU "
"graph mode on DeepSeek-V2-Lite will be added in a future update."
msgstr ""
"为 DeepSeek 启用 NPU 图模式可能会触发错误。这是因为当 MLA（多头潜在注意力）和 NPU 图模式同时激活时，每个 KV 头的查询数必须为 "
"32、64 或 128。然而，DeepSeek-V2-Lite 只有 16 个注意力头，导致每个 KV 有 16 个查询，该值超出了支持范围。对 "
"DeepSeek-V2-Lite 的 NPU 图模式支持将在未来的更新中添加。"

#: ../../source/faqs.md:151
#, python-brace-format
msgid ""
"And if you're using DeepSeek-V3 or DeepSeek-R1, please make sure after "
"the tensor parallel split, `num_heads`/`num_kv_heads` is {32, 64, 128}."
msgstr ""
"如果你正在使用 DeepSeek-V3 或 DeepSeek-R1，请确保在张量并行切分后，`num_heads`/`num_kv_heads` "
"的值为 {32, 64, 128} 中的一个。"

#: ../../source/faqs.md:158
msgid ""
"14. Failed to reinstall vllm-ascend from source after uninstalling vllm-"
"ascend"
msgstr "14. 卸载 vllm-ascend 后无法从源码重新安装 vllm-ascend"

#: ../../source/faqs.md:160
msgid ""
"You may encounter the problem of C/C++ compilation failure when "
"reinstalling vllm-ascend from source using pip. If the installation "
"fails, use `python setup.py install` (recommended) to install, or use "
"`python setup.py clean` to clear the cache."
msgstr ""
"使用 pip 从源码重新安装 vllm-ascend 时，可能会遇到 C/C++ 编译失败的问题。如果安装失败，请使用 `python "
"setup.py install`（推荐）进行安装，或使用 `python setup.py clean` 清除缓存。"

#: ../../source/faqs.md:162
msgid "15. How to generate deterministic results when using vllm-ascend?"
msgstr "15. 使用 vllm-ascend 时如何生成确定性结果？"

#: ../../source/faqs.md:164
msgid "There are several factors that affect output determinism:"
msgstr "有几个因素会影响输出的确定性："

#: ../../source/faqs.md:166
msgid ""
"Sampler method: using **greedy sampling** by setting `temperature=0` in "
"`SamplingParams`, e.g.:"
msgstr "采样方法：通过在 `SamplingParams` 中设置 `temperature=0` 来使用 **贪婪采样**，例如："

#: ../../source/faqs.md:191
msgid "Set the following environment parameters:"
msgstr "设置以下环境参数："

#: ../../source/faqs.md:200
msgid ""
"16. How to fix the error \"ImportError: Please install vllm[audio] for "
"audio support\" for the Qwen2.5-Omni model？"
msgstr ""
"16. 对于 Qwen2.5-Omni 模型，如何修复 \"ImportError: Please install vllm[audio] for"
" audio support\" 错误？"

#: ../../source/faqs.md:202
msgid ""
"The `Qwen2.5-Omni` model requires the `librosa` package to be installed, "
"you need to install the `qwen-omni-utils` package to ensure all "
"dependencies are met, run `pip install qwen-omni-utils`. This package "
"will install `librosa` and its related dependencies, resolving the "
"`ImportError: No module named 'librosa'` issue and ensuring that the "
"audio processing functionality works correctly."
msgstr ""
"`Qwen2.5-Omni` 模型需要安装 `librosa` 包，你需要安装 `qwen-omni-utils` 包以确保满足所有依赖，运行 "
"`pip install qwen-omni-utils`。此包将安装 `librosa` 及其相关依赖，解决 `ImportError: No "
"module named 'librosa'` 问题，并确保音频处理功能正常工作。"

#: ../../source/faqs.md:205
msgid ""
"17. How to troubleshoot and resolve size capture failures resulting from "
"stream resource exhaustion, and what are the underlying causes?"
msgstr "17. 如何排查和解决因流资源耗尽导致的尺寸捕获失败，其根本原因是什么？"

#: ../../source/faqs.md:213
msgid "Recommended mitigation strategies:"
msgstr "推荐的缓解策略："

#: ../../source/faqs.md:215
#, python-brace-format
msgid ""
"Manually configure the compilation_config parameter with a reduced size "
"set: '{\"cudagraph_capture_sizes\":[size1, size2, size3, ...]}'."
msgstr ""
"手动配置 compilation_config "
"参数，使用缩减后的尺寸集合：'{\"cudagraph_capture_sizes\":[size1, size2, size3, ...]}'。"

#: ../../source/faqs.md:216
msgid ""
"Employ ACLgraph's full graph mode as an alternative to the piecewise "
"approach."
msgstr "采用 ACLgraph 的全图模式作为分段方法的替代方案。"

#: ../../source/faqs.md:218
msgid ""
"Root cause analysis: The current stream requirement calculation for size "
"captures only accounts for measurable factors including: data parallel "
"size, tensor parallel size, expert parallel configuration, piece graph "
"count, multistream-overlap shared expert settings, and HCCL communication"
" mode (AIV/AICPU). However, numerous unquantifiable elements, such as "
"operator characteristics and specific hardware features, consume "
"additional streams outside of this calculation framework, resulting in "
"stream resource exhaustion during size capture operations."
msgstr ""
"根本原因分析：当前尺寸捕获的流需求计算仅考虑了可测量的因素，包括：数据并行大小、张量并行大小、专家并行配置、分段图数量、多流重叠共享专家设置以及 "
"HCCL "
"通信模式（AIV/AICPU）。然而，许多不可量化的元素，例如算子特性和特定硬件特性，在此计算框架之外消耗了额外的流，导致尺寸捕获操作期间流资源耗尽。"

#: ../../source/faqs.md:221
msgid "18. How to install custom version of torch_npu?"
msgstr "18. 如何安装自定义版本的 torch_npu？"

#: ../../source/faqs.md:223
msgid ""
"torch-npu will be overridden  when installing vllm-ascend. If you need to"
" install a specific version of torch-npu, you can manually install the "
"specified version of torch-npu after vllm-ascend is installed."
msgstr ""
"安装 vllm-ascend 时会覆盖 torch-npu。如果你需要安装特定版本的 torch-npu，可以在 vllm-ascend "
"安装后手动安装指定版本的 torch-npu。"

#: ../../source/faqs.md:225
msgid ""
"19. On certain systems (e.g., Kylin OS), `docker pull` may fail with an "
"`invalid tar header` error"
msgstr "19. 在某些系统上（例如 Kylin OS），`docker pull` 可能因 `invalid tar header` 错误而失败"

#: ../../source/faqs.md:227
msgid ""
"On certain operating systems, such as Kylin OS, you may encounter an "
"`invalid tar header` error during the `docker pull` process:"
msgstr "在某些操作系统上，例如 Kylin OS，你可能会在 `docker pull` 过程中遇到 `invalid tar header` 错误："

#: ../../source/faqs.md:233
msgid ""
"This is often due to system compatibility issues. You can resolve this by"
" using an offline loading method with a second machine."
msgstr "这通常是由于系统兼容性问题。你可以使用第二台机器通过离线加载方法来解决此问题。"

#: ../../source/faqs.md:235
msgid ""
"On a separate host machine (e.g., a standard Ubuntu server), pull the "
"image for the target ARM64 architecture and package it into a `.tar` "
"file."
msgstr "在一台独立的主机上（例如，标准的 Ubuntu 服务器），拉取目标 ARM64 架构的镜像并将其打包成 `.tar` 文件。"

#: ../../source/faqs.md:248
msgid "Transfer the image archive"
msgstr "传输镜像归档文件"

#: ../../source/faqs.md:250
msgid ""
"Copy the `vllm_ascend_<tag>.tar` file (where `<tag>` is the image tag you"
" used) to your target machine"
msgstr "将 `vllm_ascend_<tag>.tar` 文件（其中 `<tag>` 是你使用的镜像标签）复制到你的目标机器"

#: ../../source/faqs.md:252
msgid ""
"20. Why am I getting an error when executing the script to start a Docker"
" container? The error message is: \"operation not permitted\""
msgstr "20. 为什么执行启动 Docker 容器的脚本时会出错？错误信息是：\"operation not permitted\""

#: ../../source/faqs.md:254
msgid ""
"When using `--shm-size`, you may need to add the `--privileged=true` flag"
" to your `docker run` command to grant the container necessary "
"permissions. Please be aware that using `--privileged=true` grants the "
"container extensive privileges on the host system, which can be a "
"security risk. Only use this option if you understand the implications "
"and trust the container's source."
msgstr ""
"使用 `--shm-size` 时，你可能需要在 `docker run` 命令中添加 `--privileged=true` "
"标志，以授予容器必要的权限。请注意，使用 `--privileged=true` "
"会授予容器在主机系统上的广泛权限，这可能带来安全风险。只有在理解其影响并信任容器来源的情况下才使用此选项。"

#: ../../source/faqs.md:256
msgid "21. How to achieve low latency in a small batch scenario?"
msgstr "21. 如何在小批量场景下实现低延迟？"

#: ../../source/faqs.md:258
msgid ""
"The performance of `torch_npu.npu_fused_infer_attention_score` in small "
"batch scenarios is not satisfactory, mainly due to the lack of flash "
"decoding function. We offer an alternative operator in "
"`tools/install_flash_infer_attention_score_ops_a2.sh` and "
"`tools/install_flash_infer_attention_score_ops_a3.sh`, you can install it"
" using the following instruction:"
msgstr ""
"`torch_npu.npu_fused_infer_attention_score` 在小批量场景下的性能不理想，主要是由于缺乏 Flash "
"Decoding 功能。我们在 `tools/install_flash_infer_attention_score_ops_a2.sh` 和 "
"`tools/install_flash_infer_attention_score_ops_a3.sh` "
"中提供了一个替代算子，你可以使用以下指令安装它："

#: ../../source/faqs.md:266
msgid ""
"**NOTE**: Don't set `additional_config.pa_shape_list` when using this "
"method; otherwise, it will lead to another attention operator. "
"**Important**: Please make sure you're using the **official image** of "
"`vllm-ascend`; otherwise, you **must change** the directory `/vllm-"
"workspace` in `tools/install_flash_infer_attention_score_ops_a2.sh` or "
"`tools/install_flash_infer_attention_score_ops_a3.sh` to your own, or "
"create one. If you're not the root user, you need `sudo` **privileges** "
"to run this script."
msgstr ""
"**注意**：使用此方法时不要设置 "
"`additional_config.pa_shape_list`；否则会导致使用另一个注意力算子。**重要**：请确保你使用的是 `vllm-"
"ascend` 的**官方镜像**；否则，你**必须将** "
"`tools/install_flash_infer_attention_score_ops_a2.sh` 或 "
"`tools/install_flash_infer_attention_score_ops_a3.sh` 中的目录 `/vllm-"
"workspace` **更改为你自己的目录**，或者创建一个。如果你不是 root 用户，则需要 `sudo` **权限**来运行此脚本。"

#: ../../source/faqs.md:269
msgid ""
"22. How to set `SOC_VERSION` when building from source on a CPU-only "
"machine?"
msgstr "22. 在仅含 CPU 的机器上从源码构建时，如何设置 `SOC_VERSION`？"

#: ../../source/faqs.md:271
msgid ""
"When building from source (e.g. `pip install -e .`), the build may try to"
" infer the target chip via `npu-smi`. If `npu-smi` is not available "
"(common in CPU-only build environments), you must set `SOC_VERSION` "
"manually before installation."
msgstr ""
"从源码构建时（例如 `pip install -e .`），构建过程可能会尝试通过 `npu-smi` 推断目标芯片。如果 `npu-smi` "
"不可用（在仅含 CPU 的构建环境中很常见），则必须在安装前手动设置 `SOC_VERSION`。"

#: ../../source/faqs.md:273
msgid "You can use the defaults from `Dockerfile*` as a reference. For example:"
msgstr "你可以参考 `Dockerfile*` 中的默认值。例如："

#: ../../source/faqs.md:289
msgid "23. Compilation error occasionally encounters with triton-ascend"
msgstr "23. triton-ascend 偶尔遇到编译错误"

#: ../../source/faqs.md:291
msgid ""
"As shown in [#7782](https://github.com/vllm-project/vllm-"
"ascend/issues/7782), triton-ascend occasionally encounters compilation "
"errors, which is a known issue in triton-ascend 3.2.0. To avoid this "
"issue, please use the official docker images or install the specific "
"triton-ascend version as following:"
msgstr ""
"如 [#7782](https://github.com/vllm-project/vllm-ascend/issues/7782) 所示"
"，triton-ascend 偶尔会遇到编译错误，这是 triton-ascend 3.2.0 中的一个已知问题。为避免此问题，请使用官方 "
"docker 镜像或按以下方式安装特定的 triton-ascend 版本："

#: ../../source/faqs.md:300
msgid "24. Why TPOT increases drastically as concurrency grows?"
msgstr "24. 为什么 TPOT 随着并发增长而急剧增加？"

#: ../../source/faqs.md:302
msgid ""
"When testing a vLLM server, one may find that TPOT increases as "
"concurrency increases (for example, TPOT increases by 0.5 ~ 1ms when "
"concurrency increases by 4). This phenomenon is normal in most cases. "
"However, sometimes TPOT may increase dramatically (10 to 100ms for "
"example) as concurrency grows. This is possibly caused by "
"[**PREEMPTION**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption)"
" in vLLM. Generally, when your server hits KV cache limits, vLLM tries to"
" free KV cache of requests to ensure sufficient space for other requests,"
" which is called preemption in vLLM. When a request is preempted, the "
"default behavior is to recompute the KV cache of this request again in "
"the future, which is why the performance might drop significantly. There "
"are several ways to verify this:"
msgstr ""
"在测试 vLLM 服务器时，可能会发现 TPOT 随着并发度的增加而增加（例如，并发度增加 4 时，TPOT 增加 0.5 ~ "
"1ms）。在大多数情况下，这种现象是正常的。然而，有时随着并发度的增长，TPOT 可能会急剧增加（例如增加 10 到 100ms）。这可能是由 "
"vLLM 中的 "
"[**抢占**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption)"
" 引起的。通常，当服务器达到 KV 缓存限制时，vLLM 会尝试释放请求的 KV 缓存，以确保为其他请求提供足够的空间，这在 vLLM "
"中称为抢占。当一个请求被抢占时，默认行为是在未来重新计算该请求的 KV 缓存，这就是性能可能显著下降的原因。有几种方法可以验证这一点："

#: ../../source/faqs.md:305
msgid ""
"vLLM usually logs stats on your server. You might see metrics like `GPU "
"KV cache usage: 99.0%,`. When reaching 100%, it triggers preemption."
msgstr ""
"vLLM 通常会在服务器上记录统计信息。您可能会看到类似 `GPU KV cache usage: 99.0%,` 的指标。当达到 100% "
"时，会触发抢占。"

#: ../../source/faqs.md:306
msgid ""
"When launching a vLLM server, you will see logs like `GPU KV cache size: "
"66340 tokens` and `Maximum concurrency for 16,384 tokens per request: "
"4.05`. These are estimated KV cache capacity for a single DP group. You "
"can adjust the overall request traffic according to this."
msgstr ""
"启动 vLLM 服务器时，您会看到类似 `GPU KV cache size: 66340 tokens` 和 `Maximum "
"concurrency for 16,384 tokens per request: 4.05` 的日志。这些是针对单个 DP 组的估计 KV "
"缓存容量。您可以据此调整总体请求流量。"

#: ../../source/faqs.md:308
msgid ""
"Preemption cannot be avoided completely since KV cache usage always has a"
" limit. But there are methods to reduce the chances of preemption. As is "
"suggested in "
"[**PREEMPTION**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption),"
" the core strategy is to increase available KV cache. For example, one "
"can increase `--gpu-memory-utilization` or decrease `--max-num-seqs` && "
"`--max-num-batched-tokens`."
msgstr ""
"抢占无法完全避免，因为 KV 缓存的使用总是有限制的。但有方法可以减少抢占的发生几率。正如 "
"[**抢占**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption)"
" 中所建议的，核心策略是增加可用的 KV 缓存。例如，可以增加 `--gpu-memory-utilization` 或减少 `--max-"
"num-seqs` 和 `--max-num-batched-tokens`。"

#~ msgid ""
#~ "[[v0.7.3.post1] FAQ & Feedback](https://github.com"
#~ "/vllm-project/vllm-ascend/issues/1007)"
#~ msgstr ""
#~ "[[v0.7.3.post1] 常见问题与反馈](https://github.com/vllm-project"
#~ "/vllm-ascend/issues/1007)"

#~ msgid "7. How does vllm-ascend perform?"
#~ msgstr "7. vllm-ascend 的性能如何？"

#~ msgid ""
#~ "Currently, only some models are "
#~ "improved. Such as `Qwen2.5 VL`, `Qwen3`,"
#~ " `Deepseek  V3`. Others are not good"
#~ " enough. From 0.9.0rc2, Qwen and "
#~ "Deepseek works with graph mode to "
#~ "play a good performance. What's more,"
#~ " you can install `mindie-turbo` with"
#~ " `vllm-ascend v0.7.3` to speed up "
#~ "the inference as well."
#~ msgstr ""
#~ "目前，只有部分模型得到了改进，例如 `Qwen2.5 VL`、`Qwen3` 和 "
#~ "`Deepseek V3`。其他模型的效果还不够理想。从 0.9.0rc2 版本开始，Qwen "
#~ "和 Deepseek 已支持图模式，以获得更好的性能。此外，您还可以在 `vllm-"
#~ "ascend v0.7.3` 上安装 `mindie-turbo` "
#~ "来进一步加速推理。"

#~ msgid ""
#~ "Currently, only 1P1D is supported on "
#~ "V0 Engine. For V1 Engine or NPND"
#~ " support, We will make it stable "
#~ "and supported by vllm-ascend in "
#~ "the future."
#~ msgstr "目前，V0 引擎仅支持 1P1D。对于 V1 引擎或 NPND 的支持，我们将在未来使其稳定并由 vllm-ascend 提供支持。"

#~ msgid ""
#~ "Currently, w8a8 quantization is already "
#~ "supported by vllm-ascend originally on"
#~ " v0.8.4rc2 or higher, If you're using"
#~ " vllm 0.7.3 version, w8a8 quantization "
#~ "is supporeted with the integration of"
#~ " vllm-ascend and mindie-turbo, please"
#~ " use `pip install vllm-ascend[mindie-"
#~ "turbo]`."
#~ msgstr ""
#~ "目前，w8a8 量化已在 v0.8.4rc2 或更高版本的 vllm-"
#~ "ascend 中原生支持。如果您使用的是 vllm 0.7.3 版本，通过集成 "
#~ "vllm-ascend 和 mindie-turbo 也支持 w8a8"
#~ " 量化，请使用 `pip install vllm-ascend[mindie-"
#~ "turbo]`。"

#~ msgid "11. How to run w8a8 DeepSeek model?"
#~ msgstr "11. 如何运行 w8a8 DeepSeek 模型？"

#~ msgid ""
#~ "Please following the [inferencing "
#~ "tutorial](https://vllm-"
#~ "ascend.readthedocs.io/en/latest/tutorials/multi_node.html) and"
#~ " replace model to DeepSeek."
#~ msgstr ""
#~ "请按照[推理教程](https://vllm-"
#~ "ascend.readthedocs.io/en/latest/tutorials/multi_node.html)进行操作，并将模型替换为"
#~ " DeepSeek。"

#~ msgid ""
#~ "12. There is no output in log "
#~ "when loading models using vllm-ascend,"
#~ " How to solve it?"
#~ msgstr "12. 使用 vllm-ascend 加载模型时日志没有输出，如何解决？"

#~ msgid ""
#~ "If you're using vllm 0.7.3 version, "
#~ "this is a known progress bar "
#~ "display issue in VLLM, which has "
#~ "been resolved in [this PR](https://github.com"
#~ "/vllm-project/vllm/pull/12428), please cherry-"
#~ "pick it locally by yourself. Otherwise,"
#~ " please fill up an issue."
#~ msgstr ""
#~ "如果您使用的是 vllm 0.7.3 版本，这是 VLLM "
#~ "中一个已知的进度条显示问题，已在 [此 PR](https://github.com/vllm-"
#~ "project/vllm/pull/12428) 中解决，请自行在本地进行 cherry-"
#~ "pick。否则，请提交一个 issue。"

#~ msgid ""
#~ "You may encounter the following error"
#~ " if running DeepSeek with NPU graph"
#~ " mode enabled. The allowed number of"
#~ " queries per kv when enabling both"
#~ " MLA and Graph mode only support "
#~ "{32, 64, 128}, **Thus this is not"
#~ " supported for DeepSeek-V2-Lite**, as it"
#~ " only has 16 attention heads. The "
#~ "NPU graph mode support on "
#~ "DeepSeek-V2-Lite will be done in the "
#~ "future."
#~ msgstr ""
#~ "如果在启用 NPU 图模式的情况下运行 DeepSeek，您可能会遇到以下错误。当同时启用 "
#~ "MLA 和图模式时，每个 kv 允许的查询数仅支持 {32, 64, "
#~ "128}，**因此这不支持 DeepSeek-V2-Lite**，因为它只有 16 "
#~ "个注意力头。未来将增加对 DeepSeek-V2-Lite 的 NPU 图模式支持。"