[v0.18.0][Doc] Translated Doc files 2026-04-22 (#8565)

## Auto-Translation Summary Translated **43** file(s): - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/KV_Cache_Pool_Guide.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/cpu_binding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/disaggregated_prefill.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/eplb_swift_balancer.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/npugraph_ex.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/patch.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/quantization.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/index.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/multi_node_test.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_ais_bench.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_evalscope.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_lm_eval.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_opencompass.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/faqs.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/installation.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_colocated_mooncake_multi_instance.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/DeepSeek-V3.1.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM4.x.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM5.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/PaddleOCR-VL.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen-VL-Dense.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-235B-A22B.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3_embedding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/Fine_grained_TP.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/batch_invariance.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/context_parallel.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/cpu_binding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/dynamic_batch.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/eplb_swift_balancer.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/kv_pool.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/layer_sharding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/netloader.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/npugraph_ex.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/sleep_mode.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/ucm_deployment.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/weight_prefetch.po</code> --- [Workflow run](https://github.com/vllm-project/vllm-ascend/actions/runs/24767290887) Signed-off-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com> Co-authored-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
2026-04-23 11:06:05 +08:00
parent 9e31e4f234
commit 0c458aa6dc
43 changed files with 1389 additions and 1012 deletions
--- a/docs/source/locale/zh_CN/LC_MESSAGES/faqs.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/faqs.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version:  vllm-ascend\n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -57,14 +57,16 @@ msgid ""
 "series (Atlas-A3-cann-kernels) and Atlas 300I (Ascend-cann-kernels-310p) "
 "series are supported:"
 msgstr ""
-"目前，**仅**支持 Atlas A2 系列（Ascend-cann-kernels-910b）、Atlas A3 系列（Atlas-A3-cann-kernels）和 Atlas 300I（Ascend-cann-kernels-310p）系列："
+"目前，**仅**支持 Atlas A2 系列（Ascend-cann-kernels-910b）、Atlas A3 系列（Atlas-A3"
+"-cann-kernels）和 Atlas 300I（Ascend-cann-kernels-310p）系列："

 #: ../../source/faqs.md:14
 msgid ""
 "Atlas A2 Training series (Atlas 800T A2, Atlas 900 A2 PoD, Atlas 200T A2 "
 "Box16, Atlas 300T A2)"
 msgstr ""
-"Atlas A2 训练系列（Atlas 800T A2、Atlas 900 A2 PoD、Atlas 200T A2 Box16、Atlas 300T A2）"
+"Atlas A2 训练系列（Atlas 800T A2、Atlas 900 A2 PoD、Atlas 200T A2 Box16、Atlas "
+"300T A2）"

 #: ../../source/faqs.md:15
 msgid "Atlas 800I A2 Inference series (Atlas 800I A2)"
@@ -74,8 +76,7 @@ msgstr "Atlas 800I A2 推理系列（Atlas 800I A2）"
 msgid ""
 "Atlas A3 Training series (Atlas 800T A3, Atlas 900 A3 SuperPoD, Atlas "
 "9000 A3 SuperPoD)"
-msgstr ""
-"Atlas A3 训练系列（Atlas 800T A3、Atlas 900 A3 SuperPoD、Atlas 9000 A3 SuperPoD）"
+msgstr "Atlas A3 训练系列（Atlas 800T A3、Atlas 900 A3 SuperPoD、Atlas 9000 A3 SuperPoD）"

 #: ../../source/faqs.md:17
 msgid "Atlas 800I A3 Inference series (Atlas 800I A3)"
@@ -109,7 +110,8 @@ msgid ""
 "supported. Otherwise, we have to implement it by using custom ops. We "
 "also welcome you to join us to improve together."
 msgstr ""
-"从技术角度看，如果 torch-npu 支持某设备，则 vllm-ascend 也支持该设备。否则，我们需要通过自定义算子来实现。我们也欢迎您加入我们，共同改进。"
+"从技术角度看，如果 torch-npu 支持某设备，则 vllm-ascend "
+"也支持该设备。否则，我们需要通过自定义算子来实现。我们也欢迎您加入我们，共同改进。"

 #: ../../source/faqs.md:28
 msgid "2. How to get our docker containers?"
@@ -158,8 +160,7 @@ msgstr "3. vllm-ascend 支持哪些模型？"
 msgid ""
 "Find more details "
 "[<u>here</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html)."
-msgstr ""
-"更多详细信息请参见[<u>此处</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html)。"
+msgstr "更多详细信息请参见[<u>此处</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html)。"

 #: ../../source/faqs.md:74
 msgid "4. How to get in touch with our community?"
@@ -190,13 +191,17 @@ msgstr "参加我们的[<u>每周例会</u>](https://docs.google.com/document/d/
 msgid ""
 "Join our [<u>WeChat</u>](https://github.com/vllm-project/vllm-"
 "ascend/issues/227) group and ask your questions."
-msgstr "加入我们的[<u>微信群</u>](https://github.com/vllm-project/vllm-ascend/issues/227)并提出您的问题。"
+msgstr ""
+"加入我们的[<u>微信群</u>](https://github.com/vllm-project/vllm-"
+"ascend/issues/227)并提出您的问题。"

 #: ../../source/faqs.md:81
 msgid ""
 "Join our ascend channel in [<u>vLLM forums</u>](https://discuss.vllm.ai/c"
 "/hardware-support/vllm-ascend-support/6) and publish your topics."
-msgstr "加入我们在 [<u>vLLM 论坛</u>](https://discuss.vllm.ai/c/hardware-support/vllm-ascend-support/6) 的 ascend 频道并发布您的主题。"
+msgstr ""
+"加入我们在 [<u>vLLM 论坛</u>](https://discuss.vllm.ai/c/hardware-support/vllm-"
+"ascend-support/6) 的 ascend 频道并发布您的主题。"

 #: ../../source/faqs.md:83
 msgid "5. What features does vllm-ascend V1 supports?"
@@ -206,8 +211,7 @@ msgstr "5. vllm-ascend V1 支持哪些功能？"
 msgid ""
 "Find more details "
 "[<u>here</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)."
-msgstr ""
-"更多详细信息请参见[<u>此处</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)。"
+msgstr "更多详细信息请参见[<u>此处</u>](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)。"

 #: ../../source/faqs.md:87
 msgid ""
@@ -256,7 +260,9 @@ msgid ""
 "0.9.1, you should use vllm-ascend 0.9.1 as well. For the main branch, we "
 "ensure that `vllm-ascend` and `vllm` are compatible at every commit."
 msgstr ""
-"`vllm-ascend` 是 vLLM 的一个硬件插件。`vllm-ascend` 的版本与 `vllm` 的版本相同。例如，如果您使用 `vllm` 0.9.1，您也应该使用 vllm-ascend 0.9.1。对于主分支，我们确保 `vllm-ascend` 和 `vllm` 在每次提交时都是兼容的。"
+"`vllm-ascend` 是 vLLM 的一个硬件插件。`vllm-ascend` 的版本与 `vllm` 的版本相同。例如，如果您使用 "
+"`vllm` 0.9.1，您也应该使用 vllm-ascend 0.9.1。对于主分支，我们确保 `vllm-ascend` 和 `vllm` "
+"在每次提交时都是兼容的。"

 #: ../../source/faqs.md:109
 msgid "8. Does vllm-ascend support Prefill Disaggregation feature?"
@@ -269,7 +275,8 @@ msgid ""
 "tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"
 " for example."
 msgstr ""
-"是的，vllm-ascend 支持通过 Mooncake 后端实现 Prefill Disaggregation 功能。示例请参见[官方教程](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)。"
+"是的，vllm-ascend 支持通过 Mooncake 后端实现 Prefill Disaggregation "
+"功能。示例请参见[官方教程](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)。"

 #: ../../source/faqs.md:113
 msgid "9. Does vllm-ascend support quantization method?"
@@ -299,7 +306,8 @@ msgid ""
 "features](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)"
 " through E2E test."
 msgstr ""
-"**功能测试**：我们添加了 CI，包括部分 vllm 的原生单元测试和 vllm-ascend 自身的单元测试。在 vllm-ascend 的测试中，我们通过端到端测试来验证基本功能、主流模型的可用性以及[支持的功能](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)。"
+"**功能测试**：我们添加了 CI，包括部分 vllm 的原生单元测试和 vllm-ascend 自身的单元测试。在 vllm-ascend "
+"的测试中，我们通过端到端测试来验证基本功能、主流模型的可用性以及[支持的功能](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html)。"

 #: ../../source/faqs.md:123
 msgid ""
@@ -308,13 +316,14 @@ msgid ""
 "benchmark, which can be easily re-run locally. We will publish a perf "
 "website to show the performance test results for each pull request."
 msgstr ""
-"**性能测试**：我们提供了用于端到端性能基准测试的[基准测试](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks)工具，可以方便地在本地重新运行。我们将发布一个性能网站，展示每个拉取请求的性能测试结果。"
+"**性能测试**：我们提供了用于端到端性能基准测试的[基准测试](https://github.com/vllm-project/vllm-"
+"ascend/tree/main/benchmarks)工具，可以方便地在本地重新运行。我们将发布一个性能网站，展示每个拉取请求的性能测试结果。"

 #: ../../source/faqs.md:125
 msgid ""
 "**Accuracy test**: We are working on adding accuracy test to the CI as "
 "well."
-msgstr "**准确性测试**：我们正在努力将准确性测试也添加到CI中。"
+msgstr "**准确性测试**：我们正在努力将准确性测试也添加到 CI 中。"

 #: ../../source/faqs.md:127
 msgid ""
@@ -341,8 +350,9 @@ msgid ""
 "to the version of the vLLM package you have installed. The format of "
 "`VLLM_VERSION` should be `X.Y.Z`."
 msgstr ""
-"此问题通常是由于安装了开发版或可编辑版本的 vLLM 包引起的。为此，我们提供了环境变量 `VLLM_VERSION`，允许用户指定要使用的 vLLM "
-"包版本。请将环境变量 `VLLM_VERSION` 设置为你已安装的 vLLM 包的版本。`VLLM_VERSION` 的格式应为 `X.Y.Z`。"
+"此问题通常是由于安装了开发版或可编辑版本的 vLLM 包引起的。为此，我们提供了环境变量 `VLLM_VERSION`，允许用户指定要使用的 "
+"vLLM 包版本。请将环境变量 `VLLM_VERSION` 设置为你已安装的 vLLM 包的版本。`VLLM_VERSION` 的格式应为 "
+"`X.Y.Z`。"

 #: ../../source/faqs.md:135
 msgid "12. How to handle the out-of-memory issue?"
@@ -356,20 +366,22 @@ msgid ""
 "documentation](https://docs.vllm.ai/en/latest/usage/troubleshooting/#out-"
 "of-memory)."
 msgstr ""
-"当模型超出单个 NPU 的内存容量时，通常会发生 OOM（内存不足）错误。一般性指导可参考 [vLLM OOM 故障排除文档](https://docs.vllm.ai/en/latest/usage/troubleshooting/#out-of-memory)。"
+"当模型超出单个 NPU 的内存容量时，通常会发生 OOM（内存不足）错误。一般性指导可参考 [vLLM OOM "
+"故障排除文档](https://docs.vllm.ai/en/latest/usage/troubleshooting/#out-of-"
+"memory)。"

 #: ../../source/faqs.md:139
 msgid ""
-"In scenarios where NPUs have limited high bandwidth memory (HBM) "
-"capacity, dynamic memory allocation/deallocation during inference can "
-"exacerbate memory fragmentation, leading to OOM. To address this:"
-msgstr "在 NPU 的高带宽内存容量有限的场景下，推理过程中的动态内存分配/释放会加剧内存碎片，导致 OOM。为解决此问题："
+"In scenarios where NPUs have limited high bandwidth memory (on-chip "
+"memory) capacity, dynamic memory allocation/deallocation during inference"
+" can exacerbate memory fragmentation, leading to OOM. To address this:"
+msgstr "在 NPU 的高带宽内存（片上内存）容量有限的场景下，推理过程中的动态内存分配/释放会加剧内存碎片，导致 OOM。为解决此问题："

 #: ../../source/faqs.md:141
 msgid ""
-"**Limit `--max-model-len`**: It can save the HBM usage for KV cache "
-"initialization step."
-msgstr "**限制 `--max-model-len`**：它可以节省 KV 缓存初始化步骤的 HBM 使用量。"
+"**Limit `--max-model-len`**: It can save the on-chip memory usage for KV "
+"cache initialization step."
+msgstr "**限制 `--max-model-len`**：它可以节省 KV 缓存初始化步骤的片上内存使用量。"

 #: ../../source/faqs.md:143
 msgid ""
@@ -379,7 +391,9 @@ msgid ""
 "Engine Arguments](https://docs.vllm.ai/en/latest/cli/serve/#-gpu-memory-"
 "utilization)."
 msgstr ""
-"**调整 `--gpu-memory-utilization`**：如果未指定，默认值为 `0.9`。你可以降低此值以预留更多内存，从而减少碎片风险。详情参见：[vLLM - 推理与服务 - 引擎参数](https://docs.vllm.ai/en/latest/cli/serve/#-gpu-memory-utilization)。"
+"**调整 `--gpu-memory-utilization`**：如果未指定，默认值为 "
+"`0.9`。你可以降低此值以预留更多内存，从而减少碎片风险。详情参见：[vLLM - 推理与服务 - "
+"引擎参数](https://docs.vllm.ai/en/latest/cli/serve/#-gpu-memory-utilization)。"

 #: ../../source/faqs.md:145
 msgid ""
@@ -401,22 +415,24 @@ msgstr "13. 运行 DeepSeek 时无法启用 NPU 图模式"
 #: ../../source/faqs.md:149
 msgid ""
 "Enabling NPU graph mode for DeepSeek may trigger an error. This is "
-"because when both MLA and NPU graph mode are active, the number of "
-"queries per KV head must be 32, 64, or 128. However, DeepSeek-V2-Lite has"
-" only 16 attention heads, which results in 16 queries per KV—a value "
-"outside the supported range. Support for NPU graph mode on "
-"DeepSeek-V2-Lite will be added in a future update."
+"because when both MLA (Multi-Head Latent Attention) and NPU graph mode "
+"are active, the number of queries per KV head must be 32, 64, or 128. "
+"However, DeepSeek-V2-Lite has only 16 attention heads, which results in "
+"16 queries per KV—a value outside the supported range. Support for NPU "
+"graph mode on DeepSeek-V2-Lite will be added in a future update."
 msgstr ""
-"为 DeepSeek 启用 NPU 图模式可能会触发错误。这是因为当 MLA 和 NPU 图模式同时激活时，每个 KV 头的查询数必须为 32、64 或 "
-"128。然而，DeepSeek-V2-Lite 只有 16 个注意力头，导致每个 KV 有 16 个查询，该值超出了支持范围。对 "
+"为 DeepSeek 启用 NPU 图模式可能会触发错误。这是因为当 MLA（多头潜在注意力）和 NPU 图模式同时激活时，每个 KV 头的查询数必须为 "
+"32、64 或 128。然而，DeepSeek-V2-Lite 只有 16 个注意力头，导致每个 KV 有 16 个查询，该值超出了支持范围。对 "
 "DeepSeek-V2-Lite 的 NPU 图模式支持将在未来的更新中添加。"

 #: ../../source/faqs.md:151
+#, python-brace-format
 msgid ""
 "And if you're using DeepSeek-V3 or DeepSeek-R1, please make sure after "
 "the tensor parallel split, `num_heads`/`num_kv_heads` is {32, 64, 128}."
 msgstr ""
-"如果你正在使用 DeepSeek-V3 或 DeepSeek-R1，请确保在张量并行切分后，`num_heads`/`num_kv_heads` 的值为 {32, 64, 128} 中的一个。"
+"如果你正在使用 DeepSeek-V3 或 DeepSeek-R1，请确保在张量并行切分后，`num_heads`/`num_kv_heads` "
+"的值为 {32, 64, 128} 中的一个。"

 #: ../../source/faqs.md:158
 msgid ""
@@ -431,7 +447,8 @@ msgid ""
 "fails, use `python setup.py install` (recommended) to install, or use "
 "`python setup.py clean` to clear the cache."
 msgstr ""
-"使用 pip 从源码重新安装 vllm-ascend 时，可能会遇到 C/C++ 编译失败的问题。如果安装失败，请使用 `python setup.py install`（推荐）进行安装，或使用 `python setup.py clean` 清除缓存。"
+"使用 pip 从源码重新安装 vllm-ascend 时，可能会遇到 C/C++ 编译失败的问题。如果安装失败，请使用 `python "
+"setup.py install`（推荐）进行安装，或使用 `python setup.py clean` 清除缓存。"

 #: ../../source/faqs.md:162
 msgid "15. How to generate deterministic results when using vllm-ascend?"
@@ -445,8 +462,7 @@ msgstr "有几个因素会影响输出的确定性："
 msgid ""
 "Sampler method: using **greedy sampling** by setting `temperature=0` in "
 "`SamplingParams`, e.g.:"
-msgstr ""
-"采样方法：通过在 `SamplingParams` 中设置 `temperature=0` 来使用 **贪婪采样**，例如："
+msgstr "采样方法：通过在 `SamplingParams` 中设置 `temperature=0` 来使用 **贪婪采样**，例如："

 #: ../../source/faqs.md:191
 msgid "Set the following environment parameters:"
@@ -456,7 +472,9 @@ msgstr "设置以下环境参数："
 msgid ""
 "16. How to fix the error \"ImportError: Please install vllm[audio] for "
 "audio support\" for the Qwen2.5-Omni model？"
-msgstr "16. 对于 Qwen2.5-Omni 模型，如何修复 \"ImportError: Please install vllm[audio] for audio support\" 错误？"
+msgstr ""
+"16. 对于 Qwen2.5-Omni 模型，如何修复 \"ImportError: Please install vllm[audio] for"
+" audio support\" 错误？"

 #: ../../source/faqs.md:202
 msgid ""
@@ -467,7 +485,9 @@ msgid ""
 "`ImportError: No module named 'librosa'` issue and ensuring that the "
 "audio processing functionality works correctly."
 msgstr ""
-"`Qwen2.5-Omni` 模型需要安装 `librosa` 包，你需要安装 `qwen-omni-utils` 包以确保满足所有依赖，运行 `pip install qwen-omni-utils`。此包将安装 `librosa` 及其相关依赖，解决 `ImportError: No module named 'librosa'` 问题，并确保音频处理功能正常工作。"
+"`Qwen2.5-Omni` 模型需要安装 `librosa` 包，你需要安装 `qwen-omni-utils` 包以确保满足所有依赖，运行 "
+"`pip install qwen-omni-utils`。此包将安装 `librosa` 及其相关依赖，解决 `ImportError: No "
+"module named 'librosa'` 问题，并确保音频处理功能正常工作。"

 #: ../../source/faqs.md:205
 msgid ""
@@ -480,10 +500,13 @@ msgid "Recommended mitigation strategies:"
 msgstr "推荐的缓解策略："

 #: ../../source/faqs.md:215
+#, python-brace-format
 msgid ""
 "Manually configure the compilation_config parameter with a reduced size "
 "set: '{\"cudagraph_capture_sizes\":[size1, size2, size3, ...]}'."
-msgstr "手动配置 compilation_config 参数，使用缩减后的尺寸集合：'{\"cudagraph_capture_sizes\":[size1, size2, size3, ...]}'。"
+msgstr ""
+"手动配置 compilation_config "
+"参数，使用缩减后的尺寸集合：'{\"cudagraph_capture_sizes\":[size1, size2, size3, ...]}'。"

 #: ../../source/faqs.md:216
 msgid ""
@@ -502,7 +525,9 @@ msgid ""
 "additional streams outside of this calculation framework, resulting in "
 "stream resource exhaustion during size capture operations."
 msgstr ""
-"根本原因分析：当前尺寸捕获的流需求计算仅考虑了可测量的因素，包括：数据并行大小、张量并行大小、专家并行配置、分段图数量、多流重叠共享专家设置以及 HCCL 通信模式（AIV/AICPU）。然而，许多不可量化的因素，例如算子特性和特定硬件特性，在此计算框架之外消耗了额外的流，导致尺寸捕获操作期间流资源耗尽。"
+"根本原因分析：当前尺寸捕获的流需求计算仅考虑了可测量的因素，包括：数据并行大小、张量并行大小、专家并行配置、分段图数量、多流重叠共享专家设置以及 "
+"HCCL "
+"通信模式（AIV/AICPU）。然而，许多不可量化的元素，例如算子特性和特定硬件特性，在此计算框架之外消耗了额外的流，导致尺寸捕获操作期间流资源耗尽。"

 #: ../../source/faqs.md:221
 msgid "18. How to install custom version of torch_npu?"
@@ -513,7 +538,9 @@ msgid ""
 "torch-npu will be overridden  when installing vllm-ascend. If you need to"
 " install a specific version of torch-npu, you can manually install the "
 "specified version of torch-npu after vllm-ascend is installed."
-msgstr "安装 vllm-ascend 时会覆盖 torch-npu。如果你需要安装特定版本的 torch-npu，可以在 vllm-ascend 安装后手动安装指定版本的 torch-npu。"
+msgstr ""
+"安装 vllm-ascend 时会覆盖 torch-npu。如果你需要安装特定版本的 torch-npu，可以在 vllm-ascend "
+"安装后手动安装指定版本的 torch-npu。"

 #: ../../source/faqs.md:225
 msgid ""
@@ -565,7 +592,9 @@ msgid ""
 "security risk. Only use this option if you understand the implications "
 "and trust the container's source."
 msgstr ""
-"使用 `--shm-size` 时，你可能需要在 `docker run` 命令中添加 `--privileged=true` 标志，以授予容器必要的权限。请注意，使用 `--privileged=true` 会授予容器在主机系统上的广泛权限，这可能带来安全风险。只有在理解其影响并信任容器来源的情况下才使用此选项。"
+"使用 `--shm-size` 时，你可能需要在 `docker run` 命令中添加 `--privileged=true` "
+"标志，以授予容器必要的权限。请注意，使用 `--privileged=true` "
+"会授予容器在主机系统上的广泛权限，这可能带来安全风险。只有在理解其影响并信任容器来源的情况下才使用此选项。"

 #: ../../source/faqs.md:256
 msgid "21. How to achieve low latency in a small batch scenario?"
@@ -580,7 +609,10 @@ msgid ""
 "`tools/install_flash_infer_attention_score_ops_a3.sh`, you can install it"
 " using the following instruction:"
 msgstr ""
-"`torch_npu.npu_fused_infer_attention_score` 在小批量场景下的性能不理想，主要是由于缺乏 Flash Decoding 功能。我们在 `tools/install_flash_infer_attention_score_ops_a2.sh` 和 `tools/install_flash_infer_attention_score_ops_a3.sh` 中提供了一个替代算子，你可以使用以下指令安装它："
+"`torch_npu.npu_fused_infer_attention_score` 在小批量场景下的性能不理想，主要是由于缺乏 Flash "
+"Decoding 功能。我们在 `tools/install_flash_infer_attention_score_ops_a2.sh` 和 "
+"`tools/install_flash_infer_attention_score_ops_a3.sh` "
+"中提供了一个替代算子，你可以使用以下指令安装它："

 #: ../../source/faqs.md:266
 msgid ""
@@ -593,7 +625,12 @@ msgid ""
 "create one. If you're not the root user, you need `sudo` **privileges** "
 "to run this script."
 msgstr ""
-"**注意**：使用此方法时不要设置 `additional_config.pa_shape_list`；否则会导致使用另一个注意力算子。**重要**：请确保你使用的是 `vllm-ascend` 的**官方镜像**；否则，你**必须将** `tools/install_flash_infer_attention_score_ops_a2.sh` 或 `tools/install_flash_infer_attention_score_ops_a3.sh` 中的目录 `/vllm-workspace` **更改为你自己的目录**，或者创建一个。如果你不是 root 用户，则需要 `sudo` **权限**来运行此脚本。"
+"**注意**：使用此方法时不要设置 "
+"`additional_config.pa_shape_list`；否则会导致使用另一个注意力算子。**重要**：请确保你使用的是 `vllm-"
+"ascend` 的**官方镜像**；否则，你**必须将** "
+"`tools/install_flash_infer_attention_score_ops_a2.sh` 或 "
+"`tools/install_flash_infer_attention_score_ops_a3.sh` 中的目录 `/vllm-"
+"workspace` **更改为你自己的目录**，或者创建一个。如果你不是 root 用户，则需要 `sudo` **权限**来运行此脚本。"

 #: ../../source/faqs.md:269
 msgid ""
@@ -608,7 +645,8 @@ msgid ""
 "(common in CPU-only build environments), you must set `SOC_VERSION` "
 "manually before installation."
 msgstr ""
-"从源码构建时（例如 `pip install -e .`），构建过程可能会尝试通过 `npu-smi` 推断目标芯片。如果 `npu-smi` 不可用（在仅含 CPU 的构建环境中很常见），则必须在安装前手动设置 `SOC_VERSION`。"
+"从源码构建时（例如 `pip install -e .`），构建过程可能会尝试通过 `npu-smi` 推断目标芯片。如果 `npu-smi` "
+"不可用（在仅含 CPU 的构建环境中很常见），则必须在安装前手动设置 `SOC_VERSION`。"

 #: ../../source/faqs.md:273
 msgid "You can use the defaults from `Dockerfile*` as a reference. For example:"
@@ -626,7 +664,9 @@ msgid ""
 "issue, please use the official docker images or install the specific "
 "triton-ascend version as following:"
 msgstr ""
-"如 [#7782](https://github.com/vllm-project/vllm-ascend/issues/7782) 所示，triton-ascend 偶尔会遇到编译错误，这是 triton-ascend 3.2.0 中的一个已知问题。为避免此问题，请使用官方 docker 镜像或按以下方式安装特定的 triton-ascend 版本："
+"如 [#7782](https://github.com/vllm-project/vllm-ascend/issues/7782) 所示"
+"，triton-ascend 偶尔会遇到编译错误，这是 triton-ascend 3.2.0 中的一个已知问题。为避免此问题，请使用官方 "
+"docker 镜像或按以下方式安装特定的 triton-ascend 版本："

 #: ../../source/faqs.md:300
 msgid "24. Why TPOT increases drastically as concurrency grows?"
@@ -647,14 +687,20 @@ msgid ""
 "the future, which is why the performance might drop significantly. There "
 "are several ways to verify this:"
 msgstr ""
-"在测试 vLLM 服务器时，可能会发现 TPOT 随着并发度的增加而增加（例如，并发度增加 4 时，TPOT 增加 0.5 ~ 1ms）。在大多数情况下，这种现象是正常的。然而，有时随着并发度的增长，TPOT 可能会急剧增加（例如增加 10 到 100ms）。这可能是由 vLLM 中的 [**抢占**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption) 引起的。通常，当服务器达到 KV 缓存限制时，vLLM 会尝试释放请求的 KV 缓存，以确保为其他请求提供足够的空间，这在 vLLM 中称为抢占。当一个请求被抢占时，默认行为是在未来重新计算该请求的 KV 缓存，这就是性能可能显著下降的原因。有几种方法可以验证这一点："
+"在测试 vLLM 服务器时，可能会发现 TPOT 随着并发度的增加而增加（例如，并发度增加 4 时，TPOT 增加 0.5 ~ "
+"1ms）。在大多数情况下，这种现象是正常的。然而，有时随着并发度的增长，TPOT 可能会急剧增加（例如增加 10 到 100ms）。这可能是由 "
+"vLLM 中的 "
+"[**抢占**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption)"
+" 引起的。通常，当服务器达到 KV 缓存限制时，vLLM 会尝试释放请求的 KV 缓存，以确保为其他请求提供足够的空间，这在 vLLM "
+"中称为抢占。当一个请求被抢占时，默认行为是在未来重新计算该请求的 KV 缓存，这就是性能可能显著下降的原因。有几种方法可以验证这一点："

 #: ../../source/faqs.md:305
 msgid ""
 "vLLM usually logs stats on your server. You might see metrics like `GPU "
 "KV cache usage: 99.0%,`. When reaching 100%, it triggers preemption."
 msgstr ""
-"vLLM 通常会在服务器上记录统计信息。您可能会看到类似 `GPU KV cache usage: 99.0%,` 的指标。当达到 100% 时，会触发抢占。"
+"vLLM 通常会在服务器上记录统计信息。您可能会看到类似 `GPU KV cache usage: 99.0%,` 的指标。当达到 100% "
+"时，会触发抢占。"

 #: ../../source/faqs.md:306
 msgid ""
@@ -663,7 +709,9 @@ msgid ""
 "4.05`. These are estimated KV cache capacity for a single DP group. You "
 "can adjust the overall request traffic according to this."
 msgstr ""
-"启动 vLLM 服务器时，您会看到类似 `GPU KV cache size: 66340 tokens` 和 `Maximum concurrency for 16,384 tokens per request: 4.05` 的日志。这些是针对单个 DP 组的估计 KV 缓存容量。您可以据此调整总体请求流量。"
+"启动 vLLM 服务器时，您会看到类似 `GPU KV cache size: 66340 tokens` 和 `Maximum "
+"concurrency for 16,384 tokens per request: 4.05` 的日志。这些是针对单个 DP 组的估计 KV "
+"缓存容量。您可以据此调整总体请求流量。"

 #: ../../source/faqs.md:308
 msgid ""
@@ -675,7 +723,10 @@ msgid ""
 "can increase `--gpu-memory-utilization` or decrease `--max-num-seqs` && "
 "`--max-num-batched-tokens`."
 msgstr ""
-"抢占无法完全避免，因为 KV 缓存的使用总是有限制的。但有方法可以减少抢占的发生几率。正如 [**抢占**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption) 中所建议的，核心策略是增加可用的 KV 缓存。例如，可以增加 `--gpu-memory-utilization` 或减少 `--max-num-seqs` 和 `--max-num-batched-tokens`。"
+"抢占无法完全避免，因为 KV 缓存的使用总是有限制的。但有方法可以减少抢占的发生几率。正如 "
+"[**抢占**](https://docs.vllm.ai/en/latest/configuration/optimization/#preemption)"
+" 中所建议的，核心策略是增加可用的 KV 缓存。例如，可以增加 `--gpu-memory-utilization` 或减少 `--max-"
+"num-seqs` 和 `--max-num-batched-tokens`。"

 #~ msgid ""
 #~ "[[v0.7.3.post1] FAQ & Feedback](https://github.com"
@@ -701,7 +752,8 @@ msgstr ""
 #~ "目前，只有部分模型得到了改进，例如 `Qwen2.5 VL`、`Qwen3` 和 "
 #~ "`Deepseek V3`。其他模型的效果还不够理想。从 0.9.0rc2 版本开始，Qwen "
 #~ "和 Deepseek 已支持图模式，以获得更好的性能。此外，您还可以在 `vllm-"
-#~ "ascend v0.7.3` 上安装 `mindie-turbo` 来进一步加速推理。"
+#~ "ascend v0.7.3` 上安装 `mindie-turbo` "
+#~ "来进一步加速推理。"

 #~ msgid ""
 #~ "Currently, only 1P1D is supported on "
@@ -721,7 +773,11 @@ msgstr ""
 #~ " use `pip install vllm-ascend[mindie-"
 #~ "turbo]`."
 #~ msgstr ""
-#~ "目前，w8a8 量化已在 v0.8.4rc2 或更高版本的 vllm-ascend 中原生支持。如果您使用的是 vllm 0.7.3 版本，通过集成 vllm-ascend 和 mindie-turbo 也支持 w8a8 量化，请使用 `pip install vllm-ascend[mindie-turbo]`。"
+#~ "目前，w8a8 量化已在 v0.8.4rc2 或更高版本的 vllm-"
+#~ "ascend 中原生支持。如果您使用的是 vllm 0.7.3 版本，通过集成 "
+#~ "vllm-ascend 和 mindie-turbo 也支持 w8a8"
+#~ " 量化，请使用 `pip install vllm-ascend[mindie-"
+#~ "turbo]`。"

 #~ msgid "11. How to run w8a8 DeepSeek model?"
 #~ msgstr "11. 如何运行 w8a8 DeepSeek 模型？"
@@ -733,7 +789,8 @@ msgstr ""
 #~ " replace model to DeepSeek."
 #~ msgstr ""
 #~ "请按照[推理教程](https://vllm-"
-#~ "ascend.readthedocs.io/en/latest/tutorials/multi_node.html)进行操作，并将模型替换为 DeepSeek。"
+#~ "ascend.readthedocs.io/en/latest/tutorials/multi_node.html)进行操作，并将模型替换为"
+#~ " DeepSeek。"

 #~ msgid ""
 #~ "12. There is no output in log "
@@ -750,7 +807,10 @@ msgstr ""
 #~ "pick it locally by yourself. Otherwise,"
 #~ " please fill up an issue."
 #~ msgstr ""
-#~ "如果您使用的是 vllm 0.7.3 版本，这是 VLLM 中一个已知的进度条显示问题，已在 [此 PR](https://github.com/vllm-project/vllm/pull/12428) 中解决，请自行在本地进行 cherry-pick。否则，请提交一个 issue。"
+#~ "如果您使用的是 vllm 0.7.3 版本，这是 VLLM "
+#~ "中一个已知的进度条显示问题，已在 [此 PR](https://github.com/vllm-"
+#~ "project/vllm/pull/12428) 中解决，请自行在本地进行 cherry-"
+#~ "pick。否则，请提交一个 issue。"

 #~ msgid ""
 #~ "You may encounter the following error"
@@ -765,4 +825,7 @@ msgstr ""
 #~ "DeepSeek-V2-Lite will be done in the "
 #~ "future."
 #~ msgstr ""
-#~ "如果在启用 NPU 图模式的情况下运行 DeepSeek，您可能会遇到以下错误。当同时启用 MLA 和图模式时，每个 kv 允许的查询数仅支持 {32, 64, 128}，**因此这不支持 DeepSeek-V2-Lite**，因为它只有 16 个注意力头。未来将增加对 DeepSeek-V2-Lite 的 NPU 图模式支持。"
+#~ "如果在启用 NPU 图模式的情况下运行 DeepSeek，您可能会遇到以下错误。当同时启用 "
+#~ "MLA 和图模式时，每个 kv 允许的查询数仅支持 {32, 64, "
+#~ "128}，**因此这不支持 DeepSeek-V2-Lite**，因为它只有 16 "
+#~ "个注意力头。未来将增加对 DeepSeek-V2-Lite 的 NPU 图模式支持。"