[v0.18.0][Doc] Translated Doc files 2026-04-15 (#8309)

## Auto-Translation Summary Translated **19** file(s): - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/contributors.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/KV_Cache_Pool_Guide.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/ModelRunner_prepare_inputs.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/cpu_binding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Kimi-K2.5.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen2.5-Omni.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Dense.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/Fine_grained_TP.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/epd_disaggregation.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/external_dp.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/large_scale_ep.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/release_notes.po</code> --- [Workflow run](https://github.com/vllm-project/vllm-ascend/actions/runs/24447109402) Signed-off-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com> Co-authored-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
2026-04-17 16:29:30 +08:00
parent ceb1e49661
commit 9c1d58f4d2
19 changed files with 2586 additions and 1581 deletions
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -53,7 +53,12 @@ msgid ""
 "memory usage, it would introduce additional communication and small "
 "operator overhead. Therefore, we will not enable the DCP feature on node "
 "d."
-msgstr "以 Deepseek-V3.1-w8a8 模型为例，使用 3 台 Atlas 800T A3 服务器部署“1P1D”架构。节点 p 跨多台机器部署，而节点 d 部署在单台机器上。假设预填充服务器的 IP 为 192.0.0.1（预填充 1）和 192.0.0.2（预填充 2），解码器服务器为 192.0.0.3（解码器 1）。每台服务器使用 8 个 NPU（16 个芯片）部署一个服务实例。在当前示例中，我们将在节点 p 上启用上下文并行特性以改善 TTFT。虽然在节点 d 上启用 DCP 特性可以减少内存使用，但会引入额外的通信和小算子开销。因此，我们不会在节点 d 上启用 DCP 特性。"
+msgstr ""
+"以 Deepseek-V3.1-w8a8 模型为例，使用 3 台 Atlas 800T A3 服务器部署“1P1D”架构。节点 p "
+"跨多台机器部署，而节点 d 部署在单台机器上。假设预填充服务器的 IP 为 192.0.0.1（预填充 1）和 192.0.0.2（预填充 "
+"2），解码器服务器为 192.0.0.3（解码器 1）。每台服务器使用 8 个 NPU（16 个芯片）部署一个服务实例。在当前示例中，我们将在节点"
+" p 上启用上下文并行特性以改善 TTFT。虽然在节点 d 上启用 DCP "
+"特性可以减少内存使用，但会引入额外的通信和小算子开销。因此，我们不会在节点 d 上启用 DCP 特性。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:13
 msgid "Environment Preparation"
@@ -69,7 +74,11 @@ msgid ""
 "model weight](https://www.modelscope.cn/models/Eco-"
 "Tech/DeepSeek-V3.1-w8a8). Please modify `torch_dtype` from `float16` to "
 "`bfloat16` in `config.json`."
-msgstr "`DeepSeek-V3.1_w8a8mix_mtp`（混合 MTP 量化版本）：[下载模型权重](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-w8a8)。请在 `config.json` 中将 `torch_dtype` 从 `float16` 修改为 `bfloat16`。"
+msgstr ""
+"`DeepSeek-V3.1_w8a8mix_mtp`（混合 MTP "
+"量化版本）：[下载模型权重](https://www.modelscope.cn/models/Eco-"
+"Tech/DeepSeek-V3.1-w8a8)。请在 `config.json` 中将 `torch_dtype` 从 `float16` "
+"修改为 `bfloat16`。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:19
 msgid ""
@@ -86,7 +95,9 @@ msgid ""
 "Refer to [verify multi-node communication "
 "environment](../../installation.md#verify-multi-node-communication) to "
 "verify multi-node communication."
-msgstr "请参考[验证多节点通信环境](../../installation.md#verify-multi-node-communication)来验证多节点通信。"
+msgstr ""
+"请参考[验证多节点通信环境](../../installation.md#verify-multi-node-"
+"communication)来验证多节点通信。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:25
 msgid "Installation"
@@ -101,7 +112,9 @@ msgid ""
 "Select an image based on your machine type and start the Docker image on "
 "your node, refer to [using Docker](../../installation.md#set-up-using-"
 "docker)."
-msgstr "根据您的机器类型选择镜像并在节点上启动 Docker 镜像，请参考[使用 Docker](../../installation.md#set-up-using-docker)。"
+msgstr ""
+"根据您的机器类型选择镜像并在节点上启动 Docker 镜像，请参考[使用 Docker](../../installation.md#set-"
+"up-using-docker)。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:64
 msgid "You need to set up environment on each node."
@@ -119,7 +132,10 @@ msgid ""
 "socket listeners. To avoid any issues, port conflicts should be "
 "prevented. Additionally, ensure that each node's engine_id is uniquely "
 "assigned to avoid conflicts."
-msgstr "我们可以分别在预填充器/解码器节点上运行以下脚本来启动服务器。请注意，每个 P/D 节点将占用从 kv_port 到 kv_port + num_chips 的端口范围来初始化 socket 监听器。为避免任何问题，应防止端口冲突。此外，请确保每个节点的 engine_id 被唯一分配以避免冲突。"
+msgstr ""
+"我们可以分别在预填充器/解码器节点上运行以下脚本来启动服务器。请注意，每个 P/D 节点将占用从 kv_port 到 kv_port + "
+"num_chips 的端口范围来初始化 socket 监听器。为避免任何问题，应防止端口冲突。此外，请确保每个节点的 engine_id "
+"被唯一分配以避免冲突。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:70
 msgid ""
@@ -154,7 +170,10 @@ msgid ""
 "[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-"
 "project/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
-msgstr "在与预填充服务实例相同的节点上运行代理服务器。您可以在仓库的示例中找到代理程序：[load_balance_proxy_server_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
+msgstr ""
+"在与预填充服务实例相同的节点上运行代理服务器。您可以在仓库的示例中找到代理程序：[load_balance_proxy_server_example.py](https://github.com"
+"/vllm-project/vllm-"
+"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:301
 msgid "**Notice:** The parameters are explained as follows:"
@@ -193,21 +212,29 @@ msgid ""
 "state is also counted in metrics such as TTFT and TPOT. Therefore, when "
 "testing performance, it is generally recommended that `--max-num-seqs` * "
 "`--data-parallel-size` >= the actual total concurrency."
-msgstr "`--max-num-seqs` 表示每个 DP 组允许处理的最大请求数。如果发送到服务的请求数量超过此限制，超出的请求将保持在等待状态，不会被调度。请注意，在等待状态所花费的时间也会计入 TTFT 和 TPOT 等指标。因此，在测试性能时，通常建议 `--max-num-seqs` * `--data-parallel-size` >= 实际总并发数。"
+msgstr ""
+"`--max-num-seqs` 表示每个 DP "
+"组允许处理的最大请求数。如果发送到服务的请求数量超过此限制，超出的请求将保持在等待状态，不会被调度。请注意，在等待状态所花费的时间也会计入 "
+"TTFT 和 TPOT 等指标。因此，在测试性能时，通常建议 `--max-num-seqs` * `--data-parallel-size` "
+">= 实际总并发数。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:309
 msgid ""
 "`--max-num-batched-tokens` represents the maximum number of tokens that "
 "the model can process in a single step. Currently, vLLM v1 scheduling "
 "enables ChunkPrefill/SplitFuse by default, which means:"
-msgstr "`--max-num-batched-tokens` 表示模型单步可以处理的最大 token 数。目前，vLLM v1 调度默认启用 ChunkPrefill/SplitFuse，这意味着："
+msgstr ""
+"`--max-num-batched-tokens` 表示模型单步可以处理的最大 token 数。目前，vLLM v1 调度默认启用 "
+"ChunkPrefill/SplitFuse，这意味着："

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:310
 msgid ""
 "(1) If the input length of a request is greater than `--max-num-batched-"
 "tokens`, it will be divided into multiple rounds of computation according"
 " to `--max-num-batched-tokens`;"
-msgstr "（1）如果请求的输入长度大于 `--max-num-batched-tokens`，它将根据 `--max-num-batched-tokens` 被分成多轮计算；"
+msgstr ""
+"（1）如果请求的输入长度大于 `--max-num-batched-tokens`，它将根据 `--max-num-batched-tokens`"
+" 被分成多轮计算；"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:311
 msgid ""
@@ -236,14 +263,22 @@ msgid ""
 "during actual inference (e.g., due to uneven EP load), setting `--gpu-"
 "memory-utilization` too high may lead to OOM (Out of Memory) issues "
 "during actual inference. The default value is `0.9`."
-msgstr "`--gpu-memory-utilization` 表示 vLLM 将用于实际推理的 HBM 比例。其核心功能是计算可用的 kv_cache 大小。在预热阶段（vLLM 中称为 profile run），vLLM 会记录输入大小为 `--max-num-batched-tokens` 的推理过程中的峰值 GPU 内存使用量。然后，可用的 kv_cache 大小计算为：`--gpu-memory-utilization` * HBM 大小 - 峰值 GPU 内存使用量。因此，`--gpu-memory-utilization` 的值越大，可用的 kv_cache 就越多。然而，由于预热阶段的 GPU 内存使用量可能与实际推理期间不同（例如，由于 EP 负载不均），将 `--gpu-memory-utilization` 设置得过高可能导致实际推理时出现 OOM（内存不足）问题。默认值为 `0.9`。"
+msgstr ""
+"`--gpu-memory-utilization` 表示 vLLM 将用于实际推理的 HBM 比例。其核心功能是计算可用的 kv_cache "
+"大小。在预热阶段（vLLM 中称为 profile run），vLLM 会记录输入大小为 `--max-num-batched-tokens` "
+"的推理过程中的峰值 GPU 内存使用量。然后，可用的 kv_cache 大小计算为：`--gpu-memory-utilization` * "
+"HBM 大小 - 峰值 GPU 内存使用量。因此，`--gpu-memory-utilization` 的值越大，可用的 kv_cache "
+"就越多。然而，由于预热阶段的 GPU 内存使用量可能与实际推理期间不同（例如，由于 EP 负载不均），将 `--gpu-memory-"
+"utilization` 设置得过高可能导致实际推理时出现 OOM（内存不足）问题。默认值为 `0.9`。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:314
 msgid ""
 "`--enable-expert-parallel` indicates that EP is enabled. Note that vLLM "
 "does not support a mixed approach of ETP and EP; that is, MoE can either "
 "use pure EP or pure TP."
-msgstr "`--enable-expert-parallel` 表示启用了 EP。请注意，vLLM 不支持 ETP 和 EP 的混合方法；也就是说，MoE 只能使用纯 EP 或纯 TP。"
+msgstr ""
+"`--enable-expert-parallel` 表示启用了 EP。请注意，vLLM 不支持 ETP 和 EP 的混合方法；也就是说，MoE "
+"只能使用纯 EP 或纯 TP。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:315
 msgid ""
@@ -266,7 +301,11 @@ msgid ""
 "\"PIECEWISE\" and \"FULL_DECODE_ONLY\" are supported. The graph mode is "
 "mainly used to reduce the cost of operator dispatch. Currently, "
 "\"FULL_DECODE_ONLY\" is recommended."
-msgstr "`--compilation-config` 包含与 aclgraph 图模式相关的配置。最重要的配置是 \"cudagraph_mode\" 和 \"cudagraph_capture_sizes\"，其含义如下：\"cudagraph_mode\"：表示特定的图模式。目前支持 \"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 \"FULL_DECODE_ONLY\"。"
+msgstr ""
+"`--compilation-config` 包含与 aclgraph 图模式相关的配置。最重要的配置是 \"cudagraph_mode\" 和"
+" \"cudagraph_capture_sizes\"，其含义如下：\"cudagraph_mode\"：表示特定的图模式。目前支持 "
+"\"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 "
+"\"FULL_DECODE_ONLY\"。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:319
 msgid ""
@@ -276,14 +315,19 @@ msgid ""
 " inputs between levels are automatically padded to the next level. "
 "Currently, the default setting is recommended. Only in some scenarios is "
 "it necessary to set this separately to achieve optimal performance."
-msgstr "\"cudagraph_capture_sizes\"：表示不同级别的图模式。默认值为 [1, 2, 4, 8, 16, 24, 32, 40,..., `--max-num-seqs`]。在图模式下，不同级别图的输入是固定的，级别之间的输入会自动填充到下一级别。目前推荐使用默认设置。仅在部分场景中，需要单独设置此参数以达到最佳性能。"
+msgstr ""
+"\"cudagraph_capture_sizes\"：表示不同级别的图模式。默认值为 [1, 2, 4, 8, 16, 24, 32, "
+"40,..., `--max-num-"
+"seqs`]。在图模式下，不同级别图的输入是固定的，级别之间的输入会自动填充到下一级别。目前推荐使用默认设置。仅在部分场景中，需要单独设置此参数以达到最佳性能。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:320
 msgid ""
 "`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` indicates that Flashcomm1 "
 "optimization is enabled. Currently, this optimization is only supported "
 "for MoE in scenarios where tensor-parallel-size > 1."
-msgstr "`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` 表示启用了 Flashcomm1 优化。目前，此优化仅在 tensor-parallel-size > 1 的场景下对 MoE 提供支持。"
+msgstr ""
+"`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` 表示启用了 Flashcomm1 优化。目前，此优化仅在 "
+"tensor-parallel-size > 1 的场景下对 MoE 提供支持。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:321
 msgid ""
@@ -291,7 +335,9 @@ msgid ""
 "parallel is enabled. This environment variable is required in the PD "
 "architecture but not needed in the PD co-locate deployment scenario. It "
 "will be removed in the future."
-msgstr "`export VLLM_ASCEND_ENABLE_CONTEXT_PARALLEL=1` 表示启用了上下文并行。此环境变量在 PD 架构中是必需的，但在 PD 共置部署场景中不需要。未来将被移除。"
+msgstr ""
+"`export VLLM_ASCEND_ENABLE_CONTEXT_PARALLEL=1` 表示启用了上下文并行。此环境变量在 PD "
+"架构中是必需的，但在 PD 共置部署场景中不需要。未来将被移除。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:323
 msgid "**Notice:**"
@@ -314,22 +360,18 @@ msgid "Accuracy Evaluation"
 msgstr "精度评估"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:330
-msgid "Here are two accuracy evaluation methods."
-msgstr "以下是两种精度评估方法。"
-
-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:332
-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:344
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:342
 msgid "Using AISBench"
 msgstr "使用 AISBench"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:334
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:332
 msgid ""
 "Refer to [Using "
 "AISBench](../../developer_guide/evaluation/using_ais_bench.md) for "
 "details."
 msgstr "详情请参考[使用 AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:336
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:334
 msgid ""
 "After execution, you can get the result, here is the result of "
 "`DeepSeek-V3.1-w8a8` for reference only."
@@ -375,52 +417,55 @@ msgstr "生成"
 msgid "86.67"
 msgstr "86.67"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:342
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:340
 msgid "Performance"
 msgstr "性能"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:346
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:344
 msgid ""
 "Refer to [Using AISBench for performance "
 "evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
 "performance-evaluation) for details."
-msgstr "详情请参阅[使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
+msgstr ""
+"详情请参阅[使用 AISBench "
+"进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-"
+"performance-evaluation)。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:348
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:346
 msgid "Using vLLM Benchmark"
 msgstr "使用 vLLM 基准测试"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:350
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:348
 msgid "Run performance evaluation of `DeepSeek-V3.1-w8a8` as an example."
 msgstr "以运行 `DeepSeek-V3.1-w8a8` 的性能评估为例。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:352
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:350
 msgid ""
 "Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/benchmarking/) "
 "for more details."
 msgstr "更多详情请参阅 [vllm 基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:354
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:352
 msgid "There are three `vllm bench` subcommands:"
 msgstr "`vllm bench` 包含三个子命令："

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:356
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:354
 msgid "`latency`: Benchmark the latency of a single batch of requests."
 msgstr "`latency`：对单批请求的延迟进行基准测试。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:357
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:355
 msgid "`serve`: Benchmark the online serving throughput."
 msgstr "`serve`：对在线服务吞吐量进行基准测试。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:358
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:356
 msgid "`throughput`: Benchmark offline inference throughput."
 msgstr "`throughput`：对离线推理吞吐量进行基准测试。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:360
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:358
 msgid "Take the `serve` as an example. Run the code as follows."
 msgstr "以 `serve` 为例，按如下方式运行代码。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:367
+#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:365
 msgid ""
 "After about several minutes, you can get the performance evaluation "
 "result."
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_single_node.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_single_node.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -38,7 +38,9 @@ msgid ""
 "Using the `Qwen3-235B-A22B-w8a8` (Quantized version) model as an example,"
 " use 1 Atlas 800 A3 (64G × 16) server to deploy the single node \"pd co-"
 "locate\" architecture."
-msgstr "以 `Qwen3-235B-A22B-w8a8`（量化版本）模型为例，使用 1 台 Atlas 800 A3（64G × 16）服务器部署单节点 \"pd co-locate\" 架构。"
+msgstr ""
+"以 `Qwen3-235B-A22B-w8a8`（量化版本）模型为例，使用 1 台 Atlas 800 A3（64G × 16）服务器部署单节点 "
+"\"pd co-locate\" 架构。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:9
 msgid "Environment Preparation"
@@ -53,7 +55,10 @@ msgid ""
 "`Qwen3-235B-A22B-w8a8` (Quantized version): requires 1 Atlas 800 A3 (64G "
 "× 16) node. [Download model weight](https://modelscope.cn/models/vllm-"
 "ascend/Qwen3-235B-A22B-W8A8)"
-msgstr "`Qwen3-235B-A22B-w8a8`（量化版本）：需要 1 个 Atlas 800 A3（64G × 16）节点。[下载模型权重](https://modelscope.cn/models/vllm-ascend/Qwen3-235B-A22B-W8A8)"
+msgstr ""
+"`Qwen3-235B-A22B-w8a8`（量化版本）：需要 1 个 Atlas 800 A3（64G × "
+"16）节点。[下载模型权重](https://modelscope.cn/models/vllm-ascend/Qwen3-235B-A22B-"
+"W8A8)"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:15
 msgid ""
@@ -69,6 +74,42 @@ msgstr "使用 Docker 运行"
 msgid "Start a Docker container on each node."
 msgstr "在每个节点上启动一个 Docker 容器。"

+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
+msgid "dataset"
+msgstr "数据集"
+
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
+msgid "version"
+msgstr "版本"
+
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
+msgid "metric"
+msgstr "指标"
+
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
+msgid "mode"
+msgstr "模式"
+
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
+msgid "vllm-api-general-chat"
+msgstr "vllm-api-general-chat"
+
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
+msgid "aime2024"
+msgstr "aime2024"
+
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
+msgid "-"
+msgstr "-"
+
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
+msgid "accuracy"
+msgstr "准确率"
+
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
+msgid "gen"
+msgstr "生成"
+
 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:63
 msgid "Deployment"
 msgstr "部署"
@@ -81,7 +122,9 @@ msgstr "单节点部署"
 msgid ""
 "`Qwen3-235B-A22B-w8a8` can be deployed on 1 Atlas 800 A3（64G*16）. "
 "Quantized version needs to start with parameter `--quantization ascend`."
-msgstr "`Qwen3-235B-A22B-w8a8` 可以部署在 1 台 Atlas 800 A3（64G*16）上。量化版本需要使用参数 `--quantization ascend` 启动。"
+msgstr ""
+"`Qwen3-235B-A22B-w8a8` 可以部署在 1 台 Atlas 800 A3（64G*16）上。量化版本需要使用参数 "
+"`--quantization ascend` 启动。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:70
 msgid "Run the following script to execute online 128k inference."
@@ -98,7 +141,10 @@ msgid ""
 "for vllm version below `v0.12.0` use parameter: `--rope_scaling "
 "'{\"rope_type\":\"yarn\",\"factor\":4,\"original_max_position_embeddings\":32768}'"
 " \\`"
-msgstr "对于 vllm 版本低于 `v0.12.0`，使用参数：`--rope_scaling '{\"rope_type\":\"yarn\",\"factor\":4,\"original_max_position_embeddings\":32768}' \\`"
+msgstr ""
+"对于 vllm 版本低于 `v0.12.0`，使用参数：`--rope_scaling "
+"'{\"rope_type\":\"yarn\",\"factor\":4,\"original_max_position_embeddings\":32768}'"
+" \\`"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:109
 #, python-brace-format
@@ -107,7 +153,10 @@ msgid ""
 "'{\"rope_parameters\": "
 "{\"rope_type\":\"yarn\",\"rope_theta\":1000000,\"factor\":4,\"original_max_position_embeddings\":32768}}'"
 " \\`"
-msgstr "对于 vllm 版本 `v0.12.0`，使用参数：`--hf-overrides '{\"rope_parameters\": {\"rope_type\":\"yarn\",\"rope_theta\":1000000,\"factor\":4,\"original_max_position_embeddings\":32768}}' \\`"
+msgstr ""
+"对于 vllm 版本 `v0.12.0`，使用参数：`--hf-overrides '{\"rope_parameters\": "
+"{\"rope_type\":\"yarn\",\"rope_theta\":1000000,\"factor\":4,\"original_max_position_embeddings\":32768}}'"
+" \\`"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:111
 msgid "The parameters are explained as follows:"
@@ -146,21 +195,29 @@ msgid ""
 "state is also counted in metrics such as TTFT and TPOT. Therefore, when "
 "testing performance, it is generally recommended that `--max-num-seqs` * "
 "`--data-parallel-size` >= the actual total concurrency."
-msgstr "`--max-num-seqs` 表示每个 DP 组允许处理的最大请求数。如果发送到服务的请求数量超过此限制，超出的请求将保持在等待状态，不会被调度。请注意，在等待状态所花费的时间也会计入 TTFT 和 TPOT 等指标。因此，在测试性能时，通常建议 `--max-num-seqs` * `--data-parallel-size` >= 实际总并发数。"
+msgstr ""
+"`--max-num-seqs` 表示每个 DP "
+"组允许处理的最大请求数。如果发送到服务的请求数量超过此限制，超出的请求将保持在等待状态，不会被调度。请注意，在等待状态所花费的时间也会计入 "
+"TTFT 和 TPOT 等指标。因此，在测试性能时，通常建议 `--max-num-seqs` * `--data-parallel-size` "
+">= 实际总并发数。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:118
 msgid ""
 "`--max-num-batched-tokens` represents the maximum number of tokens that "
 "the model can process in a single step. Currently, vLLM v1 scheduling "
 "enables ChunkPrefill/SplitFuse by default, which means:"
-msgstr "`--max-num-batched-tokens` 表示模型单步可以处理的最大 token 数。目前，vLLM v1 调度默认启用 ChunkPrefill/SplitFuse，这意味着："
+msgstr ""
+"`--max-num-batched-tokens` 表示模型单步可以处理的最大 token 数。目前，vLLM v1 调度默认启用 "
+"ChunkPrefill/SplitFuse，这意味着："

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:119
 msgid ""
 "(1) If the input length of a request is greater than `--max-num-batched-"
 "tokens`, it will be divided into multiple rounds of computation according"
 " to `--max-num-batched-tokens`;"
-msgstr "（1）如果请求的输入长度大于 `--max-num-batched-tokens`，它将根据 `--max-num-batched-tokens` 被分成多轮计算；"
+msgstr ""
+"（1）如果请求的输入长度大于 `--max-num-batched-tokens`，它将根据 `--max-num-batched-tokens`"
+" 被分成多轮计算；"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:120
 msgid ""
@@ -189,14 +246,22 @@ msgid ""
 "during actual inference (e.g., due to uneven EP load), setting `--gpu-"
 "memory-utilization` too high may lead to OOM (Out of Memory) issues "
 "during actual inference. The default value is `0.9`."
-msgstr "`--gpu-memory-utilization` 表示 vLLM 将用于实际推理的 HBM 比例。其核心功能是计算可用的 kv_cache 大小。在预热阶段（vLLM 中称为 profile run），vLLM 会记录输入大小为 `--max-num-batched-tokens` 的推理过程中的峰值 GPU 内存使用量。然后，可用的 kv_cache 大小计算为：`--gpu-memory-utilization` * HBM 大小 - 峰值 GPU 内存使用量。因此，`--gpu-memory-utilization` 的值越大，可用的 kv_cache 就越多。然而，由于预热阶段的 GPU 内存使用量可能与实际推理时不同（例如，由于 EP 负载不均），将 `--gpu-memory-utilization` 设置得过高可能导致实际推理时出现 OOM（内存不足）问题。默认值为 `0.9`。"
+msgstr ""
+"`--gpu-memory-utilization` 表示 vLLM 将用于实际推理的 HBM 比例。其核心功能是计算可用的 kv_cache "
+"大小。在预热阶段（vLLM 中称为 profile run），vLLM 会记录输入大小为 `--max-num-batched-tokens` "
+"的推理过程中的峰值 GPU 内存使用量。然后，可用的 kv_cache 大小计算为：`--gpu-memory-utilization` * "
+"HBM 大小 - 峰值 GPU 内存使用量。因此，`--gpu-memory-utilization` 的值越大，可用的 kv_cache "
+"就越多。然而，由于预热阶段的 GPU 内存使用量可能与实际推理时不同（例如，由于 EP 负载不均），将 `--gpu-memory-"
+"utilization` 设置得过高可能导致实际推理时出现 OOM（内存不足）问题。默认值为 `0.9`。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:123
 msgid ""
 "`--enable-expert-parallel` indicates that EP is enabled. Note that vLLM "
 "does not support a mixed approach of ETP and EP; that is, MoE can either "
 "use pure EP or pure TP."
-msgstr "`--enable-expert-parallel` 表示启用了 EP。请注意，vLLM 不支持 ETP 和 EP 的混合方法；也就是说，MoE 要么使用纯 EP，要么使用纯 TP。"
+msgstr ""
+"`--enable-expert-parallel` 表示启用了 EP。请注意，vLLM 不支持 ETP 和 EP 的混合方法；也就是说，MoE "
+"要么使用纯 EP，要么使用纯 TP。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:124
 msgid ""
@@ -219,7 +284,11 @@ msgid ""
 "\"PIECEWISE\" and \"FULL_DECODE_ONLY\" are supported. The graph mode is "
 "mainly used to reduce the cost of operator dispatch. Currently, "
 "\"FULL_DECODE_ONLY\" is recommended."
-msgstr "`--compilation-config` 包含与 aclgraph 图模式相关的配置。最重要的配置是 \"cudagraph_mode\" 和 \"cudagraph_capture_sizes\"，其含义如下：\"cudagraph_mode\"：表示具体的图模式。目前支持 \"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 \"FULL_DECODE_ONLY\"。"
+msgstr ""
+"`--compilation-config` 包含与 aclgraph 图模式相关的配置。最重要的配置是 \"cudagraph_mode\" 和"
+" \"cudagraph_capture_sizes\"，其含义如下：\"cudagraph_mode\"：表示具体的图模式。目前支持 "
+"\"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 "
+"\"FULL_DECODE_ONLY\"。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:128
 msgid ""
@@ -229,14 +298,19 @@ msgid ""
 " inputs between levels are automatically padded to the next level. "
 "Currently, the default setting is recommended. Only in some scenarios is "
 "it necessary to set this separately to achieve optimal performance."
-msgstr "\"cudagraph_capture_sizes\"：表示不同级别的图模式。默认值为 [1, 2, 4, 8, 16, 24, 32, 40,..., `--max-num-seqs`]。在图模式下，不同级别图的输入是固定的，级别之间的输入会自动填充到下一个级别。目前推荐使用默认设置。仅在部分场景中，需要单独设置此参数以达到最佳性能。"
+msgstr ""
+"\"cudagraph_capture_sizes\"：表示不同级别的图模式。默认值为 [1, 2, 4, 8, 16, 24, 32, "
+"40,..., `--max-num-"
+"seqs`]。在图模式下，不同级别图的输入是固定的，级别之间的输入会自动填充到下一个级别。目前推荐使用默认设置。仅在部分场景中，需要单独设置此参数以达到最佳性能。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:129
 msgid ""
 "`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` indicates that Flashcomm1 "
 "optimization is enabled. Currently, this optimization is only supported "
 "for MoE in scenarios where tp_size > 1."
-msgstr "`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` 表示启用了 Flashcomm1 优化。目前，此优化仅在 tp_size > 1 的场景下对 MoE 支持。"
+msgstr ""
+"`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` 表示启用了 Flashcomm1 优化。目前，此优化仅在 "
+"tp_size > 1 的场景下对 MoE 支持。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:133
 msgid "tp_size needs to be divisible by dcp_size"
@@ -246,120 +320,85 @@ msgstr "tp_size 需要能被 dcp_size 整除"
 msgid ""
 "decode context parallel size must be less than or equal to max_dcp_size, "
 "where max_dcp_size = tensor_parallel_size // total_num_kv_heads."
-msgstr "解码上下文并行大小必须小于或等于 max_dcp_size，其中 max_dcp_size = tensor_parallel_size // total_num_kv_heads。"
+msgstr ""
+"解码上下文并行大小必须小于或等于 max_dcp_size，其中 max_dcp_size = tensor_parallel_size // "
+"total_num_kv_heads。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:136
 msgid "Accuracy Evaluation"
 msgstr "精度评估"

 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:138
-msgid "Here are two accuracy evaluation methods."
-msgstr "以下是两种精度评估方法。"
-
-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:140
-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:152
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:150
 msgid "Using AISBench"
 msgstr "使用 AISBench"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:142
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:140
 msgid ""
 "Refer to [Using "
 "AISBench](../../developer_guide/evaluation/using_ais_bench.md) for "
 "details."
 msgstr "详情请参阅[使用 AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:144
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:142
 msgid ""
 "After execution, you can get the result, here is the result of `Qwen3"
 "-235B-A22B-w8a8` for reference only."
 msgstr "执行后，您可以获得结果，以下是 `Qwen3-235B-A22B-w8a8` 的结果，仅供参考。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
-msgid "dataset"
-msgstr "数据集"
-
-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
-msgid "version"
-msgstr "版本"
-
-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
-msgid "metric"
-msgstr "指标"
-
-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
-msgid "mode"
-msgstr "模式"
-
-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
-msgid "vllm-api-general-chat"
-msgstr "vllm-api-general-chat"
-
-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
-msgid "aime2024"
-msgstr "aime2024"
-
-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
-msgid "-"
-msgstr "-"
-
-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
-msgid "accuracy"
-msgstr "准确率"
-
-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
-msgid "gen"
-msgstr "生成"
-
 #: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:21
 msgid "83.33"
 msgstr "83.33"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:150
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:148
 msgid "Performance"
 msgstr "性能"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:154
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:152
 msgid ""
 "Refer to [Using AISBench for performance "
 "evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
 "performance-evaluation) for details."
-msgstr "详情请参阅[使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
+msgstr ""
+"详情请参阅[使用 AISBench "
+"进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-"
+"performance-evaluation)。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:156
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:154
 msgid "Using vLLM Benchmark"
 msgstr "使用 vLLM Benchmark"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:158
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:156
 msgid "Run performance evaluation of `Qwen3-235B-A22B-w8a8` as an example."
 msgstr "以运行 `Qwen3-235B-A22B-w8a8` 的性能评估为例。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:160
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:158
 msgid ""
 "Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/benchmarking/) "
 "for more details."
 msgstr "更多详情请参阅 [vllm benchmark](https://docs.vllm.ai/en/latest/benchmarking/)。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:162
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:160
 msgid "There are three `vllm bench` subcommands:"
 msgstr "`vllm bench` 有三个子命令："

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:164
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:162
 msgid "`latency`: Benchmark the latency of a single batch of requests."
 msgstr "`latency`：对单批请求的延迟进行基准测试。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:165
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:163
 msgid "`serve`: Benchmark the online serving throughput."
 msgstr "`serve`：对在线服务吞吐量进行基准测试。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:166
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:164
 msgid "`throughput`: Benchmark offline inference throughput."
 msgstr "`throughput`：对离线推理吞吐量进行基准测试。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:168
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:166
 msgid "Take the `serve` as an example. Run the code as follows."
 msgstr "以 `serve` 为例。运行代码如下。"

-#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:175
+#: ../../source/tutorials/features/long_sequence_context_parallel_single_node.md:173
 msgid ""
 "After about several minutes, you can get the performance evaluation "
 "result."
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -41,7 +41,10 @@ msgid ""
 "prefiller server is 192.0.0.1 (prefill 1) and 192.0.0.2 (prefill 2), and "
 "the decoder servers are 192.0.0.3 (decoder 1) and 192.0.0.4 (decoder 2). "
 "On each server, use 8 NPUs 16 chips to deploy one service instance."
-msgstr "以 Deepseek-r1-w8a8 模型为例，使用 4 台 Atlas 800T A3 服务器部署 \"2P1D\" 架构。假设预填充服务器 IP 为 192.0.0.1（预填充节点 1）和 192.0.0.2（预填充节点 2），解码服务器 IP 为 192.0.0.3（解码节点 1）和 192.0.0.4（解码节点 2）。每台服务器使用 8 个 NPU（16 个芯片）部署一个服务实例。"
+msgstr ""
+"以 Deepseek-r1-w8a8 模型为例，使用 4 台 Atlas 800T A3 服务器部署 \"2P1D\" 架构。假设预填充服务器 "
+"IP 为 192.0.0.1（预填充节点 1）和 192.0.0.2（预填充节点 2），解码服务器 IP 为 192.0.0.3（解码节点 1）和"
+" 192.0.0.4（解码节点 2）。每台服务器使用 8 个 NPU（16 个芯片）部署一个服务实例。"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:9
 msgid "Verify Multi-Node Communication Environment"
@@ -137,7 +140,10 @@ msgid ""
 " by Moonshot AI.Installation and Compilation Guide: <https://github.com"
 "/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries> First, we"
 " need to obtain the Mooncake project. Refer to the following command:"
-msgstr "Mooncake 是月之暗面（Moonshot AI）提供的领先 LLM 服务 Kimi 的推理平台。安装与编译指南：<https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries> 首先，我们需要获取 Mooncake 项目。参考以下命令："
+msgstr ""
+"Mooncake 是月之暗面（Moonshot AI）提供的领先 LLM 服务 Kimi "
+"的推理平台。安装与编译指南：<https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file"
+"#build-and-use-binaries> 首先，我们需要获取 Mooncake 项目。参考以下命令："

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:183
 msgid "(Optional) Replace go install url if the network is poor"
@@ -185,7 +191,10 @@ msgid ""
 "socket listeners. To avoid any issues, port conflicts should be "
 "prevented. Additionally, ensure that each node's engine_id is uniquely "
 "assigned to avoid conflicts."
-msgstr "我们可以分别运行以下脚本来在预填充器/解码器节点上启动服务器。请注意，每个 P/D 节点将占用从 kv_port 到 kv_port + num_chips 的端口范围来初始化 socket 监听器。为避免问题，应防止端口冲突。此外，请确保每个节点的 engine_id 被唯一分配，以避免冲突。"
+msgstr ""
+"我们可以分别运行以下脚本来在预填充器/解码器节点上启动服务器。请注意，每个 P/D 节点将占用从 kv_port 到 kv_port + "
+"num_chips 的端口范围来初始化 socket 监听器。为避免问题，应防止端口冲突。此外，请确保每个节点的 engine_id "
+"被唯一分配，以避免冲突。"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:227
 msgid "kv_port Configuration Guide"
@@ -198,7 +207,10 @@ msgid ""
 "npu_per_node × 1000)`. If `kv_port` overlaps with this range, "
 "intermittent port conflicts may occur. To avoid this, configure `kv_port`"
 " according to the table below:"
-msgstr "在 Ascend NPU 上，Mooncake 使用 AscendDirectTransport 进行 RDMA 数据传输，它会在 `[20000, 20000 + npu_per_node × 1000)` 范围内随机分配端口。如果 `kv_port` 与此范围重叠，可能会发生间歇性端口冲突。为避免此问题，请根据下表配置 `kv_port`："
+msgstr ""
+"在 Ascend NPU 上，Mooncake 使用 AscendDirectTransport 进行 RDMA 数据传输，它会在 "
+"`[20000, 20000 + npu_per_node × 1000)` 范围内随机分配端口。如果 `kv_port` "
+"与此范围重叠，可能会发生间歇性端口冲突。为避免此问题，请根据下表配置 `kv_port`："

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:132
 msgid "NPUs per Node"
@@ -242,7 +254,9 @@ msgid ""
 "during startup, it may be caused by kv_port conflicting with randomly "
 "allocated AscendDirectTransport ports. Increase your kv_port value to "
 "avoid the reserved range."
-msgstr "如果在启动时偶尔看到 `zmq.error.ZMQError: Address already in use`，可能是由于 kv_port 与随机分配的 AscendDirectTransport 端口冲突所致。请增加您的 kv_port 值以避开保留范围。"
+msgstr ""
+"如果在启动时偶尔看到 `zmq.error.ZMQError: Address already in use`，可能是由于 kv_port "
+"与随机分配的 AscendDirectTransport 端口冲突所致。请增加您的 kv_port 值以避开保留范围。"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:240
 msgid "launch_online_dp.py"
@@ -251,9 +265,12 @@ msgstr "launch_online_dp.py"
 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:242
 msgid ""
 "Use `launch_online_dp.py` to launch external dp vllm servers. "
-"[launch\\_online\\_dp.py](https://github.com/vllm-project/vllm-"
+"[launch_online_dp.py](https://github.com/vllm-project/vllm-"
+"ascend/blob/main/examples/external_online_dp/launch_online_dp.py)"
+msgstr ""
+"使用 `launch_online_dp.py` 启动外部解耦 vllm "
+"服务器。[launch_online_dp.py](https://github.com/vllm-project/vllm-"
 "ascend/blob/main/examples/external_online_dp/launch_online_dp.py)"
-msgstr "使用 `launch_online_dp.py` 启动外部解耦 vllm 服务器。[launch\\_online\\_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:245
 msgid "run_dp_template.sh"
@@ -262,9 +279,12 @@ msgstr "run_dp_template.sh"
 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:247
 msgid ""
 "Modify `run_dp_template.sh` on each node. "
-"[run\\_dp\\_template.sh](https://github.com/vllm-project/vllm-"
+"[run_dp_template.sh](https://github.com/vllm-project/vllm-"
+"ascend/blob/main/examples/external_online_dp/run_dp_template.sh)"
+msgstr ""
+"在每个节点上修改 `run_dp_template.sh`。[run_dp_template.sh](https://github.com"
+"/vllm-project/vllm-"
 "ascend/blob/main/examples/external_online_dp/run_dp_template.sh)"
-msgstr "在每个节点上修改 `run_dp_template.sh`。[run\\_dp\\_template.sh](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/run_dp_template.sh)"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md
 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:250
@@ -321,7 +341,12 @@ msgid ""
 "MooncakeLayerwiseConnector.[load\\_balance\\_proxy\\_layerwise\\_server\\_example.py](https://github.com"
 "/vllm-project/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_layerwise_server_example.py)"
-msgstr "**`load_balance_proxy_layerwise_server_example.py`**：请求首先被路由到 D 节点，然后根据需要转发到 P 节点。此代理设计用于与 MooncakeLayerwiseConnector 配合使用。[load\\_balance\\_proxy\\_layerwise\\_server\\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_layerwise_server_example.py)"
+msgstr ""
+"**`load_balance_proxy_layerwise_server_example.py`**：请求首先被路由到 D "
+"节点，然后根据需要转发到 P 节点。此代理设计用于与 MooncakeLayerwiseConnector "
+"配合使用。[load_balance_proxy_layerwise_server_example.py](https://github.com"
+"/vllm-project/vllm-"
+"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_layerwise_server_example.py)"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:756
 msgid ""
@@ -331,7 +356,12 @@ msgid ""
 "MooncakeConnector.[load\\_balance\\_proxy\\_server\\_example.py](https://github.com"
 "/vllm-project/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
-msgstr "**`load_balance_proxy_server_example.py`**：请求首先被路由到 P 节点，然后转发到 D 节点进行后续处理。此代理设计用于与 MooncakeConnector 配合使用。[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
+msgstr ""
+"**`load_balance_proxy_server_example.py`**：请求首先被路由到 P 节点，然后转发到 D "
+"节点进行后续处理。此代理设计用于与 MooncakeConnector "
+"配合使用。[load\\_balance\\_proxy\\_server\\_example.py](https://github.com"
+"/vllm-project/vllm-"
+"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:814
 msgid "Parameter"
@@ -371,7 +401,7 @@ msgstr "--prefiller-ports"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:814
 msgid "Ports of prefiller nodes"
-msgstr "预填充节点的端口"
+msgstr "预填充节点端口"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:814
 msgid "--decoder-hosts"
@@ -379,7 +409,7 @@ msgstr "--decoder-hosts"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:814
 msgid "Hosts of decoder nodes"
-msgstr "解码器节点的主机地址"
+msgstr "解码器节点主机地址"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:814
 msgid "--decoder-ports"
@@ -387,7 +417,7 @@ msgstr "--decoder-ports"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:814
 msgid "Ports of decoder nodes"
-msgstr "解码器节点的端口"
+msgstr "解码器节点端口"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:877
 msgid ""
@@ -396,9 +426,8 @@ msgid ""
 "project/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
 msgstr ""
-"您可以在代码仓库的示例中找到代理程序，"
-"[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-"
-"project/vllm-"
+"您可以在代码仓库的示例中找到代理程序，[load\\_balance\\_proxy\\_server\\_example.py](https://github.com"
+"/vllm-project/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:879
@@ -411,8 +440,8 @@ msgid ""
 "[aisbench](https://gitee.com/aisbench/benchmark) Execute the following "
 "commands to install aisbench"
 msgstr ""
-"我们推荐使用 aisbench 工具进行性能评估。"
-"[aisbench](https://gitee.com/aisbench/benchmark) 执行以下命令安装 aisbench"
+"我们推荐使用 aisbench 工具进行性能评估。[aisbench](https://gitee.com/aisbench/benchmark)"
+" 执行以下命令安装 aisbench"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:889
 msgid ""
@@ -443,7 +472,9 @@ msgstr "以 gsm8k 数据集为例，执行以下命令来评估性能。"
 msgid ""
 "For more details for commands and parameters for aisbench, refer to  "
 "[aisbench](https://gitee.com/aisbench/benchmark)"
-msgstr "有关 aisbench 命令和参数的更多详细信息，请参考 [aisbench](https://gitee.com/aisbench/benchmark)"
+msgstr ""
+"有关 aisbench 命令和参数的更多详细信息，请参考 "
+"[aisbench](https://gitee.com/aisbench/benchmark)"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:932
 msgid "FAQ"
@@ -459,8 +490,7 @@ msgid ""
 "warm-up to achieve best performance, we recommend preheating the service "
 "with some requests before conducting performance tests to achieve the "
 "best end-to-end throughput."
-msgstr ""
-"由于部分 NPU 算子的计算需要经过多轮预热才能达到最佳性能，我们建议在进行性能测试前，先用一些请求预热服务，以获得最佳的端到端吞吐量。"
+msgstr "由于部分 NPU 算子的计算需要经过多轮预热才能达到最佳性能，我们建议在进行性能测试前，先用一些请求预热服务，以获得最佳的端到端吞吐量。"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_multi_node.md:938
 msgid "Verification"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -24,7 +24,7 @@ msgid "Prefill-Decode Disaggregation (Qwen2.5-VL)"
 msgstr "预填充-解码解耦架构 (Qwen2.5-VL)"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_single_node.md:3
-msgid "Getting Start"
+msgid "Getting Started"
 msgstr "开始使用"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_single_node.md:5
@@ -36,10 +36,10 @@ msgstr "vLLM-Ascend 现已支持预填充-解码 (PD) 解耦架构。本指南

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_single_node.md:7
 msgid ""
-"Using the Qwen2.5-VL-7B-Instruct model as an example, use vllm-ascend "
+"Using the Qwen2.5-VL-7B-Instruct model as an example, use vLLM-Ascend "
 "v0.11.0rc1 (with vLLM v0.11.0) on 1 Atlas 800T A2 server to deploy the "
 "\"1P1D\" architecture. Assume the IP address is 192.0.0.1."
-msgstr "以 Qwen2.5-VL-7B-Instruct 模型为例，在 1 台 Atlas 800T A2 服务器上使用 vllm-ascend v0.11.0rc1 (包含 vLLM v0.11.0) 部署 \"1P1D\" 架构。假设 IP 地址为 192.0.0.1。"
+msgstr "以 Qwen2.5-VL-7B-Instruct 模型为例，在 1 台 Atlas 800T A2 服务器上使用 vLLM-Ascend v0.11.0rc1 (包含 vLLM v0.11.0) 部署 \"1P1D\" 架构。假设 IP 地址为 192.0.0.1。"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_single_node.md:9
 msgid "Verify Communication Environment"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Kimi-K2.5.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Kimi-K2.5.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -35,7 +35,8 @@ msgid ""
 "language understanding with advanced agentic capabilities, instant and "
 "thinking modes, as well as conversational and agentic paradigms."
 msgstr ""
-"Kimi K2.5 是一个开源的、原生的多模态智能体模型，通过在 Kimi-K2-Base 基础上持续预训练约 15 万亿视觉和文本混合令牌构建而成。它无缝集成了视觉与语言理解能力、先进的智能体能力、即时与思考模式，以及对话式和智能体范式。"
+"Kimi K2.5 是一个开源的、原生的多模态智能体模型，通过在 Kimi-K2-Base 基础上持续预训练约 15 "
+"万亿视觉和文本混合令牌构建而成。它无缝集成了视觉与语言理解能力、先进的智能体能力、即时与思考模式，以及对话式和智能体范式。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:7
 msgid "The `Kimi-K2.5` model is first supported in `vllm-ascend:v0.17.0rc1`."
@@ -58,7 +59,9 @@ msgid ""
 "Refer to [supported "
 "features](../../user_guide/support_matrix/supported_models.md) to get the"
 " model's supported feature matrix."
-msgstr "请参考 [支持的特性](../../user_guide/support_matrix/supported_models.md) 获取模型支持的特性矩阵。"
+msgstr ""
+"请参考 [支持的特性](../../user_guide/support_matrix/supported_models.md) "
+"获取模型支持的特性矩阵。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:15
 msgid ""
@@ -78,14 +81,18 @@ msgstr "模型权重"
 msgid ""
 "`Kimi-K2.5-w4a8`(Quantized version for w4a8): [Download model "
 "weight](https://modelscope.cn/models/Eco-Tech/Kimi-K2.5-W4A8)."
-msgstr "`Kimi-K2.5-w4a8`（w4a8量化版本）：[下载模型权重](https://modelscope.cn/models/Eco-Tech/Kimi-K2.5-W4A8)。"
+msgstr ""
+"`Kimi-K2.5-w4a8`（w4a8量化版本）：[下载模型权重](https://modelscope.cn/models/Eco-"
+"Tech/Kimi-K2.5-W4A8)。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:22
 msgid ""
 "`kimi-k2.5-eagle3`(Eagle3 MTP draft model for accelerating inference of "
 "Kimi-K2.5): [Download model "
 "weight](https://huggingface.co/lightseekorg/kimi-k2.5-eagle3)"
-msgstr "`kimi-k2.5-eagle3`（用于加速 Kimi-K2.5 推理的 Eagle3 MTP 草稿模型）：[下载模型权重](https://huggingface.co/lightseekorg/kimi-k2.5-eagle3)"
+msgstr ""
+"`kimi-k2.5-eagle3`（用于加速 Kimi-K2.5 推理的 Eagle3 MTP "
+"草稿模型）：[下载模型权重](https://huggingface.co/lightseekorg/kimi-k2.5-eagle3)"

 #: ../../source/tutorials/models/Kimi-K2.5.md:24
 msgid ""
@@ -102,7 +109,9 @@ msgid ""
 "If you want to deploy multi-node environment, you need to verify multi-"
 "node communication according to [verify multi-node communication "
 "environment](../../installation.md#verify-multi-node-communication)."
-msgstr "如果您想部署多节点环境，需要根据 [验证多节点通信环境](../../installation.md#verify-multi-node-communication) 验证多节点通信。"
+msgstr ""
+"如果您想部署多节点环境，需要根据 [验证多节点通信环境](../../installation.md#verify-multi-node-"
+"communication) 验证多节点通信。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:30
 msgid "Installation"
@@ -117,21 +126,26 @@ msgid ""
 "Select an image based on your machine type and start the docker image on "
 "your node, refer to [using docker](../../installation.md#set-up-using-"
 "docker)."
-msgstr "根据您的机器类型选择镜像，并在节点上启动 docker 镜像，请参考 [使用 docker](../../installation.md#set-up-using-docker)。"
+msgstr ""
+"根据您的机器类型选择镜像，并在节点上启动 docker 镜像，请参考 [使用 docker](../../installation.md#set-"
+"up-using-docker)。"

-#: ../../source/tutorials/models/Kimi-K2.5.md
+#: ../../source/tutorials/models/Kimi-K2.5.md:36
 msgid "A3 series"
 msgstr "A3 系列"

 #: ../../source/tutorials/models/Kimi-K2.5.md:43
-#: ../../source/tutorials/models/Kimi-K2.5.md:86
 msgid "Start the docker image on your each node."
 msgstr "在您的每个节点上启动 docker 镜像。"

-#: ../../source/tutorials/models/Kimi-K2.5.md
+#: ../../source/tutorials/models/Kimi-K2.5.md:45
 msgid "A2 series"
 msgstr "A2 系列"

+#: ../../source/tutorials/models/Kimi-K2.5.md:86
+msgid "Start the docker image on your each node."
+msgstr "在您的每个节点上启动 docker 镜像。"
+
 #: ../../source/tutorials/models/Kimi-K2.5.md:119
 msgid ""
 "In addition, if you don't want to use the docker image as above, you can "
@@ -169,7 +183,6 @@ msgid "Run the following script to execute online inference."
 msgstr "运行以下脚本执行在线推理。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:176
-#: ../../source/tutorials/models/Kimi-K2.5.md:645
 msgid "**Notice:** The parameters are explained as follows:"
 msgstr "**注意：** 参数解释如下："

@@ -180,7 +193,9 @@ msgid ""
 "reduce TPOT in v1 scheduler. However, TTFT may degrade in some scenarios."
 " Furthermore, enabling this feature is not recommended in scenarios where"
 " PD is separated."
-msgstr "设置环境变量 `VLLM_ASCEND_BALANCE_SCHEDULING=1` 启用均衡调度。这可能有助于提高 v1 调度器中的输出吞吐量并降低 TPOT。然而，在某些场景下 TTFT 可能会下降。此外，在 PD 分离的场景中不建议启用此功能。"
+msgstr ""
+"设置环境变量 `VLLM_ASCEND_BALANCE_SCHEDULING=1` 启用均衡调度。这可能有助于提高 v1 "
+"调度器中的输出吞吐量并降低 TPOT。然而，在某些场景下 TTFT 可能会下降。此外，在 PD 分离的场景中不建议启用此功能。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:180
 msgid ""
@@ -195,7 +210,9 @@ msgid ""
 " with an input length of 3.5K and output length of 1.5K, a value of "
 "`16384` is sufficient, however, for precision testing, please set it at "
 "least `35000`."
-msgstr "`--max-model-len` 指定最大上下文长度——即单个请求的输入和输出令牌总数。对于输入长度 3.5K 和输出长度 1.5K 的性能测试，`16384` 的值就足够了，但对于精度测试，请至少将其设置为 `35000`。"
+msgstr ""
+"`--max-model-len` 指定最大上下文长度——即单个请求的输入和输出令牌总数。对于输入长度 3.5K 和输出长度 1.5K "
+"的性能测试，`16384` 的值就足够了，但对于精度测试，请至少将其设置为 `35000`。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:182
 msgid ""
@@ -244,14 +261,18 @@ msgstr "Prefill-Decode 分离"
 msgid ""
 "We recommend using Mooncake for deployment: "
 "[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)."
-msgstr "我们建议使用 Mooncake 进行部署：[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)。"
+msgstr ""
+"我们建议使用 Mooncake "
+"进行部署：[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:326
 msgid ""
 "Take Atlas 800 A3 (64G × 16) for example, we recommend to deploy 2P1D (4 "
 "nodes) rather than 1P1D (2 nodes), because there is no enough NPU memory "
 "to serve high concurrency in 1P1D case."
-msgstr "以 Atlas 800 A3（64G × 16）为例，我们建议部署 2P1D（4 个节点）而不是 1P1D（2 个节点），因为在 1P1D 情况下没有足够的 NPU 内存来服务高并发。"
+msgstr ""
+"以 Atlas 800 A3（64G × 16）为例，我们建议部署 2P1D（4 个节点）而不是 1P1D（2 个节点），因为在 1P1D "
+"情况下没有足够的 NPU 内存来服务高并发。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:328
 msgid "`Kimi-K2.5-w4a8 2P1D` require 4 Atlas 800 A3 (64G × 16)."
@@ -263,14 +284,20 @@ msgid ""
 "to deploy a `launch_dp_program.py` script and a `run_dp_template.sh` "
 "script on each node and deploy a `proxy.sh` script on prefill master node"
 " to forward requests."
-msgstr "要运行 vllm-ascend `Prefill-Decode Disaggregation` 服务，您需要在每个节点上部署一个 `launch_dp_program.py` 脚本和一个 `run_dp_template.sh` 脚本，并在 prefill 主节点上部署一个 `proxy.sh` 脚本来转发请求。"
+msgstr ""
+"要运行 vllm-ascend `Prefill-Decode Disaggregation` 服务，您需要在每个节点上部署一个 "
+"`launch_dp_program.py` 脚本和一个 `run_dp_template.sh` 脚本，并在 prefill 主节点上部署一个 "
+"`proxy.sh` 脚本来转发请求。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:332
 msgid ""
 "`launch_online_dp.py` to launch external dp vllm servers.  "
 "[launch\\_online\\_dp.py](https://github.com/vllm-project/vllm-"
 "ascend/blob/main/examples/external_online_dp/launch_online_dp.py)"
-msgstr "`launch_online_dp.py` 用于启动外部 dp vllm 服务器。[launch\\_online\\_dp.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/external_online_dp/launch_online_dp.py)"
+msgstr ""
+"`launch_online_dp.py` 用于启动外部 dp vllm "
+"服务器。[launch\\_online\\_dp.py](https://github.com/vllm-project/vllm-"
+"ascend/blob/main/examples/external_online_dp/launch_online_dp.py)"

 #: ../../source/tutorials/models/Kimi-K2.5.md:335
 msgid "Prefill Node 0 `run_dp_template.sh` script"
@@ -288,6 +315,10 @@ msgstr "Decode 节点 0 `run_dp_template.sh` 脚本"
 msgid "Decode Node 1 `run_dp_template.sh` script"
 msgstr "Decode 节点 1 `run_dp_template.sh` 脚本"

+#: ../../source/tutorials/models/Kimi-K2.5.md:645
+msgid "**Notice:** The parameters are explained as follows:"
+msgstr "**注意：** 参数解释如下："
+
 #: ../../source/tutorials/models/Kimi-K2.5.md:648
 msgid ""
 "`VLLM_ASCEND_ENABLE_FLASHCOMM1=1`: enables the communication optimization"
@@ -300,7 +331,9 @@ msgid ""
 "significantly improve performance but consumes more NPU memory. In the "
 "Prefill-Decode (PD) separation scenario, enable MLAPO only on decode "
 "nodes."
-msgstr "`VLLM_ASCEND_ENABLE_MLAPO=1`：启用融合算子，这可以显著提高性能但会消耗更多 NPU 内存。在 Prefill-Decode（PD）分离场景中，仅在 decode 节点上启用 MLAPO。"
+msgstr ""
+"`VLLM_ASCEND_ENABLE_MLAPO=1`：启用融合算子，这可以显著提高性能但会消耗更多 NPU 内存。在 Prefill-"
+"Decode（PD）分离场景中，仅在 decode 节点上启用 MLAPO。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:650
 msgid ""
@@ -316,7 +349,9 @@ msgid ""
 "the min is `n = 1` and the max is `n = max-num-seqs`. For other values, "
 "it is recommended to set them to the number of frequently occurring "
 "requests on the Decode (D) node."
-msgstr "`cudagraph_capture_sizes`：推荐值为 `n x (mtp + 1)`。最小值为 `n = 1`，最大值为 `n = max-num-seqs`。对于其他值，建议将其设置为 Decode（D）节点上频繁出现的请求数量。"
+msgstr ""
+"`cudagraph_capture_sizes`：推荐值为 `n x (mtp + 1)`。最小值为 `n = 1`，最大值为 `n = "
+"max-num-seqs`。对于其他值，建议将其设置为 Decode（D）节点上频繁出现的请求数量。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:652
 msgid ""
@@ -325,7 +360,8 @@ msgid ""
 "requests will be sent to the prefill node to recompute the KV Cache. In "
 "the PD separation scenario, it is recommended to enable this "
 "configuration on both prefill and decode nodes simultaneously."
-msgstr "`recompute_scheduler_enable: true`：启用重计算调度器。当 decode 节点的键值缓存（KV Cache）不足时，请求将被发送到 prefill 节点以重新计算 KV Cache。在 PD 分离场景中，建议同时在 prefill 和 decode 节点上启用此配置。"
+msgstr ""
+"`recompute_scheduler_enable: true`：启用重计算调度器。当解码节点的键值缓存（KV Cache）不足时，请求将被发送到预填充节点以重新计算 KV Cache。在 PD 分离场景中，建议同时在预填充和解码节点上启用此配置。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:653
 msgid ""
@@ -333,7 +369,8 @@ msgid ""
 "(TP) size is 1 or `enable_shared_expert_dp: true`, an additional stream "
 "is enabled to overlap the computation process of shared experts for "
 "improved efficiency."
-msgstr "`multistream_overlap_shared_expert: true`：当张量并行（TP）大小为 1 或 `enable_shared_expert_dp: true` 时，启用额外的流来重叠共享专家的计算过程以提高效率。"
+msgstr ""
+"`multistream_overlap_shared_expert: true`：当张量并行（TP）大小为 1 或 `enable_shared_expert_dp: true` 时，启用额外的流来重叠共享专家的计算过程以提高效率。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:655
 msgid "run server for each node:"
@@ -341,7 +378,7 @@ msgstr "为每个节点运行服务器："

 #: ../../source/tutorials/models/Kimi-K2.5.md:668
 msgid "Run the `proxy.sh` script on the prefill master node"
-msgstr "在 prefill 主节点上运行 `proxy.sh` 脚本"
+msgstr "在预填充主节点上运行 `proxy.sh` 脚本"

 #: ../../source/tutorials/models/Kimi-K2.5.md:670
 msgid ""
@@ -350,7 +387,8 @@ msgid ""
 "[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-"
 "project/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
-msgstr "在与 prefiller 服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序：[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
+msgstr ""
+"在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序：[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"

 #: ../../source/tutorials/models/Kimi-K2.5.md:726
 msgid "Functional Verification"
@@ -567,8 +605,8 @@ msgid ""
 msgstr "**问：启动失败，提示 HCCL 端口冲突（地址已被占用）。我该怎么办？**"

 #: ../../source/tutorials/models/Kimi-K2.5.md:812
-msgid "A: Clean up old processes and restart: `pkill -f VLLM*`."
-msgstr "答：清理旧进程并重启：`pkill -f VLLM*`。"
+msgid "A: Clean up old processes and restart: `pkill -f vLLM*`."
+msgstr "答：清理旧进程并重启：`pkill -f vLLM*`。"

 #: ../../source/tutorials/models/Kimi-K2.5.md:814
 msgid "**Q: How to handle OOM or unstable startup?**"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen2.5-Omni.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen2.5-Omni.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -42,7 +42,9 @@ msgid ""
 "including supported features, feature configuration, environment "
 "preparation, single-NPU and multi-NPU deployment, accuracy and "
 "performance evaluation."
-msgstr "`Qwen2.5-Omni` 模型自 `vllm-ascend:v0.11.0rc0` 版本起获得支持。本文档将展示该模型的主要验证步骤，包括支持的特性、特性配置、环境准备、单NPU和多NPU部署、精度和性能评估。"
+msgstr ""
+"`Qwen2.5-Omni` 模型自 `vllm-ascend:v0.11.0rc0` "
+"版本起获得支持。本文档将展示该模型的主要验证步骤，包括支持的特性、特性配置、环境准备、单NPU和多NPU部署、精度和性能评估。"

 #: ../../source/tutorials/models/Qwen2.5-Omni.md:9
 msgid "Supported Features"
@@ -73,13 +75,17 @@ msgstr "模型权重"
 msgid ""
 "`Qwen2.5-Omni-3B`(BF16): [Download model "
 "weight](https://modelscope.cn/models/Qwen/Qwen2.5-Omni-3B)"
-msgstr "`Qwen2.5-Omni-3B`(BF16): [下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-Omni-3B)"
+msgstr ""
+"`Qwen2.5-Omni-3B`(BF16): "
+"[下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-Omni-3B)"

 #: ../../source/tutorials/models/Qwen2.5-Omni.md:20
 msgid ""
 "`Qwen2.5-Omni-7B`(BF16): [Download model "
 "weight](https://modelscope.cn/models/Qwen/Qwen2.5-Omni-7B)"
-msgstr "`Qwen2.5-Omni-7B`(BF16): [下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-Omni-7B)"
+msgstr ""
+"`Qwen2.5-Omni-7B`(BF16): "
+"[下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-Omni-7B)"

 #: ../../source/tutorials/models/Qwen2.5-Omni.md:22
 msgid "Following examples use the 7B version by default."
@@ -98,7 +104,9 @@ msgid ""
 "Select an image based on your machine type and start the docker image on "
 "your node, refer to [using docker](../../installation.md#set-up-using-"
 "docker)."
-msgstr "根据您的机器类型选择镜像并在节点上启动 docker 镜像，请参考[使用 docker](../../installation.md#set-up-using-docker)。"
+msgstr ""
+"根据您的机器类型选择镜像并在节点上启动 docker 镜像，请参考[使用 docker](../../installation.md#set-"
+"up-using-docker)。"

 #: ../../source/tutorials/models/Qwen2.5-Omni.md:65
 msgid "Deployment"
@@ -114,18 +122,22 @@ msgstr "单 NPU (Qwen2.5-Omni-7B)"

 #: ../../source/tutorials/models/Qwen2.5-Omni.md:72
 msgid ""
-"The **environment variable** `LOCAL_MEDIA_PATH` which **allows** API "
-"requests to read local images or videos from directories specified by the"
-" server file system. Please note this is a security risk. Should only be "
-"enabled in trusted environments."
-msgstr "**环境变量** `LOCAL_MEDIA_PATH` **允许** API 请求从服务器文件系统指定的目录读取本地图像或视频。请注意，这存在安全风险。应仅在受信任的环境中启用。"
+"The environment variable `LOCAL_MEDIA_PATH` which allows API requests to "
+"read local images or videos from directories specified by the server file"
+" system. Please note this is a security risk. Should only be enabled in "
+"trusted environments."
+msgstr ""
+"环境变量 `LOCAL_MEDIA_PATH` 允许 API "
+"请求从服务器文件系统指定的目录读取本地图像或视频。请注意，这存在安全风险。应仅在受信任的环境中启用。"

 #: ../../source/tutorials/models/Qwen2.5-Omni.md:92
 msgid ""
 "Now vllm-ascend docker image should contain vllm[audio] build part, if "
 "you encounter *audio not supported issue* by any chance, please re-build "
 "vllm with [audio] flag."
-msgstr "当前 vllm-ascend docker 镜像应包含 vllm[audio] 构建部分，如果您遇到*音频不支持的问题*，请使用 [audio] 标志重新构建 vllm。"
+msgstr ""
+"当前 vllm-ascend docker 镜像应包含 vllm[audio] 构建部分，如果您遇到*音频不支持的问题*，请使用 [audio] "
+"标志重新构建 vllm。"

 #: ../../source/tutorials/models/Qwen2.5-Omni.md:100
 msgid ""
@@ -162,8 +174,8 @@ msgid "Functional Verification"
 msgstr "功能验证"

 #: ../../source/tutorials/models/Qwen2.5-Omni.md:131
-msgid "If your service **starts** successfully, you can see the info shown below:"
-msgstr "如果您的服务**启动**成功，您可以看到如下所示的信息："
+msgid "If your service starts successfully, you can see the info shown below:"
+msgstr "如果您的服务启动成功，您可以看到如下所示的信息："

 #: ../../source/tutorials/models/Qwen2.5-Omni.md:139
 msgid "Once your server is started, you can query the model with input prompts:"
@@ -258,7 +270,10 @@ msgid ""
 "Refer to [Using AISBench for performance "
 "evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
 "performance-evaluation) for details."
-msgstr "详情请参考[使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
+msgstr ""
+"详情请参考[使用 AISBench "
+"进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-"
+"performance-evaluation)。"

 #: ../../source/tutorials/models/Qwen2.5-Omni.md:194
 msgid "Using vLLM Benchmark"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Dense.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Dense.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -35,9 +35,8 @@ msgid ""
 "advancements in reasoning, instruction-following, agent capabilities, and"
 " multilingual support."
 msgstr ""
-"Qwen3 是 Qwen 系列最新一代的大语言模型，提供了一套完整的稠密模型和专家混合"
-"(MoE) 模型。基于广泛的训练，Qwen3 在推理、指令遵循、智能体能力和多语言支持方"
-"面实现了突破性进展。"
+"Qwen3 是 Qwen 系列最新一代的大语言模型，提供了一套完整的稠密模型和专家混合(MoE) 模型。基于广泛的训练，Qwen3 "
+"在推理、指令遵循、智能体能力和多语言支持方面实现了突破性进展。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:7
 msgid ""
@@ -47,18 +46,15 @@ msgid ""
 "optimization points. We will also explore how adjusting service "
 "parameters can maximize throughput performance across various scenarios."
 msgstr ""
-"欢迎阅读在 vLLM-Ascend 环境中优化 Qwen 稠密模型的教程。本指南将帮助您为您的用"
-"例配置最有效的设置，并通过实际示例突出关键优化点。我们还将探讨如何调整服务参"
-"数以在各种场景下最大化吞吐性能。"
+"欢迎阅读在 vLLM-Ascend 环境中优化 Qwen "
+"稠密模型的教程。本指南将帮助您为您的用例配置最有效的设置，并通过实际示例突出关键优化点。我们还将探讨如何调整服务参数以在各种场景下最大化吞吐性能。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:9
 msgid ""
 "This document will show the main verification steps of the model, "
 "including supported features, feature configuration, environment "
 "preparation, accuracy and performance evaluation."
-msgstr ""
-"本文档将展示模型的主要验证步骤，包括支持的特性、特性配置、环境准备、精度和性"
-"能评估。"
+msgstr "本文档将展示模型的主要验证步骤，包括支持的特性、特性配置、环境准备、精度和性能评估。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:11
 msgid ""
@@ -68,11 +64,9 @@ msgid ""
 "20250429). This example requires version **v0.11.0rc2**. Earlier versions"
 " may lack certain features."
 msgstr ""
-"Qwen3 稠密模型首次在 "
-"[v0.8.4rc2](https://github.com/vllm-project/vllm-"
+"Qwen3 稠密模型首次在 [v0.8.4rc2](https://github.com/vllm-project/vllm-"
 "ascend/blob/main/docs/source/user_guide/release_notes.md#v084rc2---"
-"20250429) 中得到支持。本示例需要版本 **v0.11.0rc2**。更早的版本可能缺少某些特"
-"性。"
+"20250429) 中得到支持。本示例需要版本 **v0.11.0rc2**。更早的版本可能缺少某些特性。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:13
 msgid "Supported Features"
@@ -84,16 +78,14 @@ msgid ""
 "features](../../user_guide/support_matrix/supported_models.md) to get the"
 " model's supported feature matrix."
 msgstr ""
-"请参考 [支持的特性](../../user_guide/support_matrix/supported_models."
-"md) 以获取模型支持的特性矩阵。"
+"请参考 [支持的特性](../../user_guide/support_matrix/supported_models.md) "
+"以获取模型支持的特性矩阵。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:17
 msgid ""
 "Refer to [feature guide](../../user_guide/feature_guide/index.md) to get "
 "the feature's configuration."
-msgstr ""
-"请参考 [特性指南](../../user_guide/feature_guide/index.md) 以获取特性的配置信"
-"息。"
+msgstr "请参考 [特性指南](../../user_guide/feature_guide/index.md) 以获取特性的配置信息。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:19
 msgid "Environment Preparation"
@@ -109,9 +101,9 @@ msgid ""
 "Atlas 800I A2 (64G × 1) card. [Download model "
 "weight](https://modelers.cn/models/Modelers_Park/Qwen3-0.6B)"
 msgstr ""
-"`Qwen3-0.6B`(BF16 版本): 需要 1 张 Atlas 800 A3 (64G × 2) 卡或 1 张 Atlas "
-"800I A2 (64G × 1) 卡。[下载模型权重](https://modelers.cn/models/"
-"Modelers_Park/Qwen3-0.6B)"
+"`Qwen3-0.6B`(BF16 版本): 需要 1 张 Atlas 800 A3 (64G × 2) 卡或 1 张 Atlas 800I A2"
+" (64G × 1) "
+"卡。[下载模型权重](https://modelers.cn/models/Modelers_Park/Qwen3-0.6B)"

 #: ../../source/tutorials/models/Qwen3-Dense.md:24
 msgid ""
@@ -119,9 +111,9 @@ msgid ""
 "Atlas 800I A2 (64G × 1) card. [Download model "
 "weight](https://modelers.cn/models/Modelers_Park/Qwen3-1.7B)"
 msgstr ""
-"`Qwen3-1.7B`(BF16 版本): 需要 1 张 Atlas 800 A3 (64G × 2) 卡或 1 张 Atlas "
-"800I A2 (64G × 1) 卡。[下载模型权重](https://modelers.cn/models/"
-"Modelers_Park/Qwen3-1.7B)"
+"`Qwen3-1.7B`(BF16 版本): 需要 1 张 Atlas 800 A3 (64G × 2) 卡或 1 张 Atlas 800I A2"
+" (64G × 1) "
+"卡。[下载模型权重](https://modelers.cn/models/Modelers_Park/Qwen3-1.7B)"

 #: ../../source/tutorials/models/Qwen3-Dense.md:25
 msgid ""
@@ -129,9 +121,8 @@ msgid ""
 "Atlas 800I A2 (64G × 1) card. [Download model "
 "weight](https://modelers.cn/models/Modelers_Park/Qwen3-4B)"
 msgstr ""
-"`Qwen3-4B`(BF16 版本): 需要 1 张 Atlas 800 A3 (64G × 2) 卡或 1 张 Atlas "
-"800I A2 (64G × 1) 卡。[下载模型权重](https://modelers.cn/models/"
-"Modelers_Park/Qwen3-4B)"
+"`Qwen3-4B`(BF16 版本): 需要 1 张 Atlas 800 A3 (64G × 2) 卡或 1 张 Atlas 800I A2 "
+"(64G × 1) 卡。[下载模型权重](https://modelers.cn/models/Modelers_Park/Qwen3-4B)"

 #: ../../source/tutorials/models/Qwen3-Dense.md:26
 msgid ""
@@ -139,9 +130,8 @@ msgid ""
 "Atlas 800I A2 (64G × 1) card. [Download model "
 "weight](https://modelers.cn/models/Modelers_Park/Qwen3-8B)"
 msgstr ""
-"`Qwen3-8B`(BF16 版本): 需要 1 张 Atlas 800 A3 (64G × 2) 卡或 1 张 Atlas "
-"800I A2 (64G × 1) 卡。[下载模型权重](https://modelers.cn/models/"
-"Modelers_Park/Qwen3-8B)"
+"`Qwen3-8B`(BF16 版本): 需要 1 张 Atlas 800 A3 (64G × 2) 卡或 1 张 Atlas 800I A2 "
+"(64G × 1) 卡。[下载模型权重](https://modelers.cn/models/Modelers_Park/Qwen3-8B)"

 #: ../../source/tutorials/models/Qwen3-Dense.md:27
 msgid ""
@@ -149,9 +139,8 @@ msgid ""
 "Atlas 800I A2 (64G × 1) cards. [Download model "
 "weight](https://modelers.cn/models/Modelers_Park/Qwen3-14B)"
 msgstr ""
-"`Qwen3-14B`(BF16 版本): 需要 1 张 Atlas 800 A3 (64G × 2) 卡或 2 张 Atlas "
-"800I A2 (64G × 1) 卡。[下载模型权重](https://modelers.cn/models/"
-"Modelers_Park/Qwen3-14B)"
+"`Qwen3-14B`(BF16 版本): 需要 1 张 Atlas 800 A3 (64G × 2) 卡或 2 张 Atlas 800I A2 "
+"(64G × 1) 卡。[下载模型权重](https://modelers.cn/models/Modelers_Park/Qwen3-14B)"

 #: ../../source/tutorials/models/Qwen3-Dense.md:28
 msgid ""
@@ -159,9 +148,8 @@ msgid ""
 "Atlas 800I A2 (64G × 4) cards. [Download model "
 "weight](https://modelers.cn/models/Modelers_Park/Qwen3-32B)"
 msgstr ""
-"`Qwen3-32B`(BF16 版本): 需要 2 张 Atlas 800 A3 (64G × 4) 卡或 4 张 Atlas "
-"800I A2 (64G × 4) 卡。[下载模型权重](https://modelers.cn/models/"
-"Modelers_Park/Qwen3-32B)"
+"`Qwen3-32B`(BF16 版本): 需要 2 张 Atlas 800 A3 (64G × 4) 卡或 4 张 Atlas 800I A2 "
+"(64G × 4) 卡。[下载模型权重](https://modelers.cn/models/Modelers_Park/Qwen3-32B)"

 #: ../../source/tutorials/models/Qwen3-Dense.md:29
 msgid ""
@@ -169,9 +157,9 @@ msgid ""
 "cards or 4 Atlas 800I A2 (64G × 4) cards. [Download model "
 "weight](https://www.modelscope.cn/models/vllm-ascend/Qwen3-32B-W8A8)"
 msgstr ""
-"`Qwen3-32B-W8A8`(量化版本): 需要 2 张 Atlas 800 A3 (64G × 4) 卡或 4 张 "
-"Atlas 800I A2 (64G × 4) 卡。[下载模型权重](https://www.modelscope.cn/"
-"models/vllm-ascend/Qwen3-32B-W8A8)"
+"`Qwen3-32B-W8A8`(量化版本): 需要 2 张 Atlas 800 A3 (64G × 4) 卡或 4 张 Atlas 800I "
+"A2 (64G × 4) 卡。[下载模型权重](https://www.modelscope.cn/models/vllm-"
+"ascend/Qwen3-32B-W8A8)"

 #: ../../source/tutorials/models/Qwen3-Dense.md:31
 msgid ""
@@ -195,8 +183,8 @@ msgid ""
 "node communication according to [verify multi-node communication "
 "environment](../../installation.md#verify-multi-node-communication)."
 msgstr ""
-"如果您想部署多节点环境，需要根据 [验证多节点通信环境](../../installation."
-"md#verify-multi-node-communication) 来验证多节点通信。"
+"如果您想部署多节点环境，需要根据 [验证多节点通信环境](../../installation.md#verify-multi-node-"
+"communication) 来验证多节点通信。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:39
 msgid "Installation"
@@ -208,8 +196,9 @@ msgid ""
 "Currently, we provide the all-in-one images.[Download "
 "images](https://quay.io/repository/ascend/vllm-ascend?tab=tags)"
 msgstr ""
-"您可以使用我们的官方 docker 镜像来支持 Qwen3 稠密模型。目前，我们提供一体化镜"
-"像。[下载镜像](https://quay.io/repository/ascend/vllm-ascend?tab=tags)"
+"您可以使用我们的官方 docker 镜像来支持 Qwen3 "
+"稠密模型。目前，我们提供一体化镜像。[下载镜像](https://quay.io/repository/ascend/vllm-"
+"ascend?tab=tags)"

 #: ../../source/tutorials/models/Qwen3-Dense.md:44
 msgid "Docker Pull (by tag)"
@@ -227,18 +216,15 @@ msgid ""
 " (`pip install -e`) to help developer immediately take place changes "
 "without requiring a new installation."
 msgstr ""
-"默认工作目录是 `/workspace`，vLLM 和 vLLM Ascend 代码放置在 `/vllm-"
-"workspace` 中，并以 [开发模式](https://setuptools.pypa.io/en/latest/"
-"userguide/development_mode.html) (`pip install -e`) 安装，以帮助开发者立即应用"
-"更改而无需重新安装。"
+"默认工作目录是 `/workspace`，vLLM 和 vLLM Ascend 代码放置在 `/vllm-workspace` 中，并以 "
+"[开发模式](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)"
+" (`pip install -e`) 安装，以帮助开发者立即应用更改而无需重新安装。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:92
 msgid ""
 "In the [Run docker container](./Qwen3-Dense.md#run-docker-container), "
 "detailed explanations are provided through specific examples."
-msgstr ""
-"在 [运行 docker 容器](./Qwen3-Dense.md#run-docker-container) 中，通过具体示例"
-"提供了详细说明。"
+msgstr "在 [运行 docker 容器](./Qwen3-Dense.md#run-docker-container) 中，通过具体示例提供了详细说明。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:94
 msgid ""
@@ -273,11 +259,10 @@ msgid ""
 "max_num_batched_tokens, and cudagraph_capture_sizes, to achieve the best "
 "performance."
 msgstr ""
-"在本节中，我们将演示在 vLLM-Ascend 中调整超参数以实现最大推理吞吐性能的最佳实"
-"践。通过定制服务级配置以适应不同的用例，您可以确保您的系统在各种场景下都能达"
-"到最佳性能。我们将指导您如何根据观察到的现象（例如 max_model_len、"
-"max_num_batched_tokens 和 cudagraph_capture_sizes）来微调超参数，以获得最佳性"
-"能。"
+"在本节中，我们将演示在 vLLM-Ascend "
+"中调整超参数以实现最大推理吞吐性能的最佳实践。通过定制服务级配置以适应不同的用例，您可以确保您的系统在各种场景下都能达到最佳性能。我们将指导您如何根据观察到的现象（例如"
+" max_model_len、max_num_batched_tokens 和 "
+"cudagraph_capture_sizes）来微调超参数，以获得最佳性能。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:104
 msgid "The specific example scenario is as follows:"
@@ -364,11 +349,9 @@ msgid ""
 " these scenarios and this parameter will be removed."
 msgstr ""
 "**[可选]** `--additional-config '{\"pa_shape_list\":[48,64,72,80]}'`: "
-"`pa_shape_list` 指定了您希望切换到 PA 算子的批次大小。这是一个临时的调优旋"
-"钮。目前，注意力算子调度默认使用 FIA 算子。在某些批次大小（并发）设置下，FIA "
-"可能性能不佳。通过设置 `pa_shape_list`，当运行时批次大小与列出的值之一匹配时，"
-"vLLM-Ascend 将用 PA 算子替换 FIA 算子以防止性能下降。未来，FIA 将针对这些场景"
-"进行优化，此参数将被移除。"
+"`pa_shape_list` 指定了您希望切换到 PA 算子的批次大小。这是一个临时的调优旋钮。目前，注意力算子调度默认使用 FIA "
+"算子。在某些批次大小（并发）设置下，FIA 可能性能不佳。通过设置 `pa_shape_list`，当运行时批次大小与列出的值之一匹配时"
+"，vLLM-Ascend 将用 PA 算子替换 FIA 算子以防止性能下降。未来，FIA 将针对这些场景进行优化，此参数将被移除。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:198
 #, python-brace-format
@@ -381,10 +364,10 @@ msgid ""
 "\"FULL_DECODE_ONLY\", "
 "\"cudagraph_capture_sizes\":[1,8,24,48,60,64,72,76]}'`."
 msgstr ""
-"如果需要极致性能，可以启用 cudagraph_capture_sizes 参数，参考：[关键优化"
-"点](./Qwen3-Dense.md#key-optimization-points)、[优化亮点](./Qwen3-"
-"Dense.md#optimization-highlights)。以下是批次大小为 72 的示例：`--compilation-"
-"config '{\"cudagraph_mode\": \"FULL_DECODE_ONLY\", "
+"如果需要极致性能，可以启用 cudagraph_capture_sizes 参数，参考：[关键优化点](./Qwen3-Dense.md#key-"
+"optimization-points)、[优化亮点](./Qwen3-Dense.md#optimization-"
+"highlights)。以下是批次大小为 72 的示例：`--compilation-config '{\"cudagraph_mode\": "
+"\"FULL_DECODE_ONLY\", "
 "\"cudagraph_capture_sizes\":[1,8,24,48,60,64,72,76]}'`。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:201
@@ -423,7 +406,7 @@ msgid ""
 "Refer to [Using "
 "AISBench](../../developer_guide/evaluation/using_ais_bench.md) for "
 "details."
-msgstr "详情请参阅[使用AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"
+msgstr "详情请参阅[使用 AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:273
 msgid ""
@@ -512,11 +495,13 @@ msgid ""
 "Refer to [Using AISBench for performance "
 "evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
 "performance-evaluation) for details."
-msgstr "详情请参阅[使用AISBench进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
+msgstr ""
+"详情请参阅[使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md"
+"#execute-performance-evaluation)。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:287
 msgid "Using vLLM Benchmark"
-msgstr "使用vLLM基准测试"
+msgstr "使用 vLLM 基准测试"

 #: ../../source/tutorials/models/Qwen3-Dense.md:289
 msgid "Run performance evaluation of `Qwen3-32B-W8A8` as an example."
@@ -526,7 +511,7 @@ msgstr "以运行 `Qwen3-32B-W8A8` 的性能评估为例。"
 msgid ""
 "Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/benchmarking/) "
 "for more details."
-msgstr "更多详情请参阅[vllm基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"
+msgstr "更多详情请参阅 [vLLM 基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:293
 msgid "There are three `vllm bench` subcommands:"
@@ -564,11 +549,11 @@ msgid ""
 "significantly improve the performance of Qwen Dense models. These "
 "techniques are designed to enhance throughput and efficiency across "
 "various scenarios."
-msgstr "本节将介绍能显著提升Qwen Dense模型性能的关键优化点。这些技术旨在提升各种场景下的吞吐量和效率。"
+msgstr "本节将介绍能显著提升 Qwen Dense 模型性能的关键优化点。这些技术旨在提升各种场景下的吞吐量和效率。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:316
 msgid "1. Rope Optimization"
-msgstr "1. Rope优化"
+msgstr "1. Rope 优化"

 #: ../../source/tutorials/models/Qwen3-Dense.md:318
 msgid ""
@@ -578,7 +563,9 @@ msgid ""
 "performed during the first layer of the forward pass. For subsequent "
 "layers, the position encoding is directly reused, eliminating redundant "
 "calculations and significantly speeding up inference in decode phase."
-msgstr "Rope优化通过修改位置编码过程来提升模型效率。具体来说，它确保 `cos_sin_cache` 及相关索引选择操作仅在正向传播的第一层执行。对于后续层，位置编码被直接复用，消除了冗余计算，并显著加快了解码阶段的推理速度。"
+msgstr ""
+"Rope 优化通过修改位置编码过程来提升模型效率。具体来说，它确保 `cos_sin_cache` "
+"及相关索引选择操作仅在正向传播的第一层执行。对于后续层，位置编码被直接复用，消除了冗余计算，并显著加快了解码阶段的推理速度。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:320
 #: ../../source/tutorials/models/Qwen3-Dense.md:326
@@ -590,14 +577,14 @@ msgstr "此优化默认启用，无需设置任何额外的环境变量。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:322
 msgid "2. AddRMSNormQuant Fusion"
-msgstr "2. AddRMSNormQuant融合"
+msgstr "2. AddRMSNormQuant 融合"

 #: ../../source/tutorials/models/Qwen3-Dense.md:324
 msgid ""
 "AddRMSNormQuant fusion merges the Address-wise Multi-Scale Normalization "
 "and Quantization operations, allowing for more efficient memory access "
 "and computation, thereby enhancing throughput."
-msgstr "AddRMSNormQuant融合将地址感知多尺度归一化与量化操作合并，实现了更高效的内存访问和计算，从而提升了吞吐量。"
+msgstr "AddRMSNormQuant 融合将地址感知多尺度归一化与量化操作合并，实现了更高效的内存访问和计算，从而提升了吞吐量。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:328
 msgid "3. FlashComm_v1"
@@ -612,7 +599,9 @@ msgid ""
 "processing. In quantization scenarios, FlashComm_v1 also reduces the "
 "communication overhead by decreasing the bit-level data transfer, which "
 "further minimizes the end-to-end latency during the prefill phase."
-msgstr "FlashComm_v1通过将传统的allreduce集合通信分解为reduce-scatter和all-gather，显著提升了大批量场景下的性能。这种分解有助于减少RMSNorm令牌维度的计算，从而实现更高效的处理。在量化场景中，FlashComm_v1还通过减少比特级数据传输来降低通信开销，从而进一步最小化预填充阶段的端到端延迟。"
+msgstr ""
+"FlashComm_v1 通过将传统的 allreduce 集合通信分解为 reduce-scatter 和 all-"
+"gather，显著提升了大批量场景下的性能。这种分解有助于减少 RMSNorm 令牌维度的计算，从而实现更高效的处理。在量化场景中，FlashComm_v1 还通过减少比特级数据传输来降低通信开销，从而进一步最小化预填充阶段的端到端延迟。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:332
 msgid ""
@@ -626,7 +615,9 @@ msgid ""
 "exceeds the threshold. This ensures that the feature is only activated in"
 " scenarios where it improves performance, avoiding potential degradation "
 "in lower-concurrency situations."
-msgstr "需要注意的是，将allreduce通信分解为reduce-scatter和all-gather操作仅在无显著通信降级的高并发场景下有益。在其他情况下，这种分解可能导致明显的性能下降。为缓解此问题，当前实现采用基于阈值的方法，仅当每个推理调度的实际令牌数超过阈值时才启用FlashComm_v1。这确保了该功能仅在能提升性能的场景下激活，避免了在低并发情况下可能出现的性能下降。"
+msgstr ""
+"需要注意的是，将 allreduce 通信分解为 reduce-scatter 和 all-"
+"gather 操作仅在无显著通信降级的高并发场景下有益。在其他情况下，这种分解可能导致明显的性能下降。为缓解此问题，当前实现采用基于阈值的方法，仅当每个推理调度的实际令牌数超过阈值时才启用 FlashComm_v1。这确保了该功能仅在能提升性能的场景下激活，避免了在低并发情况下可能出现的性能下降。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:334
 msgid ""
@@ -636,7 +627,7 @@ msgstr "此优化需要设置环境变量 `VLLM_ASCEND_ENABLE_FLASHCOMM1 = 1`

 #: ../../source/tutorials/models/Qwen3-Dense.md:336
 msgid "4. Matmul and ReduceScatter Fusion"
-msgstr "4. 矩阵乘法和ReduceScatter融合"
+msgstr "4. 矩阵乘法和 ReduceScatter 融合"

 #: ../../source/tutorials/models/Qwen3-Dense.md:338
 msgid ""
@@ -648,7 +639,7 @@ msgid ""
 "communication steps, improves computational efficiency, and allows for "
 "better resource utilization, resulting in enhanced throughput, especially"
 " in large-scale distributed environments."
-msgstr "一旦启用FlashComm_v1，可以应用额外的优化。此优化融合了矩阵乘法和ReduceScatter操作，并包含分片优化。矩阵乘法计算被视为一个流水线，而ReduceScatter和反量化操作则在另一个独立的流水线中处理。这种方法显著减少了通信步骤，提高了计算效率，并实现了更好的资源利用，从而提升了吞吐量，尤其在大规模分布式环境中效果显著。"
+msgstr "一旦启用 FlashComm_v1，可以应用额外的优化。此优化融合了矩阵乘法和 ReduceScatter 操作，并包含分片优化。矩阵乘法计算被视为一个流水线，而 ReduceScatter 和反量化操作则在另一个独立的流水线中处理。这种方法显著减少了通信步骤，提高了计算效率，并实现了更好的资源利用，从而提升了吞吐量，尤其在大规模分布式环境中效果显著。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:340
 msgid ""
@@ -658,7 +649,7 @@ msgid ""
 " is currently used to mitigate this problem. The optimization is only "
 "applied when the token count exceeds the threshold, ensuring that it is "
 "not enabled in cases where it could negatively impact performance."
-msgstr "此优化在FlashComm_v1激活后会自动启用。然而，由于融合后在小并发场景下存在性能下降的问题，目前采用基于阈值的方法来缓解此问题。该优化仅在令牌数超过阈值时应用，确保在可能对性能产生负面影响的情况下不被启用。"
+msgstr "此优化在 FlashComm_v1 激活后会自动启用。然而，由于融合后在小并发场景下存在性能下降的问题，目前采用基于阈值的方法来缓解此问题。该优化仅在令牌数超过阈值时应用，确保在可能对性能产生负面影响的情况下不被启用。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:342
 msgid "5. Weight Prefetching"
@@ -681,7 +672,7 @@ msgid ""
 "preloaded to L2 cache ahead of time, reducing MTE utilization during the "
 "MLP computations and indirectly improving Cube computation efficiency by "
 "minimizing resource contention and optimizing data flow."
-msgstr "在稠密模型场景中，MLP的gate_up_proj和down_proj线性层通常表现出相对较高的MTE利用率。为解决此问题，我们创建了一个专门用于权重预取的独立流水线，该流水线与MLP之前的原始向量计算流水线（如RMSNorm和SiLU）并行运行。这种方法允许权重提前预加载到L2缓存中，从而降低MLP计算期间的MTE利用率，并通过最小化资源争用和优化数据流，间接提升Cube计算效率。"
+msgstr "在稠密模型场景中，MLP 的 gate_up_proj 和 down_proj 线性层通常表现出相对较高的 MTE 利用率。为解决此问题，我们创建了一个专门用于权重预取的独立流水线，该流水线与 MLP 之前的原始向量计算流水线（如 RMSNorm 和 SiLU）并行运行。这种方法允许权重提前预加载到 L2 缓存中，从而降低 MLP 计算期间的 MTE 利用率，并通过最小化资源争用和优化数据流，间接提升 Cube 计算效率。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:348
 #, python-brace-format
@@ -695,11 +686,17 @@ msgid ""
 "\"enabled\": true, \"prefetch_ratio\": { \"mlp\": { \"gate_up\": 1.0, "
 "\"down\": 1.0}}}. See User Guide->Feature Guide->Weight Prefetch Guide "
 "for details."
-msgstr "之前用于启用MLP权重预取的环境变量 `VLLM_ASCEND_ENABLE_PREFETCH_MLP`，以及用于设置MLP gate_up_proj和down_proj权重预取大小的 `VLLM_ASCEND_MLP_GATE_UP_PREFETCH_SIZE` 和 `VLLM_ASCEND_MLP_DOWN_PREFETCH_SIZE` 已被弃用。请改用以下配置：`\"weight_prefetch_config\": { \"enabled\": true, \"prefetch_ratio\": { \"mlp\": { \"gate_up\": 1.0, \"down\": 1.0}}}`。详情请参阅用户指南->功能指南->权重预取指南。"
+msgstr ""
+"此前用于启用MLP权重预取的环境变量 `VLLM_ASCEND_ENABLE_PREFETCH_MLP`，以及用于设置MLP "
+"gate_up_proj和down_proj权重预取大小的 `VLLM_ASCEND_MLP_GATE_UP_PREFETCH_SIZE` 和 "
+"`VLLM_ASCEND_MLP_DOWN_PREFETCH_SIZE` "
+"已被弃用。请改用以下配置：`\"weight_prefetch_config\": { \"enabled\": true, "
+"\"prefetch_ratio\": { \"mlp\": { \"gate_up\": 1.0, \"down\": "
+"1.0}}}`。详情请参阅用户指南->功能指南->权重预取指南。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:350
 msgid "6. Zerolike Elimination"
-msgstr "6. Zerolike消除"
+msgstr "6. 类零消除"

 #: ../../source/tutorials/models/Qwen3-Dense.md:352
 msgid ""
@@ -731,7 +728,9 @@ msgid ""
 "The configuration compilation_config = { \"cudagraph_mode\": "
 "\"FULL_DECODE_ONLY\"} is used when starting the service. This setup is "
 "necessary to enable the aclgraph's full decode-only mode."
-msgstr "启动服务时使用配置 `compilation_config = { \"cudagraph_mode\": \"FULL_DECODE_ONLY\"}`。此设置对于启用aclgraph的完全仅解码模式是必需的。"
+msgstr ""
+"启动服务时使用配置 `compilation_config = { \"cudagraph_mode\": "
+"\"FULL_DECODE_ONLY\"}`。此设置对于启用aclgraph的完全仅解码模式是必需的。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:362
 msgid "8. Asynchronous Scheduling"
@@ -785,13 +784,11 @@ msgid ""
 "18MB. The reason for this is that, at this value, the vector computations"
 " of RMSNorm and SiLU can effectively hide the prefetch stream, thereby "
 "accelerating the Matmul computations of the two linear layers."
-msgstr ""
-"例如，在上述实际场景中，我将MLP中gate_up_proj和down_proj的预取缓冲区大小设置为18MB。"
-"这样做的原因是，在此数值下，RMSNorm和SiLU的向量计算能够有效隐藏预取流，从而加速两个线性层的Matmul计算。"
+msgstr "例如，在上述实际场景中，我将MLP中gate_up_proj和down_proj的预取缓冲区大小设置为18MB。这样做的原因是，在此数值下，RMSNorm和SiLU的向量计算能够有效隐藏预取流，从而加速两个线性层的Matmul计算。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:378
 msgid "2.Max-num-batched-tokens"
-msgstr "2.最大批处理令牌数"
+msgstr "2. 最大批处理令牌数"

 #: ../../source/tutorials/models/Qwen3-Dense.md:380
 msgid ""
@@ -802,24 +799,22 @@ msgid ""
 "processed per batch, potentially leading to inefficiencies. Conversely, "
 "setting it too large increases the risk of Out of Memory (OOM) errors due"
 " to excessive memory consumption."
-msgstr ""
-"最大批处理令牌数参数决定了单批次可处理的令牌数量上限。调整此值有助于平衡吞吐量与内存使用。"
-"若设置过小，每批次处理的令牌数较少，可能降低效率，从而对端到端性能产生负面影响。"
-"反之，若设置过大，则会因内存消耗过高而增加内存溢出（OOM）错误的风险。"
+msgstr "最大批处理令牌数参数决定了单批次可处理的令牌数量上限。调整此值有助于平衡吞吐量与内存使用。若设置过小，每批次处理的令牌数较少，可能降低效率，从而对端到端性能产生负面影响。反之，若设置过大，则会因内存消耗过高而增加内存溢出（OOM）错误的风险。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:382
 msgid ""
 "In the above real-world scenario, we not only conducted extensive testing"
 " to determine the most cost-effective value, but also took into account "
 "the accumulation of decode tokens when enabling chunked prefill. If the "
-"value is set too small, a single request may被分块多次，并且在推理的早期阶段，一个批次可能只包含少量解码令牌。这可能导致端到端吞吐量达不到预期。"
-msgstr ""
-"在上述实际场景中，我们不仅通过大量测试确定了最具性价比的数值，还考虑了启用分块预填充时解码令牌的累积问题。"
-"若该值设置过小，单个请求可能被多次分块处理，且在推理早期阶段，单个批次可能仅包含少量解码令牌，从而导致端到端吞吐量无法达到预期。"
+"value is set too small, a single request may be chunked multiple times, "
+"and during the early stages of inference, a batch may contain only a "
+"small number of decode tokens. This can result in the end-to-end "
+"throughput falling short of expectations."
+msgstr "在上述实际场景中，我们不仅通过大量测试确定了最具性价比的数值，还考虑了启用分块预填充时解码令牌的累积问题。若该值设置过小，单个请求可能被多次分块处理，且在推理早期阶段，单个批次可能仅包含少量解码令牌，从而导致端到端吞吐量无法达到预期。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:384
 msgid "3.Cudagraph_capture_sizes"
-msgstr "3.CUDA图捕获尺寸"
+msgstr "3. CUDA图捕获尺寸"

 #: ../../source/tutorials/models/Qwen3-Dense.md:386
 msgid ""
@@ -827,8 +822,7 @@ msgid ""
 "captures during the inference process. Adjusting this value determines "
 "how much of the computation graph is captured at once, which can "
 "significantly impact both performance and memory usage."
-msgstr ""
-"CUDA图捕获尺寸参数控制推理过程中图捕获的粒度。调整此值决定了单次捕获的计算图范围，这对性能和内存使用均有显著影响。"
+msgstr "CUDA图捕获尺寸参数控制推理过程中图捕获的粒度。调整此值决定了单次捕获的计算图范围，这对性能和内存使用均有显著影响。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:388
 msgid ""
@@ -839,9 +833,7 @@ msgid ""
 " between two sizes, the framework will automatically pad the token count "
 "to the larger size. This often leads to actual performance deviating from"
 " the expected or even degrading."
-msgstr ""
-"若未手动指定此列表，系统将自动填充一系列均匀分布的值，这通常能保证良好性能。"
-"但若需进一步微调，手动指定数值将获得更佳效果。这是因为当批次大小介于两个尺寸之间时，框架会自动将令牌数填充至较大尺寸，这常导致实际性能偏离预期甚至下降。"
+msgstr "若未手动指定此列表，系统将自动填充一系列均匀分布的值，这通常能保证良好性能。但若需进一步微调，手动指定数值将获得更佳效果。这是因为当批次大小介于两个尺寸之间时，框架会自动将令牌数填充至较大尺寸，这常导致实际性能偏离预期甚至下降。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:390
 msgid ""
@@ -850,9 +842,7 @@ msgid ""
 "actually included in the cudagraph_capture_sizes list. This way, during "
 "the decode phase, padding operations are essentially avoided, ensuring "
 "the reliability of the experimental data."
-msgstr ""
-"因此，如上述实际场景所示，在调整基准测试请求并发度时，我们始终确保并发度实际包含在CUDA图捕获尺寸列表中。"
-"这样在解码阶段基本避免了填充操作，从而保证了实验数据的可靠性。"
+msgstr "因此，如上述实际场景所示，在调整基准测试请求并发度时，我们始终确保并发度实际包含在CUDA图捕获尺寸列表中。这样在解码阶段基本避免了填充操作，从而保证了实验数据的可靠性。"

 #: ../../source/tutorials/models/Qwen3-Dense.md:392
 msgid ""
@@ -861,6 +851,4 @@ msgid ""
 "not meet this condition will be automatically filtered out. Therefore, I "
 "recommend incrementally adding concurrency based on the TP size after "
 "enabling FlashComm_v1."
-msgstr ""
-"需特别注意，若启用FlashComm_v1，此列表中的值必须是TP尺寸的整数倍。不满足此条件的任何值都将被自动过滤。"
-"因此，建议在启用FlashComm_v1后，基于TP尺寸逐步增加并发度。"
+msgstr "需特别注意，若启用FlashComm_v1，此列表中的值必须是TP尺寸的整数倍。不满足此条件的任何值都将被自动过滤。因此，建议在启用FlashComm_v1后，基于TP尺寸逐步增加并发度。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -37,7 +37,9 @@ msgid ""
 "equipped with chain-of-thought reasoning, supporting audio, video, and "
 "text input, with text output."
 msgstr ""
-"Qwen3-Omni 是原生端到端多语言全模态基础模型。它能处理文本、图像、音频和视频，并以文本和自然语音形式提供实时流式响应。我们引入了多项架构升级以提升性能和效率。Qwen3-Omni-30B-A3B 的 Thinking 模型包含思考器组件，具备思维链推理能力，支持音频、视频和文本输入，输出为文本。"
+"Qwen3-Omni "
+"是原生端到端多语言全模态基础模型。它能处理文本、图像、音频和视频，并以文本和自然语音形式提供实时流式响应。我们引入了多项架构升级以提升性能和效率。Qwen3"
+"-Omni-30B-A3B 的 Thinking 模型包含思考器组件，具备思维链推理能力，支持音频、视频和文本输入，输出为文本。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:7
 msgid ""
@@ -55,14 +57,18 @@ msgid ""
 "Refer to [supported features](https://docs.vllm.ai/projects/ascend/zh-"
 "cn/latest/user_guide/support_matrix/supported_models.html) to get the "
 "model's supported feature matrix."
-msgstr "请参考 [支持的功能](https://docs.vllm.ai/projects/ascend/zh-cn/latest/user_guide/support_matrix/supported_models.html) 以获取模型支持的功能矩阵。"
+msgstr ""
+"请参考 [支持的功能](https://docs.vllm.ai/projects/ascend/zh-"
+"cn/latest/user_guide/support_matrix/supported_models.html) 以获取模型支持的功能矩阵。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:13
 msgid ""
 "Refer to [feature guide](https://docs.vllm.ai/projects/ascend/zh-"
 "cn/latest/user_guide/feature_guide/index.html) to get the feature's "
 "configuration."
-msgstr "请参考 [功能指南](https://docs.vllm.ai/projects/ascend/zh-cn/latest/user_guide/feature_guide/index.html) 以获取功能的配置信息。"
+msgstr ""
+"请参考 [功能指南](https://docs.vllm.ai/projects/ascend/zh-"
+"cn/latest/user_guide/feature_guide/index.html) 以获取功能的配置信息。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:15
 msgid "Environment Preparation"
@@ -74,18 +80,20 @@ msgstr "模型权重"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:19
 msgid ""
-"`Qwen3-Omni-30B-A3B-Thinking` requires 2 NPU Cards(64G × 2).[Download "
+"`Qwen3-Omni-30B-A3B-Thinking` requires 2 NPU Cards (64G × 2).[Download "
 "model weight](https://modelscope.cn/models/Qwen/Qwen3-Omni-30B-A3B-"
 "Thinking) It is recommended to download the model weight to the shared "
 "directory of multiple nodes, such as `/root/.cache/`"
 msgstr ""
-"`Qwen3-Omni-30B-A3B-Thinking` 需要 2 张 NPU 卡 (64G × 2)。[下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-Omni-30B-A3B-Thinking)。建议将模型权重下载到多节点的共享目录，例如 `/root/.cache/`。"
+"`Qwen3-Omni-30B-A3B-Thinking` 需要 2 张 NPU 卡 (64G × "
+"2)。[下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-Omni-30B-A3B-"
+"Thinking)。建议将模型权重下载到多节点的共享目录，例如 `/root/.cache/`。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:22
 msgid "Installation"
 msgstr "安装"

-#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md
+#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:24
 msgid "Use docker image"
 msgstr "使用 Docker 镜像"

@@ -100,9 +108,11 @@ msgid ""
 "Select an image based on your machine type and start the docker image on "
 "your node, refer to [using docker](../../installation.md#set-up-using-"
 "docker)."
-msgstr "根据您的机器类型选择镜像并在节点上启动 Docker 镜像，请参考 [使用 Docker](../../installation.md#set-up-using-docker)。"
+msgstr ""
+"根据您的机器类型选择镜像并在节点上启动 Docker 镜像，请参考 [使用 Docker](../../installation.md#set-"
+"up-using-docker)。"

-#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md
+#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:32
 msgid "Build from source"
 msgstr "从源码构建"

@@ -114,7 +124,9 @@ msgstr "您可以从源码构建所有组件。"
 msgid ""
 "Install `vllm-ascend`, refer to [set up using "
 "python](../../installation.md#set-up-using-python)."
-msgstr "安装 `vllm-ascend`，请参考 [使用 Python 设置](../../installation.md#set-up-using-python)。"
+msgstr ""
+"安装 `vllm-ascend`，请参考 [使用 Python 设置](../../installation.md#set-up-using-"
+"python)。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:71
 msgid "Please install system dependencies"
@@ -146,7 +158,9 @@ msgid ""
 "Atlas A2 with 64 GB of NPU card memory, tensor-parallel-size should be at"
 " least 1, and for 32 GB of memory, tensor-parallel-size should be at "
 "least 2."
-msgstr "运行以下脚本在多 NPU 上启动 vLLM 服务器：对于具有 64 GB NPU 卡内存的 Atlas A2，tensor-parallel-size 应至少为 1；对于 32 GB 内存，tensor-parallel-size 应至少为 2。"
+msgstr ""
+"运行以下脚本在多 NPU 上启动 vLLM 服务器：对于具有 64 GB NPU 卡内存的 Atlas A2，tensor-parallel-"
+"size 应至少为 1；对于 32 GB 内存，tensor-parallel-size 应至少为 2。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:188
 msgid "Functional Verification"
@@ -173,25 +187,31 @@ msgid ""
 "As an example, take the `gsm8k` `omnibench` `bbh` dataset as a test "
 "dataset, and run accuracy evaluation of `Qwen3-Omni-30B-A3B-Thinking` in "
 "online mode."
-msgstr "以 `gsm8k`、`omnibench`、`bbh` 数据集作为测试数据集为例，在在线模式下运行 `Qwen3-Omni-30B-A3B-Thinking` 的精度评估。"
+msgstr ""
+"以 `gsm8k`、`omnibench`、`bbh` 数据集作为测试数据集为例，在在线模式下运行 `Qwen3-Omni-30B-A3B-"
+"Thinking` 的精度评估。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:239
 msgid ""
 "Refer to Using "
 "evalscope(<https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_evalscope.html"
 "#install-evalscope-using-pip>) for `evalscope`installation."
-msgstr "关于 `evalscope` 的安装，请参考使用 evalscope (<https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_evalscope.html#install-evalscope-using-pip>)。"
+msgstr ""
+"关于 `evalscope` 的安装，请参考使用 evalscope "
+"(<https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_evalscope.html"
+"#install-evalscope-using-pip>)。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:240
 msgid "Run `evalscope` to execute the accuracy evaluation."
 msgstr "运行 `evalscope` 以执行精度评估。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:255
-#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:296
 msgid ""
 "After execution, you can get the result, here is the result of `Qwen3"
 "-Omni-30B-A3B-Thinking` in vllm-ascend:0.13.0rc1 for reference only."
-msgstr "执行后，您可以获得结果。以下是 `Qwen3-Omni-30B-A3B-Thinking` 在 vllm-ascend:0.13.0rc1 中的结果，仅供参考。"
+msgstr ""
+"执行后，您可以获得结果。以下是 `Qwen3-Omni-30B-A3B-Thinking` 在 vllm-ascend:0.13.0rc1 "
+"中的结果，仅供参考。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:269
 msgid "Performance"
@@ -207,7 +227,9 @@ msgid ""
 "example. Refer to vllm benchmark for more details. Refer to [vllm "
 "benchmark](https://docs.vllm.ai/en/latest/benchmarking/) for more "
 "details."
-msgstr "以运行 `Qwen3-Omni-30B-A3B-Thinking` 的性能评估为例。更多详情请参考 vllm 基准测试。更多详情请参考 [vllm 基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"
+msgstr ""
+"以运行 `Qwen3-Omni-30B-A3B-Thinking` 的性能评估为例。更多详情请参考 vllm 基准测试。更多详情请参考 [vllm"
+" 基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:277
 msgid "There are three `vllm bench` subcommands:"
@@ -227,4 +249,12 @@ msgstr "`throughput`：对离线推理吞吐量进行基准测试。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:283
 msgid "Take the `serve` as an example. Run the code as follows."
-msgstr "以 `serve` 为例。按如下方式运行代码。"
+msgstr "以 `serve` 为例。按如下方式运行代码。"
+
+#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:296
+msgid ""
+"After execution, you can get the result, here is the result of `Qwen3"
+"-Omni-30B-A3B-Thinking` in vllm-ascend:0.13.0rc1 for reference only."
+msgstr ""
+"执行后，您可以获得结果。以下是 `Qwen3-Omni-30B-A3B-Thinking` 在 vllm-ascend:0.13.0rc1 "
+"中的结果，仅供参考。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -79,7 +79,10 @@ msgid ""
 "`Qwen3.5-397B-A17B`(BF16 version): require 2 Atlas 800 A3 (64G × 16) "
 "nodes or 4 Atlas 800 A2 (64G × 8) nodes. [Download model "
 "weight](https://www.modelscope.cn/models/Qwen/Qwen3.5-397B-A17B)"
-msgstr "`Qwen3.5-397B-A17B` (BF16 版本)：需要 2 个 Atlas 800 A3 (64G × 16) 节点或 4 个 Atlas 800 A2 (64G × 8) 节点。[下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3.5-397B-A17B)"
+msgstr ""
+"`Qwen3.5-397B-A17B` (BF16 版本)：需要 2 个 Atlas 800 A3 (64G × 16) 节点或 4 个 "
+"Atlas 800 A2 (64G × 8) "
+"节点。[下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3.5-397B-A17B)"

 #: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:22
 msgid ""
@@ -87,7 +90,10 @@ msgid ""
 "× 16) node or 2 Atlas 800 A2 (64G × 8) nodes. [Download model "
 "weight](https://www.modelscope.cn/models/Eco-Tech/Qwen3.5-397B-A17B-"
 "w8a8-mtp)"
-msgstr "`Qwen3.5-397B-A17B-w8a8` (量化版本)：需要 1 个 Atlas 800 A3 (64G × 16) 节点或 2 个 Atlas 800 A2 (64G × 8) 节点。[下载模型权重](https://www.modelscope.cn/models/Eco-Tech/Qwen3.5-397B-A17B-w8a8-mtp)"
+msgstr ""
+"`Qwen3.5-397B-A17B-w8a8` (量化版本)：需要 1 个 Atlas 800 A3 (64G × 16) 节点或 2 个 "
+"Atlas 800 A2 (64G × 8) 节点。[下载模型权重](https://www.modelscope.cn/models/Eco-"
+"Tech/Qwen3.5-397B-A17B-w8a8-mtp)"

 #: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:24
 msgid ""
@@ -104,13 +110,15 @@ msgid ""
 "If you want to deploy multi-node environment, you need to verify multi-"
 "node communication according to [verify multi-node communication "
 "environment](../../installation.md#verify-multi-node-communication)."
-msgstr "如果您想部署多节点环境，需要根据[验证多节点通信环境](../../installation.md#verify-multi-node-communication)来验证多节点通信。"
+msgstr ""
+"如果您想部署多节点环境，需要根据[验证多节点通信环境](../../installation.md#verify-multi-node-"
+"communication)来验证多节点通信。"

 #: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:30
 msgid "Installation"
 msgstr "安装"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:34
 msgid "Use docker image"
 msgstr "使用 Docker 镜像"

@@ -119,16 +127,20 @@ msgid ""
 "For example, using images `quay.io/ascend/vllm-ascend:v0.17.0rc1`(for "
 "Atlas 800 A2) and `quay.io/ascend/vllm-ascend:v0.17.0rc1-a3`(for Atlas "
 "800 A3)."
-msgstr "例如，使用镜像 `quay.io/ascend/vllm-ascend:v0.17.0rc1`（适用于 Atlas 800 A2）和 `quay.io/ascend/vllm-ascend:v0.17.0rc1-a3`（适用于 Atlas 800 A3）。"
+msgstr ""
+"例如，使用镜像 `quay.io/ascend/vllm-ascend:v0.17.0rc1`（适用于 Atlas 800 A2）和 "
+"`quay.io/ascend/vllm-ascend:v0.17.0rc1-a3`（适用于 Atlas 800 A3）。"

 #: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:38
 msgid ""
 "Select an image based on your machine type and start the docker image on "
 "your node, refer to [using docker](../../installation.md#set-up-using-"
 "docker)."
-msgstr "根据您的机器类型选择镜像并在节点上启动 Docker 镜像，请参考[使用 Docker](../../installation.md#set-up-using-docker)。"
+msgstr ""
+"根据您的机器类型选择镜像并在节点上启动 Docker 镜像，请参考[使用 Docker](../../installation.md#set-"
+"up-using-docker)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:76
 msgid "Build from source"
 msgstr "从源码构建"

@@ -140,7 +152,9 @@ msgstr "您可以从源码构建所有组件。"
 msgid ""
 "Install `vllm-ascend`, refer to [set up using "
 "python](../../installation.md#set-up-using-python)."
-msgstr "安装 `vllm-ascend`，请参考[使用 Python 设置](../../installation.md#set-up-using-python)。"
+msgstr ""
+"安装 `vllm-ascend`，请参考[使用 Python 设置](../../installation.md#set-up-using-"
+"python)。"

 #: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:84
 msgid ""
@@ -158,39 +172,42 @@ msgstr "单节点部署"

 #: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:90
 msgid ""
-"`Qwen3.5-397B-A17B` can be deployed on 2 Atlas 800 A3(64G*16) or 4 Atlas "
-"800 A2(64G*8). `Qwen3.5-397B-A17B-w8a8` can be deployed on 1 Atlas 800 "
-"A3(64G*16) or 2 Atlas 800 A2(64G*8), need to start with parameter "
-"`--quantization ascend`."
-msgstr "`Qwen3.5-397B-A17B` 可以部署在 2 个 Atlas 800 A3(64G*16) 或 4 个 Atlas 800 A2(64G*8) 上。`Qwen3.5-397B-A17B-w8a8` 可以部署在 1 个 Atlas 800 A3(64G*16) 或 2 个 Atlas 800 A2(64G*8) 上，需要使用参数 `--quantization ascend` 启动。"
+"`Qwen3.5-397B-A17B-w8a8` can be deployed on 1 Atlas 800 A3(64G*16) or 2 "
+"Atlas 800 A2(64G*8), need to start with parameter `--quantization "
+"ascend`."
+msgstr ""
+"`Qwen3.5-397B-A17B-w8a8` 可以部署在 1 个 Atlas 800 A3(64G*16) 或 2 个 Atlas 800 "
+"A2(64G*8) 上，需要使用参数 `--quantization ascend` 启动。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:93
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:92
 msgid ""
 "Run the following script to execute online 128k inference On 1 Atlas 800 "
 "A3(64G*16)."
 msgstr "在 1 个 Atlas 800 A3(64G*16) 上运行以下脚本以执行在线 128k 推理。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:134
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:133
 msgid "**Notice:**"
 msgstr "**注意：**"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:136
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:135
 msgid "The parameters are explained as follows:"
 msgstr "参数解释如下："

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:138
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:137
 msgid ""
 "`--data-parallel-size` 1 and `--tensor-parallel-size` 16 are common "
 "settings for data parallelism (DP) and tensor parallelism (TP) sizes."
-msgstr "`--data-parallel-size` 1 和 `--tensor-parallel-size` 16 是数据并行 (DP) 和张量并行 (TP) 大小的常见设置。"
+msgstr ""
+"`--data-parallel-size` 1 和 `--tensor-parallel-size` 16 是数据并行 (DP) 和张量并行 "
+"(TP) 大小的常见设置。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:139
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:138
 msgid ""
 "`--max-model-len` represents the context length, which is the maximum "
 "value of the input plus output for a single request."
 msgstr "`--max-model-len` 表示上下文长度，即单个请求的输入加输出的最大值。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:140
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:139
 msgid ""
 "`--max-num-seqs` indicates the maximum number of requests that each DP "
 "group is allowed to process. If the number of requests sent to the "
@@ -199,36 +216,44 @@ msgid ""
 "state is also counted in metrics such as TTFT and TPOT. Therefore, when "
 "testing performance, it is generally recommended that `--max-num-seqs` * "
 "`--data-parallel-size` >= the actual total concurrency."
-msgstr "`--max-num-seqs` 表示每个 DP 组允许处理的最大请求数。如果发送到服务的请求数超过此限制，多余的请求将保持在等待状态，不会被调度。请注意，在等待状态所花费的时间也会计入 TTFT 和 TPOT 等指标。因此，在测试性能时，通常建议 `--max-num-seqs` * `--data-parallel-size` >= 实际总并发数。"
+msgstr ""
+"`--max-num-seqs` 表示每个 DP "
+"组允许处理的最大请求数。如果发送到服务的请求数超过此限制，多余的请求将保持在等待状态，不会被调度。请注意，在等待状态所花费的时间也会计入 TTFT"
+" 和 TPOT 等指标。因此，在测试性能时，通常建议 `--max-num-seqs` * `--data-parallel-size` >= "
+"实际总并发数。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:141
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:140
 msgid ""
 "`--max-num-batched-tokens` represents the maximum number of tokens that "
 "the model can process in a single step. Currently, vLLM v1 scheduling "
 "enables ChunkPrefill/SplitFuse by default, which means:"
-msgstr "`--max-num-batched-tokens` 表示模型单步可以处理的最大 token 数。目前，vLLM v1 调度默认启用 ChunkPrefill/SplitFuse，这意味着："
+msgstr ""
+"`--max-num-batched-tokens` 表示模型单步可以处理的最大 token 数。目前，vLLM v1 调度默认启用 "
+"ChunkPrefill/SplitFuse，这意味着："

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:142
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:141
 msgid ""
 "(1) If the input length of a request is greater than `--max-num-batched-"
 "tokens`, it will be divided into multiple rounds of computation according"
 " to `--max-num-batched-tokens`;"
-msgstr "(1) 如果请求的输入长度大于 `--max-num-batched-tokens`，它将根据 `--max-num-batched-tokens` 被分成多轮计算；"
+msgstr ""
+"(1) 如果请求的输入长度大于 `--max-num-batched-tokens`，它将根据 `--max-num-batched-"
+"tokens` 被分成多轮计算；"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:143
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:142
 msgid ""
 "(2) Decode requests are prioritized for scheduling, and prefill requests "
 "are scheduled only if there is available capacity."
 msgstr "(2) 解码请求优先调度，只有在有可用容量时才调度预填充请求。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:144
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:143
 msgid ""
 "Generally, if `--max-num-batched-tokens` is set to a larger value, the "
 "overall latency will be lower, but the pressure on GPU memory (activation"
 " value usage) will be greater."
 msgstr "通常，如果 `--max-num-batched-tokens` 设置得较大，整体延迟会更低，但 GPU 内存（激活值使用）的压力会更大。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:145
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:144
 msgid ""
 "`--gpu-memory-utilization` represents the proportion of HBM that vLLM "
 "will use for actual inference. Its essential function is to calculate the"
@@ -242,16 +267,24 @@ msgid ""
 "during actual inference (e.g., due to uneven EP load), setting `--gpu-"
 "memory-utilization` too high may lead to OOM (Out of Memory) issues "
 "during actual inference. The default value is `0.9`."
-msgstr "`--gpu-memory-utilization` 表示 vLLM 将用于实际推理的 HBM 比例。其核心功能是计算可用的 kv_cache 大小。在预热阶段（vLLM 中称为 profile run），vLLM 会记录输入大小为 `--max-num-batched-tokens` 的推理过程中的峰值 GPU 内存使用量。然后，可用的 kv_cache 大小计算为：`--gpu-memory-utilization` * HBM 大小 - 峰值 GPU 内存使用量。因此，`--gpu-memory-utilization` 的值越大，可用的 kv_cache 就越多。然而，由于预热阶段的 GPU 内存使用量可能与实际推理时不同（例如，由于 EP 负载不均），将 `--gpu-memory-utilization` 设置得过高可能导致实际推理时出现 OOM（内存不足）问题。默认值为 `0.9`。"
+msgstr ""
+"`--gpu-memory-utilization` 表示 vLLM 将用于实际推理的 HBM 比例。其核心功能是计算可用的 kv_cache "
+"大小。在预热阶段（vLLM 中称为 profile run），vLLM 会记录输入大小为 `--max-num-batched-tokens` "
+"的推理过程中的峰值 GPU 内存使用量。然后，可用的 kv_cache 大小计算为：`--gpu-memory-utilization` * "
+"HBM 大小 - 峰值 GPU 内存使用量。因此，`--gpu-memory-utilization` 的值越大，可用的 kv_cache "
+"就越多。然而，由于预热阶段的 GPU 内存使用量可能与实际推理时不同（例如，由于 EP 负载不均），将 `--gpu-memory-"
+"utilization` 设置得过高可能导致实际推理时出现 OOM（内存不足）问题。默认值为 `0.9`。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:146
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:145
 msgid ""
 "`--enable-expert-parallel` indicates that EP is enabled. Note that vLLM "
 "does not support a mixed approach of ETP and EP; that is, MoE can either "
 "use pure EP or pure TP."
-msgstr "`--enable-expert-parallel` 表示启用了 EP。请注意，vLLM 不支持 ETP 和 EP 的混合方法；也就是说，MoE 要么使用纯 EP，要么使用纯 TP。"
+msgstr ""
+"`--enable-expert-parallel` 表示启用了 EP。请注意，vLLM 不支持 ETP 和 EP 的混合方法；也就是说，MoE "
+"要么使用纯 EP，要么使用纯 TP。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:147
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:146
 msgid ""
 "`--no-enable-prefix-caching` indicates that prefix caching is disabled. "
 "To enable it, for mamba-like models Qwen3.5, set `--enable-prefix-"
@@ -259,15 +292,19 @@ msgid ""
 "implementation of hybrid kv cache might result in a very large block_size"
 " when scheduling. For example, the block_size may be adjusted to 2048, "
 "which means that any prefix shorter than 2048 will never be cached."
-msgstr "`--no-enable-prefix-caching` 表示前缀缓存被禁用。要启用它，对于类似 Mamba 的模型 Qwen3.5，请设置 `--enable-prefix-caching` 和 `--mamba-cache-mode align`。请注意，当前混合 kv cache 的实现可能在调度时导致非常大的 block_size。例如，block_size 可能被调整为 2048，这意味着任何短于 2048 的前缀将永远不会被缓存。"
+msgstr ""
+"`--no-enable-prefix-caching` 表示前缀缓存被禁用。要启用它，对于类似 Mamba 的模型 Qwen3.5，请设置 "
+"`--enable-prefix-caching` 和 `--mamba-cache-mode align`。请注意，当前混合 kv cache "
+"的实现可能在调度时导致非常大的 block_size。例如，block_size 可能被调整为 2048，这意味着任何短于 2048 "
+"的前缀将永远不会被缓存。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:148
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:147
 msgid ""
 "`--quantization` \"ascend\" indicates that quantization is used. To "
 "disable quantization, remove this option."
 msgstr "`--quantization` \"ascend\" 表示使用了量化。要禁用量化，请移除此选项。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:149
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:148
 msgid ""
 "`--compilation-config` contains configurations related to the aclgraph "
 "graph mode. The most significant configurations are \"cudagraph_mode\" "
@@ -276,9 +313,13 @@ msgid ""
 "\"PIECEWISE\" and \"FULL_DECODE_ONLY\" are supported. The graph mode is "
 "mainly used to reduce the cost of operator dispatch. Currently, "
 "\"FULL_DECODE_ONLY\" is recommended."
-msgstr "`--compilation-config` 包含与 aclgraph 图模式相关的配置。最重要的配置是 \"cudagraph_mode\" 和 \"cudagraph_capture_sizes\"，其含义如下：\"cudagraph_mode\"：表示特定的图模式。目前支持 \"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 \"FULL_DECODE_ONLY\"。"
+msgstr ""
+"`--compilation-config` 包含与 aclgraph 图模式相关的配置。最重要的配置是 \"cudagraph_mode\" 和"
+" \"cudagraph_capture_sizes\"，其含义如下：\"cudagraph_mode\"：表示特定的图模式。目前支持 "
+"\"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 "
+"\"FULL_DECODE_ONLY\"。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:151
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:150
 msgid ""
 "\"cudagraph_capture_sizes\": represents different levels of graph modes. "
 "The default value is [1, 2, 4, 8, 16, 24, 32, 40,..., `--max-num-seqs`]. "
@@ -286,164 +327,124 @@ msgid ""
 " inputs between levels are automatically padded to the next level. "
 "Currently, the default setting is recommended. Only in some scenarios is "
 "it necessary to set this separately to achieve optimal performance."
-msgstr "\"cudagraph_capture_sizes\"：表示不同级别的图模式。默认值为 [1, 2, 4, 8, 16, 24, 32, 40,..., `--max-num-seqs`]。在图模式下，不同级别图的输入是固定的，级别之间的输入会自动填充到下一个级别。目前推荐使用默认设置。只有在某些场景下，才需要单独设置此参数以达到最佳性能。"
+msgstr ""
+"\"cudagraph_capture_sizes\"：表示不同级别的图模式。默认值为 [1, 2, 4, 8, 16, 24, 32, "
+"40,..., `--max-num-"
+"seqs`]。在图模式下，不同级别图的输入是固定的，级别之间的输入会自动填充到下一个级别。目前推荐使用默认设置。只有在某些场景下，才需要单独设置此参数以达到最佳性能。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:153
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:152
 msgid "Multi-node Deployment with MP (Recommended)"
 msgstr "使用 MP 的多节点部署（推荐）"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:155
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:154
 msgid ""
 "Assume you have 2 Atlas 800 A2 nodes, and want to deploy the `Qwen3.5"
-"-397B-A17B` model across multiple nodes."
-msgstr "假设您有 2 个 Atlas 800 A2 节点，并希望跨多个节点部署 `Qwen3.5-397B-A17B` 模型。"
+"-397B-A17B-w8a8-mtp` model across multiple nodes."
+msgstr "假设您有 2 个 Atlas 800 A2 节点，并希望跨多个节点部署 `Qwen3.5-397B-A17B-w8a8-mtp` 模型。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:157
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:156
 msgid "Node 0"
 msgstr "节点 0"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:203
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:202
 msgid "Node1"
 msgstr "节点 1"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:253
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:252
 msgid ""
 "If the service starts successfully, the following information will be "
 "displayed on node 0:"
 msgstr "如果服务启动成功，节点 0 上将显示以下信息："

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:264
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:263
 msgid "Multi-node Deployment with Ray"
 msgstr "使用 Ray 的多节点部署"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:266
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:265
 msgid "refer to [Ray Distributed (Qwen/Qwen3-235B-A22B)](../features/ray.md)."
 msgstr "请参考 [Ray 分布式 (Qwen/Qwen3-235B-A22B)](../features/ray.md)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:268
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:267
 msgid "Prefill-Decode Disaggregation"
 msgstr "预填充-解码解耦"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:270
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:269
 msgid ""
 "We recommend using Mooncake for deployment: "
 "[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)."
 msgstr "我们推荐使用 Mooncake 进行部署：[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:272
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:271
 msgid ""
 "Take Atlas 800 A3 (64G × 16) for example, we recommend to deploy 1P1D (3 "
 "nodes) to run Qwen3.5-397B-A17B."
 msgstr "以 Atlas 800 A3 (64G × 16) 为例，我们建议部署 1P1D（3 个节点）来运行 Qwen3.5-397B-A17B。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:274
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:273
 msgid "`Qwen3.5-397B-A17B-w8a8-mtp 1P1D` require 3 Atlas 800 A3 (64G × 16)."
 msgstr "`Qwen3.5-397B-A17B-w8a8-mtp 1P1D` 需要 3 个 Atlas 800 A3 (64G × 16)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:276
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:275
 msgid ""
 "To run the vllm-ascend `Prefill-Decode Disaggregation` service, you need "
 "to deploy `run_p.sh` 、`run_d0.sh` and `run_d1.sh` script on each node and"
 " deploy a `proxy.sh` script on prefill master node to forward requests."
 msgstr "要运行 vllm-ascend `Prefill-Decode Disaggregation` 服务，您需要在每个节点上部署 `run_p.sh`、`run_d0.sh` 和 `run_d1.sh` 脚本，并在预填充主节点上部署一个 `proxy.sh` 脚本来转发请求。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:278
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:277
 msgid "Prefill Node 0 `run_p.sh` script"
 msgstr "预填充节点 0 `run_p.sh` 脚本"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:353
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:352
 msgid "Decode Node 0 `run_d0.sh` script"
 msgstr "解码节点 0 `run_d0.sh` 脚本"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:433
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:432
 msgid "Decode Node 1 `run_d1.sh` script"
 msgstr "解码节点 1 `run_d1.sh` 脚本"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:512
-msgid "**Notice:** The parameters are explained as follows:"
-msgstr "**注意：** 参数说明如下："
-
-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:515
-msgid ""
-"`--async-scheduling`: enables the asynchronous scheduling function. When "
-"Multi-Token Prediction (MTP) is enabled, asynchronous scheduling of "
-"operator delivery can be implemented to overlap the operator delivery "
-"latency."
-msgstr ""
-"`--async-scheduling`：启用异步调度功能。当启用多令牌预测（MTP）时，可以实现算子交付的异步调度，以重叠算子交付延迟。"
-
-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:516
-msgid ""
-"`cudagraph_capture_sizes`: The recommended value is `n x (mtp + 1)`. And "
-"the min is `n = 1` and the max is `n = max-num-seqs`. For other values, "
-"it is recommended to set them to the number of frequently occurring "
-"requests on the Decode (D) node."
-msgstr ""
-"`cudagraph_capture_sizes`：推荐值为 `n x (mtp + 1)`。最小值为 `n = 1`，最大值为 `n = max-num-seqs`。对于其他值，建议设置为解码（D）节点上频繁出现的请求数量。"
-
-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:517
-msgid ""
-"`recompute_scheduler_enable: true`: enables the recomputation scheduler. "
-"When the Key-Value Cache (KV Cache) of the decode node is insufficient, "
-"requests will be sent to the prefill node to recompute the KV Cache. In "
-"the PD separation scenario, it is recommended to enable this "
-"configuration on both prefill and decode nodes simultaneously."
-msgstr ""
-"`recompute_scheduler_enable: true`：启用重计算调度器。当解码节点的键值缓存（KV Cache）不足时，请求将被发送到预填充节点以重新计算 KV Cache。在 PD 分离场景下，建议同时在预填充节点和解码节点上启用此配置。"
-
-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:518
-msgid ""
-"`no-enable-prefix-caching`: The prefix-cache feature is enabled by "
-"default. You can use the `--no-enable-prefix-caching` parameter to "
-"disable this feature. Notice: for Prefill-Decode disaggregation feature, "
-"known issue on D node: [#7944](https://github.com/vllm-project/vllm-"
-"ascend/issues/7944)"
-msgstr ""
-"`no-enable-prefix-caching`：前缀缓存功能默认启用。您可以使用 `--no-enable-prefix-caching` 参数禁用此功能。注意：对于预填充-解码分离功能，D 节点上的已知问题：[#7944](https://github.com/vllm-project/vllm-ascend/issues/7944)"
-
-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:520
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:519
 msgid "Run the `proxy.sh` script on the prefill master node"
 msgstr "在预填充主节点上运行 `proxy.sh` 脚本"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:522
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:521
 msgid ""
 "Run a proxy server on the same node with the prefiller service instance. "
 "You can get the proxy program in the repository's examples: "
 "[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-"
 "project/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
-msgstr ""
-"在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序：[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
+msgstr "在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序：[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:548
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:547
 msgid "Functional Verification"
 msgstr "功能验证"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:550
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:549
 msgid "Once your server is started, you can query the model with input prompts:"
 msgstr "服务器启动后，您可以使用输入提示词查询模型："

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:563
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:562
 msgid "Accuracy Evaluation"
 msgstr "精度评估"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:565
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:564
 msgid "Here are two accuracy evaluation methods."
 msgstr "以下是两种精度评估方法。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:567
-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:579
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:566
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:578
 msgid "Using AISBench"
 msgstr "使用 AISBench"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:569
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:568
 msgid ""
 "Refer to [Using "
 "AISBench](../../developer_guide/evaluation/using_ais_bench.md) for "
 "details."
 msgstr "详情请参阅[使用 AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:571
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:570
 msgid ""
 "After execution, you can get the result, here is the result of `Qwen3.5"
 "-397B-A17B-w8a8` in `vllm-ascend:v0.17.0rc1` for reference only."
@@ -489,53 +490,53 @@ msgstr "生成"
 msgid "96.74"
 msgstr "96.74"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:577
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:576
 msgid "Performance"
 msgstr "性能"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:581
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:580
 msgid ""
 "Refer to [Using AISBench for performance "
 "evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
 "performance-evaluation) for details."
 msgstr "详情请参阅[使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:583
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:582
 msgid "Using vLLM Benchmark"
 msgstr "使用 vLLM Benchmark"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:585
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:584
 msgid "Run performance evaluation of `Qwen3.5-397B-A17B-w8a8` as an example."
 msgstr "以运行 `Qwen3.5-397B-A17B-w8a8` 的性能评估为例。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:587
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:586
 msgid ""
 "Refer to [vllm "
 "benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) "
 "for more details."
 msgstr "更多详情请参阅 [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:589
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:588
 msgid "There are three `vllm bench` subcommands:"
 msgstr "`vllm bench` 有三个子命令："

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:591
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:590
 msgid "`latency`: Benchmark the latency of a single batch of requests."
 msgstr "`latency`：对单批请求的延迟进行基准测试。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:592
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:591
 msgid "`serve`: Benchmark the online serving throughput."
 msgstr "`serve`：对在线服务吞吐量进行基准测试。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:593
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:592
 msgid "`throughput`: Benchmark offline inference throughput."
 msgstr "`throughput`：对离线推理吞吐量进行基准测试。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:595
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:594
 msgid "Take the `serve` as an example. Run the code as follows."
 msgstr "以 `serve` 为例。运行代码如下。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:602
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:601
 msgid ""
 "After about several minutes, you can get the performance evaluation "
 "result."