[v0.18.0][Doc] Translated Doc files 2026-04-15 (#8309)

## Auto-Translation Summary Translated **19** file(s): - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/contributors.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/KV_Cache_Pool_Guide.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/ModelRunner_prepare_inputs.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/cpu_binding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Kimi-K2.5.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen2.5-Omni.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Dense.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/Fine_grained_TP.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/epd_disaggregation.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/external_dp.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/large_scale_ep.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/release_notes.po</code> --- [Workflow run](https://github.com/vllm-project/vllm-ascend/actions/runs/24447109402) Signed-off-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com> Co-authored-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
2026-04-17 16:29:30 +08:00
parent ceb1e49661
commit 9c1d58f4d2
19 changed files with 2586 additions and 1581 deletions
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/Fine_grained_TP.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/Fine_grained_TP.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -20,8 +20,8 @@ msgstr ""
 "Generated-By: Babel 2.18.0\n"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:1
-msgid "Fine-Grained Tensor Parallelism (Finegrained TP)"
-msgstr "细粒度张量并行 (Finegrained TP)"
+msgid "Fine-Grained Tensor Parallelism (Fine-grained TP)"
+msgstr "细粒度张量并行 (Fine-grained TP)"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:3
 msgid "Overview"
@@ -37,7 +37,10 @@ msgid ""
 "model head (lm_head), attention output projection (o_proj), and MLP "
 "blocks—via the `finegrained_tp_config` parameter."
 msgstr ""
-"细粒度张量并行 (Fine-grained TP) 扩展了标准张量并行，允许为**不同的模型组件设置独立的张量并行规模**。与对所有层应用单一的全局 `tensor_parallel_size` 不同，细粒度 TP 允许用户通过 `finegrained_tp_config` 参数为关键模块（如嵌入层、语言模型头部 (lm_head)、注意力输出投影层 (o_proj) 和 MLP 块）配置独立的 TP 规模。"
+"细粒度张量并行 (Fine-grained TP) "
+"扩展了标准张量并行，允许为**不同的模型组件设置独立的张量并行规模**。与对所有层应用单一的全局 `tensor_parallel_size` "
+"不同，细粒度 TP 允许用户通过 `finegrained_tp_config` 参数为关键模块（如嵌入层、语言模型头部 "
+"(lm_head)、注意力输出投影层 (o_proj) 和 MLP 块）配置独立的 TP 规模。"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:7
 msgid ""
@@ -47,10 +50,11 @@ msgid ""
 "compatible with standard dense transformer architectures and integrates "
 "seamlessly into vLLM’s serving pipeline."
 msgstr ""
-"此功能支持在单个模型内使用异构并行策略，从而能更精细地控制跨设备的权重分布、内存布局和通信模式。该特性与标准的密集 Transformer 架构兼容，并能无缝集成到 vLLM 的服务流水线中。"
+"此功能支持在单个模型内使用异构并行策略，从而能更精细地控制跨设备的权重分布、内存布局和通信模式。该特性与标准的密集 Transformer "
+"架构兼容，并能无缝集成到 vLLM 的服务流水线中。"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:11
-msgid "Benefits of Finegrained TP"
+msgid "Benefits of Fine-grained TP"
 msgstr "细粒度 TP 的优势"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:13
@@ -62,11 +66,12 @@ msgstr "细粒度张量并行通过有针对性的权重分片带来两个主要
 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:15
 msgid ""
 "**Reduced Per-Device Memory Footprint**:   Fine-grained TP shards large "
-"weight matrices(e.g., LM Head, o_proj)across devices, lowering peak "
+"weight matrices (e.g., LM Head, o_proj) across devices, lowering peak "
 "memory usage and enabling larger batches or deployment on memory-limited "
 "hardware—without quantization."
 msgstr ""
-"**降低单设备内存占用**：   细粒度 TP 将大型权重矩阵（例如 LM Head、o_proj）分片到多个设备上，降低了峰值内存使用量，从而支持更大的批次或在内存受限的硬件上进行部署——无需量化。"
+"**降低单设备内存占用**：   细粒度 TP 将大型权重矩阵（例如 LM "
+"Head、o_proj）分片到多个设备上，降低了峰值内存使用量，从而支持更大的批次或在内存受限的硬件上进行部署——无需量化。"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:18
 msgid ""
@@ -76,7 +81,9 @@ msgid ""
 "efficiency—especially for latency-sensitive layers like LM Head and "
 "o_proj."
 msgstr ""
-"**加速 GEMM 中的内存访问**：   在解码密集型工作负载中，GEMM 性能通常受内存带宽限制。权重分片减少了每个设备需要获取的权重数据量，从而降低了 DRAM 流量并提高了带宽效率——对于 LM Head 和 o_proj 等延迟敏感层尤其如此。"
+"**加速 GEMM 中的内存访问**：   在解码密集型工作负载中，GEMM "
+"性能通常受内存带宽限制。权重分片减少了每个设备需要获取的权重数据量，从而降低了 DRAM 流量并提高了带宽效率——对于 LM Head 和 "
+"o_proj 等延迟敏感层尤其如此。"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:21
 msgid ""
@@ -99,7 +106,9 @@ msgid ""
 "Fine-grained TP is **model-agnostic** and supports all standard dense "
 "transformer architectures, including Llama, Qwen, DeepSeek (base/dense "
 "variants), and others."
-msgstr "细粒度 TP 是**模型无关的**，支持所有标准的密集 Transformer 架构，包括 Llama、Qwen、DeepSeek（基础/密集变体）等。"
+msgstr ""
+"细粒度 TP 是**模型无关的**，支持所有标准的密集 Transformer 架构，包括 "
+"Llama、Qwen、DeepSeek（基础/密集变体）等。"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:31
 msgid "Component & Execution Mode Support"
@@ -161,7 +170,9 @@ msgstr "⚠️ 注意："
 msgid ""
 "`o_proj` TP is only supported in Graph mode during Decode, because "
 "dummy_run in eager mode will not trigger o_proj."
-msgstr "`o_proj` TP 仅在 Decode 阶段的 Graph 模式下受支持，因为 eager 模式下的 dummy_run 不会触发 o_proj。"
+msgstr ""
+"`o_proj` TP 仅在 Decode 阶段的 Graph 模式下受支持，因为 eager 模式下的 dummy_run 不会触发 "
+"o_proj。"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:43
 msgid ""
@@ -194,7 +205,7 @@ msgid ""
 msgstr "⚠️ 违反这些约束将导致运行时错误或未定义行为。"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:56
-msgid "How to Use Finegrained TP"
+msgid "How to Use Fine-grained TP"
 msgstr "如何使用细粒度 TP"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md:58
@@ -222,7 +233,9 @@ msgid ""
 "decode instances in an environment of 32 cards Ascend 910B*64G (A2), with"
 " parallel configuration as DP32+EP32, and fine-grained TP size of 8; the "
 "performance data is as follows."
-msgstr "为评估细粒度 TP 在大规模服务场景中的有效性，我们使用模型 **DeepSeek-R1-W8A8**，在 32 卡 Ascend 910B*64G (A2) 环境中部署 PD 分离的解码实例，并行配置为 DP32+EP32，细粒度 TP 规模为 8；性能数据如下。"
+msgstr ""
+"为评估细粒度 TP 在大规模服务场景中的有效性，我们使用模型 **DeepSeek-R1-W8A8**，在 32 卡 Ascend "
+"910B*64G (A2) 环境中部署 PD 分离的解码实例，并行配置为 DP32+EP32，细粒度 TP 规模为 8；性能数据如下。"

 #: ../../source/user_guide/feature_guide/Fine_grained_TP.md
 msgid "Module"
@@ -304,4 +317,6 @@ msgid ""
 "PD separation, where models are typically deployed in all-DP mode. In "
 "this setup, sharding weight-heavy layers reduces redundant storage and "
 "memory pressure."
-msgstr "细粒度 TP 在 PD 分离的**解码实例**中**最有效**，因为模型通常以全 DP 模式部署。在此设置中，对权重密集的层进行分片可以减少冗余存储和内存压力。"
+msgstr ""
+"细粒度 TP 在 PD 分离的**解码实例**中**最有效**，因为模型通常以全 DP "
+"模式部署。在此设置中，对权重密集的层进行分片可以减少冗余存储和内存压力。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/epd_disaggregation.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/epd_disaggregation.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -34,7 +34,8 @@ msgid ""
 "Deploying these two stages in independent vLLM instances brings three "
 "practical benefits:"
 msgstr ""
-"**解耦编码器** 将多模态大语言模型的视觉编码器阶段运行在与预填充/解码器阶段分离的进程中。将这两个阶段部署在独立的 vLLM 实例中，带来三个实际好处："
+"**解耦编码器** 将多模态大语言模型的视觉编码器阶段运行在与预填充/解码器阶段分离的进程中。将这两个阶段部署在独立的 vLLM "
+"实例中，带来三个实际好处："

 #: ../../source/user_guide/feature_guide/epd_disaggregation.md:7
 msgid "**Independent, fine-grained scaling**"
@@ -89,8 +90,8 @@ msgid ""
 "Design doc: <https://docs.google.com/document/d"
 "/1aed8KtC6XkXtdoV87pWT0a8OJlZ-CpnuLLzmR8l9BAE>"
 msgstr ""
-"设计文档：<https://docs.google.com/document/d"
-"/1aed8KtC6XkXtdoV87pWT0a8OJlZ-CpnuLLzmR8l9BAE>"
+"设计文档：<https://docs.google.com/document/d/1aed8KtC6XkXtdoV87pWT0a8OJlZ-"
+"CpnuLLzmR8l9BAE>"

 #: ../../source/user_guide/feature_guide/epd_disaggregation.md:27
 msgid "Usage"
@@ -107,16 +108,16 @@ msgid ""
 "1 Encoder instance + 1 PD instance: "
 "`examples/online_serving/disaggregated_encoder/disagg_1e1pd/`"
 msgstr ""
-"1 个编码器实例 + 1 个 PD 实例："
-"`examples/online_serving/disaggregated_encoder/disagg_1e1pd/`"
+"1 个编码器实例 + 1 个 PD "
+"实例：`examples/online_serving/disaggregated_encoder/disagg_1e1pd/`"

 #: ../../source/user_guide/feature_guide/epd_disaggregation.md:35
 msgid ""
 "1 Encoder instance + 1 Prefill instance + 1 Decode instance: "
 "`examples/online_serving/disaggregated_encoder/disagg_1e1p1d/`"
 msgstr ""
-"1 个编码器实例 + 1 个预填充实例 + 1 个解码实例："
-"`examples/online_serving/disaggregated_encoder/disagg_1e1p1d/`"
+"1 个编码器实例 + 1 个预填充实例 + 1 "
+"个解码实例：`examples/online_serving/disaggregated_encoder/disagg_1e1p1d/`"

 #: ../../source/user_guide/feature_guide/epd_disaggregation.md:40
 msgid "Development"
@@ -154,7 +155,8 @@ msgid ""
 "instance to the PD instance.   All related code is under "
 "`vllm/distributed/ec_transfer`."
 msgstr ""
-"一个连接器将编码器缓存 (EC) 嵌入向量从编码器实例传输到 PD 实例。所有相关代码位于 `vllm/distributed/ec_transfer` 目录下。"
+"一个连接器将编码器缓存 (EC) 嵌入向量从编码器实例传输到 PD 实例。所有相关代码位于 "
+"`vllm/distributed/ec_transfer` 目录下。"

 #: ../../source/user_guide/feature_guide/epd_disaggregation.md:53
 msgid "Key abstractions"
@@ -175,7 +177,7 @@ msgid "*Worker role* – loads the embeddings into memory."
 msgstr "*工作进程角色* – 将嵌入向量加载到内存中。"

 #: ../../source/user_guide/feature_guide/epd_disaggregation.md:59
-msgid "**EPD Load Balance Proxy** -"
+msgid "**EPD Load Balancing Proxy** -"
 msgstr "**EPD 负载均衡代理** -"

 #: ../../source/user_guide/feature_guide/epd_disaggregation.md:60
@@ -200,12 +202,14 @@ msgid ""
 " to facilitate the kv transfer between P and D. For step-by-step "
 "deployment and configuration of Mooncake, refer to the following guide:"
 "   "
-"[https://docs.vllm.ai/projects/ascend/en/latest/tutorials/pd_disaggregation_mooncake_multi_node.html](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"
+"[https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"
 msgstr ""
-"我们使用来自 `vllm_ascend/distributed/kv_transfer/kv_p2p/mooncake_layerwise_connector.py` 的 **MooncakeLayerwiseConnector** 创建示例设置，并参考 "
-"`examples/disaggregated_prefill_v1/load_balance_proxy_layerwise_server_example.py` 来促进 P 和 D 之间的 KV 传输。关于 Mooncake 的逐步部署和配置，请参考以下指南："
-"   "
-"[https://docs.vllm.ai/projects/ascend/en/latest/tutorials/pd_disaggregation_mooncake_multi_node.html](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"
+"我们使用来自 "
+"`vllm_ascend/distributed/kv_transfer/kv_p2p/mooncake_layerwise_connector.py`"
+" 的 **MooncakeLayerwiseConnector** 创建示例设置，并参考 "
+"`examples/disaggregated_prefill_v1/load_balance_proxy_layerwise_server_example.py`"
+" 来促进 P 和 D 之间的 KV 传输。关于 Mooncake 的逐步部署和配置，请参考以下指南：   "
+"[https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"

 #: ../../source/user_guide/feature_guide/epd_disaggregation.md:66
 msgid ""
@@ -218,7 +222,10 @@ msgid ""
 "`docs/source/developer_guide/Design_Documents/disaggregated_prefill.md` "
 "shows the brief idea about the disaggregated prefill."
 msgstr ""
-"对于 PD 解耦部分，当使用 MooncakeLayerwiseConnector 时：请求首先进入解码器实例，解码器通过元服务器反向触发一个远程预填充任务。然后预填充节点执行推理，并将 KV 缓存逐层推送到解码器，实现计算与传输的重叠。一旦传输完成，解码器无缝地继续后续的令牌生成。`docs/source/developer_guide/Design_Documents/disaggregated_prefill.md` 展示了关于解耦预填充的简要思路。"
+"对于 PD 解耦部分，当使用 MooncakeLayerwiseConnector "
+"时：请求首先进入解码器实例，解码器通过元服务器反向触发一个远程预填充任务。然后预填充节点执行推理，并将 KV "
+"缓存逐层推送到解码器，实现计算与传输的重叠。一旦传输完成，解码器无缝地继续后续的令牌生成。`docs/source/developer_guide/Design_Documents/disaggregated_prefill.md`"
+" 展示了关于解耦预填充的简要思路。"

 #: ../../source/user_guide/feature_guide/epd_disaggregation.md:69
 msgid "Limitations"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/external_dp.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/external_dp.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -35,10 +35,12 @@ msgid ""
 "vLLM deployment, with its own endpoint, and have an external router "
 "balance HTTP requests between them, making use of appropriate real-time "
 "telemetry from each server for routing decisions."
-msgstr "在这种情况下，将每个数据并行等级视为一个独立的 vLLM 部署（拥有自己的端点），并使用一个外部路由器在它们之间平衡 HTTP 请求，同时利用来自每个服务器的适当实时遥测数据来做出路由决策，会更加方便。"
+msgstr ""
+"在这种情况下，将每个数据并行等级视为一个独立的 vLLM 部署（拥有自己的端点），并使用一个外部路由器在它们之间平衡 HTTP "
+"请求，同时利用来自每个服务器的适当实时遥测数据来做出路由决策，会更加方便。"

 #: ../../source/user_guide/feature_guide/external_dp.md:7
-msgid "Getting Start"
+msgid "Getting Started"
 msgstr "开始使用"

 #: ../../source/user_guide/feature_guide/external_dp.md:9
@@ -47,7 +49,9 @@ msgid ""
 "DP](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment/?h=external"
 "#external-load-balancing) is already natively supported by vLLM. In vllm-"
 "ascend we provide two enhanced functionalities:"
-msgstr "[外部数据并行](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment/?h=external#external-load-balancing) 功能已由 vLLM 原生支持。在 vllm-ascend 中，我们提供了两项增强功能："
+msgstr ""
+"[外部数据并行](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment/?h=external"
+"#external-load-balancing) 功能已由 vLLM 原生支持。在 vllm-ascend 中，我们提供了两项增强功能："

 #: ../../source/user_guide/feature_guide/external_dp.md:11
 msgid ""
@@ -85,7 +89,9 @@ msgid ""
 "parallel. These can be mock servers or actual vLLM servers. Note that "
 "this proxy also works with only one vLLM server running, but will fall "
 "back to direct request forwarding which is meaningless."
-msgstr "首先，您需要至少运行两个处于数据并行模式的 vLLM 服务器。这些可以是模拟服务器或实际的 vLLM 服务器。请注意，此代理在仅运行一个 vLLM 服务器时也能工作，但会退化为直接请求转发，这没有意义。"
+msgstr ""
+"首先，您需要至少运行两个处于数据并行模式的 vLLM 服务器。这些可以是模拟服务器或实际的 vLLM 服务器。请注意，此代理在仅运行一个 vLLM"
+" 服务器时也能工作，但会退化为直接请求转发，这没有意义。"

 #: ../../source/user_guide/feature_guide/external_dp.md:29
 msgid ""
@@ -93,7 +99,9 @@ msgid ""
 "launch script in `examples/external_online_dp`. For scenarios of large DP"
 " size across multiple nodes, we recommend using our launch script for "
 "convenience."
-msgstr "您可以手动逐个启动外部 vLLM 数据并行服务器，也可以使用 `examples/external_online_dp` 中的启动脚本。对于跨多个节点的大规模数据并行场景，我们建议使用我们的启动脚本以方便操作。"
+msgstr ""
+"您可以手动逐个启动外部 vLLM 数据并行服务器，也可以使用 `examples/external_online_dp` "
+"中的启动脚本。对于跨多个节点的大规模数据并行场景，我们建议使用我们的启动脚本以方便操作。"

 #: ../../source/user_guide/feature_guide/external_dp.md:31
 msgid "Manually Launch"
@@ -112,7 +120,12 @@ msgid ""
 " instances in one command on each node. It will internally call "
 "`examples/external_online_dp/run_dp_template.sh` for each DP rank with "
 "proper DP-related parameters."
-msgstr "首先，您需要根据您的 vLLM 配置修改 `examples/external_online_dp/run_dp_template.sh`。然后，您可以使用 `examples/external_online_dp/launch_online_dp.py` 在每个节点上通过一条命令启动多个 vLLM 实例。它将在内部为每个数据并行等级调用 `examples/external_online_dp/run_dp_template.sh`，并传入适当的数据并行相关参数。"
+msgstr ""
+"首先，您需要根据您的 vLLM 配置修改 "
+"`examples/external_online_dp/run_dp_template.sh`。然后，您可以使用 "
+"`examples/external_online_dp/launch_online_dp.py` 在每个节点上通过一条命令启动多个 vLLM "
+"实例。它将在内部为每个数据并行等级调用 "
+"`examples/external_online_dp/run_dp_template.sh`，并传入适当的数据并行相关参数。"

 #: ../../source/user_guide/feature_guide/external_dp.md:43
 msgid "An example of running external DP in one single node:"
@@ -131,7 +144,9 @@ msgid ""
 "After all vLLM DP instances are launched, you can now launch the load-"
 "balance proxy server, which serves as an entrypoint for coming requests "
 "and load-balances them between vLLM DP instances."
-msgstr "所有 vLLM 数据并行实例启动后，您现在可以启动负载均衡代理服务器。该服务器作为传入请求的入口点，并在各个 vLLM 数据并行实例之间进行负载均衡。"
+msgstr ""
+"所有 vLLM 数据并行实例启动后，您现在可以启动负载均衡代理服务器。该服务器作为传入请求的入口点，并在各个 vLLM "
+"数据并行实例之间进行负载均衡。"

 #: ../../source/user_guide/feature_guide/external_dp.md:70
 msgid "The proxy server has the following features:"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/large_scale_ep.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/large_scale_ep.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-15 09:41+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -24,7 +24,7 @@ msgid "Distributed DP Server With Large-Scale Expert Parallelism"
 msgstr "分布式数据并行服务器与大规模专家并行"

 #: ../../source/user_guide/feature_guide/large_scale_ep.md:3
-msgid "Getting Start"
+msgid "Getting Started"
 msgstr "快速开始"

 #: ../../source/user_guide/feature_guide/large_scale_ep.md:5
@@ -42,7 +42,11 @@ msgid ""
 "independently, while the decoder nodes use the 192.0.0.5 node as the "
 "master node."
 msgstr ""
-"vLLM-Ascend 现已支持在大规模**专家并行（EP）**场景下的预填充-解码（PD）解耦。为获得更好的性能，vLLM-Ascend 中应用了分布式数据并行服务器。在 PD 分离场景下，可以根据 PD 节点的不同特性实施不同的优化策略，从而实现更灵活的模型部署。以 DeepSeek 模型为例，使用 8 台 Atlas 800T A3 服务器部署模型。假设服务器 IP 从 192.0.0.1 开始到 192.0.0.8 结束。使用前 4 台服务器作为预填充节点，后 4 台服务器作为解码节点。并且预填充节点独立部署为主节点，而解码节点使用 192.0.0.5 节点作为主节点。"
+"vLLM-Ascend 现已支持在大规模**专家并行（EP）**场景下的预填充-解码（PD）解耦。为获得更好的性能，vLLM-Ascend "
+"中应用了分布式数据并行服务器。在 PD 分离场景下，可以根据 PD 节点的不同特性实施不同的优化策略，从而实现更灵活的模型部署。以 "
+"DeepSeek 模型为例，使用 8 台 Atlas 800T A3 服务器部署模型。假设服务器 IP 从 192.0.0.1 开始到 "
+"192.0.0.8 结束。使用前 4 台服务器作为预填充节点，后 4 台服务器作为解码节点。并且预填充节点独立部署为主节点，而解码节点使用 "
+"192.0.0.5 节点作为主节点。"

 #: ../../source/user_guide/feature_guide/large_scale_ep.md:8
 msgid "Verify Multi-Node Communication Environment"
@@ -65,7 +69,8 @@ msgid ""
 "the Atlas A3 generation, both intra-node and inter-node connectivity are "
 "via HCCS."
 msgstr ""
-"所有 NPU 必须互连。对于 Atlas A2 代，节点内连接通过 HCCS，节点间连接通过 RDMA。对于 Atlas A3 代，节点内和节点间连接均通过 HCCS。"
+"所有 NPU 必须互连。对于 Atlas A2 代，节点内连接通过 HCCS，节点间连接通过 RDMA。对于 Atlas A3 "
+"代，节点内和节点间连接均通过 HCCS。"

 #: ../../source/user_guide/feature_guide/large_scale_ep.md:15
 msgid "Verification Process"
@@ -145,7 +150,9 @@ msgid ""
 "master node independently, while the decoder nodes use the 192.0.0.5 node"
 " as the master node. This leads to differences in 'dp_size_local' and "
 "'dp_rank_start'"
-msgstr "请注意，预填充节点和解码节点可能具有不同的配置。在此示例中，每个预填充节点独立部署为主节点，而解码节点使用 192.0.0.5 节点作为主节点。这导致了 'dp_size_local' 和 'dp_rank_start' 的差异。"
+msgstr ""
+"请注意，预填充节点和解码节点可能具有不同的配置。在此示例中，每个预填充节点独立部署为主节点，而解码节点使用 192.0.0.5 "
+"节点作为主节点。这导致了 'dp_size_local' 和 'dp_rank_start' 的差异。"

 #: ../../source/user_guide/feature_guide/large_scale_ep.md:319
 msgid "Example proxy for Distributed DP Server"
@@ -251,7 +258,10 @@ msgid ""
 "[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-"
 "project/vllm-"
 "ascend/blob/v0.9.1-dev/examples/disaggregate_prefill_v1/load_balance_proxy_server_example.py)"
-msgstr "您可以在仓库的示例中找到代理程序，[load_balance_proxy_server_example.py](https://github.com/vllm-project/vllm-ascend/blob/v0.9.1-dev/examples/disaggregate_prefill_v1/load_balance_proxy_server_example.py)"
+msgstr ""
+"您可以在仓库的示例中找到代理程序，[load_balance_proxy_server_example.py](https://github.com"
+"/vllm-project/vllm-"
+"ascend/blob/v0.9.1-dev/examples/disaggregate_prefill_v1/load_balance_proxy_server_example.py)"

 #: ../../source/user_guide/feature_guide/large_scale_ep.md:366
 msgid "Benchmark"
@@ -262,7 +272,9 @@ msgid ""
 "We recommend using aisbench tool to assess performance. "
 "[aisbench](https://gitee.com/aisbench/benchmark). Execute the following "
 "commands to install aisbench"
-msgstr "我们推荐使用 aisbench 工具评估性能。[aisbench](https://gitee.com/aisbench/benchmark)。执行以下命令安装 aisbench"
+msgstr ""
+"我们推荐使用 aisbench "
+"工具评估性能。[aisbench](https://gitee.com/aisbench/benchmark)。执行以下命令安装 aisbench"

 #: ../../source/user_guide/feature_guide/large_scale_ep.md:376
 msgid ""
@@ -281,7 +293,9 @@ msgid ""
 "You can change the configuration in the directory "
 ":`benchmark/ais_bench/benchmark/configs/models/vllm_api` Take "
 "`vllm_api_stream_chat.py` as an example:"
-msgstr "您可以在目录：`benchmark/ais_bench/benchmark/configs/models/vllm_api` 中更改配置。以 `vllm_api_stream_chat.py` 为例："
+msgstr ""
+"您可以在目录：`benchmark/ais_bench/benchmark/configs/models/vllm_api` 中更改配置。以 "
+"`vllm_api_stream_chat.py` 为例："

 #: ../../source/user_guide/feature_guide/large_scale_ep.md:411
 msgid ""
@@ -293,7 +307,9 @@ msgstr "以 gsm8k 数据集为例，执行以下命令评估性能。"
 msgid ""
 "For more details on commands and parameters for aisbench, refer to "
 "[aisbench](https://gitee.com/aisbench/benchmark)"
-msgstr "有关 aisbench 命令和参数的更多详细信息，请参考 [aisbench](https://gitee.com/aisbench/benchmark)"
+msgstr ""
+"有关 aisbench 命令和参数的更多详细信息，请参考 "
+"[aisbench](https://gitee.com/aisbench/benchmark)"

 #: ../../source/user_guide/feature_guide/large_scale_ep.md:419
 msgid "Prefill & Decode Configuration Details"
@@ -368,7 +384,9 @@ msgid ""
 "is 7K. In this scenario, we give a recommended configuration for "
 "distributed DP server with high EP. Here we use 4 nodes for prefill and 4"
 " nodes for decode."
-msgstr "例如，如果平均输入长度为 3.5k，输出长度为 1.1k，上下文长度为 16k，输入数据集的最大长度为 7K。在此场景下，我们为具有高 EP 的分布式数据并行服务器提供了一个推荐配置。这里我们使用 4 个节点进行预填充，4 个节点进行解码。"
+msgstr ""
+"例如，如果平均输入长度为 3.5k，输出长度为 1.1k，上下文长度为 16k，输入数据集的最大长度为 7K。在此场景下，我们为具有高 EP "
+"的分布式数据并行服务器提供了一个推荐配置。这里我们使用 4 个节点进行预填充，4 个节点进行解码。"

 #: ../../source/user_guide/feature_guide/large_scale_ep.md:282
 msgid "node"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/release_notes.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/release_notes.po