[v0.18.0][Doc] Translated Doc files 2026-04-15 (#8309)
## Auto-Translation Summary Translated **19** file(s): - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/contributors.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/KV_Cache_Pool_Guide.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/ModelRunner_prepare_inputs.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/cpu_binding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Kimi-K2.5.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen2.5-Omni.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Dense.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/Fine_grained_TP.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/epd_disaggregation.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/external_dp.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/large_scale_ep.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/release_notes.po</code> --- [Workflow run](https://github.com/vllm-project/vllm-ascend/actions/runs/24447109402) Signed-off-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com> Co-authored-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
This commit is contained in:
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -20,8 +20,8 @@ msgstr ""
|
||||
"Generated-By: Babel 2.18.0\n"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:1
|
||||
msgid "Fine-Grained Tensor Parallelism (Finegrained TP)"
|
||||
msgstr "细粒度张量并行 (Finegrained TP)"
|
||||
msgid "Fine-Grained Tensor Parallelism (Fine-grained TP)"
|
||||
msgstr "细粒度张量并行 (Fine-grained TP)"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:3
|
||||
msgid "Overview"
|
||||
@@ -37,7 +37,10 @@ msgid ""
|
||||
"model head (lm_head), attention output projection (o_proj), and MLP "
|
||||
"blocks—via the `finegrained_tp_config` parameter."
|
||||
msgstr ""
|
||||
"细粒度张量并行 (Fine-grained TP) 扩展了标准张量并行,允许为**不同的模型组件设置独立的张量并行规模**。与对所有层应用单一的全局 `tensor_parallel_size` 不同,细粒度 TP 允许用户通过 `finegrained_tp_config` 参数为关键模块(如嵌入层、语言模型头部 (lm_head)、注意力输出投影层 (o_proj) 和 MLP 块)配置独立的 TP 规模。"
|
||||
"细粒度张量并行 (Fine-grained TP) "
|
||||
"扩展了标准张量并行,允许为**不同的模型组件设置独立的张量并行规模**。与对所有层应用单一的全局 `tensor_parallel_size` "
|
||||
"不同,细粒度 TP 允许用户通过 `finegrained_tp_config` 参数为关键模块(如嵌入层、语言模型头部 "
|
||||
"(lm_head)、注意力输出投影层 (o_proj) 和 MLP 块)配置独立的 TP 规模。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:7
|
||||
msgid ""
|
||||
@@ -47,10 +50,11 @@ msgid ""
|
||||
"compatible with standard dense transformer architectures and integrates "
|
||||
"seamlessly into vLLM’s serving pipeline."
|
||||
msgstr ""
|
||||
"此功能支持在单个模型内使用异构并行策略,从而能更精细地控制跨设备的权重分布、内存布局和通信模式。该特性与标准的密集 Transformer 架构兼容,并能无缝集成到 vLLM 的服务流水线中。"
|
||||
"此功能支持在单个模型内使用异构并行策略,从而能更精细地控制跨设备的权重分布、内存布局和通信模式。该特性与标准的密集 Transformer "
|
||||
"架构兼容,并能无缝集成到 vLLM 的服务流水线中。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:11
|
||||
msgid "Benefits of Finegrained TP"
|
||||
msgid "Benefits of Fine-grained TP"
|
||||
msgstr "细粒度 TP 的优势"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:13
|
||||
@@ -62,11 +66,12 @@ msgstr "细粒度张量并行通过有针对性的权重分片带来两个主要
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:15
|
||||
msgid ""
|
||||
"**Reduced Per-Device Memory Footprint**: Fine-grained TP shards large "
|
||||
"weight matrices(e.g., LM Head, o_proj)across devices, lowering peak "
|
||||
"weight matrices (e.g., LM Head, o_proj) across devices, lowering peak "
|
||||
"memory usage and enabling larger batches or deployment on memory-limited "
|
||||
"hardware—without quantization."
|
||||
msgstr ""
|
||||
"**降低单设备内存占用**: 细粒度 TP 将大型权重矩阵(例如 LM Head、o_proj)分片到多个设备上,降低了峰值内存使用量,从而支持更大的批次或在内存受限的硬件上进行部署——无需量化。"
|
||||
"**降低单设备内存占用**: 细粒度 TP 将大型权重矩阵(例如 LM "
|
||||
"Head、o_proj)分片到多个设备上,降低了峰值内存使用量,从而支持更大的批次或在内存受限的硬件上进行部署——无需量化。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:18
|
||||
msgid ""
|
||||
@@ -76,7 +81,9 @@ msgid ""
|
||||
"efficiency—especially for latency-sensitive layers like LM Head and "
|
||||
"o_proj."
|
||||
msgstr ""
|
||||
"**加速 GEMM 中的内存访问**: 在解码密集型工作负载中,GEMM 性能通常受内存带宽限制。权重分片减少了每个设备需要获取的权重数据量,从而降低了 DRAM 流量并提高了带宽效率——对于 LM Head 和 o_proj 等延迟敏感层尤其如此。"
|
||||
"**加速 GEMM 中的内存访问**: 在解码密集型工作负载中,GEMM "
|
||||
"性能通常受内存带宽限制。权重分片减少了每个设备需要获取的权重数据量,从而降低了 DRAM 流量并提高了带宽效率——对于 LM Head 和 "
|
||||
"o_proj 等延迟敏感层尤其如此。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:21
|
||||
msgid ""
|
||||
@@ -99,7 +106,9 @@ msgid ""
|
||||
"Fine-grained TP is **model-agnostic** and supports all standard dense "
|
||||
"transformer architectures, including Llama, Qwen, DeepSeek (base/dense "
|
||||
"variants), and others."
|
||||
msgstr "细粒度 TP 是**模型无关的**,支持所有标准的密集 Transformer 架构,包括 Llama、Qwen、DeepSeek(基础/密集变体)等。"
|
||||
msgstr ""
|
||||
"细粒度 TP 是**模型无关的**,支持所有标准的密集 Transformer 架构,包括 "
|
||||
"Llama、Qwen、DeepSeek(基础/密集变体)等。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:31
|
||||
msgid "Component & Execution Mode Support"
|
||||
@@ -161,7 +170,9 @@ msgstr "⚠️ 注意:"
|
||||
msgid ""
|
||||
"`o_proj` TP is only supported in Graph mode during Decode, because "
|
||||
"dummy_run in eager mode will not trigger o_proj."
|
||||
msgstr "`o_proj` TP 仅在 Decode 阶段的 Graph 模式下受支持,因为 eager 模式下的 dummy_run 不会触发 o_proj。"
|
||||
msgstr ""
|
||||
"`o_proj` TP 仅在 Decode 阶段的 Graph 模式下受支持,因为 eager 模式下的 dummy_run 不会触发 "
|
||||
"o_proj。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:43
|
||||
msgid ""
|
||||
@@ -194,7 +205,7 @@ msgid ""
|
||||
msgstr "⚠️ 违反这些约束将导致运行时错误或未定义行为。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:56
|
||||
msgid "How to Use Finegrained TP"
|
||||
msgid "How to Use Fine-grained TP"
|
||||
msgstr "如何使用细粒度 TP"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md:58
|
||||
@@ -222,7 +233,9 @@ msgid ""
|
||||
"decode instances in an environment of 32 cards Ascend 910B*64G (A2), with"
|
||||
" parallel configuration as DP32+EP32, and fine-grained TP size of 8; the "
|
||||
"performance data is as follows."
|
||||
msgstr "为评估细粒度 TP 在大规模服务场景中的有效性,我们使用模型 **DeepSeek-R1-W8A8**,在 32 卡 Ascend 910B*64G (A2) 环境中部署 PD 分离的解码实例,并行配置为 DP32+EP32,细粒度 TP 规模为 8;性能数据如下。"
|
||||
msgstr ""
|
||||
"为评估细粒度 TP 在大规模服务场景中的有效性,我们使用模型 **DeepSeek-R1-W8A8**,在 32 卡 Ascend "
|
||||
"910B*64G (A2) 环境中部署 PD 分离的解码实例,并行配置为 DP32+EP32,细粒度 TP 规模为 8;性能数据如下。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/Fine_grained_TP.md
|
||||
msgid "Module"
|
||||
@@ -304,4 +317,6 @@ msgid ""
|
||||
"PD separation, where models are typically deployed in all-DP mode. In "
|
||||
"this setup, sharding weight-heavy layers reduces redundant storage and "
|
||||
"memory pressure."
|
||||
msgstr "细粒度 TP 在 PD 分离的**解码实例**中**最有效**,因为模型通常以全 DP 模式部署。在此设置中,对权重密集的层进行分片可以减少冗余存储和内存压力。"
|
||||
msgstr ""
|
||||
"细粒度 TP 在 PD 分离的**解码实例**中**最有效**,因为模型通常以全 DP "
|
||||
"模式部署。在此设置中,对权重密集的层进行分片可以减少冗余存储和内存压力。"
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -34,7 +34,8 @@ msgid ""
|
||||
"Deploying these two stages in independent vLLM instances brings three "
|
||||
"practical benefits:"
|
||||
msgstr ""
|
||||
"**解耦编码器** 将多模态大语言模型的视觉编码器阶段运行在与预填充/解码器阶段分离的进程中。将这两个阶段部署在独立的 vLLM 实例中,带来三个实际好处:"
|
||||
"**解耦编码器** 将多模态大语言模型的视觉编码器阶段运行在与预填充/解码器阶段分离的进程中。将这两个阶段部署在独立的 vLLM "
|
||||
"实例中,带来三个实际好处:"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/epd_disaggregation.md:7
|
||||
msgid "**Independent, fine-grained scaling**"
|
||||
@@ -89,8 +90,8 @@ msgid ""
|
||||
"Design doc: <https://docs.google.com/document/d"
|
||||
"/1aed8KtC6XkXtdoV87pWT0a8OJlZ-CpnuLLzmR8l9BAE>"
|
||||
msgstr ""
|
||||
"设计文档:<https://docs.google.com/document/d"
|
||||
"/1aed8KtC6XkXtdoV87pWT0a8OJlZ-CpnuLLzmR8l9BAE>"
|
||||
"设计文档:<https://docs.google.com/document/d/1aed8KtC6XkXtdoV87pWT0a8OJlZ-"
|
||||
"CpnuLLzmR8l9BAE>"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/epd_disaggregation.md:27
|
||||
msgid "Usage"
|
||||
@@ -107,16 +108,16 @@ msgid ""
|
||||
"1 Encoder instance + 1 PD instance: "
|
||||
"`examples/online_serving/disaggregated_encoder/disagg_1e1pd/`"
|
||||
msgstr ""
|
||||
"1 个编码器实例 + 1 个 PD 实例:"
|
||||
"`examples/online_serving/disaggregated_encoder/disagg_1e1pd/`"
|
||||
"1 个编码器实例 + 1 个 PD "
|
||||
"实例:`examples/online_serving/disaggregated_encoder/disagg_1e1pd/`"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/epd_disaggregation.md:35
|
||||
msgid ""
|
||||
"1 Encoder instance + 1 Prefill instance + 1 Decode instance: "
|
||||
"`examples/online_serving/disaggregated_encoder/disagg_1e1p1d/`"
|
||||
msgstr ""
|
||||
"1 个编码器实例 + 1 个预填充实例 + 1 个解码实例:"
|
||||
"`examples/online_serving/disaggregated_encoder/disagg_1e1p1d/`"
|
||||
"1 个编码器实例 + 1 个预填充实例 + 1 "
|
||||
"个解码实例:`examples/online_serving/disaggregated_encoder/disagg_1e1p1d/`"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/epd_disaggregation.md:40
|
||||
msgid "Development"
|
||||
@@ -154,7 +155,8 @@ msgid ""
|
||||
"instance to the PD instance. All related code is under "
|
||||
"`vllm/distributed/ec_transfer`."
|
||||
msgstr ""
|
||||
"一个连接器将编码器缓存 (EC) 嵌入向量从编码器实例传输到 PD 实例。所有相关代码位于 `vllm/distributed/ec_transfer` 目录下。"
|
||||
"一个连接器将编码器缓存 (EC) 嵌入向量从编码器实例传输到 PD 实例。所有相关代码位于 "
|
||||
"`vllm/distributed/ec_transfer` 目录下。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/epd_disaggregation.md:53
|
||||
msgid "Key abstractions"
|
||||
@@ -175,7 +177,7 @@ msgid "*Worker role* – loads the embeddings into memory."
|
||||
msgstr "*工作进程角色* – 将嵌入向量加载到内存中。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/epd_disaggregation.md:59
|
||||
msgid "**EPD Load Balance Proxy** -"
|
||||
msgid "**EPD Load Balancing Proxy** -"
|
||||
msgstr "**EPD 负载均衡代理** -"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/epd_disaggregation.md:60
|
||||
@@ -200,12 +202,14 @@ msgid ""
|
||||
" to facilitate the kv transfer between P and D. For step-by-step "
|
||||
"deployment and configuration of Mooncake, refer to the following guide:"
|
||||
" "
|
||||
"[https://docs.vllm.ai/projects/ascend/en/latest/tutorials/pd_disaggregation_mooncake_multi_node.html](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"
|
||||
"[https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"
|
||||
msgstr ""
|
||||
"我们使用来自 `vllm_ascend/distributed/kv_transfer/kv_p2p/mooncake_layerwise_connector.py` 的 **MooncakeLayerwiseConnector** 创建示例设置,并参考 "
|
||||
"`examples/disaggregated_prefill_v1/load_balance_proxy_layerwise_server_example.py` 来促进 P 和 D 之间的 KV 传输。关于 Mooncake 的逐步部署和配置,请参考以下指南:"
|
||||
" "
|
||||
"[https://docs.vllm.ai/projects/ascend/en/latest/tutorials/pd_disaggregation_mooncake_multi_node.html](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"
|
||||
"我们使用来自 "
|
||||
"`vllm_ascend/distributed/kv_transfer/kv_p2p/mooncake_layerwise_connector.py`"
|
||||
" 的 **MooncakeLayerwiseConnector** 创建示例设置,并参考 "
|
||||
"`examples/disaggregated_prefill_v1/load_balance_proxy_layerwise_server_example.py`"
|
||||
" 来促进 P 和 D 之间的 KV 传输。关于 Mooncake 的逐步部署和配置,请参考以下指南: "
|
||||
"[https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/features/pd_disaggregation_mooncake_multi_node.html)"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/epd_disaggregation.md:66
|
||||
msgid ""
|
||||
@@ -218,7 +222,10 @@ msgid ""
|
||||
"`docs/source/developer_guide/Design_Documents/disaggregated_prefill.md` "
|
||||
"shows the brief idea about the disaggregated prefill."
|
||||
msgstr ""
|
||||
"对于 PD 解耦部分,当使用 MooncakeLayerwiseConnector 时:请求首先进入解码器实例,解码器通过元服务器反向触发一个远程预填充任务。然后预填充节点执行推理,并将 KV 缓存逐层推送到解码器,实现计算与传输的重叠。一旦传输完成,解码器无缝地继续后续的令牌生成。`docs/source/developer_guide/Design_Documents/disaggregated_prefill.md` 展示了关于解耦预填充的简要思路。"
|
||||
"对于 PD 解耦部分,当使用 MooncakeLayerwiseConnector "
|
||||
"时:请求首先进入解码器实例,解码器通过元服务器反向触发一个远程预填充任务。然后预填充节点执行推理,并将 KV "
|
||||
"缓存逐层推送到解码器,实现计算与传输的重叠。一旦传输完成,解码器无缝地继续后续的令牌生成。`docs/source/developer_guide/Design_Documents/disaggregated_prefill.md`"
|
||||
" 展示了关于解耦预填充的简要思路。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/epd_disaggregation.md:69
|
||||
msgid "Limitations"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -35,10 +35,12 @@ msgid ""
|
||||
"vLLM deployment, with its own endpoint, and have an external router "
|
||||
"balance HTTP requests between them, making use of appropriate real-time "
|
||||
"telemetry from each server for routing decisions."
|
||||
msgstr "在这种情况下,将每个数据并行等级视为一个独立的 vLLM 部署(拥有自己的端点),并使用一个外部路由器在它们之间平衡 HTTP 请求,同时利用来自每个服务器的适当实时遥测数据来做出路由决策,会更加方便。"
|
||||
msgstr ""
|
||||
"在这种情况下,将每个数据并行等级视为一个独立的 vLLM 部署(拥有自己的端点),并使用一个外部路由器在它们之间平衡 HTTP "
|
||||
"请求,同时利用来自每个服务器的适当实时遥测数据来做出路由决策,会更加方便。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/external_dp.md:7
|
||||
msgid "Getting Start"
|
||||
msgid "Getting Started"
|
||||
msgstr "开始使用"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/external_dp.md:9
|
||||
@@ -47,7 +49,9 @@ msgid ""
|
||||
"DP](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment/?h=external"
|
||||
"#external-load-balancing) is already natively supported by vLLM. In vllm-"
|
||||
"ascend we provide two enhanced functionalities:"
|
||||
msgstr "[外部数据并行](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment/?h=external#external-load-balancing) 功能已由 vLLM 原生支持。在 vllm-ascend 中,我们提供了两项增强功能:"
|
||||
msgstr ""
|
||||
"[外部数据并行](https://docs.vllm.ai/en/latest/serving/data_parallel_deployment/?h=external"
|
||||
"#external-load-balancing) 功能已由 vLLM 原生支持。在 vllm-ascend 中,我们提供了两项增强功能:"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/external_dp.md:11
|
||||
msgid ""
|
||||
@@ -85,7 +89,9 @@ msgid ""
|
||||
"parallel. These can be mock servers or actual vLLM servers. Note that "
|
||||
"this proxy also works with only one vLLM server running, but will fall "
|
||||
"back to direct request forwarding which is meaningless."
|
||||
msgstr "首先,您需要至少运行两个处于数据并行模式的 vLLM 服务器。这些可以是模拟服务器或实际的 vLLM 服务器。请注意,此代理在仅运行一个 vLLM 服务器时也能工作,但会退化为直接请求转发,这没有意义。"
|
||||
msgstr ""
|
||||
"首先,您需要至少运行两个处于数据并行模式的 vLLM 服务器。这些可以是模拟服务器或实际的 vLLM 服务器。请注意,此代理在仅运行一个 vLLM"
|
||||
" 服务器时也能工作,但会退化为直接请求转发,这没有意义。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/external_dp.md:29
|
||||
msgid ""
|
||||
@@ -93,7 +99,9 @@ msgid ""
|
||||
"launch script in `examples/external_online_dp`. For scenarios of large DP"
|
||||
" size across multiple nodes, we recommend using our launch script for "
|
||||
"convenience."
|
||||
msgstr "您可以手动逐个启动外部 vLLM 数据并行服务器,也可以使用 `examples/external_online_dp` 中的启动脚本。对于跨多个节点的大规模数据并行场景,我们建议使用我们的启动脚本以方便操作。"
|
||||
msgstr ""
|
||||
"您可以手动逐个启动外部 vLLM 数据并行服务器,也可以使用 `examples/external_online_dp` "
|
||||
"中的启动脚本。对于跨多个节点的大规模数据并行场景,我们建议使用我们的启动脚本以方便操作。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/external_dp.md:31
|
||||
msgid "Manually Launch"
|
||||
@@ -112,7 +120,12 @@ msgid ""
|
||||
" instances in one command on each node. It will internally call "
|
||||
"`examples/external_online_dp/run_dp_template.sh` for each DP rank with "
|
||||
"proper DP-related parameters."
|
||||
msgstr "首先,您需要根据您的 vLLM 配置修改 `examples/external_online_dp/run_dp_template.sh`。然后,您可以使用 `examples/external_online_dp/launch_online_dp.py` 在每个节点上通过一条命令启动多个 vLLM 实例。它将在内部为每个数据并行等级调用 `examples/external_online_dp/run_dp_template.sh`,并传入适当的数据并行相关参数。"
|
||||
msgstr ""
|
||||
"首先,您需要根据您的 vLLM 配置修改 "
|
||||
"`examples/external_online_dp/run_dp_template.sh`。然后,您可以使用 "
|
||||
"`examples/external_online_dp/launch_online_dp.py` 在每个节点上通过一条命令启动多个 vLLM "
|
||||
"实例。它将在内部为每个数据并行等级调用 "
|
||||
"`examples/external_online_dp/run_dp_template.sh`,并传入适当的数据并行相关参数。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/external_dp.md:43
|
||||
msgid "An example of running external DP in one single node:"
|
||||
@@ -131,7 +144,9 @@ msgid ""
|
||||
"After all vLLM DP instances are launched, you can now launch the load-"
|
||||
"balance proxy server, which serves as an entrypoint for coming requests "
|
||||
"and load-balances them between vLLM DP instances."
|
||||
msgstr "所有 vLLM 数据并行实例启动后,您现在可以启动负载均衡代理服务器。该服务器作为传入请求的入口点,并在各个 vLLM 数据并行实例之间进行负载均衡。"
|
||||
msgstr ""
|
||||
"所有 vLLM 数据并行实例启动后,您现在可以启动负载均衡代理服务器。该服务器作为传入请求的入口点,并在各个 vLLM "
|
||||
"数据并行实例之间进行负载均衡。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/external_dp.md:70
|
||||
msgid "The proxy server has the following features:"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -24,7 +24,7 @@ msgid "Distributed DP Server With Large-Scale Expert Parallelism"
|
||||
msgstr "分布式数据并行服务器与大规模专家并行"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/large_scale_ep.md:3
|
||||
msgid "Getting Start"
|
||||
msgid "Getting Started"
|
||||
msgstr "快速开始"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/large_scale_ep.md:5
|
||||
@@ -42,7 +42,11 @@ msgid ""
|
||||
"independently, while the decoder nodes use the 192.0.0.5 node as the "
|
||||
"master node."
|
||||
msgstr ""
|
||||
"vLLM-Ascend 现已支持在大规模**专家并行(EP)**场景下的预填充-解码(PD)解耦。为获得更好的性能,vLLM-Ascend 中应用了分布式数据并行服务器。在 PD 分离场景下,可以根据 PD 节点的不同特性实施不同的优化策略,从而实现更灵活的模型部署。以 DeepSeek 模型为例,使用 8 台 Atlas 800T A3 服务器部署模型。假设服务器 IP 从 192.0.0.1 开始到 192.0.0.8 结束。使用前 4 台服务器作为预填充节点,后 4 台服务器作为解码节点。并且预填充节点独立部署为主节点,而解码节点使用 192.0.0.5 节点作为主节点。"
|
||||
"vLLM-Ascend 现已支持在大规模**专家并行(EP)**场景下的预填充-解码(PD)解耦。为获得更好的性能,vLLM-Ascend "
|
||||
"中应用了分布式数据并行服务器。在 PD 分离场景下,可以根据 PD 节点的不同特性实施不同的优化策略,从而实现更灵活的模型部署。以 "
|
||||
"DeepSeek 模型为例,使用 8 台 Atlas 800T A3 服务器部署模型。假设服务器 IP 从 192.0.0.1 开始到 "
|
||||
"192.0.0.8 结束。使用前 4 台服务器作为预填充节点,后 4 台服务器作为解码节点。并且预填充节点独立部署为主节点,而解码节点使用 "
|
||||
"192.0.0.5 节点作为主节点。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/large_scale_ep.md:8
|
||||
msgid "Verify Multi-Node Communication Environment"
|
||||
@@ -65,7 +69,8 @@ msgid ""
|
||||
"the Atlas A3 generation, both intra-node and inter-node connectivity are "
|
||||
"via HCCS."
|
||||
msgstr ""
|
||||
"所有 NPU 必须互连。对于 Atlas A2 代,节点内连接通过 HCCS,节点间连接通过 RDMA。对于 Atlas A3 代,节点内和节点间连接均通过 HCCS。"
|
||||
"所有 NPU 必须互连。对于 Atlas A2 代,节点内连接通过 HCCS,节点间连接通过 RDMA。对于 Atlas A3 "
|
||||
"代,节点内和节点间连接均通过 HCCS。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/large_scale_ep.md:15
|
||||
msgid "Verification Process"
|
||||
@@ -145,7 +150,9 @@ msgid ""
|
||||
"master node independently, while the decoder nodes use the 192.0.0.5 node"
|
||||
" as the master node. This leads to differences in 'dp_size_local' and "
|
||||
"'dp_rank_start'"
|
||||
msgstr "请注意,预填充节点和解码节点可能具有不同的配置。在此示例中,每个预填充节点独立部署为主节点,而解码节点使用 192.0.0.5 节点作为主节点。这导致了 'dp_size_local' 和 'dp_rank_start' 的差异。"
|
||||
msgstr ""
|
||||
"请注意,预填充节点和解码节点可能具有不同的配置。在此示例中,每个预填充节点独立部署为主节点,而解码节点使用 192.0.0.5 "
|
||||
"节点作为主节点。这导致了 'dp_size_local' 和 'dp_rank_start' 的差异。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/large_scale_ep.md:319
|
||||
msgid "Example proxy for Distributed DP Server"
|
||||
@@ -251,7 +258,10 @@ msgid ""
|
||||
"[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-"
|
||||
"project/vllm-"
|
||||
"ascend/blob/v0.9.1-dev/examples/disaggregate_prefill_v1/load_balance_proxy_server_example.py)"
|
||||
msgstr "您可以在仓库的示例中找到代理程序,[load_balance_proxy_server_example.py](https://github.com/vllm-project/vllm-ascend/blob/v0.9.1-dev/examples/disaggregate_prefill_v1/load_balance_proxy_server_example.py)"
|
||||
msgstr ""
|
||||
"您可以在仓库的示例中找到代理程序,[load_balance_proxy_server_example.py](https://github.com"
|
||||
"/vllm-project/vllm-"
|
||||
"ascend/blob/v0.9.1-dev/examples/disaggregate_prefill_v1/load_balance_proxy_server_example.py)"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/large_scale_ep.md:366
|
||||
msgid "Benchmark"
|
||||
@@ -262,7 +272,9 @@ msgid ""
|
||||
"We recommend using aisbench tool to assess performance. "
|
||||
"[aisbench](https://gitee.com/aisbench/benchmark). Execute the following "
|
||||
"commands to install aisbench"
|
||||
msgstr "我们推荐使用 aisbench 工具评估性能。[aisbench](https://gitee.com/aisbench/benchmark)。执行以下命令安装 aisbench"
|
||||
msgstr ""
|
||||
"我们推荐使用 aisbench "
|
||||
"工具评估性能。[aisbench](https://gitee.com/aisbench/benchmark)。执行以下命令安装 aisbench"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/large_scale_ep.md:376
|
||||
msgid ""
|
||||
@@ -281,7 +293,9 @@ msgid ""
|
||||
"You can change the configuration in the directory "
|
||||
":`benchmark/ais_bench/benchmark/configs/models/vllm_api` Take "
|
||||
"`vllm_api_stream_chat.py` as an example:"
|
||||
msgstr "您可以在目录:`benchmark/ais_bench/benchmark/configs/models/vllm_api` 中更改配置。以 `vllm_api_stream_chat.py` 为例:"
|
||||
msgstr ""
|
||||
"您可以在目录:`benchmark/ais_bench/benchmark/configs/models/vllm_api` 中更改配置。以 "
|
||||
"`vllm_api_stream_chat.py` 为例:"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/large_scale_ep.md:411
|
||||
msgid ""
|
||||
@@ -293,7 +307,9 @@ msgstr "以 gsm8k 数据集为例,执行以下命令评估性能。"
|
||||
msgid ""
|
||||
"For more details on commands and parameters for aisbench, refer to "
|
||||
"[aisbench](https://gitee.com/aisbench/benchmark)"
|
||||
msgstr "有关 aisbench 命令和参数的更多详细信息,请参考 [aisbench](https://gitee.com/aisbench/benchmark)"
|
||||
msgstr ""
|
||||
"有关 aisbench 命令和参数的更多详细信息,请参考 "
|
||||
"[aisbench](https://gitee.com/aisbench/benchmark)"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/large_scale_ep.md:419
|
||||
msgid "Prefill & Decode Configuration Details"
|
||||
@@ -368,7 +384,9 @@ msgid ""
|
||||
"is 7K. In this scenario, we give a recommended configuration for "
|
||||
"distributed DP server with high EP. Here we use 4 nodes for prefill and 4"
|
||||
" nodes for decode."
|
||||
msgstr "例如,如果平均输入长度为 3.5k,输出长度为 1.1k,上下文长度为 16k,输入数据集的最大长度为 7K。在此场景下,我们为具有高 EP 的分布式数据并行服务器提供了一个推荐配置。这里我们使用 4 个节点进行预填充,4 个节点进行解码。"
|
||||
msgstr ""
|
||||
"例如,如果平均输入长度为 3.5k,输出长度为 1.1k,上下文长度为 16k,输入数据集的最大长度为 7K。在此场景下,我们为具有高 EP "
|
||||
"的分布式数据并行服务器提供了一个推荐配置。这里我们使用 4 个节点进行预填充,4 个节点进行解码。"
|
||||
|
||||
#: ../../source/user_guide/feature_guide/large_scale_ep.md:282
|
||||
msgid "node"
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user