[v0.18.0][Doc] Translated Doc files 2026-04-15 (#8309)
## Auto-Translation Summary Translated **19** file(s): - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/contributors.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/KV_Cache_Pool_Guide.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/ModelRunner_prepare_inputs.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/cpu_binding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Kimi-K2.5.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen2.5-Omni.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Dense.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/Fine_grained_TP.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/epd_disaggregation.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/external_dp.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/large_scale_ep.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/release_notes.po</code> --- [Workflow run](https://github.com/vllm-project/vllm-ascend/actions/runs/24447109402) Signed-off-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com> Co-authored-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
This commit is contained in:
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -46,32 +46,42 @@ msgid ""
|
||||
"including HBM, DRAM, and SSD, making a pool for KV Cache storage while "
|
||||
"making the prefix of requests visible across all nodes, increasing the "
|
||||
"cache hit rate for all requests."
|
||||
msgstr "因此,我们提出了 KV 缓存池,旨在利用包括 HBM、DRAM 和 SSD 在内的多种存储类型,构建一个 KV 缓存存储池,同时使请求的前缀在所有节点间可见,从而提高所有请求的缓存命中率。"
|
||||
msgstr ""
|
||||
"因此,我们提出了 KV 缓存池,旨在利用包括 HBM、DRAM 和 SSD 在内的多种存储类型,构建一个 KV "
|
||||
"缓存存储池,同时使请求的前缀在所有节点间可见,从而提高所有请求的缓存命中率。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:11
|
||||
msgid ""
|
||||
"vLLM Ascend currently supports [MooncakeStore](https://github.com"
|
||||
"/kvcache-ai/Mooncake), one of the most recognized KV Cache storage "
|
||||
"engines."
|
||||
msgstr "vLLM Ascend 目前支持 [MooncakeStore](https://github.com/kvcache-ai/Mooncake),这是最受认可的 KV 缓存存储引擎之一。"
|
||||
msgstr ""
|
||||
"vLLM Ascend 目前支持 [MooncakeStore](https://github.com/kvcache-"
|
||||
"ai/Mooncake),这是最受认可的 KV 缓存存储引擎之一。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:13
|
||||
msgid ""
|
||||
"While one can utilize Mooncake Store in vLLM V1 engine by setting it as a"
|
||||
" remote backend of LMCache with GPU (see "
|
||||
"While one can utilize MooncakeStore in vLLM V1 engine by setting it as a "
|
||||
"remote backend of LMCache with GPU (see "
|
||||
"[Tutorial](https://github.com/LMCache/LMCache/blob/dev/examples/kv_cache_reuse/remote_backends/mooncakestore/README.md)),"
|
||||
" we find it would be better to integrate a connector that directly "
|
||||
"supports Mooncake Store and can utilize the data transfer strategy that "
|
||||
"supports MooncakeStore and can utilize the data transfer strategy that "
|
||||
"best fits Huawei NPU hardware."
|
||||
msgstr "虽然可以通过将 Mooncake Store 设置为 GPU 上 LMCache 的远程后端来在 vLLM V1 引擎中使用它(参见[教程](https://github.com/LMCache/LMCache/blob/dev/examples/kv_cache_reuse/remote_backends/mooncakestore/README.md)),但我们认为集成一个直接支持 Mooncake Store 并能利用最适合华为 NPU 硬件的数据传输策略的连接器会更好。"
|
||||
msgstr ""
|
||||
"虽然可以通过将 MooncakeStore 设置为 GPU 上 LMCache 的远程后端来在 vLLM V1 "
|
||||
"引擎中使用它(参见[教程](https://github.com/LMCache/LMCache/blob/dev/examples/kv_cache_reuse/remote_backends/mooncakestore/README.md)),但我们认为集成一个直接支持"
|
||||
" MooncakeStore 并能利用最适合华为 NPU 硬件的数据传输策略的连接器会更好。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:15
|
||||
msgid ""
|
||||
"Hence, we propose to integrate Mooncake Store with a brand new "
|
||||
"Hence, we propose to integrate MooncakeStore with a brand new "
|
||||
"**MooncakeStoreConnectorV1**, which is indeed largely inspired by "
|
||||
"**LMCacheConnectorV1** (see the `How is MooncakeStoreConnectorV1 "
|
||||
"Implemented?` section)."
|
||||
msgstr "因此,我们提议将 Mooncake Store 与全新的 **MooncakeStoreConnectorV1** 集成,该连接器的设计在很大程度上受到了 **LMCacheConnectorV1** 的启发(参见 `MooncakeStoreConnectorV1 是如何实现的?` 部分)。"
|
||||
msgstr ""
|
||||
"因此,我们提议将 MooncakeStore 与全新的 **MooncakeStoreConnectorV1** "
|
||||
"集成,该连接器的设计在很大程度上受到了 **LMCacheConnectorV1** 的启发(参见 "
|
||||
"`MooncakeStoreConnectorV1 是如何实现的?` 部分)。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:17
|
||||
msgid "Usage"
|
||||
@@ -79,17 +89,21 @@ msgstr "使用方法"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:19
|
||||
msgid ""
|
||||
"vLLM Ascend currently supports Mooncake Store for KV Cache Pool. To "
|
||||
"enable Mooncake Store, one needs to configure `kv-transfer-config` and "
|
||||
"choose `MooncakeStoreConnector` as the KV Connector."
|
||||
msgstr "vLLM Ascend 目前支持使用 Mooncake Store 作为 KV 缓存池。要启用 Mooncake Store,需要配置 `kv-transfer-config` 并选择 `MooncakeStoreConnector` 作为 KV 连接器。"
|
||||
"vLLM Ascend currently supports MooncakeStore for KV Cache Pool. To enable"
|
||||
" MooncakeStore, one needs to configure `kv-transfer-config` and choose "
|
||||
"`MooncakeStoreConnector` as the KV Connector."
|
||||
msgstr ""
|
||||
"vLLM Ascend 目前支持使用 MooncakeStore 作为 KV 缓存池。要启用 MooncakeStore,需要配置 `kv-"
|
||||
"transfer-config` 并选择 `MooncakeStoreConnector` 作为 KV 连接器。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:21
|
||||
msgid ""
|
||||
"For step-by-step deployment and configuration, please refer to the [KV "
|
||||
"Pool User "
|
||||
"Guide](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/kv_pool.html)."
|
||||
msgstr "关于逐步部署和配置,请参考 [KV 池用户指南](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/kv_pool.html)。"
|
||||
msgstr ""
|
||||
"关于逐步部署和配置,请参考 [KV "
|
||||
"池用户指南](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/kv_pool.html)。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:23
|
||||
msgid "How it works?"
|
||||
@@ -114,7 +128,9 @@ msgid ""
|
||||
"efficient caching both locally (in HBM) and globally (via Mooncake), "
|
||||
"ensuring that frequently used prefixes remain hot while less frequently "
|
||||
"accessed KV data can spill over to lower-cost memory."
|
||||
msgstr "当与 vLLM 的前缀缓存机制结合时,该池能够实现本地(HBM 中)和全局(通过 Mooncake)的高效缓存,确保常用前缀保持热状态,而访问频率较低的 KV 数据则可以溢出到成本更低的内存中。"
|
||||
msgstr ""
|
||||
"当与 vLLM 的前缀缓存机制结合时,该池能够实现本地(HBM 中)和全局(通过 "
|
||||
"Mooncake)的高效缓存,确保常用前缀保持热状态,而访问频率较低的 KV 数据则可以溢出到成本更低的内存中。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:31
|
||||
msgid "1. Combining KV Cache Pool with HBM Prefix Caching"
|
||||
@@ -125,7 +141,9 @@ msgid ""
|
||||
"Prefix Caching with HBM is already supported by the vLLM V1 Engine. By "
|
||||
"introducing KV Connector V1, users can seamlessly combine HBM-based "
|
||||
"Prefix Caching with Mooncake-backed KV Pool."
|
||||
msgstr "vLLM V1 引擎已支持基于 HBM 的前缀缓存。通过引入 KV Connector V1,用户可以无缝地将基于 HBM 的前缀缓存与 Mooncake 支持的 KV 池结合起来。"
|
||||
msgstr ""
|
||||
"vLLM V1 引擎已支持基于 HBM 的前缀缓存。通过引入 KV Connector V1,用户可以无缝地将基于 HBM 的前缀缓存与 "
|
||||
"Mooncake 支持的 KV 池结合起来。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:36
|
||||
msgid ""
|
||||
@@ -133,7 +151,9 @@ msgid ""
|
||||
"which is enabled by default in vLLM V1 unless the "
|
||||
"`--no_enable_prefix_caching` flag is set, and setting up the KV Connector"
|
||||
" for KV Pool (e.g., the MooncakeStoreConnector)."
|
||||
msgstr "用户只需启用前缀缓存(在 vLLM V1 中默认启用,除非设置了 `--no_enable_prefix_caching` 标志)并为 KV 池设置 KV 连接器(例如 MooncakeStoreConnector),即可同时启用这两个功能。"
|
||||
msgstr ""
|
||||
"用户只需启用前缀缓存(在 vLLM V1 中默认启用,除非设置了 `--no_enable_prefix_caching` 标志)并为 KV "
|
||||
"池设置 KV 连接器(例如 MooncakeStoreConnector),即可同时启用这两个功能。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:38
|
||||
msgid "**Workflow**:"
|
||||
@@ -149,7 +169,9 @@ msgid ""
|
||||
" the connector. If there are additional hits in the KV Pool, we get the "
|
||||
"**additional blocks only** from the KV Pool, and get the rest of the "
|
||||
"blocks directly from HBM to minimize the data transfer latency."
|
||||
msgstr "获取 HBM 上的命中令牌数量后,引擎通过连接器查询 KV 池。如果在 KV 池中有额外的命中,我们**仅从 KV 池获取额外的块**,其余块则直接从 HBM 获取,以最小化数据传输延迟。"
|
||||
msgstr ""
|
||||
"获取 HBM 上的命中令牌数量后,引擎通过连接器查询 KV 池。如果在 KV 池中有额外的命中,我们**仅从 KV "
|
||||
"池获取额外的块**,其余块则直接从 HBM 获取,以最小化数据传输延迟。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:44
|
||||
msgid ""
|
||||
@@ -173,7 +195,9 @@ msgid ""
|
||||
"Currently, we only perform put and get operations of KV Pool for "
|
||||
"**Prefill Nodes**, and Decode Nodes get their KV Cache from Mooncake P2P "
|
||||
"KV Connector, i.e., MooncakeConnector."
|
||||
msgstr "目前,我们仅对**预填充节点**执行 KV 池的 put 和 get 操作,解码节点则通过 Mooncake P2P KV 连接器(即 MooncakeConnector)获取其 KV 缓存。"
|
||||
msgstr ""
|
||||
"目前,我们仅对**预填充节点**执行 KV 池的 put 和 get 操作,解码节点则通过 Mooncake P2P KV 连接器(即 "
|
||||
"MooncakeConnector)获取其 KV 缓存。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:52
|
||||
msgid ""
|
||||
@@ -182,15 +206,20 @@ msgid ""
|
||||
"Nodes, while not sacrificing the data transfer efficiency between Prefill"
|
||||
" and Decode nodes with P2P KV Connector that transfers KV Caches between "
|
||||
"NPU devices directly."
|
||||
msgstr "这样做的主要好处是,我们可以通过为预填充节点使用来自 HBM 和 KV 池的前缀缓存来减少计算量,从而保持性能增益,同时又不牺牲预填充节点与解码节点之间的数据传输效率,因为 P2P KV 连接器直接在 NPU 设备间传输 KV 缓存。"
|
||||
msgstr ""
|
||||
"这样做的主要好处是,我们可以通过为预填充节点使用来自 HBM 和 KV "
|
||||
"池的前缀缓存来减少计算量,从而保持性能增益,同时又不牺牲预填充节点与解码节点之间的数据传输效率,因为 P2P KV 连接器直接在 NPU "
|
||||
"设备间传输 KV 缓存。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:54
|
||||
msgid ""
|
||||
"To enable this feature, we need to set up both Mooncake Connector and "
|
||||
"Mooncake Store Connector with a Multi Connector, which is a KV Connector "
|
||||
"MooncakeStore Connector with a Multi Connector, which is a KV Connector "
|
||||
"class provided by vLLM that can call multiple KV Connectors in a specific"
|
||||
" order."
|
||||
msgstr "要启用此功能,我们需要使用 Multi Connector 来设置 Mooncake Connector 和 Mooncake Store Connector。Multi Connector 是 vLLM 提供的一个 KV 连接器类,可以按特定顺序调用多个 KV 连接器。"
|
||||
msgstr ""
|
||||
"要启用此功能,我们需要使用 Multi Connector 来设置 Mooncake Connector 和 MooncakeStore "
|
||||
"Connector。Multi Connector 是 vLLM 提供的一个 KV 连接器类,可以按特定顺序调用多个 KV 连接器。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:56
|
||||
msgid ""
|
||||
@@ -208,7 +237,9 @@ msgid ""
|
||||
"V1: through implementing the required methods defined in the KV connector"
|
||||
" V1 base class, one can integrate a third-party KV cache transfer/storage"
|
||||
" backend into the vLLM framework."
|
||||
msgstr "**MooncakeStoreConnectorV1** 继承自 vLLM V1 中的 KV Connector V1 类:通过实现 KV 连接器 V1 基类中定义的必要方法,可以将第三方 KV 缓存传输/存储后端集成到 vLLM 框架中。"
|
||||
msgstr ""
|
||||
"**MooncakeStoreConnectorV1** 继承自 vLLM V1 中的 KV Connector V1 类:通过实现 KV 连接器"
|
||||
" V1 基类中定义的必要方法,可以将第三方 KV 缓存传输/存储后端集成到 vLLM 框架中。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:62
|
||||
msgid ""
|
||||
@@ -220,7 +251,12 @@ msgid ""
|
||||
"that allows async `get` and `put` of KV caches with multi-threading, and "
|
||||
"NPU-related data transfer optimization such as removing the `LocalBuffer`"
|
||||
" in LMCache to remove redundant data transfer."
|
||||
msgstr "MooncakeStoreConnectorV1 也在很大程度上借鉴了 LMCacheConnectorV1,包括用于查找 KV 缓存键的 `Lookup Engine`/`Lookup Client` 设计,以及用于将令牌处理为前缀感知哈希的 `ChunkedTokenDatabase` 类和其他哈希相关设计。在此基础上,我们还添加了自己的设计,包括允许通过多线程异步 `get` 和 `put` KV 缓存的 `KVTransferThread`,以及与 NPU 相关的数据传输优化,例如移除 LMCache 中的 `LocalBuffer` 以消除冗余数据传输。"
|
||||
msgstr ""
|
||||
"MooncakeStoreConnectorV1 也在很大程度上借鉴了 LMCacheConnectorV1,包括用于查找 KV 缓存键的 "
|
||||
"`Lookup Engine`/`Lookup Client` 设计,以及用于将令牌处理为前缀感知哈希的 "
|
||||
"`ChunkedTokenDatabase` 类和其他哈希相关设计。在此基础上,我们还添加了自己的设计,包括允许通过多线程异步 `get` 和 "
|
||||
"`put` KV 缓存的 `KVTransferThread`,以及与 NPU 相关的数据传输优化,例如移除 LMCache 中的 "
|
||||
"`LocalBuffer` 以消除冗余数据传输。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:64
|
||||
msgid ""
|
||||
@@ -268,7 +304,8 @@ msgstr ""
|
||||
"`wait_for_layer_load`:可选;在分层 + 异步 KV 加载场景中等待层加载。\n"
|
||||
"`save_kv_layer`:可选;执行分层 KV 缓存放入 KV 池的操作。\n"
|
||||
"`wait_for_save`:如果异步保存/放入 KV 缓存,则等待 KV 保存完成。\n"
|
||||
"`get_finished`:获取已完成 KV 传输的请求,如果 `put` 完成则为 `done_sending`,如果 `get` 完成则为 `done_receiving`。"
|
||||
"`get_finished`:获取已完成 KV 传输的请求,如果 `put` 完成则为 `done_sending`,如果 `get` 完成则为 "
|
||||
"`done_receiving`。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:82
|
||||
msgid "DFX"
|
||||
@@ -293,9 +330,9 @@ msgstr "限制"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:89
|
||||
msgid ""
|
||||
"Currently, Mooncake Store for vLLM-Ascend only supports DRAM as the "
|
||||
"Currently, MooncakeStore for vLLM-Ascend only supports DRAM as the "
|
||||
"storage for KV Cache pool."
|
||||
msgstr "目前,vLLM-Ascend 的 Mooncake Store 仅支持 DRAM 作为 KV 缓存池的存储。"
|
||||
msgstr "目前,vLLM-Ascend 的 MooncakeStore 仅支持 DRAM 作为 KV 缓存池的存储。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:91
|
||||
msgid ""
|
||||
@@ -306,4 +343,6 @@ msgid ""
|
||||
"situation by falling back the request and re-compute everything assuming "
|
||||
"there's no prefix cache hit (or even better, revert only one block and "
|
||||
"keep using the Prefix Caches before that)."
|
||||
msgstr "目前,如果我们成功查找到一个键并发现它存在,但在调用 KV 池的 get 函数时失败,我们仅输出一条日志表明 get 操作失败并继续执行;因此,该特定请求的准确性可能会受到影响。我们将通过回退请求并假设没有前缀缓存命中来重新计算所有内容(或者更好的是,仅回退一个块并继续使用该块之前的前缀缓存)来处理这种情况。"
|
||||
msgstr ""
|
||||
"目前,如果我们成功查找到一个键并发现其存在,但在调用 KV 池的 get 函数时获取失败,我们仅输出一条日志表明 get "
|
||||
"操作失败并继续执行;因此,该特定请求的准确性可能会受到影响。我们将通过回退请求并假设没有前缀缓存命中来重新计算所有内容(或者更优的方案是,仅回退一个块并继续使用该块之前的前缀缓存)来处理这种情况。"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -88,13 +88,15 @@ msgid ""
|
||||
"At last, these `Token IDs` are required to be fed into a model, and "
|
||||
"`positions` should also be sent into the model to create `Rope` (Rotary "
|
||||
"positional embedding). Both of them are the inputs of the model."
|
||||
msgstr "最后,这些 `Token IDs` 需要输入到模型中,`positions` 也需要送入模型以创建 `Rope`(旋转位置编码)。两者共同构成模型的输入。"
|
||||
msgstr ""
|
||||
"最后,这些 `Token IDs` 需要输入到模型中,`positions` 也需要送入模型以创建 "
|
||||
"`Rope`(旋转位置编码)。两者共同构成模型的输入。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:38
|
||||
msgid ""
|
||||
"**Note**: The `Token IDs` are the inputs of a model, so we also call them"
|
||||
" `Inputs IDs`."
|
||||
msgstr "**注意**:`Token IDs` 是模型的输入,因此我们也称它们为 `Inputs IDs`。"
|
||||
" `Input IDs`."
|
||||
msgstr "**注意**:`Token IDs` 是模型的输入,因此我们也称它们为 `Input IDs`。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:40
|
||||
msgid "2. Build inputs attention metadata"
|
||||
@@ -185,14 +187,19 @@ msgid ""
|
||||
"len)`. Here, `max num request` is the maximum count of concurrent "
|
||||
"requests allowed in a forward batch and `max model len` is the maximum "
|
||||
"token count that can be handled at one request sequence in this model."
|
||||
msgstr "**Token IDs table**:存储每个请求的 token IDs(即模型的输入)。此表的形状为 `(max num request, max model len)`。其中,`max num request` 是前向批次中允许的最大并发请求数,`max model len` 是该模型中单个请求序列可以处理的最大 token 数量。"
|
||||
msgstr ""
|
||||
"**Token IDs table**:存储每个请求的 token IDs(即模型的输入)。此表的形状为 `(max num request, "
|
||||
"max model len)`。其中,`max num request` 是前向批次中允许的最大并发请求数,`max model len` "
|
||||
"是该模型中单个请求序列可以处理的最大 token 数量。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:62
|
||||
msgid ""
|
||||
"**Block table**: translates the logical address (within its sequence) of "
|
||||
"each block to its global physical address in the device's memory. The "
|
||||
"shape of this table is `(max num request, max model len / block size)`"
|
||||
msgstr "**Block table**:将每个块在其序列内的逻辑地址转换为其在设备内存中的全局物理地址。此表的形状为 `(max num request, max model len / block size)`"
|
||||
msgstr ""
|
||||
"**Block table**:将每个块在其序列内的逻辑地址转换为其在设备内存中的全局物理地址。此表的形状为 `(max num request,"
|
||||
" max model len / block size)`"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:64
|
||||
msgid ""
|
||||
@@ -255,13 +262,14 @@ msgid "Obtain inputs"
|
||||
msgstr "获取输入"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:103
|
||||
#, python-brace-format
|
||||
msgid ""
|
||||
"As the maximum number of tokens that can be scheduled is 10, the "
|
||||
"scheduled tokens of each request can be represented as `{'0': 3, '1': 2, "
|
||||
"'2': 5}`. Note that `request_2` uses chunked prefill, leaving 3 prompt "
|
||||
"tokens unscheduled."
|
||||
msgstr "由于一次可调度的最大 token 数为 10,每个请求的已调度 token 可以表示为 `{'0': 3, '1': 2, '2': 5}`。注意 `request_2` 使用了分块预填充,留下了 3 个提示 token 未调度。"
|
||||
msgstr ""
|
||||
"由于一次可调度的最大 token 数为 10,每个请求的已调度 token 可以表示为 `{'0': 3, '1': 2, '2': 5}`。注意"
|
||||
" `request_2` 使用了分块预填充,留下了 3 个提示 token 未调度。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:105
|
||||
msgid "1. Get token positions"
|
||||
@@ -273,7 +281,10 @@ msgid ""
|
||||
"assigned to **request_0**, tokens 3–4 to **request_1**, and tokens 5–9 to"
|
||||
" **request_2**. To represent this mapping, we use `request indices`, for "
|
||||
"example, `request indices`: `[0, 0, 0, 1, 1, 2, 2, 2, 2, 2]`."
|
||||
msgstr "首先,确定每个 token 属于哪个请求:token 0–2 分配给 **request_0**,token 3–4 分配给 **request_1**,token 5–9 分配给 **request_2**。为了表示这种映射,我们使用 `request indices`,例如,`request indices`:`[0, 0, 0, 1, 1, 2, 2, 2, 2, 2]`。"
|
||||
msgstr ""
|
||||
"首先,确定每个 token 属于哪个请求:token 0–2 分配给 **request_0**,token 3–4 分配给 "
|
||||
"**request_1**,token 5–9 分配给 **request_2**。为了表示这种映射,我们使用 `request "
|
||||
"indices`,例如,`request indices`:`[0, 0, 0, 1, 1, 2, 2, 2, 2, 2]`。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:109
|
||||
msgid ""
|
||||
@@ -281,7 +292,10 @@ msgid ""
|
||||
"position of current scheduled tokens** (`request_0: [0 + 0, 0 + 1, 0 + "
|
||||
"2]`, `request_1: [0 + 0, 0 + 1]`, `request_2: [0 + 0, 0 + 1,..., 0 + 4]`)"
|
||||
" and then concatenate them together (`[0, 1, 2, 0, 1, 0, 1, 2, 3, 4]`)."
|
||||
msgstr "对于每个请求,使用 **已计算 token 的数量** + **当前调度 token 的相对位置**(`request_0: [0 + 0, 0 + 1, 0 + 2]`,`request_1: [0 + 0, 0 + 1]`,`request_2: [0 + 0, 0 + 1,..., 0 + 4]`),然后将它们连接在一起(`[0, 1, 2, 0, 1, 0, 1, 2, 3, 4]`)。"
|
||||
msgstr ""
|
||||
"对于每个请求,使用 **已计算 token 的数量** + **当前调度 token 的相对位置**(`request_0: [0 + 0, 0 "
|
||||
"+ 1, 0 + 2]`,`request_1: [0 + 0, 0 + 1]`,`request_2: [0 + 0, 0 + 1,..., 0"
|
||||
" + 4]`),然后将它们连接在一起(`[0, 1, 2, 0, 1, 0, 1, 2, 3, 4]`)。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:111
|
||||
msgid ""
|
||||
@@ -293,7 +307,9 @@ msgstr "注意:在实际代码中,有一种更高效的方法(使用 `requ
|
||||
msgid ""
|
||||
"Finally, `token positions` can be obtained as `[0, 1, 2, 0, 1, 0, 1, 2, "
|
||||
"3, 4]`. This variable is **token level**."
|
||||
msgstr "最后,`token positions` 可以获取为 `[0, 1, 2, 0, 1, 0, 1, 2, 3, 4]`。此变量是 **token 级别** 的。"
|
||||
msgstr ""
|
||||
"最后,`token positions` 可以获取为 `[0, 1, 2, 0, 1, 0, 1, 2, 3, 4]`。此变量是 **token "
|
||||
"级别** 的。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:115
|
||||
msgid "2. Get token indices"
|
||||
@@ -326,14 +342,19 @@ msgstr "注意 `T_x_x` 是一个 `int32`。"
|
||||
msgid ""
|
||||
"Let's say `M = max model len`. Then we can use `token positions` together"
|
||||
" with `request indices` of each token to construct `token indices`."
|
||||
msgstr "假设 `M = max model len`。那么我们可以使用 `token positions` 以及每个 token 的 `request indices` 来构造 `token indices`。"
|
||||
msgstr ""
|
||||
"假设 `M = max model len`。那么我们可以使用 `token positions` 以及每个 token 的 `request "
|
||||
"indices` 来构造 `token indices`。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:137
|
||||
msgid ""
|
||||
"So `token indices` = `[0 + 0 * M, 1 + 0 * M, 2 + 0 * M, 0 + 1 * M, 1 + 1 "
|
||||
"* M, 0 + 2 * M, 1 + 2 * M, 2 + 2 * M, 3 + 2 * M, 4 + 2 * M]` = `[0, 1, 2,"
|
||||
" 12, 13, 24, 25, 26, 27, 28]`"
|
||||
msgstr "所以 `token indices` = `[0 + 0 * M, 1 + 0 * M, 2 + 0 * M, 0 + 1 * M, 1 + 1 * M, 0 + 2 * M, 1 + 2 * M, 2 + 2 * M, 3 + 2 * M, 4 + 2 * M]` = `[0, 1, 2, 12, 13, 24, 25, 26, 27, 28]`"
|
||||
msgstr ""
|
||||
"所以 `token indices` = `[0 + 0 * M, 1 + 0 * M, 2 + 0 * M, 0 + 1 * M, 1 + 1 "
|
||||
"* M, 0 + 2 * M, 1 + 2 * M, 2 + 2 * M, 3 + 2 * M, 4 + 2 * M]` = `[0, 1, 2,"
|
||||
" 12, 13, 24, 25, 26, 27, 28]`"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:139
|
||||
msgid "3. Retrieve the Token IDs"
|
||||
@@ -353,7 +374,9 @@ msgstr "如前所述,我们将这些 `Token IDs` 称为 `Input IDs`。"
|
||||
msgid ""
|
||||
"`Input IDs` = `[T_0_0, T_0_1, T_0_2, T_1_0, T_1_1, T_2_0, T_2_1, T_3_2, "
|
||||
"T_3_3, T_3_4]`"
|
||||
msgstr "`Input IDs` = `[T_0_0, T_0_1, T_0_2, T_1_0, T_1_1, T_2_0, T_2_1, T_3_2, T_3_3, T_3_4]`"
|
||||
msgstr ""
|
||||
"`Input IDs` = `[T_0_0, T_0_1, T_0_2, T_1_0, T_1_1, T_2_0, T_2_1, T_3_2, "
|
||||
"T_3_3, T_3_4]`"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:151
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:237
|
||||
@@ -367,7 +390,8 @@ msgid ""
|
||||
"model len / block size)`, where `max model len / block size = 12 / 2 = "
|
||||
"6`."
|
||||
msgstr ""
|
||||
"在当前的**块表**中,我们使用第一个块(即 block_0)来标记未使用的块。块的形状为 `(最大请求数, 最大模型长度 / 块大小)`,其中 `最大模型长度 / 块大小 = 12 / 2 = 6`。"
|
||||
"在当前的**块表**中,我们使用第一个块(即 block_0)来标记未使用的块。块的形状为 `(最大请求数, 最大模型长度 / 块大小)`,其中 "
|
||||
"`最大模型长度 / 块大小 = 12 / 2 = 6`。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:165
|
||||
msgid "The KV cache block in the device memory is like:"
|
||||
@@ -434,7 +458,11 @@ msgid ""
|
||||
" / 2] = [0, 0, 1, 6, 6, 12, 12, 13, 13, 14]`. This could be used to "
|
||||
"select `device block number` from `block table`."
|
||||
msgstr ""
|
||||
"(**令牌级别**) 使用一个简单的公式计算`块表索引`:`request indices * K + positions / block size`。因此它等于 `[0 * 6 + 0 / 2, 0 * 6 + 1 / 2, 0 * 6 + 2 / 2, 1 * 6 + 0 / 2, 1 * 6 + 1 / 2, 2 * 6 + 0 / 2, 2 * 6 + 1 / 2, 2 * 6 + 2 / 2, 2 * 6 + 3 / 2, 2 * 6 + 4 / 2] = [0, 0, 1, 6, 6, 12, 12, 13, 13, 14]`。这可用于从`块表`中选择`设备块编号`。"
|
||||
"(**令牌级别**) 使用一个简单的公式计算`块表索引`:`request indices * K + positions / block "
|
||||
"size`。因此它等于 `[0 * 6 + 0 / 2, 0 * 6 + 1 / 2, 0 * 6 + 2 / 2, 1 * 6 + 0 / 2,"
|
||||
" 1 * 6 + 1 / 2, 2 * 6 + 0 / 2, 2 * 6 + 1 / 2, 2 * 6 + 2 / 2, 2 * 6 + 3 / "
|
||||
"2, 2 * 6 + 4 / 2] = [0, 0, 1, 6, 6, 12, 12, 13, 13, "
|
||||
"14]`。这可用于从`块表`中选择`设备块编号`。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:194
|
||||
msgid ""
|
||||
@@ -443,14 +471,17 @@ msgid ""
|
||||
"block_table[block_table_indices]`. So `device block number=[1, 1, 2, 3, "
|
||||
"3, 4, 4, 5, 5, 6]`"
|
||||
msgstr ""
|
||||
"(**令牌级别**) 使用`块表索引`为每个已调度的令牌选择出`设备块编号`。伪代码为 `block_numbers = block_table[block_table_indices]`。因此 `设备块编号=[1, 1, 2, 3, 3, 4, 4, 5, 5, 6]`"
|
||||
"(**令牌级别**) 使用`块表索引`为每个已调度的令牌选择出`设备块编号`。伪代码为 `block_numbers = "
|
||||
"block_table[block_table_indices]`。因此 `设备块编号=[1, 1, 2, 3, 3, 4, 4, 5, 5, "
|
||||
"6]`"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:195
|
||||
msgid ""
|
||||
"(**Token level**) `block offsets` could be computed by `block offsets = "
|
||||
"positions % block size = [0, 1, 0, 0, 1, 0, 1, 0, 1, 0]`."
|
||||
msgstr ""
|
||||
"(**令牌级别**) `块内偏移`可以通过 `block offsets = positions % block size = [0, 1, 0, 0, 1, 0, 1, 0, 1, 0]` 计算得出。"
|
||||
"(**令牌级别**) `块内偏移`可以通过 `block offsets = positions % block size = [0, 1, 0,"
|
||||
" 0, 1, 0, 1, 0, 1, 0]` 计算得出。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:196
|
||||
msgid ""
|
||||
@@ -458,7 +489,8 @@ msgid ""
|
||||
"mapping`: `device block number * block size + block_offsets = [2, 3, 4, "
|
||||
"6, 7, 8, 9, 10, 11, 12]`"
|
||||
msgstr ""
|
||||
"最后,使用`块内偏移`和`设备块编号`创建`槽映射`:`设备块编号 * 块大小 + 块内偏移 = [2, 3, 4, 6, 7, 8, 9, 10, 11, 12]`"
|
||||
"最后,使用`块内偏移`和`设备块编号`创建`槽映射`:`设备块编号 * 块大小 + 块内偏移 = [2, 3, 4, 6, 7, 8, 9, "
|
||||
"10, 11, 12]`"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:198
|
||||
msgid "(**Request level**) As we know the scheduled token count is `[3, 2, 5]`:"
|
||||
@@ -538,7 +570,9 @@ msgid ""
|
||||
"**Note**: **T_0_3**, **T_1_2** are new Token IDs of **request_0** and "
|
||||
"**request_1** respectively. They are sampled from the output of the "
|
||||
"model."
|
||||
msgstr "**注意**:**T_0_3**、**T_1_2** 分别是 **request_0** 和 **request_1** 的新令牌 ID。它们是从模型输出中采样得到的。"
|
||||
msgstr ""
|
||||
"**注意**:**T_0_3**、**T_1_2** 分别是 **request_0** 和 **request_1** 的新令牌 "
|
||||
"ID。它们是从模型输出中采样得到的。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:234
|
||||
msgid "`token indices`: `[3, 14, 29, 30, 31]`"
|
||||
@@ -553,7 +587,9 @@ msgid ""
|
||||
"We allocate the blocks `7` and `8` to `request_1` and `request_2` "
|
||||
"respectively, as they need more space in device to store KV cache "
|
||||
"following token generation or chunked prefill."
|
||||
msgstr "我们将块 `7` 和 `8` 分别分配给 `request_1` 和 `request_2`,因为它们在令牌生成或分块预填充后需要更多设备空间来存储 KV 缓存。"
|
||||
msgstr ""
|
||||
"我们将块 `7` 和 `8` 分别分配给 `request_1` 和 "
|
||||
"`request_2`,因为它们在令牌生成或分块预填充后需要更多设备空间来存储 KV 缓存。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/ModelRunner_prepare_inputs.md:241
|
||||
msgid "Current **Block Table**:"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -35,9 +35,8 @@ msgid ""
|
||||
"Ascend NPUs and is automatically executed during worker initialization "
|
||||
"when enabled."
|
||||
msgstr ""
|
||||
"CPU 绑定将 vLLM Ascend 工作进程和关键线程固定到特定的 CPU 核心,以减少 CPU-"
|
||||
"NPU 跨 NUMA 流量,并在多进程工作负载下稳定延迟。它专为运行 Ascend NPU 的 ARM "
|
||||
"服务器设计,启用后会在工作进程初始化期间自动执行。"
|
||||
"CPU 绑定将 vLLM Ascend 工作进程和关键线程固定到特定的 CPU 核心,以减少 CPU-NPU 跨 NUMA "
|
||||
"流量,并在多进程工作负载下稳定延迟。它专为运行 Ascend NPU 的 ARM 服务器设计,启用后会在工作进程初始化期间自动执行。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:7
|
||||
msgid "Background"
|
||||
@@ -53,10 +52,9 @@ msgid ""
|
||||
"purely a host‑side affinity policy and does not change model execution "
|
||||
"logic."
|
||||
msgstr ""
|
||||
"在多插槽 ARM 系统上,操作系统调度器可能会将 vLLM 线程放置在远离本地 NPU 的 "
|
||||
"CPU 上,从而导致 NUMA 跨域流量和延迟抖动。CPU 绑定强制执行一种确定性的 CPU "
|
||||
"放置策略,并可选地将 NPU IRQ 绑定到同一个 CPU 池。这与其他性能特性(如图模式"
|
||||
"或动态批处理)不同,因为它纯粹是主机端的亲和性策略,不改变模型执行逻辑。"
|
||||
"在多插槽 ARM 系统上,操作系统调度器可能会将 vLLM 线程放置在远离本地 NPU 的 CPU 上,从而导致 NUMA "
|
||||
"跨域流量和延迟抖动。CPU 绑定强制执行一种确定性的 CPU 放置策略,并可选地将 NPU IRQ 绑定到同一个 CPU "
|
||||
"池。这与其他性能特性(如图模式或动态批处理)不同,因为它纯粹是主机端的亲和性策略,不改变模型执行逻辑。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:11
|
||||
msgid "Design & How it works"
|
||||
@@ -71,8 +69,8 @@ msgid ""
|
||||
"**Allowed CPU list**: The cpuset from /proc/self/status "
|
||||
"(Cpus_allowed_list). All allocations are constrained to this list."
|
||||
msgstr ""
|
||||
"**允许的 CPU 列表**:来自 /proc/self/status (Cpus_allowed_list) 的 cpuset。"
|
||||
"所有分配都受限于此列表。"
|
||||
"**允许的 CPU 列表**:来自 /proc/self/status (Cpus_allowed_list) 的 "
|
||||
"cpuset。所有分配都受限于此列表。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:16
|
||||
msgid ""
|
||||
@@ -86,8 +84,7 @@ msgstr ""
|
||||
msgid ""
|
||||
"**CPU pool per NPU**: The CPU list assigned to each logical NPU ID based "
|
||||
"on the binding mode."
|
||||
msgstr ""
|
||||
"**每个 NPU 的 CPU 池**:根据绑定模式分配给每个逻辑 NPU ID 的 CPU 列表。"
|
||||
msgstr "**每个 NPU 的 CPU 池**:根据绑定模式分配给每个逻辑 NPU ID 的 CPU 列表。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:18
|
||||
msgid "**Binding modes & Device behavior**:"
|
||||
@@ -119,8 +116,8 @@ msgid ""
|
||||
"logical NPUs**, ensuring each NPU is assigned a contiguous segment of CPU"
|
||||
" cores. This prevents CPU core overlap across multiple process groups."
|
||||
msgstr ""
|
||||
"根据**全局逻辑 NPU 总数**均匀分割允许的 CPU 列表,确保每个 NPU 被分配一个连"
|
||||
"续的 CPU 核心段。这可以防止多个进程组之间的 CPU 核心重叠。"
|
||||
"根据**全局逻辑 NPU 总数**均匀分割允许的 CPU 列表,确保每个 NPU 被分配一个连续的 CPU 核心段。这可以防止多个进程组之间的 "
|
||||
"CPU 核心重叠。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md
|
||||
msgid "A2 / 310P / Others"
|
||||
@@ -136,8 +133,8 @@ msgid ""
|
||||
"If multiple NPUs are assigned to a single NUMA node (which may cause "
|
||||
"bandwidth contention), the CPU allocation extends to adjacent NUMA nodes."
|
||||
msgstr ""
|
||||
"基于 NPU 拓扑亲和性 (`npu-smi info -t topo`) 分配 CPU。如果多个 NPU 被分配"
|
||||
"到单个 NUMA 节点(可能导致带宽争用),则 CPU 分配会扩展到相邻的 NUMA 节点。"
|
||||
"基于 NPU 拓扑亲和性 (`npu-smi info -t topo`) 分配 CPU。如果多个 NPU 被分配到单个 NUMA "
|
||||
"节点(可能导致带宽争用),则 CPU 分配会扩展到相邻的 NUMA 节点。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:25
|
||||
msgid "**Default**: enabled (enable_cpu_binding = true)."
|
||||
@@ -151,8 +148,7 @@ msgstr "**回退**:如果 NPU 拓扑亲和性不可用,则使用 global_slic
|
||||
msgid ""
|
||||
"**Failure handling**: Any exception in binding is logged as a warning and"
|
||||
" **binding is skipped for that rank**."
|
||||
msgstr ""
|
||||
"**故障处理**:绑定过程中的任何异常都会记录为警告,并且**跳过该等级的绑定**。"
|
||||
msgstr "**故障处理**:绑定过程中的任何异常都会记录为警告,并且**跳过该等级的绑定**。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:29
|
||||
msgid "Execution flow (simplified)"
|
||||
@@ -373,9 +369,7 @@ msgstr "`IRQ`: 600-601, `Main`: 602-637, `ACL`: 638, `Release`: 639"
|
||||
msgid ""
|
||||
"This layout remains deterministic even when multiple processes share the "
|
||||
"same cpuset, because slicing is based on the global logical NPU ID."
|
||||
msgstr ""
|
||||
"即使多个进程共享同一个 cpuset,此布局也保持确定性,因为切片是基于全局逻辑 "
|
||||
"NPU ID 的。"
|
||||
msgstr "即使多个进程共享同一个 cpuset,此布局也保持确定性,因为切片是基于全局逻辑 NPU ID 的。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:86
|
||||
msgid "Example 2: A3 global_slice, even split"
|
||||
@@ -389,6 +383,10 @@ msgstr "示例 2:A3 global_slice,均匀分割"
|
||||
msgid "**Inputs**:"
|
||||
msgstr "**输入**:"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:90
|
||||
msgid "allowed_cpus = [0..23] (24 CPUs)"
|
||||
msgstr "allowed_cpus = [0..23] (24个CPU)"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:91
|
||||
msgid ""
|
||||
"NUMA nodes = 0..1 (2 NUMA nodes, symmetric layout; NUMA0 = 0..11, NUMA1 ="
|
||||
@@ -520,7 +518,10 @@ msgid ""
|
||||
"(6,7) and NUMA1 (8..11). This is a direct consequence of global slicing "
|
||||
"over the ordered cpuset; the remainder distribution does not enforce NUMA"
|
||||
" boundaries."
|
||||
msgstr "在上述对称NUMA布局中 (NUMA0 = 0..7, NUMA1 = 8..16),NPU0保持在NUMA0内,NPU2保持在NUMA1内,但NPU1跨越了NUMA0 (6,7) 和 NUMA1 (8..11)。这是对有序cpuset进行全局切片的直接结果;余数分配不强制NUMA边界。"
|
||||
msgstr ""
|
||||
"在上述对称NUMA布局中 (NUMA0 = 0..7, NUMA1 = "
|
||||
"8..16),NPU0保持在NUMA0内,NPU2保持在NUMA1内,但NPU1跨越了NUMA0 (6,7) 和 NUMA1 "
|
||||
"(8..11)。这是对有序cpuset进行全局切片的直接结果;余数分配不强制NUMA边界。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:134
|
||||
msgid ""
|
||||
@@ -539,7 +540,9 @@ msgid ""
|
||||
"avoid cross‑NUMA pools. A future enhancement should incorporate NUMA node"
|
||||
" boundaries into the slicing logic so that pools remain within a single "
|
||||
"NUMA node whenever possible."
|
||||
msgstr "使用当前的 `global_slice` 策略,某些CPU/NPU布局无法避免跨NUMA池。未来的增强应将NUMA节点边界纳入切片逻辑,以便池尽可能保持在单个NUMA节点内。"
|
||||
msgstr ""
|
||||
"使用当前的 `global_slice` "
|
||||
"策略,某些CPU/NPU布局无法避免跨NUMA池。未来的增强应将NUMA节点边界纳入切片逻辑,以便池尽可能保持在单个NUMA节点内。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:140
|
||||
msgid "Example 4: global_slice with visible subset of NPUs"
|
||||
@@ -594,7 +597,6 @@ msgid "Example 5: A2/310P topo_affinity with NUMA extension"
|
||||
msgstr "示例 5: 具有NUMA扩展的 A2/310P topo_affinity"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:163
|
||||
#, python-brace-format
|
||||
msgid "npu_affinity = {0: [0..7], 1: [0..7]} (from `npu-smi info -t topo`)"
|
||||
msgstr "npu_affinity = {0: [0..7], 1: [0..7]} (来自 `npu-smi info -t topo`)"
|
||||
|
||||
@@ -745,11 +747,12 @@ msgid ""
|
||||
"0–31, NUMA1 = CPUs 32–63, and the cpuset is 0–63. With 4 logical NPUs, "
|
||||
"global slicing yields 16 CPUs per NPU (0–15, 16–31, 32–47, 48–63), so "
|
||||
"each NPU’s pool stays within a single NUMA node."
|
||||
msgstr "示例(对称布局):2个NUMA节点,总共64个CPU。NUMA0 = CPU 0–31,NUMA1 = CPU 32–63,cpuset为0–63。对于4个逻辑NPU,全局切片每个NPU产生16个CPU (0–15, 16–31, 32–47, 48–63),因此每个NPU的池保持在单个NUMA节点内。"
|
||||
msgstr ""
|
||||
"示例(对称布局):2个NUMA节点,共64个CPU。NUMA0 = CPU 0–31,NUMA1 = CPU 32–63,cpuset为0–63。对于4个逻辑NPU,全局切片为每个NPU分配16个CPU (0–15, 16–31, 32–47, 48–63),因此每个NPU的CPU池都保持在单个NUMA节点内。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:212
|
||||
msgid "**Runtime dependencies**:"
|
||||
msgstr "**运行时依赖**:"
|
||||
msgstr "**运行时依赖项**:"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:213
|
||||
msgid "Requires npu‑smi and lscpu commands."
|
||||
@@ -761,13 +764,13 @@ msgstr "IRQ绑定需要对 /proc/irq 的写访问权限。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:215
|
||||
msgid "Memory binding requires migratepages; otherwise it is skipped."
|
||||
msgstr "内存绑定需要 migratepages;否则将被跳过。"
|
||||
msgstr "内存绑定需要 migratepages;否则将跳过此步骤。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:216
|
||||
msgid ""
|
||||
"**IRQ side effects**: irqbalance may be stopped to avoid overriding "
|
||||
"bindings."
|
||||
msgstr "**IRQ副作用**:可能会停止 irqbalance 以避免覆盖绑定。"
|
||||
msgstr "**IRQ副作用**:可能会停止 irqbalance 服务以避免覆盖绑定。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:217
|
||||
msgid ""
|
||||
@@ -788,13 +791,15 @@ msgstr "使用标准的 vLLM 日志配置来启用调试日志。当启用调试
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:223
|
||||
msgid "References"
|
||||
msgstr "参考"
|
||||
msgstr "参考资料"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:225
|
||||
msgid ""
|
||||
"CPU binding implementation: vllm_ascend/cpu_binding.py (`DeviceInfo`, "
|
||||
"`CpuAlloc`, `bind_cpus`)"
|
||||
msgstr "CPU 绑定实现:vllm_ascend/cpu_binding.py (`DeviceInfo`, `CpuAlloc`, `bind_cpus`)"
|
||||
msgstr ""
|
||||
"CPU 绑定实现:vllm_ascend/cpu_binding.py (`DeviceInfo`, `CpuAlloc`, "
|
||||
"`bind_cpus`)"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:226
|
||||
msgid ""
|
||||
@@ -807,7 +812,9 @@ msgid ""
|
||||
"Additional config option: "
|
||||
"docs/source/user_guide/configuration/additional_config.md "
|
||||
"(`enable_cpu_binding`)"
|
||||
msgstr "附加配置选项:docs/source/user_guide/configuration/additional_config.md (`enable_cpu_binding`)"
|
||||
msgstr ""
|
||||
"附加配置选项:docs/source/user_guide/configuration/additional_config.md "
|
||||
"(`enable_cpu_binding`)"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:228
|
||||
msgid "Tests: tests/ut/device_allocator/test_cpu_binding.py"
|
||||
|
||||
Reference in New Issue
Block a user