[v0.18.0][Doc] Translated Doc files 2026-04-22 (#8565)
## Auto-Translation Summary Translated **43** file(s): - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/KV_Cache_Pool_Guide.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/cpu_binding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/disaggregated_prefill.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/eplb_swift_balancer.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/npugraph_ex.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/patch.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/quantization.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/index.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/multi_node_test.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_ais_bench.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_evalscope.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_lm_eval.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_opencompass.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/faqs.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/installation.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_colocated_mooncake_multi_instance.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/DeepSeek-V3.1.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM4.x.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM5.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/PaddleOCR-VL.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen-VL-Dense.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-235B-A22B.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3_embedding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/Fine_grained_TP.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/batch_invariance.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/context_parallel.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/cpu_binding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/dynamic_batch.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/eplb_swift_balancer.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/kv_pool.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/layer_sharding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/netloader.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/npugraph_ex.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/sleep_mode.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/ucm_deployment.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/weight_prefetch.po</code> --- [Workflow run](https://github.com/vllm-project/vllm-ascend/actions/runs/24767290887) Signed-off-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com> Co-authored-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
This commit is contained in:
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -37,17 +37,17 @@ msgstr "前缀缓存是大语言模型推理中的一项重要特性,可以显
|
||||
msgid ""
|
||||
"However, the performance gain from prefix caching is highly dependent on "
|
||||
"the cache hit rate, while the cache hit rate can be limited if one only "
|
||||
"uses HBM for KV cache storage."
|
||||
msgstr "然而,前缀缓存带来的性能提升高度依赖于缓存命中率,而如果仅使用 HBM 存储 KV 缓存,缓存命中率会受到限制。"
|
||||
"uses on-chip memory for KV cache storage."
|
||||
msgstr "然而,前缀缓存带来的性能提升高度依赖于缓存命中率,而如果仅使用片上内存存储 KV 缓存,缓存命中率会受到限制。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:9
|
||||
msgid ""
|
||||
"Hence, KV Cache Pool is proposed to utilize various types of storage "
|
||||
"including HBM, DRAM, and SSD, making a pool for KV Cache storage while "
|
||||
"making the prefix of requests visible across all nodes, increasing the "
|
||||
"cache hit rate for all requests."
|
||||
"including on-chip memory, DRAM, and SSD, making a pool for KV Cache "
|
||||
"storage while making the prefix of requests visible across all nodes, "
|
||||
"increasing the cache hit rate for all requests."
|
||||
msgstr ""
|
||||
"因此,我们提出了 KV 缓存池,旨在利用包括 HBM、DRAM 和 SSD 在内的多种存储类型,构建一个 KV "
|
||||
"因此,我们提出了 KV 缓存池,旨在利用包括片上内存、DRAM 和 SSD 在内的多种存储类型,构建一个 KV "
|
||||
"缓存存储池,同时使请求的前缀在所有节点间可见,从而提高所有请求的缓存命中率。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:11
|
||||
@@ -111,9 +111,9 @@ msgstr "工作原理"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:25
|
||||
msgid ""
|
||||
"The KV Cache Pool integrates multiple memory tiers (HBM, DRAM, SSD, etc.)"
|
||||
" through a connector-based architecture."
|
||||
msgstr "KV 缓存池通过基于连接器的架构,整合了多个内存层级(HBM、DRAM、SSD 等)。"
|
||||
"The KV Cache Pool integrates multiple memory tiers (on-chip memory, DRAM,"
|
||||
" SSD, etc.) through a connector-based architecture."
|
||||
msgstr "KV 缓存池通过基于连接器的架构,整合了多个内存层级(片上内存、DRAM、SSD 等)。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:27
|
||||
msgid ""
|
||||
@@ -124,25 +124,25 @@ msgstr "每个连接器实现了一个统一的接口,用于根据访问频率
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:29
|
||||
msgid ""
|
||||
"When combined with vLLM’s Prefix Caching mechanism, the pool enables "
|
||||
"efficient caching both locally (in HBM) and globally (via Mooncake), "
|
||||
"ensuring that frequently used prefixes remain hot while less frequently "
|
||||
"accessed KV data can spill over to lower-cost memory."
|
||||
"When combined with vLLM's Prefix Caching mechanism, the pool enables "
|
||||
"efficient caching both locally (in on-chip memory) and globally (via "
|
||||
"Mooncake), ensuring that frequently used prefixes remain hot while less "
|
||||
"frequently accessed KV data can spill over to lower-cost memory."
|
||||
msgstr ""
|
||||
"当与 vLLM 的前缀缓存机制结合时,该池能够实现本地(HBM 中)和全局(通过 "
|
||||
"当与 vLLM 的前缀缓存机制结合时,该池能够实现本地(片上内存中)和全局(通过 "
|
||||
"Mooncake)的高效缓存,确保常用前缀保持热状态,而访问频率较低的 KV 数据则可以溢出到成本更低的内存中。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:31
|
||||
msgid "1. Combining KV Cache Pool with HBM Prefix Caching"
|
||||
msgstr "1. 将 KV 缓存池与 HBM 前缀缓存结合"
|
||||
msgid "1. Combining KV Cache Pool with on-chip memory Prefix Caching"
|
||||
msgstr "1. 将 KV 缓存池与片上内存前缀缓存结合"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:33
|
||||
msgid ""
|
||||
"Prefix Caching with HBM is already supported by the vLLM V1 Engine. By "
|
||||
"introducing KV Connector V1, users can seamlessly combine HBM-based "
|
||||
"Prefix Caching with Mooncake-backed KV Pool."
|
||||
"Prefix Caching with on-chip memory is already supported by the vLLM V1 "
|
||||
"Engine. By introducing KV Connector V1, users can seamlessly combine on-"
|
||||
"chip memory-based Prefix Caching with Mooncake-backed KV Pool."
|
||||
msgstr ""
|
||||
"vLLM V1 引擎已支持基于 HBM 的前缀缓存。通过引入 KV Connector V1,用户可以无缝地将基于 HBM 的前缀缓存与 "
|
||||
"vLLM V1 引擎已支持基于片上内存的前缀缓存。通过引入 KV Connector V1,用户可以无缝地将基于片上内存的前缀缓存与 "
|
||||
"Mooncake 支持的 KV 池结合起来。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:36
|
||||
@@ -160,24 +160,25 @@ msgid "**Workflow**:"
|
||||
msgstr "**工作流程**:"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:40
|
||||
msgid "The engine first checks for prefix hits in the HBM cache."
|
||||
msgstr "引擎首先检查 HBM 缓存中的前缀命中情况。"
|
||||
msgid "The engine first checks for prefix hits in the on-chip memory cache."
|
||||
msgstr "引擎首先检查片上内存缓存中的前缀命中情况。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:42
|
||||
msgid ""
|
||||
"After getting the number of hit tokens on HBM, it queries the KV Pool via"
|
||||
" the connector. If there are additional hits in the KV Pool, we get the "
|
||||
"**additional blocks only** from the KV Pool, and get the rest of the "
|
||||
"blocks directly from HBM to minimize the data transfer latency."
|
||||
"After getting the number of hit tokens on on-chip memory, it queries the "
|
||||
"KV Pool via the connector. If there are additional hits in the KV Pool, "
|
||||
"we get the **additional blocks only** from the KV Pool, and get the rest "
|
||||
"of the blocks directly from on-chip memory to minimize the data transfer "
|
||||
"latency."
|
||||
msgstr ""
|
||||
"获取 HBM 上的命中令牌数量后,引擎通过连接器查询 KV 池。如果在 KV 池中有额外的命中,我们**仅从 KV "
|
||||
"池获取额外的块**,其余块则直接从 HBM 获取,以最小化数据传输延迟。"
|
||||
"获取片上内存上的命中令牌数量后,引擎通过连接器查询 KV 池。如果在 KV 池中有额外的命中,我们**仅从 KV "
|
||||
"池获取额外的块**,其余块则直接从片上内存获取,以最小化数据传输延迟。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:44
|
||||
msgid ""
|
||||
"After the KV Caches in the KV Pool are loaded into HBM, the remaining "
|
||||
"process is the same as Prefix Caching in HBM."
|
||||
msgstr "将 KV 池中的 KV 缓存加载到 HBM 后,剩余过程与 HBM 中的前缀缓存相同。"
|
||||
"After the KV Caches in the KV Pool are loaded into on-chip memory, the "
|
||||
"remaining process is the same as Prefix Caching in on-chip memory."
|
||||
msgstr "将 KV 池中的 KV 缓存加载到片上内存后,剩余过程与片上内存中的前缀缓存相同。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:46
|
||||
msgid "2. Combining KV Cache Pool with Mooncake PD Disaggregation"
|
||||
@@ -202,12 +203,12 @@ msgstr ""
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:52
|
||||
msgid ""
|
||||
"The key benefit of doing this is that we can keep the gain in performance"
|
||||
" by computing less with Prefix Caching from HBM and KV Pool for Prefill "
|
||||
"Nodes, while not sacrificing the data transfer efficiency between Prefill"
|
||||
" and Decode nodes with P2P KV Connector that transfers KV Caches between "
|
||||
"NPU devices directly."
|
||||
" by computing less with Prefix Caching from on-chip memory and KV Pool "
|
||||
"for Prefill Nodes, while not sacrificing the data transfer efficiency "
|
||||
"between Prefill and Decode nodes with P2P KV Connector that transfers KV "
|
||||
"Caches between NPU devices directly."
|
||||
msgstr ""
|
||||
"这样做的主要好处是,我们可以通过为预填充节点使用来自 HBM 和 KV "
|
||||
"这样做的主要好处是,我们可以通过为预填充节点使用来自片上内存和 KV "
|
||||
"池的前缀缓存来减少计算量,从而保持性能增益,同时又不牺牲预填充节点与解码节点之间的数据传输效率,因为 P2P KV 连接器直接在 NPU "
|
||||
"设备间传输 KV 缓存。"
|
||||
|
||||
@@ -332,7 +333,8 @@ msgstr "限制"
|
||||
msgid ""
|
||||
"Currently, MooncakeStore for vLLM-Ascend only supports DRAM as the "
|
||||
"storage for KV Cache pool."
|
||||
msgstr "目前,vLLM-Ascend 的 MooncakeStore 仅支持 DRAM 作为 KV 缓存池的存储。"
|
||||
msgstr ""
|
||||
"目前,vLLM-Ascend 的 MooncakeStore 仅支持 DRAM 作为 KV 缓存池的存储介质。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/KV_Cache_Pool_Guide.md:91
|
||||
msgid ""
|
||||
@@ -344,5 +346,5 @@ msgid ""
|
||||
"there's no prefix cache hit (or even better, revert only one block and "
|
||||
"keep using the Prefix Caches before that)."
|
||||
msgstr ""
|
||||
"目前,如果我们成功查找到一个键并发现其存在,但在调用 KV 池的 get 函数时获取失败,我们仅输出一条日志表明 get "
|
||||
"操作失败并继续执行;因此,该特定请求的准确性可能会受到影响。我们将通过回退请求并假设没有前缀缓存命中来重新计算所有内容(或者更优的方案是,仅回退一个块并继续使用该块之前的前缀缓存)来处理这种情况。"
|
||||
"目前,如果我们成功查找到一个键并确认其存在,但在调用 KV 池的 get 函数时获取失败,我们仅输出一条日志表明 get "
|
||||
"操作失败并继续执行;因此,该特定请求的准确性可能会受到影响。我们将通过回退该请求并假设没有前缀缓存命中来重新计算所有内容(或者更优的方案是,仅回退一个块并继续使用该块之前的前缀缓存)来处理这种情况。"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -193,8 +193,10 @@ msgid "**Build CPU pools**:"
|
||||
msgstr "**构建 CPU 池**:"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:39
|
||||
msgid "Use **global_slice** for A3 devices; **topo_affinity** for A2 and 310P."
|
||||
msgstr "对 A3 设备使用 **global_slice**;对 A2 和 310P 使用 **topo_affinity**。"
|
||||
msgid ""
|
||||
"Use **global_slice** for A3 devices; **topo_affinity** for A2 and Atlas "
|
||||
"300 inference products."
|
||||
msgstr "对 A3 设备使用 **global_slice**;对 A2 和 Atlas 300 推理产品使用 **topo_affinity**。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:40
|
||||
msgid "If topo affinity is missing, fall back to global_slice."
|
||||
@@ -597,6 +599,7 @@ msgid "Example 5: A2/310P topo_affinity with NUMA extension"
|
||||
msgstr "示例 5: 具有NUMA扩展的 A2/310P topo_affinity"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:163
|
||||
#, python-brace-format
|
||||
msgid "npu_affinity = {0: [0..7], 1: [0..7]} (from `npu-smi info -t topo`)"
|
||||
msgstr "npu_affinity = {0: [0..7], 1: [0..7]} (来自 `npu-smi info -t topo`)"
|
||||
|
||||
@@ -748,7 +751,9 @@ msgid ""
|
||||
"global slicing yields 16 CPUs per NPU (0–15, 16–31, 32–47, 48–63), so "
|
||||
"each NPU’s pool stays within a single NUMA node."
|
||||
msgstr ""
|
||||
"示例(对称布局):2个NUMA节点,共64个CPU。NUMA0 = CPU 0–31,NUMA1 = CPU 32–63,cpuset为0–63。对于4个逻辑NPU,全局切片为每个NPU分配16个CPU (0–15, 16–31, 32–47, 48–63),因此每个NPU的CPU池都保持在单个NUMA节点内。"
|
||||
"示例(对称布局):2个NUMA节点,共64个CPU。NUMA0 = CPU 0–31,NUMA1 = CPU "
|
||||
"32–63,cpuset为0–63。对于4个逻辑NPU,全局切片为每个NPU分配16个CPU (0–15, 16–31, 32–47, "
|
||||
"48–63),因此每个NPU的CPU池都保持在单个NUMA节点内。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/cpu_binding.md:212
|
||||
msgid "**Runtime dependencies**:"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -32,9 +32,7 @@ msgid ""
|
||||
"This feature addresses the need to optimize the **Time Per Output Token "
|
||||
"(TPOT)** and **Time To First Token (TTFT)** in large-scale inference "
|
||||
"tasks. The motivation is two-fold:"
|
||||
msgstr ""
|
||||
"此功能旨在优化大规模推理任务中的**单输出令牌时间 (TPOT)** 和**首令牌时间 "
|
||||
"(TTFT)**。其动机主要有两方面:"
|
||||
msgstr "此功能旨在优化大规模推理任务中的**单输出令牌时间 (TPOT)** 和**首令牌时间 (TTFT)**。其动机主要有两方面:"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:7
|
||||
msgid ""
|
||||
@@ -46,19 +44,22 @@ msgid ""
|
||||
"to better system performance tuning, particularly for **TTFT** and "
|
||||
"**TPOT**."
|
||||
msgstr ""
|
||||
"**调整 P 节点和 D 节点的并行策略与实例数量** 采用解耦式预填充策略,此功能允许系统灵活调整 P(预填充器)节点和 D(解码器)节点的并行化策略(例如数据并行 (dp)、张量并行 (tp) 和专家并行 (ep))以及实例数量。这有助于实现更好的系统性能调优,特别是针对 **TTFT** 和 **TPOT**。"
|
||||
"**调整 P 节点和 D 节点的并行策略与实例数量** 采用解耦式预填充策略,此功能允许系统灵活调整 P(预填充器)节点和 "
|
||||
"D(解码器)节点的并行化策略(例如数据并行 (dp)、张量并行 (tp) 和专家并行 "
|
||||
"(ep))以及实例数量。这有助于实现更好的系统性能调优,特别是针对 **TTFT** 和 **TPOT**。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:10
|
||||
msgid ""
|
||||
"**Optimizing TPOT** Without the disaggregated-prefill strategy, prefill "
|
||||
"tasks are inserted during decoding, which results in inefficiencies and "
|
||||
"delays. Disaggregated-prefill solves this by allowing for better control "
|
||||
"over the system’s **TPOT**. By managing chunked prefill tasks "
|
||||
"over the system's **TPOT**. By managing chunked prefill tasks "
|
||||
"effectively, the system avoids the challenge of determining the optimal "
|
||||
"chunk size and provides more reliable control over the time taken for "
|
||||
"generating output tokens."
|
||||
msgstr ""
|
||||
"**优化 TPOT** 在没有解耦式预填充策略的情况下,预填充任务会在解码过程中插入,导致效率低下和延迟。解耦式预填充通过允许更好地控制系统 **TPOT** 来解决此问题。通过有效管理分块的预填充任务,系统避免了确定最佳分块大小的挑战,并对生成输出令牌所需时间提供了更可靠的控制。"
|
||||
"**优化 TPOT** 在没有解耦式预填充策略的情况下,预填充任务会在解码过程中插入,导致效率低下和延迟。解耦式预填充通过允许更好地控制系统 "
|
||||
"**TPOT** 来解决此问题。通过有效管理分块的预填充任务,系统避免了确定最佳分块大小的挑战,并对生成输出令牌所需时间提供了更可靠的控制。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:15
|
||||
msgid "Usage"
|
||||
@@ -101,10 +102,11 @@ msgstr "1. 设计思路"
|
||||
msgid ""
|
||||
"Under the disaggregated-prefill, a global proxy receives external "
|
||||
"requests, forwarding prefill to P nodes and decode to D nodes; the KV "
|
||||
"cache (key–value cache) is exchanged between P and D nodes via peer-to-"
|
||||
"cache (key-value cache) is exchanged between P and D nodes via peer-to-"
|
||||
"peer (P2P) communication."
|
||||
msgstr ""
|
||||
"在解耦式预填充架构下,一个全局代理接收外部请求,将预填充请求转发给 P 节点,将解码请求转发给 D 节点;KV 缓存(键值缓存)通过点对点 (P2P) 通信在 P 节点和 D 节点之间交换。"
|
||||
"在解耦式预填充架构下,一个全局代理接收外部请求,将预填充请求转发给 P 节点,将解码请求转发给 D 节点;KV 缓存(键值缓存)通过点对点 "
|
||||
"(P2P) 通信在 P 节点和 D 节点之间交换。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:33
|
||||
msgid "2. Implementation Design"
|
||||
@@ -116,7 +118,9 @@ msgid ""
|
||||
" respectively.  "
|
||||
""
|
||||
msgstr ""
|
||||
"我们的设计图如下所示,分别展示了拉取和推送方案。 "
|
||||
"我们的设计图如下所示,分别展示了拉取和推送方案。 "
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:35
|
||||
msgid "alt text"
|
||||
@@ -128,7 +132,7 @@ msgstr "Mooncake 连接器"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:41
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:49
|
||||
msgid "The request is sent to the Proxy’s `_handle_completions` endpoint."
|
||||
msgid "The request is sent to the Proxy's `_handle_completions` endpoint."
|
||||
msgstr "请求被发送到代理的 `_handle_completions` 端点。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:42
|
||||
@@ -137,16 +141,18 @@ msgid ""
|
||||
"request, configuring `kv_transfer_params` with `do_remote_decode=True`, "
|
||||
"`max_completion_tokens=1`, and `min_tokens=1`."
|
||||
msgstr ""
|
||||
"代理调用 `select_prefiller` 选择一个 P 节点并转发请求,配置 `kv_transfer_params` 为 `do_remote_decode=True`、`max_completion_tokens=1` 和 `min_tokens=1`。"
|
||||
"代理调用 `select_prefiller` 选择一个 P 节点并转发请求,配置 `kv_transfer_params` 为 "
|
||||
"`do_remote_decode=True`、`max_completion_tokens=1` 和 `min_tokens=1`。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:43
|
||||
msgid ""
|
||||
"After the P node’s scheduler finishes prefill, `update_from_output` "
|
||||
"invokes the schedule connector’s `request_finished` to defer KV cache "
|
||||
"After the P node's scheduler finishes prefill, `update_from_output` "
|
||||
"invokes the schedule connector's `request_finished` to defer KV cache "
|
||||
"release, constructs `kv_transfer_params` with `do_remote_prefill=True`, "
|
||||
"and returns to the Proxy."
|
||||
msgstr ""
|
||||
"P 节点的调度器完成预填充后,`update_from_output` 调用调度连接器的 `request_finished` 以延迟释放 KV 缓存,构建 `kv_transfer_params` 为 `do_remote_prefill=True`,并返回给代理。"
|
||||
"P 节点的调度器完成预填充后,`update_from_output` 调用调度连接器的 `request_finished` 以延迟释放 KV "
|
||||
"缓存,构建 `kv_transfer_params` 为 `do_remote_prefill=True`,并返回给代理。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:44
|
||||
msgid ""
|
||||
@@ -162,7 +168,8 @@ msgid ""
|
||||
"P node to release KV cache and proceeds with decoding to return the "
|
||||
"result."
|
||||
msgstr ""
|
||||
"在 D 节点上,调度器将请求标记为 `RequestStatus.WAITING_FOR_REMOTE_KVS`,预分配 KV 缓存,调用 `kv_connector_no_forward` 拉取远程 KV 缓存,然后通知 P 节点释放 KV 缓存并继续解码以返回结果。"
|
||||
"在 D 节点上,调度器将请求标记为 `RequestStatus.WAITING_FOR_REMOTE_KVS`,预分配 KV 缓存,调用 "
|
||||
"`kv_connector_no_forward` 拉取远程 KV 缓存,然后通知 P 节点释放 KV 缓存并继续解码以返回结果。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:47
|
||||
msgid "Mooncake Layerwise Connector"
|
||||
@@ -174,7 +181,8 @@ msgid ""
|
||||
"request, configuring `kv_transfer_params` with `do_remote_prefill=True` "
|
||||
"and setting the `metaserver` endpoint."
|
||||
msgstr ""
|
||||
"代理调用 `select_decoder` 选择一个 D 节点并转发请求,配置 `kv_transfer_params` 为 `do_remote_prefill=True` 并设置 `metaserver` 端点。"
|
||||
"代理调用 `select_decoder` 选择一个 D 节点并转发请求,配置 `kv_transfer_params` 为 "
|
||||
"`do_remote_prefill=True` 并设置 `metaserver` 端点。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:51
|
||||
msgid ""
|
||||
@@ -183,20 +191,24 @@ msgid ""
|
||||
"cache, then calls `kv_connector_no_forward` to send a request to the "
|
||||
"metaserver and waits for the KV cache transfer to complete."
|
||||
msgstr ""
|
||||
"在 D 节点上,调度器使用 `kv_transfer_params` 将请求标记为 `RequestStatus.WAITING_FOR_REMOTE_KVS`,预分配 KV 缓存,然后调用 `kv_connector_no_forward` 向元服务器发送请求并等待 KV 缓存传输完成。"
|
||||
"在 D 节点上,调度器使用 `kv_transfer_params` 将请求标记为 "
|
||||
"`RequestStatus.WAITING_FOR_REMOTE_KVS`,预分配 KV 缓存,然后调用 "
|
||||
"`kv_connector_no_forward` 向元服务器发送请求并等待 KV 缓存传输完成。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:52
|
||||
msgid ""
|
||||
"The Proxy’s `metaserver` endpoint receives the request, calls "
|
||||
"The Proxy's `metaserver` endpoint receives the request, calls "
|
||||
"`select_prefiller` to choose a P node, and forwards it with "
|
||||
"`kv_transfer_params` set to `do_remote_decode=True`, "
|
||||
"`max_completion_tokens=1`, and `min_tokens=1`."
|
||||
msgstr ""
|
||||
"代理的 `metaserver` 端点接收请求,调用 `select_prefiller` 选择一个 P 节点,并转发请求,设置 `kv_transfer_params` 为 `do_remote_decode=True`、`max_completion_tokens=1` 和 `min_tokens=1`。"
|
||||
"代理的 `metaserver` 端点接收请求,调用 `select_prefiller` 选择一个 P 节点,并转发请求,设置 "
|
||||
"`kv_transfer_params` 为 `do_remote_decode=True`、`max_completion_tokens=1` "
|
||||
"和 `min_tokens=1`。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:53
|
||||
msgid ""
|
||||
"During processing, the P node’s scheduler pushes KV cache layer-wise; "
|
||||
"During processing, the P node's scheduler pushes KV cache layer-wise; "
|
||||
"once all layers pushing is complete, it releases the request and notifies"
|
||||
" the D node to begin decoding."
|
||||
msgstr "在处理过程中,P 节点的调度器逐层推送 KV 缓存;所有层推送完成后,它释放请求并通知 D 节点开始解码。"
|
||||
@@ -240,10 +252,11 @@ msgstr "4. 规格设计"
|
||||
msgid ""
|
||||
"This feature is flexible and supports various configurations, including "
|
||||
"setups with MLA and GQA models. It is compatible with A2 and A3 hardware "
|
||||
"configurations and facilitates scenarios involving both equal and unequal"
|
||||
" TP setups across multiple P and D nodes."
|
||||
"configurations and facilitates scenarios involving equal TP setups and "
|
||||
"certain unequal TP setups across multiple P and D nodes."
|
||||
msgstr ""
|
||||
"此功能灵活,支持多种配置,包括使用 MLA 和 GQA 模型的设置。它与 A2 和 A3 硬件配置兼容,并支持跨多个 P 节点和 D 节点的相等和不相等 TP 设置场景。"
|
||||
"此功能灵活,支持多种配置,包括使用 MLA 和 GQA 模型的设置。它与 A2 和 A3 硬件配置兼容,并支持跨多个 P 节点和 D "
|
||||
"节点的相等和不相等 TP 设置场景。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md
|
||||
msgid "Feature"
|
||||
@@ -317,7 +330,8 @@ msgid ""
|
||||
"supported and whether kv_connector_module_path exists and is loadable. On"
|
||||
" transfer failures, emit clear error logs for diagnostics."
|
||||
msgstr ""
|
||||
"通过检查 kv_connector 类型是否受支持以及 kv_connector_module_path 是否存在且可加载来验证 KV 传输配置。传输失败时,发出清晰的错误日志以供诊断。"
|
||||
"通过检查 kv_connector 类型是否受支持以及 kv_connector_module_path 是否存在且可加载来验证 KV "
|
||||
"传输配置。传输失败时,发出清晰的错误日志以供诊断。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:91
|
||||
msgid "2. Port Conflict Detection"
|
||||
@@ -328,7 +342,9 @@ msgid ""
|
||||
"Before startup, perform a port-usage check on configured ports (e.g., "
|
||||
"rpc_port, metrics_port, http_port/metaserver) by attempting to bind. If a"
|
||||
" port is already in use, fail fast and log an error."
|
||||
msgstr "启动前,通过尝试绑定来对配置的端口(例如 rpc_port、metrics_port、http_port/metaserver)进行端口使用情况检查。如果端口已被占用,快速失败并记录错误。"
|
||||
msgstr ""
|
||||
"启动前,通过尝试绑定来对配置的端口(例如 "
|
||||
"rpc_port、metrics_port、http_port/metaserver)进行端口使用情况检查。如果端口已被占用,快速失败并记录错误。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/disaggregated_prefill.md:95
|
||||
msgid "3. PD Ratio Validation"
|
||||
@@ -357,4 +373,6 @@ msgid ""
|
||||
"higher TP degree than the D nodes and the P TP count is an integer "
|
||||
"multiple of the D TP count are supported (i.e., P_tp > D_tp and P_tp % "
|
||||
"D_tp = 0)."
|
||||
msgstr "在非对称 TP 配置中,仅支持 P 节点的 TP 度数高于 D 节点且 P 节点的 TP 数量是 D 节点 TP 数量的整数倍的情况(即 P_tp > D_tp 且 P_tp % D_tp = 0)。"
|
||||
msgstr ""
|
||||
"在非对称 TP 配置中,仅支持 P 节点的 TP 度数高于 D 节点且 P 节点的 TP 数量是 D 节点 TP 数量的整数倍的情况(即 P_tp"
|
||||
" > D_tp 且 P_tp % D_tp = 0)。"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -39,7 +39,10 @@ msgid ""
|
||||
"to place experts of the same group on the same node to reduce inter-node "
|
||||
"data traffic, whenever possible."
|
||||
msgstr ""
|
||||
"在使用专家并行 (EP) 时,不同的专家被分配到不同的 NPU 上。鉴于不同专家的负载可能因当前工作负载而异,保持不同 NPU 之间的负载均衡至关重要。我们采用冗余专家策略,通过复制高负载的专家来实现。然后,我们启发式地将这些复制的专家打包到 NPU 上,以确保它们之间的负载均衡。此外,得益于 MoE 模型中使用的组限制专家路由,我们也尽可能将同一组的专家放置在同一节点上,以减少节点间的数据流量。"
|
||||
"在使用专家并行 (EP) 时,不同的专家被分配到不同的 NPU 上。鉴于不同专家的负载可能因当前工作负载而异,保持不同 NPU "
|
||||
"之间的负载均衡至关重要。我们采用冗余专家策略,通过复制高负载的专家来实现。然后,我们启发式地将这些复制的专家打包到 NPU "
|
||||
"上,以确保它们之间的负载均衡。此外,得益于 MoE "
|
||||
"模型中使用的组限制专家路由,我们也尽可能将同一组的专家放置在同一节点上,以减少节点间的数据流量。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/eplb_swift_balancer.md:7
|
||||
msgid ""
|
||||
@@ -50,7 +53,8 @@ msgid ""
|
||||
"predicting expert loads is outside the scope of this repository. A common"
|
||||
" method is to use a moving average of historical statistics."
|
||||
msgstr ""
|
||||
"为了方便复现和部署,vLLM Ascend 在 `vllm_ascend/eplb/core/policy` 中支持已部署的 EP 负载均衡算法。该算法根据估计的专家负载计算一个均衡的专家复制和放置计划。请注意,预测专家负载的具体方法不在本仓库的讨论范围内。一种常见的方法是使用历史统计数据的移动平均值。"
|
||||
"为了方便复现和部署,vLLM Ascend 在 `vllm_ascend/eplb/core/policy` 中支持已部署的 EP "
|
||||
"负载均衡算法。该算法根据估计的专家负载计算一个均衡的专家复制和放置计划。请注意,预测专家负载的具体方法不在本仓库的讨论范围内。一种常见的方法是使用历史统计数据的移动平均值。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/eplb_swift_balancer.md:9
|
||||
msgid ""
|
||||
@@ -214,7 +218,8 @@ msgid ""
|
||||
"hierarchical load balancing policy can be used in the prefilling stage "
|
||||
"with a smaller expert-parallel size."
|
||||
msgstr ""
|
||||
"当服务器节点数量能整除专家组数量时,我们使用分层负载均衡策略来利用组限制专家路由。我们首先将专家组均匀地打包到节点上,确保不同节点间的负载均衡。然后,我们在每个节点内复制专家。最后,我们将复制的专家打包到各个 NPU 上,以确保它们之间的负载均衡。分层负载均衡策略可以在预填充阶段使用,此时专家并行规模较小。"
|
||||
"当服务器节点数量能整除专家组数量时,我们使用分层负载均衡策略来利用组限制专家路由。我们首先将专家组均匀地打包到节点上,确保不同节点间的负载均衡。然后,我们在每个节点内复制专家。最后,我们将复制的专家打包到各个"
|
||||
" NPU 上,以确保它们之间的负载均衡。分层负载均衡策略可以在预填充阶段使用,此时专家并行规模较小。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/eplb_swift_balancer.md:92
|
||||
msgid "Global Load Balancing"
|
||||
@@ -227,7 +232,8 @@ msgid ""
|
||||
"experts onto individual NPUs. This policy can be adopted in the decoding "
|
||||
"stage with a larger expert-parallel size."
|
||||
msgstr ""
|
||||
"在其他情况下,我们使用全局负载均衡策略,该策略不考虑专家组,而是在全局范围内复制专家,并将复制的专家打包到各个 NPU 上。此策略可以在解码阶段采用,此时专家并行规模较大。"
|
||||
"在其他情况下,我们使用全局负载均衡策略,该策略不考虑专家组,而是在全局范围内复制专家,并将复制的专家打包到各个 NPU "
|
||||
"上。此策略可以在解码阶段采用,此时专家并行规模较大。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/eplb_swift_balancer.md:96
|
||||
msgid "Add a New EPLB Policy"
|
||||
@@ -246,8 +252,9 @@ msgid ""
|
||||
"parameters `current_expert_table`, `expert_workload` and return types "
|
||||
"`newplacement`. For example:"
|
||||
msgstr ""
|
||||
"继承 `policy_abstract.py` 中的 `EplbPolicy` 抽象类,并重写 `rebalance_experts` 接口,确保输入参数 "
|
||||
"`current_expert_table`、`expert_workload` 和返回类型 `newplacement` 保持一致。例如:"
|
||||
"继承 `policy_abstract.py` 中的 `EplbPolicy` 抽象类,并重写 `rebalance_experts` "
|
||||
"接口,确保输入参数 `current_expert_table`、`expert_workload` 和返回类型 `newplacement` "
|
||||
"保持一致。例如:"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/eplb_swift_balancer.md:126
|
||||
msgid ""
|
||||
@@ -380,7 +387,8 @@ msgid ""
|
||||
"minimum values and be subject to valid value validation. For example, "
|
||||
"`expert_heat_collection_interval` must be greater than 0:"
|
||||
msgstr ""
|
||||
"所有整型输入参数必须明确指定其最大值和最小值,并接受有效值验证。例如,`expert_heat_collection_interval` 必须大于0:"
|
||||
"所有整型输入参数必须明确指定其最大值和最小值,并接受有效值验证。例如,`expert_heat_collection_interval` "
|
||||
"必须大于0:"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/eplb_swift_balancer.md:197
|
||||
msgid "File Path"
|
||||
@@ -419,28 +427,27 @@ msgid ""
|
||||
"function body, specifying the type of exception captured and the failure "
|
||||
"handling (e.g., logging exceptions or returning a failure status)."
|
||||
msgstr ""
|
||||
"所有方法参数必须指定参数类型和默认值,并且函数必须包含针对默认参数的默认返回值处理。建议使用 `try-except` 块来处理函数体,指定捕获的异常类型和失败处理(例如,记录异常或返回失败状态)。"
|
||||
"所有方法参数必须指定参数类型和默认值,并且函数必须包含针对默认参数的默认返回值处理。建议使用 `try-except` "
|
||||
"块来处理函数体,指定捕获的异常类型和失败处理(例如,记录异常或返回失败状态)。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/eplb_swift_balancer.md:235
|
||||
msgid "Consistency"
|
||||
msgstr "一致性"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/eplb_swift_balancer.md:237
|
||||
msgid "Expert Map"
|
||||
msgstr "专家映射"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/eplb_swift_balancer.md:239
|
||||
msgid ""
|
||||
"The expert map must be globally unique during initialization and update. "
|
||||
"In a multi-node scenario during initialization, distributed communication"
|
||||
" should be used to verify the consistency of expert maps across each "
|
||||
"rank. If they are inconsistent, the user should be notified which ranks "
|
||||
"have inconsistent maps. During the update process, if only a few layers "
|
||||
"or the expert table of a certain rank has been changed, the updated "
|
||||
"expert table must be synchronized with the EPLB's context to ensure "
|
||||
"global consistency."
|
||||
"rank. If they are inconsistent, the user should be notified of which "
|
||||
"ranks have inconsistent maps. During the update process, if only a few "
|
||||
"layers or the expert table of a certain rank has been changed, the "
|
||||
"updated expert table must be synchronized with the EPLB's context to "
|
||||
"ensure global consistency."
|
||||
msgstr ""
|
||||
"专家映射在初始化和更新期间必须是全局唯一的。在初始化期间的多节点场景中,应使用分布式通信来验证每个 rank 上专家映射的一致性。如果不一致,应通知用户哪些 rank 的映射不一致。在更新过程中,如果只有少数层或某个 rank 的专家表被更改,则必须将更新后的专家表与 EPLB 的上下文同步,以确保全局一致性。"
|
||||
"专家映射在初始化和更新期间必须是全局唯一的。在初始化期间的多节点场景中,应使用分布式通信来验证每个 rank "
|
||||
"上专家映射的一致性。如果不一致,应通知用户哪些 rank 的映射不一致。在更新过程中,如果只有少数层或某个 rank "
|
||||
"的专家表被更改,则必须将更新后的专家表与 EPLB 的上下文同步,以确保全局一致性。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/eplb_swift_balancer.md:242
|
||||
msgid "Expert Weight"
|
||||
@@ -464,4 +471,6 @@ msgid ""
|
||||
"performance data collection), start the script and add `export "
|
||||
"EXPERT_MAP_RECORD=\"true\"`."
|
||||
msgstr ""
|
||||
"在使用 EPLB 之前,启动脚本并添加 `export DYNAMIC_EPLB=\"true\"`。在执行负载数据收集(或性能数据收集)之前,启动脚本并添加 `export EXPERT_MAP_RECORD=\"true\"`。"
|
||||
"在使用 EPLB 之前,启动脚本并添加 `export "
|
||||
"DYNAMIC_EPLB=\"true\"`。在执行负载数据收集(或性能数据收集)之前,启动脚本并添加 `export "
|
||||
"EXPERT_MAP_RECORD=\"true\"`。"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -29,21 +29,21 @@ msgstr "工作原理"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md:5
|
||||
msgid ""
|
||||
"This is an optimization based on Fx graphs, which can be considered an "
|
||||
"This is an optimization based on FX graphs, which can be considered an "
|
||||
"acceleration solution for the aclgraph mode."
|
||||
msgstr "这是一种基于 Fx 图的优化,可视为 aclgraph 模式的一种加速方案。"
|
||||
msgstr "这是一种基于 FX 图的优化,可视为 aclgraph 模式的一种加速方案。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md:7
|
||||
msgid "You can get its code [code](https://gitcode.com/Ascend/torchair)"
|
||||
msgstr "您可以在 [code](https://gitcode.com/Ascend/torchair) 获取其代码"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md:9
|
||||
msgid "Default Fx Graph Optimization"
|
||||
msgstr "默认 Fx 图优化"
|
||||
msgid "Default FX Graph Optimization"
|
||||
msgstr "默认 FX 图优化"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md:11
|
||||
msgid "Fx Graph pass"
|
||||
msgstr "Fx 图处理过程"
|
||||
msgid "FX Graph pass"
|
||||
msgstr "FX 图处理过程"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md:13
|
||||
msgid ""
|
||||
@@ -59,11 +59,13 @@ msgid ""
|
||||
"operators with a form of non-in-place operators + copy operators. "
|
||||
"npugraph_ex will reverse this process, restoring the in-place operators "
|
||||
"and reducing memory movement."
|
||||
msgstr "对于模型的原始输入参数,如果包含原位运算符,Dynamo 的 Functionalize 过程会将其替换为非原位运算符 + 复制运算符的形式。npugraph_ex 将逆转此过程,恢复原位运算符,减少内存移动。"
|
||||
msgstr ""
|
||||
"对于模型的原始输入参数,如果包含原位运算符,Dynamo 的 Functionalize 过程会将其替换为非原位运算符 + "
|
||||
"复制运算符的形式。npugraph_ex 将逆转此过程,恢复原位运算符,减少内存移动。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md:16
|
||||
msgid "Fx fusion pass"
|
||||
msgstr "Fx 融合处理过程"
|
||||
msgid "FX fusion pass"
|
||||
msgstr "FX 融合处理过程"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md:18
|
||||
msgid ""
|
||||
@@ -92,7 +94,9 @@ msgid ""
|
||||
"Users can register a custom graph fusion pass in TorchAir to modify "
|
||||
"PyTorch FX graphs. The registration relies on the register_replacement "
|
||||
"API."
|
||||
msgstr "用户可以在 TorchAir 中注册自定义的图融合处理过程,以修改 PyTorch FX 图。注册依赖于 register_replacement API。"
|
||||
msgstr ""
|
||||
"用户可以在 TorchAir 中注册自定义的图融合处理过程,以修改 PyTorch FX 图。注册依赖于 register_replacement"
|
||||
" API。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md:28
|
||||
msgid "Below is the declaration of this API and a demo of its usage."
|
||||
@@ -182,7 +186,9 @@ msgid ""
|
||||
" on the matching result, such as checking whether the fused operators are"
|
||||
" on the same stream, checking the device type, checking the input shapes,"
|
||||
" and so on."
|
||||
msgstr "算子融合后的额外验证函数。该函数的输入参数必须是来自 torch._inductor.pattern_matcher 的 Match 对象,用于对匹配结果进行进一步的自定义检查,例如检查融合后的算子是否在同一流上、检查设备类型、检查输入形状等。"
|
||||
msgstr ""
|
||||
"算子融合后的额外验证函数。该函数的输入参数必须是来自 torch._inductor.pattern_matcher 的 Match "
|
||||
"对象,用于对匹配结果进行进一步的自定义检查,例如检查融合后的算子是否在同一流上、检查设备类型、检查输入形状等。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md
|
||||
msgid "search_fn_pattern"
|
||||
@@ -195,7 +201,9 @@ msgid ""
|
||||
"object. After passing this parameter, search_fn will no longer be used to"
|
||||
" match operator combinations; instead, this parameter will be used "
|
||||
"directly as the matching rule."
|
||||
msgstr "通常无需提供自定义模式对象。其定义遵循原生 PyTorch MultiOutputPattern 对象的规则。传入此参数后,将不再使用 search_fn 来匹配算子组合,而是直接使用此参数作为匹配规则。"
|
||||
msgstr ""
|
||||
"通常无需提供自定义模式对象。其定义遵循原生 PyTorch MultiOutputPattern 对象的规则。传入此参数后,将不再使用 "
|
||||
"search_fn 来匹配算子组合,而是直接使用此参数作为匹配规则。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md:43
|
||||
msgid "Usage Example"
|
||||
@@ -206,7 +214,9 @@ msgid ""
|
||||
"The default fusion pass in npugraph_ex is also implemented based on this "
|
||||
"API. You can see more examples of using this API in the vllm-ascend and "
|
||||
"npugraph_ex code repositories."
|
||||
msgstr "npugraph_ex 中的默认融合处理过程也是基于此 API 实现的。您可以在 vllm-ascend 和 npugraph_ex 代码仓库中查看更多使用此 API 的示例。"
|
||||
msgstr ""
|
||||
"npugraph_ex 中的默认融合处理过程也是基于此 API 实现的。您可以在 vllm-ascend 和 npugraph_ex "
|
||||
"代码仓库中查看更多使用此 API 的示例。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/npugraph_ex.md:99
|
||||
msgid "DFX"
|
||||
@@ -217,4 +227,6 @@ msgid ""
|
||||
"By reusing the TORCH_COMPILE_DEBUG environment variable from the PyTorch "
|
||||
"community, when TORCH_COMPILE_DEBUG=1 is set, it will output the FX "
|
||||
"graphs throughout the entire process."
|
||||
msgstr "通过复用 PyTorch 社区的 TORCH_COMPILE_DEBUG 环境变量,当设置 TORCH_COMPILE_DEBUG=1 时,将输出整个过程中的 FX 图。"
|
||||
msgstr ""
|
||||
"通过复用 PyTorch 社区的 TORCH_COMPILE_DEBUG 环境变量,当设置 TORCH_COMPILE_DEBUG=1 "
|
||||
"时,将输出整个过程中的 FX 图。"
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -29,8 +29,8 @@ msgid ""
|
||||
"cycle of vLLM and vLLM Ascend and their hardware limitations, we need to "
|
||||
"patch some code in vLLM to make it compatible with vLLM Ascend."
|
||||
msgstr ""
|
||||
"vLLM Ascend 是 vLLM 的一个平台插件。由于 vLLM 和 vLLM Ascend "
|
||||
"的发布周期不同且存在硬件限制,我们需要对 vLLM 中的部分代码打补丁,以使其兼容 vLLM Ascend。"
|
||||
"vLLM Ascend 是 vLLM 的一个平台插件。由于 vLLM 和 vLLM Ascend 的发布周期不同且存在硬件限制,我们需要对 "
|
||||
"vLLM 中的部分代码打补丁,以使其兼容 vLLM Ascend。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/patch.md:5
|
||||
msgid ""
|
||||
@@ -121,7 +121,8 @@ msgid ""
|
||||
"initializing the worker process."
|
||||
msgstr ""
|
||||
"对于在线和离线模式,vLLM 引擎核心进程在初始化 worker 进程时,会在 "
|
||||
"`vllm/vllm/worker/worker_base.py::WorkerWrapperBase.init_worker` 处调用 worker 补丁。"
|
||||
"`vllm/vllm/worker/worker_base.py::WorkerWrapperBase.init_worker` 处调用 "
|
||||
"worker 补丁。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/patch.md:35
|
||||
msgid "How to write a patch"
|
||||
@@ -150,6 +151,7 @@ msgid ""
|
||||
msgstr "确定我们需要修补哪个进程。例如,这里的 `distributed` 属于 vLLM 主进程,因此我们应该修补 `platform`。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/patch.md:41
|
||||
#, python-brace-format
|
||||
msgid ""
|
||||
"Create the patch file in the right folder. The file should be named as "
|
||||
"`patch_{module_name}.py`. The example here is "
|
||||
@@ -169,7 +171,8 @@ msgid ""
|
||||
"`vllm_ascend/patch/platform/__init__.py`."
|
||||
msgstr ""
|
||||
"在 `__init__.py` 中导入补丁文件。在此示例中,将 `import "
|
||||
"vllm_ascend.patch.platform.patch_distributed` 添加到 `vllm_ascend/patch/platform/__init__.py` 中。"
|
||||
"vllm_ascend.patch.platform.patch_distributed` 添加到 "
|
||||
"`vllm_ascend/patch/platform/__init__.py` 中。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/patch.md:55
|
||||
msgid ""
|
||||
@@ -183,8 +186,8 @@ msgid ""
|
||||
"should contain the Unit Test and E2E Test as well. You can find more "
|
||||
"details in [test guide](../contribution/testing.md)"
|
||||
msgstr ""
|
||||
"添加单元测试和端到端测试。vLLM Ascend 中任何新增的代码都应包含单元测试和端到端测试。更多详情请参阅 [测试指南]"
|
||||
"(../contribution/testing.md)。"
|
||||
"添加单元测试和端到端测试。vLLM Ascend 中任何新增的代码都应包含单元测试和端到端测试。更多详情请参阅 "
|
||||
"[测试指南](../contribution/testing.md)。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/patch.md:73
|
||||
msgid "Limitations"
|
||||
@@ -201,8 +204,9 @@ msgid ""
|
||||
"`DPEngineCoreProc` entirely."
|
||||
msgstr ""
|
||||
"在 V1 引擎中,vLLM 启动三种进程:主进程、EngineCore 进程和 Worker 进程。目前 vLLM Ascend "
|
||||
"默认只能修补主进程和 Worker 进程中的代码。如果你想修补 EngineCore 进程中运行的代码,你需要在设置阶段完全修补 EngineCore "
|
||||
"进程。相关完整代码位于 `vllm.v1.engine.core`。请完全重写 `EngineCoreProc` 和 `DPEngineCoreProc`。"
|
||||
"默认只能修补主进程和 Worker 进程中的代码。如果你想修补 EngineCore 进程中运行的代码,你需要在设置阶段完全修补 "
|
||||
"EngineCore 进程。相关完整代码位于 `vllm.v1.engine.core`。请完全重写 `EngineCoreProc` 和 "
|
||||
"`DPEngineCoreProc`。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/patch.md:76
|
||||
msgid ""
|
||||
@@ -212,10 +216,10 @@ msgid ""
|
||||
"for v0.9.n in vLLM Ascend would not work as expected, because vLLM Ascend"
|
||||
" can't distinguish the version of the vLLM you're using. In this case, "
|
||||
"you can set the environment variable `VLLM_VERSION` to specify the "
|
||||
"version of the vLLM you're using, and then the patch for v0.10.0 should "
|
||||
"work."
|
||||
"version of the vLLM you're using, and then the patch for that version "
|
||||
"(e.g., v0.9.n) should work."
|
||||
msgstr ""
|
||||
"如果你运行的是经过编辑的 vLLM 代码,vLLM 的版本可能会自动更改。例如,如果你基于 v0.9.n 运行编辑后的 vLLM,vLLM "
|
||||
"的版本可能会变为 v0.9.nxxx。在这种情况下,vLLM Ascend 中针对 v0.9.n 的补丁将无法按预期工作,因为 vLLM Ascend "
|
||||
"无法区分你正在使用的 vLLM 版本。此时,你可以设置环境变量 `VLLM_VERSION` 来指定你使用的 vLLM 版本,这样针对 v0.10.0 "
|
||||
"的补丁就应该能正常工作了。"
|
||||
"的版本可能会变为 v0.9.nxxx。在这种情况下,vLLM Ascend 中针对 v0.9.n 的补丁将无法按预期工作,因为 vLLM "
|
||||
"Ascend 无法区分你正在使用的 vLLM 版本。此时,你可以设置环境变量 `VLLM_VERSION` 来指定你使用的 vLLM "
|
||||
"版本,这样针对该版本(例如 v0.9.n)的补丁就应该能正常工作了。"
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -60,14 +60,21 @@ msgid ""
|
||||
" `get_quant_method` is called to obtain the quantization method "
|
||||
"corresponding to each weight part, stored in the `quant_method` "
|
||||
"attribute."
|
||||
msgstr "vLLM Ascend 注册了一个自定义的 Ascend 量化方法。通过配置 `--quantization ascend` 参数(或离线时使用 `quantization=\"ascend\"`),即可启用量化功能。在构建 `quant_config` 时,会初始化已注册的 `AscendModelSlimConfig`,并调用 `get_quant_method` 来获取每个权重部分对应的量化方法,存储在 `quant_method` 属性中。"
|
||||
msgstr ""
|
||||
"vLLM Ascend 注册了一个自定义的 Ascend 量化方法。通过配置 `--quantization ascend` 参数(或离线时使用 "
|
||||
"`quantization=\"ascend\"`),即可启用量化功能。在构建 `quant_config` 时,会初始化已注册的 "
|
||||
"`AscendModelSlimConfig`,并调用 `get_quant_method` 来获取每个权重部分对应的量化方法,存储在 "
|
||||
"`quant_method` 属性中。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:15
|
||||
msgid ""
|
||||
"Currently supported quantization methods include `AscendLinearMethod`, "
|
||||
"`AscendFusedMoEMethod`, `AscendEmbeddingMethod`, and their corresponding "
|
||||
"non-quantized methods:"
|
||||
msgstr "当前支持的量化方法包括 `AscendLinearMethod`、`AscendFusedMoEMethod`、`AscendEmbeddingMethod` 及其对应的非量化方法:"
|
||||
msgstr ""
|
||||
"当前支持的量化方法包括 "
|
||||
"`AscendLinearMethod`、`AscendFusedMoEMethod`、`AscendEmbeddingMethod` "
|
||||
"及其对应的非量化方法:"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:17
|
||||
msgid ""
|
||||
@@ -105,14 +112,18 @@ msgid ""
|
||||
"conversion, etc.; the `apply` method is used to perform activation "
|
||||
"quantization and quantized matrix multiplication calculations during the "
|
||||
"forward process."
|
||||
msgstr "`create_weights` 方法用于权重初始化;`process_weights_after_loading` 方法用于权重后处理,例如转置、格式转换、数据类型转换等;`apply` 方法用于在前向传播过程中执行激活量化和量化矩阵乘法计算。"
|
||||
msgstr ""
|
||||
"`create_weights` 方法用于权重初始化;`process_weights_after_loading` "
|
||||
"方法用于权重后处理,例如转置、格式转换、数据类型转换等;`apply` 方法用于在前向传播过程中执行激活量化和量化矩阵乘法计算。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:27
|
||||
msgid ""
|
||||
"We need to implement the `create_weights`, "
|
||||
"`process_weights_after_loading`, and `apply` methods for different "
|
||||
"**layers** (**attention**, **mlp**, **moe**)."
|
||||
msgstr "我们需要为不同的**层**(**attention**、**mlp**、**moe**)实现 `create_weights`、`process_weights_after_loading` 和 `apply` 方法。"
|
||||
"**layers** (**attention**, **mlp**, **MoE (Mixture of Experts)**)."
|
||||
msgstr ""
|
||||
"我们需要为不同的**层**(**attention**、**mlp**、**MoE (Mixture of Experts)**)实现 "
|
||||
"`create_weights`、`process_weights_after_loading` 和 `apply` 方法。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:29
|
||||
msgid ""
|
||||
@@ -120,7 +131,9 @@ msgid ""
|
||||
" file **quant_model_description.json** needs to be read. This file "
|
||||
"describes the quantization configuration and parameters for each part of "
|
||||
"the model weights, for example:"
|
||||
msgstr "**补充说明**:加载模型时,需要读取量化模型的描述文件 **quant_model_description.json**。该文件描述了模型各部分权重的量化配置和参数,例如:"
|
||||
msgstr ""
|
||||
"**补充说明**:加载模型时,需要读取量化模型的描述文件 "
|
||||
"**quant_model_description.json**。该文件描述了模型各部分权重的量化配置和参数,例如:"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:49
|
||||
msgid ""
|
||||
@@ -138,21 +151,27 @@ msgid ""
|
||||
"`W4A8_DYNAMIC`), determine supported layers (linear, moe, attention), and"
|
||||
" design the quantization scheme (static/dynamic, "
|
||||
"pertensor/perchannel/pergroup)."
|
||||
msgstr "**步骤 1:算法设计**。定义算法 ID(例如 `W4A8_DYNAMIC`),确定支持的层(linear、moe、attention),并设计量化方案(静态/动态、pertensor/perchannel/pergroup)。"
|
||||
msgstr ""
|
||||
"**步骤 1:算法设计**。定义算法 ID(例如 "
|
||||
"`W4A8_DYNAMIC`),确定支持的层(linear、moe、attention),并设计量化方案(静态/动态、pertensor/perchannel/pergroup)。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:54
|
||||
msgid ""
|
||||
"**Step 2: Registration**. Use the `@register_scheme` decorator in "
|
||||
"`vllm_ascend/quantization/methods/registry.py` to register your "
|
||||
"quantization scheme class."
|
||||
msgstr "**步骤 2:注册**。在 `vllm_ascend/quantization/methods/registry.py` 中使用 `@register_scheme` 装饰器注册您的量化方案类。"
|
||||
msgstr ""
|
||||
"**步骤 2:注册**。在 `vllm_ascend/quantization/methods/registry.py` 中使用 "
|
||||
"`@register_scheme` 装饰器注册您的量化方案类。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:68
|
||||
msgid ""
|
||||
"**Step 3: Implementation**. Create an algorithm implementation file, such"
|
||||
" as `vllm_ascend/quantization/methods/w4a8.py`, and implement the method "
|
||||
"class and logic."
|
||||
msgstr "**步骤 3:实现**。创建一个算法实现文件,例如 `vllm_ascend/quantization/methods/w4a8.py`,并实现方法类和逻辑。"
|
||||
msgstr ""
|
||||
"**步骤 3:实现**。创建一个算法实现文件,例如 "
|
||||
"`vllm_ascend/quantization/methods/w4a8.py`,并实现方法类和逻辑。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:69
|
||||
msgid ""
|
||||
@@ -182,7 +201,11 @@ msgid ""
|
||||
"`vllm_ascend/quantization/modelslim_config.py` (e.g., `qkv_proj`, "
|
||||
"`gate_up_proj`, `experts`) to ensure sharding consistency and correct "
|
||||
"loading."
|
||||
msgstr "**融合模块映射**:将模型的 `model_type` 添加到 `vllm_ascend/quantization/modelslim_config.py` 中的 `packed_modules_model_mapping`(例如 `qkv_proj`、`gate_up_proj`、`experts`),以确保分片一致性和正确加载。"
|
||||
msgstr ""
|
||||
"**融合模块映射**:将模型的 `model_type` 添加到 "
|
||||
"`vllm_ascend/quantization/modelslim_config.py` 中的 "
|
||||
"`packed_modules_model_mapping`(例如 "
|
||||
"`qkv_proj`、`gate_up_proj`、`experts`),以确保分片一致性和正确加载。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:96
|
||||
msgid ""
|
||||
@@ -339,10 +362,9 @@ msgstr "混合"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md
|
||||
msgid ""
|
||||
"PD Colocation Scenario uses dynamic quantization for both P node and D "
|
||||
"node; PD Disaggregation Scenario uses dynamic quantization for P node and"
|
||||
" static for D node"
|
||||
msgstr "PD 共部署场景下,P节点和D节点均使用动态量化;PD 分离部署场景下,P节点使用动态量化,D节点使用静态量化"
|
||||
"We support two deployment modes: PD Colocation (dynamic quantization for "
|
||||
"both P and D) and PD Disaggregation (dynamic-quant P and static-quant D)"
|
||||
msgstr "我们支持两种部署模式:PD 共部署(P和D均使用动态量化)和 PD 分离部署(P使用动态量化,D使用静态量化)"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:112
|
||||
msgid ""
|
||||
@@ -350,10 +372,10 @@ msgid ""
|
||||
"factors with better performance, while dynamic quantization computes "
|
||||
"scaling factors on-the-fly for each token/activation tensor with higher "
|
||||
"precision."
|
||||
msgstr "**静态与动态:** 静态量化使用预计算的缩放因子,性能更好;而动态量化则为每个 token/激活张量实时计算缩放因子,精度更高。"
|
||||
msgstr "**静态与动态:** 静态量化使用预计算的缩放因子,性能更优;而动态量化则为每个 token/激活张量实时计算缩放因子,精度更高。"
|
||||
|
||||
#: ../../source/developer_guide/Design_Documents/quantization.md:114
|
||||
msgid ""
|
||||
"**Granularity:** Refers to the scope of scaling factor computation (e.g.,"
|
||||
" per-tensor, per-channel, per-group)."
|
||||
msgstr "**粒度:** 指缩放因子计算的范围(例如,per-tensor、per-channel、per-group)。"
|
||||
msgstr "**粒度:** 指缩放因子计算的范围(例如,per-tensor、per-channel、per-group)。"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -19,7 +19,7 @@ msgstr ""
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Generated-By: Babel 2.18.0\n"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:108
|
||||
#: ../../source/developer_guide/contribution/index.md:107
|
||||
msgid "Index"
|
||||
msgstr "索引"
|
||||
|
||||
@@ -62,120 +62,124 @@ msgid "Run CI locally"
|
||||
msgstr "本地运行 CI"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:37
|
||||
msgid "After completing \"Run lint\" setup, you can run CI locally:"
|
||||
msgstr "完成“运行代码检查”设置后,你可以在本地运行 CI:"
|
||||
msgid ""
|
||||
"After completing \"Run lint\" setup, you can run CI (Continuous "
|
||||
"integration) locally:"
|
||||
msgstr "完成“运行代码检查”设置后,你可以在本地运行 CI(持续集成):"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:63
|
||||
#: ../../source/developer_guide/contribution/index.md:62
|
||||
msgid "Submit the commit"
|
||||
msgstr "提交更改"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:70
|
||||
#: ../../source/developer_guide/contribution/index.md:69
|
||||
msgid "🎉 Congratulations! You have completed the development environment setup."
|
||||
msgstr "🎉 恭喜!您已完成开发环境的设置。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:72
|
||||
#: ../../source/developer_guide/contribution/index.md:71
|
||||
msgid "Testing locally"
|
||||
msgstr "本地测试"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:74
|
||||
#: ../../source/developer_guide/contribution/index.md:73
|
||||
msgid ""
|
||||
"You can refer to [Testing](./testing.md) to set up a testing environment"
|
||||
" and running tests locally."
|
||||
msgstr "你可以参考 [测试](./testing.md) 文档来设置测试环境并在本地运行测试。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:76
|
||||
#: ../../source/developer_guide/contribution/index.md:75
|
||||
msgid "DCO and Signed-off-by"
|
||||
msgstr "DCO 与签署确认"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:78
|
||||
#: ../../source/developer_guide/contribution/index.md:77
|
||||
msgid ""
|
||||
"When contributing changes to this project, you must agree to the DCO. "
|
||||
"Commits must include a `Signed-off-by:` header which certifies agreement "
|
||||
"with the terms of the DCO."
|
||||
msgstr "向本项目贡献更改时,您必须同意 DCO。提交必须包含 `Signed-off-by:` 标头,以证明您同意 DCO 的条款。"
|
||||
"with the terms of the DCO (Developer Certificate of Origin)."
|
||||
msgstr "向本项目贡献更改时,您必须同意 DCO。提交必须包含 `Signed-off-by:` 标头,以证明您同意 DCO(开发者原创证书)的条款。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:80
|
||||
#: ../../source/developer_guide/contribution/index.md:79
|
||||
msgid "Using `-s` with `git commit` will automatically add this header."
|
||||
msgstr "在 `git commit` 命令中使用 `-s` 参数会自动添加此标头。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:82
|
||||
#: ../../source/developer_guide/contribution/index.md:81
|
||||
msgid "PR Title and Classification"
|
||||
msgstr "PR 标题与分类"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:84
|
||||
#: ../../source/developer_guide/contribution/index.md:83
|
||||
msgid ""
|
||||
"Only specific types of PRs will be reviewed. The PR title is prefixed "
|
||||
"appropriately to indicate the type of change. Please use one of the "
|
||||
"following:"
|
||||
msgstr "只有特定类型的 PR 会被审核。PR 标题应使用适当的前缀来指明更改类型。请使用以下前缀之一:"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:86
|
||||
#: ../../source/developer_guide/contribution/index.md:85
|
||||
msgid "`[Attention]` for new features or optimization in attention."
|
||||
msgstr "`[Attention]` 用于注意力机制的新功能或优化。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:87
|
||||
#: ../../source/developer_guide/contribution/index.md:86
|
||||
msgid "`[Communicator]` for new features or optimization in communicators."
|
||||
msgstr "`[Communicator]` 用于通信器的新功能或优化。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:88
|
||||
#: ../../source/developer_guide/contribution/index.md:87
|
||||
msgid "`[ModelRunner]` for new features or optimization in model runner."
|
||||
msgstr "`[ModelRunner]` 用于模型运行器的新功能或优化。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:89
|
||||
#: ../../source/developer_guide/contribution/index.md:88
|
||||
msgid "`[Platform]` for new features or optimization in platform."
|
||||
msgstr "`[Platform]` 用于平台的新功能或优化。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:90
|
||||
#: ../../source/developer_guide/contribution/index.md:89
|
||||
msgid "`[Worker]` for new features or optimization in worker."
|
||||
msgstr "`[Worker]` 用于工作器的新功能或优化。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:91
|
||||
#: ../../source/developer_guide/contribution/index.md:90
|
||||
msgid ""
|
||||
"`[Core]` for new features or optimization in the core vllm-ascend logic "
|
||||
"(such as platform, attention, communicators, model runner)"
|
||||
msgstr "`[Core]` 用于核心 vllm-ascend 逻辑中的新功能或优化(例如平台、注意力机制、通信器、模型运行器)。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:92
|
||||
#: ../../source/developer_guide/contribution/index.md:91
|
||||
msgid "`[Kernel]` for changes affecting compute kernels and ops."
|
||||
msgstr "`[Kernel]` 用于影响计算内核和操作的更改。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:93
|
||||
#: ../../source/developer_guide/contribution/index.md:92
|
||||
msgid "`[Bugfix]` for bug fixes."
|
||||
msgstr "`[Bugfix]` 用于错误修复。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:94
|
||||
#: ../../source/developer_guide/contribution/index.md:93
|
||||
msgid "`[Doc]` for documentation fixes and improvements."
|
||||
msgstr "`[Doc]` 用于文档修复和改进。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:95
|
||||
#: ../../source/developer_guide/contribution/index.md:94
|
||||
msgid "`[Test]` for tests (such as unit tests)."
|
||||
msgstr "`[Test]` 用于测试(例如单元测试)。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:96
|
||||
#: ../../source/developer_guide/contribution/index.md:95
|
||||
msgid "`[CI]` for build or continuous integration improvements."
|
||||
msgstr "`[CI]` 用于构建或持续集成的改进。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:97
|
||||
#: ../../source/developer_guide/contribution/index.md:96
|
||||
msgid ""
|
||||
"`[Misc]` for PRs that do not fit the above categories. Please use this "
|
||||
"sparingly."
|
||||
msgstr "`[Misc]` 用于不属于上述类别的 PR。请谨慎使用此标签。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:100
|
||||
#: ../../source/developer_guide/contribution/index.md:99
|
||||
msgid ""
|
||||
"If the PR spans more than one category, please include all relevant "
|
||||
"prefixes."
|
||||
msgstr "如果 PR 涉及多个类别,请包含所有相关的前缀。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:103
|
||||
#: ../../source/developer_guide/contribution/index.md:102
|
||||
msgid "Others"
|
||||
msgstr "其他"
|
||||
|
||||
#: ../../source/developer_guide/contribution/index.md:105
|
||||
#: ../../source/developer_guide/contribution/index.md:104
|
||||
msgid ""
|
||||
"You may find more information about contributing to vLLM Ascend backend "
|
||||
"plugin on "
|
||||
"[<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing). If "
|
||||
"you encounter any problems while contributing, feel free to submit a PR "
|
||||
"to improve the documentation to help other developers."
|
||||
msgstr "你可以在 [<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing) 上找到有关为 vLLM Ascend 后端插件做贡献的更多信息。如果在贡献过程中遇到任何问题,欢迎随时提交 PR 来改进文档,以帮助其他开发者。"
|
||||
msgstr ""
|
||||
"你可以在 [<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing) "
|
||||
"上找到有关为 vLLM Ascend 后端插件做贡献的更多信息。如果在贡献过程中遇到任何问题,欢迎随时提交 PR 来改进文档,以帮助其他开发者。"
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -27,7 +27,9 @@ msgstr "多节点测试"
|
||||
msgid ""
|
||||
"Multi-Node CI is designed to test distributed scenarios of very large "
|
||||
"models, eg: disaggregated_prefill multi DP across multi nodes and so on."
|
||||
msgstr "多节点CI旨在测试超大规模模型的分布式场景,例如:跨多节点的解耦预填充(disaggregated_prefill)、多数据并行(multi DP)等。"
|
||||
msgstr ""
|
||||
"多节点CI旨在测试超大规模模型的分布式场景,例如:跨多节点的解耦预填充(disaggregated_prefill)、多数据并行(multi "
|
||||
"DP)等。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/multi_node_test.md:5
|
||||
msgid "How it works"
|
||||
@@ -39,7 +41,10 @@ msgid ""
|
||||
"CI mechanism. It shows how the GitHub action interacts with "
|
||||
"[lws](https://lws.sigs.k8s.io/docs/overview/) (a kind of kubernetes crd "
|
||||
"resource)."
|
||||
msgstr "下图展示了多节点CI机制的基本部署视图。它说明了GitHub Action如何与[lws](https://lws.sigs.k8s.io/docs/overview/)(一种Kubernetes CRD资源)进行交互。"
|
||||
msgstr ""
|
||||
"下图展示了多节点CI机制的基本部署视图。它说明了GitHub "
|
||||
"Action如何与[lws](https://lws.sigs.k8s.io/docs/overview/)(一种Kubernetes "
|
||||
"CRD资源)进行交互。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/multi_node_test.md:9
|
||||
msgid ""
|
||||
@@ -62,7 +67,12 @@ msgid ""
|
||||
"[LWS_WORKER_INDEX](https://lws.sigs.k8s.io/docs/reference/labels-"
|
||||
"annotations-and-environment-variables/) environment variable, so that "
|
||||
"multiple nodes can form a distributed cluster to perform tasks."
|
||||
msgstr "从工作流的角度,我们可以看到最终的测试脚本是如何执行的。关键在于这两个文件:[lws.yaml和run.sh](https://github.com/vllm-project/vllm-ascend/tree/main/tests/e2e/nightly/multi_node/scripts)。前者定义了我们的k8s集群如何被拉起,后者定义了Pod启动时的入口脚本。每个节点根据[LWS_WORKER_INDEX](https://lws.sigs.k8s.io/docs/reference/labels-annotations-and-environment-variables/)环境变量执行不同的逻辑,从而使多个节点能够组成一个分布式集群来执行任务。"
|
||||
msgstr ""
|
||||
"从工作流的角度,我们可以看到最终的测试脚本是如何执行的。关键在于这两个文件:[lws.yaml和run.sh](https://github.com"
|
||||
"/vllm-project/vllm-"
|
||||
"ascend/tree/main/tests/e2e/nightly/multi_node/scripts)。前者定义了我们的k8s集群如何被拉起,后者定义了Pod启动时的入口脚本。每个节点根据[LWS_WORKER_INDEX](https://lws.sigs.k8s.io/docs/reference"
|
||||
"/labels-annotations-and-environment-"
|
||||
"variables/)环境变量执行不同的逻辑,从而使多个节点能够组成一个分布式集群来执行任务。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/multi_node_test.md:13
|
||||
msgid ""
|
||||
@@ -83,7 +93,10 @@ msgid ""
|
||||
"to ModelScope's [vllm-ascend](https://www.modelscope.cn/organization"
|
||||
"/vllm-ascend) organization is welcome. If you do not have permission to "
|
||||
"upload, please contact @Potabk"
|
||||
msgstr "如果您需要自定义权重,例如,您为DeepSeek-V3量化了一个w8a8权重,并希望您的权重能在CI上运行,欢迎将权重上传至ModelScope的[vllm-ascend](https://www.modelscope.cn/organization/vllm-ascend)组织。如果您没有上传权限,请联系@Potabk。"
|
||||
msgstr ""
|
||||
"如果您需要自定义权重,例如,您为DeepSeek-V3量化了一个w8a8权重,并希望您的权重能在CI上运行,欢迎将权重上传至ModelScope的"
|
||||
"[vllm-ascend](https://www.modelscope.cn/organization/vllm-"
|
||||
"ascend)组织。如果您没有上传权限,请联系@Potabk。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/multi_node_test.md:21
|
||||
msgid "Add config yaml"
|
||||
@@ -100,7 +113,13 @@ msgid ""
|
||||
" just add \"yamls\" like [DeepSeek-V3.yaml](https://github.com/vllm-"
|
||||
"project/vllm-"
|
||||
"ascend/blob/main/tests/e2e/nightly/multi_node/config/DeepSeek-V3.yaml)."
|
||||
msgstr "如入口脚本[run.sh](https://github.com/vllm-project/vllm-ascend/blob/0bf3f21a987aede366ec4629ad0ffec8e32fe90d/tests/e2e/nightly/multi_node/scripts/run.sh#L106)所示,一个k8s Pod的启动意味着遍历[目录](https://github.com/vllm-project/vllm-ascend/tree/main/tests/e2e/nightly/multi_node/config/)中的所有*.yaml文件,并根据不同的配置读取和执行。因此,我们需要做的就是添加类似[DeepSeek-V3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/tests/e2e/nightly/multi_node/config/DeepSeek-V3.yaml)的\"yaml\"文件。"
|
||||
msgstr ""
|
||||
"如入口脚本[run.sh](https://github.com/vllm-project/vllm-"
|
||||
"ascend/blob/0bf3f21a987aede366ec4629ad0ffec8e32fe90d/tests/e2e/nightly/multi_node/scripts/run.sh#L106)所示,一个k8s"
|
||||
" Pod的启动意味着遍历[目录](https://github.com/vllm-project/vllm-"
|
||||
"ascend/tree/main/tests/e2e/nightly/multi_node/config/)中的所有*.yaml文件,并根据不同的配置读取和执行。因此,我们需要做的就是添加类似[DeepSeek-V3.yaml](https://github.com"
|
||||
"/vllm-project/vllm-"
|
||||
"ascend/blob/main/tests/e2e/nightly/multi_node/config/DeepSeek-V3.yaml)的\"yaml\"文件。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/multi_node_test.md:25
|
||||
msgid ""
|
||||
@@ -121,7 +140,9 @@ msgid ""
|
||||
"Currently, the multi-node test workflow is defined in the "
|
||||
"[nightly_test_a3.yaml](https://github.com/vllm-project/vllm-"
|
||||
"ascend/blob/main/.github/workflows/schedule_nightly_test_a3.yaml)"
|
||||
msgstr "目前,多节点测试工作流定义在[nightly_test_a3.yaml](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/schedule_nightly_test_a3.yaml)中。"
|
||||
msgstr ""
|
||||
"目前,多节点测试工作流定义在[nightly_test_a3.yaml](https://github.com/vllm-project"
|
||||
"/vllm-ascend/blob/main/.github/workflows/schedule_nightly_test_a3.yaml)中。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/multi_node_test.md:110
|
||||
msgid ""
|
||||
@@ -146,7 +167,9 @@ msgid ""
|
||||
"This section assumes that you already have a "
|
||||
"[Kubernetes](https://kubernetes.io/docs/setup/) NPU cluster environment "
|
||||
"locally. Then you can easily start our test with one click."
|
||||
msgstr "本节假设您本地已经有一个[Kubernetes](https://kubernetes.io/docs/setup/) NPU集群环境。然后您可以轻松地一键启动我们的测试。"
|
||||
msgstr ""
|
||||
"本节假设您本地已经有一个[Kubernetes](https://kubernetes.io/docs/setup/) "
|
||||
"NPU集群环境。然后您可以轻松地一键启动我们的测试。"
|
||||
|
||||
#: ../../source/developer_guide/contribution/multi_node_test.md:118
|
||||
msgid "Step 1. Install LWS CRD resources"
|
||||
@@ -159,7 +182,7 @@ msgid ""
|
||||
msgstr "参考<https://lws.sigs.k8s.io/docs/installation/>"
|
||||
|
||||
#: ../../source/developer_guide/contribution/multi_node_test.md:122
|
||||
msgid "Step 2. Deploy the following yaml file `lws.yaml` as what you want"
|
||||
msgid "Step 2. Deploy the following yaml file `lws.yaml` as needed"
|
||||
msgstr "步骤 2. 按需部署以下yaml文件`lws.yaml`"
|
||||
|
||||
#: ../../source/developer_guide/contribution/multi_node_test.md:258
|
||||
@@ -199,7 +222,10 @@ msgid ""
|
||||
"ascend/blob/e760aae1df7814073a4180172385505c1ec0fd83/tests/e2e/nightly/multi_node/config/DeepSeek-V3.yaml#L25)"
|
||||
" after the configure item `num_nodes` , for example: `cluster_hosts: "
|
||||
"[\"xxx.xxx.xxx.188\", \"xxx.xxx.xxx.212\"]`"
|
||||
msgstr "在每个集群主机上进行修改,就像[DeepSeek-V3.yaml](https://github.com/vllm-project/vllm-ascend/blob/e760aae1df7814073a4180172385505c1ec0fd83/tests/e2e/nightly/multi_node/config/DeepSeek-V3.yaml#L25)那样,在配置项`num_nodes`之后添加,例如:`cluster_hosts: [\"xxx.xxx.xxx.188\", \"xxx.xxx.xxx.212\"]`"
|
||||
msgstr ""
|
||||
"在每个集群主机上进行修改,就像[DeepSeek-V3.yaml](https://github.com/vllm-project/vllm-"
|
||||
"ascend/blob/e760aae1df7814073a4180172385505c1ec0fd83/tests/e2e/nightly/multi_node/config/DeepSeek-V3.yaml#L25)那样,在配置项`num_nodes`之后添加,例如:`cluster_hosts:"
|
||||
" [\"xxx.xxx.xxx.188\", \"xxx.xxx.xxx.212\"]`"
|
||||
|
||||
#: ../../source/developer_guide/contribution/multi_node_test.md:321
|
||||
msgid "Step 2. Install develop environment"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend \n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -28,7 +28,9 @@ msgid ""
|
||||
"This document guides you to conduct accuracy testing using "
|
||||
"[AISBench](https://gitee.com/aisbench/benchmark/tree/master). AISBench "
|
||||
"provides accuracy and performance evaluation for many datasets."
|
||||
msgstr "本文档指导您如何使用 [AISBench](https://gitee.com/aisbench/benchmark/tree/master) 进行精度测试。AISBench 为许多数据集提供了精度和性能评估。"
|
||||
msgstr ""
|
||||
"本文档指导您如何使用 [AISBench](https://gitee.com/aisbench/benchmark/tree/master) "
|
||||
"进行精度测试。AISBench 为许多数据集提供了精度和性能评估。"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_ais_bench.md:5
|
||||
msgid "Online Server"
|
||||
@@ -68,7 +70,9 @@ msgstr "安装 AISBench"
|
||||
msgid ""
|
||||
"Refer to [AISBench](https://gitee.com/aisbench/benchmark/tree/master) for"
|
||||
" details. Install AISBench from source."
|
||||
msgstr "详情请参考 [AISBench](https://gitee.com/aisbench/benchmark/tree/master)。从源码安装 AISBench。"
|
||||
msgstr ""
|
||||
"详情请参考 [AISBench](https://gitee.com/aisbench/benchmark/tree/master)。从源码安装 "
|
||||
"AISBench。"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_ais_bench.md:69
|
||||
msgid "Install extra AISBench dependencies."
|
||||
@@ -96,7 +100,10 @@ msgid ""
|
||||
"[Datasets](https://gitee.com/aisbench/benchmark/tree/master/ais_bench/benchmark/configs/datasets)"
|
||||
" for more datasets. Each dataset has a `README.md` with detailed download"
|
||||
" and installation instructions."
|
||||
msgstr "以 `C-Eval` 数据集为例。更多数据集请参考 [Datasets](https://gitee.com/aisbench/benchmark/tree/master/ais_bench/benchmark/configs/datasets)。每个数据集都有一个 `README.md` 文件,包含详细的下载和安装说明。"
|
||||
msgstr ""
|
||||
"以 `C-Eval` 数据集为例。更多数据集请参考 "
|
||||
"[Datasets](https://gitee.com/aisbench/benchmark/tree/master/ais_bench/benchmark/configs/datasets)。每个数据集都有一个"
|
||||
" `README.md` 文件,包含详细的下载和安装说明。"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_ais_bench.md:86
|
||||
msgid "Download dataset and install it to specific path."
|
||||
@@ -136,7 +143,9 @@ msgid ""
|
||||
"`benchmark/ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py`."
|
||||
" There are several arguments that you should update according to your "
|
||||
"environment."
|
||||
msgstr "更新文件 `benchmark/ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py`。有几个参数需要根据您的环境进行更新。"
|
||||
msgstr ""
|
||||
"更新文件 "
|
||||
"`benchmark/ais_bench/benchmark/configs/models/vllm_api/vllm_api_general_chat.py`。有几个参数需要根据您的环境进行更新。"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_ais_bench.md:158
|
||||
msgid ""
|
||||
@@ -169,9 +178,11 @@ msgstr "`host_ip` 和 `host_port`:更新为您的 vLLM 服务器的 IP 和端
|
||||
#: ../../source/developer_guide/evaluation/using_ais_bench.md:164
|
||||
msgid ""
|
||||
"`max_out_len`: Note `max_out_len` + LLM input length should be less than "
|
||||
"`max-model-len`(config in your vllm server), `32768` will be suitable for"
|
||||
"`max_model_len`(config in your vllm server), `32768` will be suitable for"
|
||||
" most datasets."
|
||||
msgstr "`max_out_len`:注意 `max_out_len` + LLM 输入长度应小于 `max-model-len`(在您的 vllm 服务器中配置),`32768` 适用于大多数数据集。"
|
||||
msgstr ""
|
||||
"`max_out_len`:注意 `max_out_len` + LLM 输入长度应小于 `max_model_len`(在您的 vllm "
|
||||
"服务器中配置),`32768` 适用于大多数数据集。"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_ais_bench.md:165
|
||||
msgid "`batch_size`: Update according to your dataset."
|
||||
@@ -235,4 +246,7 @@ msgid ""
|
||||
"You need to manually replace the dataset image paths with absolute paths,"
|
||||
" changing `/path/to/benchmark/ais_bench/datasets/textvqa/train_images/` "
|
||||
"to the actual absolute directory where the images are stored:"
|
||||
msgstr "您需要手动将数据集图像路径替换为绝对路径,将 `/path/to/benchmark/ais_bench/datasets/textvqa/train_images/` 更改为图像存储的实际绝对目录:"
|
||||
msgstr ""
|
||||
"您需要手动将数据集图像路径替换为绝对路径,将 "
|
||||
"`/path/to/benchmark/ais_bench/datasets/textvqa/train_images/` "
|
||||
"更改为图像存储的实际绝对目录:"
|
||||
@@ -1,14 +1,8 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -65,8 +59,10 @@ msgid "3. Run GSM8K using EvalScope for accuracy testing"
|
||||
msgstr "3. 使用 EvalScope 运行 GSM8K 进行精度测试"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_evalscope.md:68
|
||||
msgid "You can use `evalscope eval` to run GSM8K for accuracy testing:"
|
||||
msgstr "你可以使用 `evalscope eval` 运行 GSM8K 进行精度测试:"
|
||||
msgid ""
|
||||
"You can use `evalscope eval` to run GSM8K (a grade-school math benchmark "
|
||||
"dataset) for accuracy testing:"
|
||||
msgstr "你可以使用 `evalscope eval` 运行 GSM8K(一个小学数学基准数据集)进行精度测试:"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_evalscope.md:80
|
||||
#: ../../source/developer_guide/evaluation/using_evalscope.md:117
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -44,9 +44,11 @@ msgid "The vLLM server is started successfully, if you see logs as below:"
|
||||
msgstr "如果您看到如下日志,则表示 vLLM 服务器已成功启动:"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_lm_eval.md:46
|
||||
#: ../../source/developer_guide/evaluation/using_lm_eval.md:175
|
||||
msgid "2. Run GSM8K using lm-eval for accuracy testing"
|
||||
msgstr "2. 使用 lm-eval 运行 GSM8K 进行准确率测试"
|
||||
msgid ""
|
||||
"2. Run GSM8K using the vLLM server (curl) and then run lm-eval for "
|
||||
"accuracy testing"
|
||||
msgstr ""
|
||||
"2. 使用 vLLM 服务器(curl)运行 GSM8K,然后运行 lm-eval 进行准确率测试"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_lm_eval.md:48
|
||||
msgid "You can query the result with input prompts:"
|
||||
@@ -68,7 +70,10 @@ msgid ""
|
||||
"may cause lm-eval to download datasets from ModelScope instead of "
|
||||
"HuggingFace. Setting `USE_MODELSCOPE_HUB=0` disables this behavior so "
|
||||
"that lm-eval can fetch datasets from HuggingFace correctly."
|
||||
msgstr "Docker 容器以 `VLLM_USE_MODELSCOPE=True` 启动,这可能导致 lm-eval 从 ModelScope 而非 HuggingFace 下载数据集。设置 `USE_MODELSCOPE_HUB=0` 可禁用此行为,使 lm-eval 能够正确从 HuggingFace 获取数据集。"
|
||||
msgstr ""
|
||||
"Docker 容器以 `VLLM_USE_MODELSCOPE=True` 启动,这可能导致 lm-eval 从 ModelScope 而非 "
|
||||
"HuggingFace 下载数据集。设置 `USE_MODELSCOPE_HUB=0` 可禁用此行为,使 lm-eval 能够正确从 "
|
||||
"HuggingFace 获取数据集。"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_lm_eval.md:120
|
||||
#: ../../source/developer_guide/evaluation/using_lm_eval.md:192
|
||||
@@ -91,6 +96,10 @@ msgstr "1. 运行 docker 容器"
|
||||
msgid "You can run docker container on a single NPU:"
|
||||
msgstr "您可以在单个 NPU 上运行 docker 容器:"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_lm_eval.md:175
|
||||
msgid "2. Run GSM8K using lm-eval for accuracy testing"
|
||||
msgstr "2. 使用 lm-eval 运行 GSM8K 进行准确率测试"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_lm_eval.md:203
|
||||
msgid "After 1 to 2 minutes, the output is shown below:"
|
||||
msgstr "1 到 2 分钟后,输出如下所示:"
|
||||
|
||||
@@ -8,7 +8,7 @@ msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
|
||||
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language: zh_CN\n"
|
||||
@@ -50,7 +50,9 @@ msgid ""
|
||||
msgstr "服务器启动后,你可以在新的终端中使用输入提示词来查询模型。"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_opencompass.md:56
|
||||
msgid "2. Run C-Eval using OpenCompass for accuracy testing"
|
||||
msgid ""
|
||||
"2. Run C-Eval (a Chinese language model evaluation benchmark) using "
|
||||
"OpenCompass for accuracy testing"
|
||||
msgstr "2. 使用 OpenCompass 运行 C-Eval 进行准确率测试"
|
||||
|
||||
#: ../../source/developer_guide/evaluation/using_opencompass.md:58
|
||||
|
||||
Reference in New Issue
Block a user