[v0.18.0][Doc] Translated Doc files 2026-04-22 (#8565)

## Auto-Translation Summary

Translated **43** file(s):

-
<code>docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/KV_Cache_Pool_Guide.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/cpu_binding.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/disaggregated_prefill.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/eplb_swift_balancer.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/npugraph_ex.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/patch.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/quantization.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/index.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/multi_node_test.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_ais_bench.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_evalscope.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_lm_eval.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_opencompass.po</code>
- <code>docs/source/locale/zh_CN/LC_MESSAGES/faqs.po</code>
- <code>docs/source/locale/zh_CN/LC_MESSAGES/installation.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_colocated_mooncake_multi_instance.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/DeepSeek-V3.1.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM4.x.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM5.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/PaddleOCR-VL.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen-VL-Dense.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-235B-A22B.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3_embedding.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/Fine_grained_TP.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/batch_invariance.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/context_parallel.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/cpu_binding.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/dynamic_batch.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/eplb_swift_balancer.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/kv_pool.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/layer_sharding.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/netloader.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/npugraph_ex.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/sleep_mode.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/ucm_deployment.po</code>
-
<code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/weight_prefetch.po</code>

---

[Workflow
run](https://github.com/vllm-project/vllm-ascend/actions/runs/24767290887)

Signed-off-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
Co-authored-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
This commit is contained in:
vllm-ascend-ci
2026-04-23 11:06:05 +08:00
committed by GitHub
parent 9e31e4f234
commit 0c458aa6dc
43 changed files with 1389 additions and 1012 deletions

View File

@@ -1,14 +1,7 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2026.
#
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -187,8 +180,8 @@ msgstr "`--tensor-parallel-size` 16 是张量并行TP大小的常见设置
#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:305
msgid ""
"`--prefill-context-parallel-size` 2 are common settings for prefill "
"context parallelism (PCP) sizes."
"`--prefill-context-parallel-size` 2 is common setting for prefill context"
" parallelism (PCP) sizes."
msgstr "`--prefill-context-parallel-size` 2 是预填充上下文并行PCP大小的常见设置。"
#: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:306

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -40,7 +40,9 @@ msgid ""
"demonstrates how to use vllm-ascend v0.11.0 (with vLLM v0.11.0) on two "
"Atlas 800T A2 nodes to deploy two vLLM instances. Each instance occupies "
"4 NPU cards and uses PD-colocated deployment."
msgstr "本指南以 Qwen2.5-72B-Instruct 模型为例,演示如何在两个 Atlas 800T A2 节点上使用 vllm-ascend v0.11.0(包含 vLLM v0.11.0)部署两个 vLLM 实例。每个实例占用 4 个 NPU 卡,并采用 PD 共置部署。"
msgstr ""
"本指南以 Qwen2.5-72B-Instruct 模型为例,演示如何在两个 Atlas 800T A2 节点上使用 vllm-ascend "
"v0.11.0(包含 vLLM v0.11.0)部署两个 vLLM 实例。每个实例占用 4 个 NPU 卡,并采用 PD 共置部署。"
#: ../../source/tutorials/features/pd_colocated_mooncake_multi_instance.md:14
msgid "Verify Multi-Node Communication Environment"
@@ -128,7 +130,10 @@ msgid ""
"Mooncake is the serving platform for Kimi, a leading LLM service provided"
" by Moonshot AI. Installation and compilation guide: <https://github.com"
"/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries>."
msgstr "Mooncake 是 Kimi 的服务平台Kimi 是由 Moonshot AI 提供的领先 LLM 服务。安装和编译指南:<https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries>。"
msgstr ""
"Mooncake 是 Kimi 的服务平台Kimi 是由 Moonshot AI 提供的领先 LLM "
"服务。安装和编译指南:<https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file"
"#build-and-use-binaries>。"
#: ../../source/tutorials/features/pd_colocated_mooncake_multi_instance.md:121
msgid "First, obtain the Mooncake project using the following command:"
@@ -275,7 +280,10 @@ msgid ""
" cross-node, cross-instance KV Cache. Instance 1 utilizes NPU cards [0-3]"
" on the first Atlas 800T A2 server, while Instance 2 utilizes cards [0-3]"
" on the second server."
msgstr "在节点 1 和节点 2 上分别创建容器,并在每个容器中启动 Qwen2.5-72B-Instruct 模型服务,以测试跨节点、跨实例 KV Cache 的可重用性和性能。实例 1 使用第一个 Atlas 800T A2 服务器上的 NPU 卡 [0-3],而实例 2 使用第二个服务器上的卡 [0-3]。"
msgstr ""
"在节点 1 和节点 2 上分别创建容器,并在每个容器中启动 Qwen2.5-72B-Instruct 模型服务,以测试跨节点、跨实例 KV "
"Cache 的可重用性和性能。实例 1 使用第一个 Atlas 800T A2 服务器上的 NPU 卡 [0-3],而实例 2 "
"使用第二个服务器上的卡 [0-3]。"
#: ../../source/tutorials/features/pd_colocated_mooncake_multi_instance.md:208
msgid "Deploy Instance 1"
@@ -430,9 +438,9 @@ msgstr "步骤 2 的准备工作"
#: ../../source/tutorials/features/pd_colocated_mooncake_multi_instance.md:285
msgid ""
"Before Step 2, send a fully random Dataset B to Instance 1. Due to the "
"unified HBM/DRAM KV Cache with LRU (Least Recently Used) eviction policy,"
" Dataset B's cache evicts Dataset A's cache from HBM, leaving Dataset A's"
" cache only in Node 1's DRAM."
"unified on-chip memory/DRAM KV Cache with LRU (Least Recently Used) "
"eviction policy, Dataset B's cache evicts Dataset A's cache from on-chip "
"memory, leaving Dataset A's cache only in Node 1's DRAM."
msgstr "在步骤2之前向实例1发送一个完全随机的数据集B。由于采用了具有LRU最近最少使用淘汰策略的统一HBM/DRAM KV缓存数据集B的缓存会将数据集A的缓存从HBM中淘汰使得数据集A的缓存仅保留在节点1的DRAM中。"
#: ../../source/tutorials/features/pd_colocated_mooncake_multi_instance.md:290

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -40,7 +40,7 @@ msgid ""
"servers to deploy the \"2P1D\" architecture. Assume the IP of the "
"prefiller server is 192.0.0.1 (prefill 1) and 192.0.0.2 (prefill 2), and "
"the decoder servers are 192.0.0.3 (decoder 1) and 192.0.0.4 (decoder 2). "
"On each server, use 8 NPUs 16 chips to deploy one service instance."
"On each server, use 8 NPUs and 16 chips to deploy one service instance."
msgstr ""
"以 Deepseek-r1-w8a8 模型为例,使用 4 台 Atlas 800T A3 服务器部署 \"2P1D\" 架构。假设预填充服务器 "
"IP 为 192.0.0.1(预填充节点 1和 192.0.0.2(预填充节点 2解码服务器 IP 为 192.0.0.3(解码节点 1和"

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -30,16 +30,17 @@ msgstr "开始使用"
#: ../../source/tutorials/features/pd_disaggregation_mooncake_single_node.md:5
msgid ""
"vLLM-Ascend now supports prefill-decode (PD) disaggregation. This guide "
"takes one-by-one steps to verify these features with constrained "
"resources."
msgstr "vLLM-Ascend 现已支持预填充-解码 (PD) 解耦架构。本指南将逐步引导您在有限资源下验证这些功能。"
"provides step-by-step instructions to verify this features in resource-"
"constrained environments."
msgstr "vLLM-Ascend 现已支持预填充-解码 (PD) 解耦架构。本指南提供逐步说明,帮助您在资源受限的环境中验证这些功能。"
#: ../../source/tutorials/features/pd_disaggregation_mooncake_single_node.md:7
msgid ""
"Using the Qwen2.5-VL-7B-Instruct model as an example, use vLLM-Ascend "
"Using the Qwen2.5-VL-7B-Instruct model as an example, use vllm-ascend "
"v0.11.0rc1 (with vLLM v0.11.0) on 1 Atlas 800T A2 server to deploy the "
"\"1P1D\" architecture. Assume the IP address is 192.0.0.1."
msgstr "以 Qwen2.5-VL-7B-Instruct 模型为例,在 1 台 Atlas 800T A2 服务器上使用 vLLM-Ascend v0.11.0rc1 (包含 vLLM v0.11.0) 部署 \"1P1D\" 架构。假设 IP 地址为 192.0.0.1"
"\"1P1D\" architecture (one Prefiller and one Decoder on the same node). "
"Assume the IP address is 192.0.0.1."
msgstr "以 Qwen2.5-VL-7B-Instruct 模型为例,在 1 台 Atlas 800T A2 服务器上使用 vllm-ascend v0.11.0rc1(包含 vLLM v0.11.0)部署 \"1P1D\" 架构(同一节点上一个预填充器和一个解码器)。假设 IP 地址为 192.0.0.1。"
#: ../../source/tutorials/features/pd_disaggregation_mooncake_single_node.md:9
msgid "Verify Communication Environment"

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -32,32 +32,25 @@ msgid ""
"DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-"
"thinking mode. Compared to the previous version, this upgrade brings "
"improvements in multiple aspects:"
msgstr ""
"DeepSeek-V3.1 是一个支持思考模式和非思考模式的混合模型。与前一版本相比,此"
"次升级在多个方面带来了改进:"
msgstr "DeepSeek-V3.1 是一个支持思考模式和非思考模式的混合模型。与前一版本相比,此次升级在多个方面带来了改进:"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:7
msgid ""
"Hybrid thinking mode: One model supports both thinking mode and non-"
"thinking mode by changing the chat template."
msgstr ""
"混合思考模式:一个模型通过更改聊天模板,同时支持思考模式和非思考模式。"
msgstr "混合思考模式:一个模型通过更改聊天模板,同时支持思考模式和非思考模式。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:9
msgid ""
"Smarter tool calling: Through post-training optimization, the model's "
"performance in tool usage and agent tasks has significantly improved."
msgstr ""
"更智能的工具调用:通过后训练优化,模型在工具使用和智能体任务方面的性能显著提"
"升。"
msgstr "更智能的工具调用:通过后训练优化,模型在工具使用和智能体任务方面的性能显著提升。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:11
msgid ""
"Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable "
"answer quality to DeepSeek-R1-0528, while responding more quickly."
msgstr ""
"更高的思考效率DeepSeek-V3.1-Think 实现了与 DeepSeek-R1-0528 相当的答案质"
"量,同时响应速度更快。"
msgstr "更高的思考效率DeepSeek-V3.1-Think 实现了与 DeepSeek-R1-0528 相当的答案质量,同时响应速度更快。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:13
msgid "The `DeepSeek-V3.1` model is first supported in `vllm-ascend:v0.9.1rc3`."
@@ -69,9 +62,7 @@ msgid ""
"including supported features, feature configuration, environment "
"preparation, single-node and multi-node deployment, accuracy and "
"performance evaluation."
msgstr ""
"本文档将展示该模型的主要验证步骤,包括支持的特性、特性配置、环境准备、单节点"
"和多节点部署、精度和性能评估。"
msgstr "本文档将展示该模型的主要验证步骤,包括支持的特性、特性配置、环境准备、单节点和多节点部署、精度和性能评估。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:17
msgid "Supported Features"
@@ -90,9 +81,7 @@ msgstr ""
msgid ""
"Refer to [feature guide](../../user_guide/feature_guide/index.md) to get "
"the feature's configuration."
msgstr ""
"请参考 [特性指南](../../user_guide/feature_guide/index.md) 以获取特性的配"
"置。"
msgstr "请参考 [特性指南](../../user_guide/feature_guide/index.md) 以获取特性的配置。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:23
msgid "Environment Preparation"
@@ -107,8 +96,8 @@ msgid ""
"`DeepSeek-V3.1`(BF16 version): [Download model "
"weight](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-V3.1)."
msgstr ""
"`DeepSeek-V3.1`BF16 版本):[下载模型权重](https://www.modelscope.cn/"
"models/deepseek-ai/DeepSeek-V3.1)。"
"`DeepSeek-V3.1`BF16 版本):[下载模型权重](https://www.modelscope.cn/models"
"/deepseek-ai/DeepSeek-V3.1)。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:28
msgid ""
@@ -116,9 +105,9 @@ msgid ""
"[Download model weight](https://www.modelscope.cn/models/Eco-"
"Tech/DeepSeek-V3.1-w8a8-mtp-QuaRot)."
msgstr ""
"`DeepSeek-V3.1-w8a8-mtp-QuaRot`(混合 MTP 量化版本):[下载模型权重]"
"(https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-w8a8-mtp-"
"QuaRot)。"
"`DeepSeek-V3.1-w8a8-mtp-QuaRot`(混合 MTP "
"量化版本):[下载模型权重](https://www.modelscope.cn/models/Eco-"
"Tech/DeepSeek-V3.1-w8a8-mtp-QuaRot)。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:29
msgid ""
@@ -126,9 +115,9 @@ msgid ""
" [Download model weight](https://www.modelscope.cn/models/Eco-"
"Tech/DeepSeek-V3.1-Terminus-w4a8-mtp-QuaRot)."
msgstr ""
"`DeepSeek-V3.1-Terminus-w4a8-mtp-QuaRot`(混合 MTP 量化版本):[下载模型权"
"重](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-Terminus-w4a8-"
"mtp-QuaRot)。"
"`DeepSeek-V3.1-Terminus-w4a8-mtp-QuaRot`(混合 MTP "
"量化版本):[下载模型权重](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1"
"-Terminus-w4a8-mtp-QuaRot)。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:30
#, python-format
@@ -137,8 +126,7 @@ msgid ""
"[msmodelslim](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v31-w8a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96-mtp-%E9%87%8F%E5%8C%96)."
" You can use this method to quantize the model."
msgstr ""
"`量化方法`"
"[msmodelslim](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v31-w8a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96-mtp-%E9%87%8F%E5%8C%96)。"
"`量化方法`[msmodelslim](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v31-w8a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96-mtp-%E9%87%8F%E5%8C%96)。"
" 您可以使用此方法对模型进行量化。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:32
@@ -157,8 +145,8 @@ msgid ""
"node communication according to [verify multi-node communication "
"environment](../../installation.md#verify-multi-node-communication)."
msgstr ""
"如果您想部署多节点环境,需要根据 [验证多节点通信环境](../../installation."
"md#verify-multi-node-communication) 验证多节点通信。"
"如果您想部署多节点环境,需要根据 [验证多节点通信环境](../../installation.md#verify-multi-node-"
"communication) 验证多节点通信。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:38
msgid "Installation"
@@ -174,8 +162,8 @@ msgid ""
"your node, refer to [using docker](../../installation.md#set-up-using-"
"docker)."
msgstr ""
"根据您的机器类型选择镜像并在节点上启动 docker 镜像,请参考 [使用 docker]"
"(../../installation.md#set-up-using-docker)。"
"根据您的机器类型选择镜像并在节点上启动 docker 镜像,请参考 [使用 docker](../../installation.md#set-"
"up-using-docker)。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:80
msgid ""
@@ -195,9 +183,7 @@ msgstr "单节点部署"
msgid ""
"Quantized model `DeepSeek-V3.1-w8a8-mtp-QuaRot` can be deployed on 1 "
"Atlas 800 A3 (64G × 16)."
msgstr ""
"量化模型 `DeepSeek-V3.1-w8a8-mtp-QuaRot` 可以部署在 1 台 Atlas 800 A3 "
"64G × 16上。"
msgstr "量化模型 `DeepSeek-V3.1-w8a8-mtp-QuaRot` 可以部署在 1 台 Atlas 800 A3 64G × 16上。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:88
msgid "Run the following script to execute online inference."
@@ -215,9 +201,8 @@ msgid ""
" Furthermore, enabling this feature is not recommended in scenarios where"
" PD is separated."
msgstr ""
"设置环境变量 `VLLM_ASCEND_BALANCE_SCHEDULING=1` 启用均衡调度。这可能有助于"
"在 v1 调度器中提高输出吞吐量并降低 TPOT。然而在某些场景下 TTFT 可能会下"
"降。此外,在 PD 分离的场景中不建议启用此功能。"
"设置环境变量 `VLLM_ASCEND_BALANCE_SCHEDULING=1` 启用均衡调度。这可能有助于在 v1 "
"调度器中提高输出吞吐量并降低 TPOT。然而在某些场景下 TTFT 可能会下降。此外,在 PD 分离的场景中不建议启用此功能。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:135
msgid ""
@@ -233,24 +218,20 @@ msgid ""
"`16384` is sufficient, however, for precision testing, please set it at "
"least `35000`."
msgstr ""
"`--max-model-len` 指定最大上下文长度——即单个请求的输入和输出令牌之和。对于输"
"入长度为 3.5K 和输出长度为 1.5K 的性能测试,`16384` 的值就足够了,但是,对于"
"精度测试,请至少将其设置为 `35000`。"
"`--max-model-len` 指定最大上下文长度——即单个请求的输入和输出令牌之和。对于输入长度为 3.5K 和输出长度为 1.5K "
"的性能测试,`16384` 的值就足够了,但是,对于精度测试,请至少将其设置为 `35000`。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:137
msgid ""
"`--no-enable-prefix-caching` indicates that prefix caching is disabled. "
"To enable it, remove this option."
msgstr ""
"`--no-enable-prefix-caching` 表示前缀缓存被禁用。要启用它,请移除此选项。"
msgstr "`--no-enable-prefix-caching` 表示前缀缓存被禁用。要启用它,请移除此选项。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:138
msgid ""
"If you use the w4a8 weight, more memory will be allocated to kvcache, and"
" you can try to increase system throughput to achieve greater throughput."
msgstr ""
"如果使用 w4a8 权重,将分配更多内存给 kvcache您可以尝试增加系统吞吐量以实现"
"更大的吞吐量。"
msgstr "如果使用 w4a8 权重,将分配更多内存给 kvcache您可以尝试增加系统吞吐量以实现更大的吞吐量。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:140
msgid "Multi-node Deployment"
@@ -260,8 +241,7 @@ msgstr "多节点部署"
msgid ""
"`DeepSeek-V3.1-w8a8-mtp-QuaRot`: require at least 2 Atlas 800 A2 (64G × "
"8)."
msgstr ""
"`DeepSeek-V3.1-w8a8-mtp-QuaRot`:需要至少 2 台 Atlas 800 A264G × 8。"
msgstr "`DeepSeek-V3.1-w8a8-mtp-QuaRot`:需要至少 2 台 Atlas 800 A264G × 8"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:144
msgid "Run the following scripts on two nodes respectively."
@@ -284,8 +264,8 @@ msgid ""
"We recommend using Mooncake for deployment: "
"[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)."
msgstr ""
"我们建议使用 Mooncake 进行部署:[Mooncake](../features/"
"pd_disaggregation_mooncake_multi_node.md)。"
"我们建议使用 Mooncake "
"进行部署:[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:256
msgid ""
@@ -293,27 +273,27 @@ msgid ""
"nodes) rather than 1P1D (2 nodes), because there is no enough NPU memory "
"to serve high concurrency in 1P1D case."
msgstr ""
"以 Atlas 800 A364G × 16为例我们建议部署 2P1D4 个节点)而不是 1P1D"
"2 个节点),因为在 1P1D 情况下没有足够的 NPU 内存来服务高并发。"
"以 Atlas 800 A364G × 16为例我们建议部署 2P1D4 个节点)而不是 1P1D2 个节点),因为在 1P1D "
"情况下没有足够的 NPU 内存来服务高并发。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:258
msgid ""
"`DeepSeek-V3.1-w8a8-mtp-QuaRot 2P1D Layerwise` require 4 Atlas 800 A3 "
"(64G × 16)."
msgstr ""
"`DeepSeek-V3.1-w8a8-mtp-QuaRot 2P1D Layerwise` 需要 4 台 Atlas 800 A3 "
"64G × 16。"
"`DeepSeek-V3.1-w8a8-mtp-QuaRot 2P1D Layerwise` 需要 4 台 Atlas 800 A3 64G ×"
" 16。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:260
msgid ""
"To run the vllm-ascend `Prefill-Decode Disaggregation` service, you need "
"to deploy a `launch_dp_program.py` script and a `run_dp_template.sh` "
"to deploy a `launch_online_dp.py` script and a `run_dp_template.sh` "
"script on each node and deploy a `proxy.sh` script on prefill master node"
" to forward requests."
msgstr ""
"要运行 vllm-ascend `Prefill-Decode 解耦`服务,您需要在每个节点上部署一个 "
"`launch_dp_program.py` 脚本和一个 `run_dp_template.sh` 脚本,并在 prefill "
"主节点上部署一个 `proxy.sh` 脚本来转发请求。"
"`launch_online_dp.py` 脚本和一个 `run_dp_template.sh` 脚本,并在 prefill 主节点上部署一个 "
"`proxy.sh` 脚本来转发请求。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:262
msgid ""
@@ -321,9 +301,9 @@ msgid ""
"[launch\\_online\\_dp.py](https://github.com/vllm-project/vllm-"
"ascend/blob/main/examples/external_online_dp/launch_online_dp.py)"
msgstr ""
"`launch_online_dp.py` 用于启动外部 dp vllm 服务器。[launch\\_online\\_dp."
"py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/"
"external_online_dp/launch_online_dp.py)"
"`launch_online_dp.py` 用于启动外部 dp vllm "
"服务器。[launch\\_online\\_dp.py](https://github.com/vllm-project/vllm-"
"ascend/blob/main/examples/external_online_dp/launch_online_dp.py)"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:265
msgid "Prefill Node 0 `run_dp_template.sh` script"
@@ -358,8 +338,8 @@ msgid ""
"Prefill-Decode (PD) separation scenario, enable MLAPO only on decode "
"nodes."
msgstr ""
"`VLLM_ASCEND_ENABLE_MLAPO=1`:启用融合算子,这可以显著提高性能但会消耗更多 "
"NPU 内存。在 Prefill-Decode (PD) 分离场景中,仅在 decode 节点上启用 MLAPO。"
"`VLLM_ASCEND_ENABLE_MLAPO=1`:启用融合算子,这可以显著提高性能但会消耗更多 NPU 内存。在 Prefill-"
"Decode (PD) 分离场景中,仅在 decode 节点上启用 MLAPO。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:576
msgid ""
@@ -367,9 +347,7 @@ msgid ""
"Multi-Token Prediction (MTP) is enabled, asynchronous scheduling of "
"operator delivery can be implemented to overlap the operator delivery "
"latency."
msgstr ""
"`--async-scheduling`:启用异步调度功能。当启用多令牌预测 (MTP) 时,可以实现算"
"子交付的异步调度,以重叠算子交付延迟。"
msgstr "`--async-scheduling`:启用异步调度功能。当启用多令牌预测 (MTP) 时,可以实现算子交付的异步调度,以重叠算子交付延迟。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:577
msgid ""
@@ -378,9 +356,8 @@ msgid ""
"it is recommended to set them to the number of frequently occurring "
"requests on the Decode (D) node."
msgstr ""
"`cudagraph_capture_sizes`:推荐值为 `n x (mtp + 1)`。最小值为 `n = 1`,最大"
"值为 `n = max-num-seqs`。对于其他值,建议将其设置为 Decode (D) 节点上频繁出"
"现的请求数量。"
"`cudagraph_capture_sizes`:推荐值为 `n x (mtp + 1)`。最小值为 `n = 1`,最大值为 `n = "
"max-num-seqs`。对于其他值,建议将其设置为 Decode (D) 节点上频繁出现的请求数量。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:578
msgid ""
@@ -390,9 +367,9 @@ msgid ""
"the PD separation scenario, it is recommended to enable this "
"configuration on both prefill and decode nodes simultaneously."
msgstr ""
"`recompute_scheduler_enable: true`:启用重计算调度器。当 decode 节点的键值缓"
"存 (KV Cache) 不足时,请求将被发送到 prefill 节点以重新计算 KV Cache。在 PD "
"分离场景中,建议同时在 prefill 和 decode 节点上启用此配置。"
"`recompute_scheduler_enable: true`:启用重计算调度器。当 decode 节点的键值缓存 (KV Cache) "
"不足时,请求将被发送到 prefill 节点以重新计算 KV Cache。在 PD 分离场景中,建议同时在 prefill 和 decode "
"节点上启用此配置。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:579
msgid ""
@@ -402,8 +379,7 @@ msgid ""
"improved efficiency."
msgstr ""
"`multistream_overlap_shared_expert: true`:当张量并行 (TP) 大小为 1 或 "
"`enable_shared_expert_dp: true` 时,启用额外的流来重叠共享专家的计算过程,以"
"提高效率。"
"`enable_shared_expert_dp: true` 时,启用额外的流来重叠共享专家的计算过程,以提高效率。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:580
msgid ""
@@ -412,9 +388,8 @@ msgid ""
"embedding layer to be greater than 1, which is used to reduce the "
"computational load of each card on the LMHead embedding layer."
msgstr ""
"`lmhead_tensor_parallel_size: 16`:当 decode 节点的张量并行 (TP) 大小为 1 "
"时,此参数允许 LMHead 嵌入层的 TP 大小大于 1用于减少每张卡在 LMHead 嵌入层"
"上的计算负载。"
"`lmhead_tensor_parallel_size: 16`:当 decode 节点的张量并行 (TP) 大小为 1 时,此参数允许 "
"LMHead 嵌入层的 TP 大小大于 1用于减少每张卡在 LMHead 嵌入层上的计算负载。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:582
msgid "run server for each node:"
@@ -431,7 +406,10 @@ msgid ""
"[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-"
"project/vllm-"
"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
msgstr "在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序:[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
msgstr ""
"在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序:[load\\_balance\\_proxy\\_server\\_example.py](https://github.com"
"/vllm-project/vllm-"
"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:653
msgid "Functional Verification"
@@ -466,7 +444,9 @@ msgid ""
"After execution, you can get the result, here is the result of "
"`DeepSeek-V3.1-w8a8-mtp-QuaRot` in `vllm-ascend:0.11.0rc1` for reference "
"only."
msgstr "执行后,您可以获得结果。以下是 `vllm-ascend:0.11.0rc1` 中 `DeepSeek-V3.1-w8a8-mtp-QuaRot` 的结果,仅供参考。"
msgstr ""
"执行后,您可以获得结果。以下是 `vllm-ascend:0.11.0rc1` 中 `DeepSeek-V3.1-w8a8-mtp-QuaRot`"
" 的结果,仅供参考。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:44
msgid "dataset"
@@ -541,7 +521,10 @@ msgid ""
"Refer to [Using AISBench for performance "
"evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
"performance-evaluation) for details."
msgstr "详情请参考[使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
msgstr ""
"详情请参考[使用 AISBench "
"进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-"
"performance-evaluation)。"
#: ../../source/tutorials/models/DeepSeek-V3.1.md:693
msgid "The performance result is:"

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -74,41 +74,56 @@ msgstr "模型权重"
msgid ""
"`GLM-4.5`(BF16 version): [Download model "
"weight](https://www.modelscope.cn/models/ZhipuAI/GLM-4.5)."
msgstr "`GLM-4.5`BF16 版本):[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.5)。"
msgstr ""
"`GLM-4.5`BF16 "
"版本):[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.5)。"
#: ../../source/tutorials/models/GLM4.x.md:22
msgid ""
"`GLM-4.6`(BF16 version): [Download model "
"weight](https://www.modelscope.cn/models/ZhipuAI/GLM-4.6)."
msgstr "`GLM-4.6`BF16 版本):[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.6)。"
msgstr ""
"`GLM-4.6`BF16 "
"版本):[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.6)。"
#: ../../source/tutorials/models/GLM4.x.md:23
msgid ""
"`GLM-4.7`(BF16 version): [Download model "
"weight](https://www.modelscope.cn/models/ZhipuAI/GLM-4.7)."
msgstr "`GLM-4.7`BF16 版本):[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.7)。"
msgstr ""
"`GLM-4.7`BF16 "
"版本):[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.7)。"
#: ../../source/tutorials/models/GLM4.x.md:24
msgid ""
"`GLM-4.5-w8a8-with-float-mtp`(Quantized version with mtp): [Download "
"model weight](https://modelers.cn/models/Modelers_Park/GLM-4.5-w8a8)."
msgstr "`GLM-4.5-w8a8-with-float-mtp`(带 mtp 的量化版本):[下载模型权重](https://modelers.cn/models/Modelers_Park/GLM-4.5-w8a8)。"
msgstr ""
"`GLM-4.5-w8a8-with-float-mtp`(带 mtp "
"的量化版本):[下载模型权重](https://modelers.cn/models/Modelers_Park/GLM-4.5-w8a8)。"
#: ../../source/tutorials/models/GLM4.x.md:25
msgid ""
"`GLM-4.6-w8a8`(Quantized version without mtp): [Download model "
"weight](https://modelers.cn/models/Modelers_Park/GLM-4.6-w8a8). Because "
"vllm do not support GLM4.6 mtp in October, so we do not provide mtp "
"version. And last month, it supported, you can use the following "
"quantization scheme to add mtp weights to Quantized weights."
msgstr "`GLM-4.6-w8a8`(不带 mtp 的量化版本):[下载模型权重](https://modelers.cn/models/Modelers_Park/GLM-4.6-w8a8)。由于 vllm 在十月份不支持 GLM4.6 的 mtp因此我们不提供 mtp 版本。上个月已支持,您可以使用以下量化方案将 mtp 权重添加到量化权重中。"
"vllm does not support GLM4.6 mtp in October, we do not provide an mtp "
"version. Last month, it was supported; you can use the following "
"quantization scheme to add mtp weights to the quantized weights."
msgstr ""
"`GLM-4.6-w8a8`(不带 mtp "
"的量化版本):[下载模型权重](https://modelers.cn/models/Modelers_Park/GLM-4.6-w8a8)。由于"
" vllm 在十月份不支持 GLM4.6 的 mtp因此我们不提供 mtp 版本。上个月已支持,您可以使用以下量化方案将 mtp "
"权重添加到量化权重中。"
#: ../../source/tutorials/models/GLM4.x.md:26
msgid ""
"`GLM-4.7-w8a8-with-float-mtp`(Quantized version without mtp): [Download "
"model weight](https://modelscope.cn/models/Eco-"
"Tech/GLM-4.7-W8A8-floatmtp)."
msgstr "`GLM-4.7-w8a8-with-float-mtp`(不带 mtp 的量化版本):[下载模型权重](https://modelscope.cn/models/Eco-Tech/GLM-4.7-W8A8-floatmtp)。"
msgstr ""
"`GLM-4.7-w8a8-with-float-mtp`(不带 mtp "
"的量化版本):[下载模型权重](https://modelscope.cn/models/Eco-"
"Tech/GLM-4.7-W8A8-floatmtp)。"
#: ../../source/tutorials/models/GLM4.x.md:27
msgid ""
@@ -136,14 +151,17 @@ msgid "A3 series"
msgstr "A3 系列"
#: ../../source/tutorials/models/GLM4.x.md:42
#: ../../source/tutorials/models/GLM4.x.md:85
msgid "Start the docker image on your each node."
msgstr "在您的每个节点上启动 docker 镜像。"
msgid "Start the docker image on each node."
msgstr "在每个节点上启动 docker 镜像。"
#: ../../source/tutorials/models/GLM4.x.md
msgid "A2 series"
msgstr "A2 系列"
#: ../../source/tutorials/models/GLM4.x.md:85
msgid "Start the docker image on your each node."
msgstr "在每个节点上启动 docker 镜像。"
#: ../../source/tutorials/models/GLM4.x.md:118
msgid ""
"In addition, if you don't want to use the docker image as above, you can "
@@ -180,7 +198,12 @@ msgid ""
"The optimization of the FIA operator will be enabled by default in CANN "
"9.x releases, and manual replacement will no longer be required. Please "
"stay tuned for updates to this document."
msgstr "我们已在 CANN 8.5.1 中优化了 FIA 算子。需要手动替换与 FIA 算子相关的文件。请执行 FIA 算子替换脚本:[A2](../../../../tools/install_flash_infer_attention_score_ops_a2.sh) 和 [A3](../../../../tools/install_flash_infer_attention_score_ops_a3.sh)。FIA 算子的优化将在 CANN 9.x 版本中默认启用,届时将不再需要手动替换。请关注本文档的更新。"
msgstr ""
"我们已在 CANN 8.5.1 中优化了 FIA 算子。需要手动替换与 FIA 算子相关的文件。请执行 FIA "
"算子替换脚本:[A2](../../../../tools/install_flash_infer_attention_score_ops_a2.sh)"
" 和 "
"[A3](../../../../tools/install_flash_infer_attention_score_ops_a3.sh)。FIA"
" 算子的优化将在 CANN 9.x 版本中默认启用,届时将不再需要手动替换。请关注本文档的更新。"
#: ../../source/tutorials/models/GLM4.x.md:132
msgid "Single-node Deployment"
@@ -194,144 +217,155 @@ msgstr "在低延迟场景下,我们推荐单机部署。"
msgid ""
"Quantized model `glm4.7_w8a8_with_float_mtp` can be deployed on 1 Atlas "
"800 A3 (64G × 16) or 1 Atlas 800 A2 (64G × 8)."
msgstr "量化模型 `glm4.7_w8a8_with_float_mtp` 可以部署在 1 台 Atlas 800 A364G × 16或 1 台 Atlas 800 A264G × 8上。"
msgstr ""
"量化模型 `glm4.7_w8a8_with_float_mtp` 可以部署在 1 台 Atlas 800 A364G × 16或 1 台 "
"Atlas 800 A264G × 8上。"
#: ../../source/tutorials/models/GLM4.x.md:137
msgid "Run the following script to execute online inference."
msgstr "运行以下脚本以执行在线推理。"
#: ../../source/tutorials/models/GLM4.x.md:169
#: ../../source/tutorials/models/GLM4.x.md:168
msgid "**Notice:** The parameters are explained as follows:"
msgstr "**注意:** 参数解释如下:"
#: ../../source/tutorials/models/GLM4.x.md:172
#: ../../source/tutorials/models/GLM4.x.md:171
msgid ""
"`--async-scheduling` Asynchronous scheduling is a technique used to "
"optimize inference efficiency. It allows non-blocking task scheduling to "
"improve concurrency and throughput, especially when processing large-"
"scale models."
msgstr "`--async-scheduling` 异步调度是一种用于优化推理效率的技术。它允许非阻塞的任务调度,以提高并发性和吞吐量,特别是在处理大规模模型时。"
msgstr ""
"`--async-scheduling` "
"异步调度是一种用于优化推理效率的技术。它允许非阻塞的任务调度,以提高并发性和吞吐量,特别是在处理大规模模型时。"
#: ../../source/tutorials/models/GLM4.x.md:173
#: ../../source/tutorials/models/GLM4.x.md:172
msgid ""
"`fusion_ops_gmmswigluquant` The performance of the GmmSwigluQuant fusion "
"operator tends to degrade when the total number of NPUs is ≤ 16."
msgstr "`fusion_ops_gmmswigluquant` 当 NPU 总数 ≤ 16 时GmmSwigluQuant 融合算子的性能往往会下降。"
#: ../../source/tutorials/models/GLM4.x.md:175
#: ../../source/tutorials/models/GLM4.x.md:174
msgid "Multi-node Deployment"
msgstr "多节点部署"
#: ../../source/tutorials/models/GLM4.x.md:177
#: ../../source/tutorials/models/GLM4.x.md:176
msgid ""
"Although the former tutorial said \"Not recommended to deploy multi-node "
"on Atlas 800 A2 (64G × 8)\", but if you insist to deploy GLM-4.x model on"
" multi-node like 2 × Atlas 800 A2 (64G × 8), run the following scripts on"
" two nodes respectively."
msgstr "尽管之前的教程提到“不建议在 Atlas 800 A264G × 8上部署多节点”但如果您坚持要在类似 2 × Atlas 800 A264G × 8的多节点上部署 GLM-4.x 模型,请分别在两个节点上运行以下脚本。"
msgstr ""
"尽管之前的教程提到“不建议在 Atlas 800 A264G × 8上部署多节点”但如果您坚持要在类似 2 × Atlas 800 "
"A264G × 8的多节点上部署 GLM-4.x 模型,请分别在两个节点上运行以下脚本。"
#: ../../source/tutorials/models/GLM4.x.md:179
#: ../../source/tutorials/models/GLM4.x.md:178
msgid "**Node 0**"
msgstr "**节点 0**"
#: ../../source/tutorials/models/GLM4.x.md:230
#: ../../source/tutorials/models/GLM4.x.md:228
msgid "**Node 1**"
msgstr "**节点 1**"
#: ../../source/tutorials/models/GLM4.x.md:283
#: ../../source/tutorials/models/GLM4.x.md:280
msgid "Prefill-Decode Disaggregation"
msgstr "Prefill-Decode 解耦部署"
#: ../../source/tutorials/models/GLM4.x.md:285
#: ../../source/tutorials/models/GLM4.x.md:282
msgid ""
"We'd like to show the deployment guide of `GLM4.7` on multi-node "
"environment with 2P1D for better performance."
msgstr "我们将展示 `GLM4.7` 在多节点环境2P1D下的部署指南以获得更好的性能。"
#: ../../source/tutorials/models/GLM4.x.md:287
#: ../../source/tutorials/models/GLM4.x.md:284
msgid "Before you start, please"
msgstr "在开始之前,请"
#: ../../source/tutorials/models/GLM4.x.md:289
#: ../../source/tutorials/models/GLM4.x.md:286
msgid "prepare the script `launch_online_dp.py` on each node:"
msgstr "在每个节点上准备脚本 `launch_online_dp.py`"
#: ../../source/tutorials/models/GLM4.x.md:392
#: ../../source/tutorials/models/GLM4.x.md:389
msgid "prepare the script `run_dp_template.sh` on each node."
msgstr "在每个节点上准备脚本 `run_dp_template.sh`。"
#: ../../source/tutorials/models/GLM4.x.md:394
#: ../../source/tutorials/models/GLM4.x.md:669
#: ../../source/tutorials/models/GLM4.x.md:391
#: ../../source/tutorials/models/GLM4.x.md:664
msgid "Prefill node 0"
msgstr "Prefill 节点 0"
#: ../../source/tutorials/models/GLM4.x.md:460
#: ../../source/tutorials/models/GLM4.x.md:676
#: ../../source/tutorials/models/GLM4.x.md:456
#: ../../source/tutorials/models/GLM4.x.md:671
msgid "Prefill node 1"
msgstr "Prefill 节点 1"
#: ../../source/tutorials/models/GLM4.x.md:525
#: ../../source/tutorials/models/GLM4.x.md:683
#: ../../source/tutorials/models/GLM4.x.md:520
#: ../../source/tutorials/models/GLM4.x.md:678
msgid "Decode node 0"
msgstr "Decode 节点 0"
#: ../../source/tutorials/models/GLM4.x.md:596
#: ../../source/tutorials/models/GLM4.x.md:690
#: ../../source/tutorials/models/GLM4.x.md:591
#: ../../source/tutorials/models/GLM4.x.md:685
msgid "Decode node 1"
msgstr "Decode 节点 1"
#: ../../source/tutorials/models/GLM4.x.md:667
#: ../../source/tutorials/models/GLM4.x.md:662
msgid ""
"Once the preparation is done, you can start the server with the following"
" command on each node:"
msgstr "准备工作完成后,您可以在每个节点上使用以下命令启动服务器:"
#: ../../source/tutorials/models/GLM4.x.md:697
#: ../../source/tutorials/models/GLM4.x.md:692
msgid "Request Forwarding"
msgstr "请求转发"
#: ../../source/tutorials/models/GLM4.x.md:699
#: ../../source/tutorials/models/GLM4.x.md:694
msgid ""
"To set up request forwarding, run the following script on any machine. "
"You can get the proxy program in the repository's examples: "
"[load_balance_proxy_server_example.py](https://github.com/vllm-project"
"/vllm-"
"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
msgstr "要设置请求转发,请在任何机器上运行以下脚本。您可以在仓库的示例中找到代理程序:[load_balance_proxy_server_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
msgstr ""
"要设置请求转发,请在任何机器上运行以下脚本。您可以在仓库的示例中找到代理程序:[load_balance_proxy_server_example.py](https://github.com"
"/vllm-project/vllm-"
"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
#: ../../source/tutorials/models/GLM4.x.md:728
#: ../../source/tutorials/models/GLM4.x.md:723
msgid "Functional Verification"
msgstr "功能验证"
#: ../../source/tutorials/models/GLM4.x.md:730
#: ../../source/tutorials/models/GLM4.x.md:725
msgid "Once your server is started, you can query the model with input prompts:"
msgstr "服务器启动后,您可以使用输入提示词查询模型:"
#: ../../source/tutorials/models/GLM4.x.md:749
#: ../../source/tutorials/models/GLM4.x.md:744
msgid "Accuracy Evaluation"
msgstr "精度评估"
#: ../../source/tutorials/models/GLM4.x.md:751
#: ../../source/tutorials/models/GLM4.x.md:746
msgid "Here are two accuracy evaluation methods."
msgstr "这里有两种精度评估方法。"
#: ../../source/tutorials/models/GLM4.x.md:753
#: ../../source/tutorials/models/GLM4.x.md:770
#: ../../source/tutorials/models/GLM4.x.md:748
#: ../../source/tutorials/models/GLM4.x.md:765
msgid "Using AISBench"
msgstr "使用 AISBench"
#: ../../source/tutorials/models/GLM4.x.md:755
#: ../../source/tutorials/models/GLM4.x.md:750
msgid ""
"Refer to [Using "
"AISBench](../../developer_guide/evaluation/using_ais_bench.md) for "
"details."
msgstr "详情请参考[使用 AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"
#: ../../source/tutorials/models/GLM4.x.md:757
#: ../../source/tutorials/models/GLM4.x.md:752
msgid ""
"After execution, you can get the result, here is the result of `GLM4.7` "
"in `vllm-ascend:main` (after `vllm-ascend:0.14.0rc1`) for reference only."
msgstr "执行后,您可以获得结果,以下是 `GLM4.7` 在 `vllm-ascend:main``vllm-ascend:0.14.0rc1` 之后)中的结果,仅供参考。"
msgstr ""
"执行后,您可以获得结果,以下是 `GLM4.7` 在 `vllm-ascend:main``vllm-ascend:0.14.0rc1` "
"之后)中的结果,仅供参考。"
#: ../../source/tutorials/models/GLM4.x.md:87
msgid "dataset"
@@ -389,111 +423,111 @@ msgstr "MATH500"
msgid "98.8"
msgstr "98.8"
#: ../../source/tutorials/models/GLM4.x.md:764
#: ../../source/tutorials/models/GLM4.x.md:759
msgid "Using Language Model Evaluation Harness"
msgstr "使用语言模型评估工具"
#: ../../source/tutorials/models/GLM4.x.md:766
#: ../../source/tutorials/models/GLM4.x.md:761
msgid "Not tested yet."
msgstr "尚未测试。"
#: ../../source/tutorials/models/GLM4.x.md:768
#: ../../source/tutorials/models/GLM4.x.md:763
msgid "Performance"
msgstr "性能"
#: ../../source/tutorials/models/GLM4.x.md:772
#: ../../source/tutorials/models/GLM4.x.md:767
msgid ""
"Refer to [Using AISBench for performance "
"evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
"performance-evaluation) for details."
msgstr ""
"详情请参考[使用AISBench进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
"详情请参考[使用AISBench进行性能评估](../../developer_guide/evaluation/using_ais_bench.md"
"#execute-performance-evaluation)。"
#: ../../source/tutorials/models/GLM4.x.md:774
#: ../../source/tutorials/models/GLM4.x.md:769
msgid "Using vLLM Benchmark"
msgstr "使用vLLM基准测试"
#: ../../source/tutorials/models/GLM4.x.md:776
#: ../../source/tutorials/models/GLM4.x.md:771
msgid "Run performance evaluation of `GLM-4.x` as an example."
msgstr "以运行 `GLM-4.x` 的性能评估为例。"
#: ../../source/tutorials/models/GLM4.x.md:778
#: ../../source/tutorials/models/GLM4.x.md:773
msgid ""
"Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/benchmarking/) "
"for more details."
msgstr ""
"更多详情请参考 [vllm基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"
msgstr "更多详情请参考 [vllm基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"
#: ../../source/tutorials/models/GLM4.x.md:780
#: ../../source/tutorials/models/GLM4.x.md:775
msgid "There are three `vllm bench` subcommands:"
msgstr "`vllm bench` 包含三个子命令:"
#: ../../source/tutorials/models/GLM4.x.md:782
#: ../../source/tutorials/models/GLM4.x.md:777
msgid "`latency`: Benchmark the latency of a single batch of requests."
msgstr "`latency`:基准测试单批次请求的延迟。"
#: ../../source/tutorials/models/GLM4.x.md:783
#: ../../source/tutorials/models/GLM4.x.md:778
msgid "`serve`: Benchmark the online serving throughput."
msgstr "`serve`:基准测试在线服务吞吐量。"
#: ../../source/tutorials/models/GLM4.x.md:784
#: ../../source/tutorials/models/GLM4.x.md:779
msgid "`throughput`: Benchmark offline inference throughput."
msgstr "`throughput`:基准测试离线推理吞吐量。"
#: ../../source/tutorials/models/GLM4.x.md:786
#: ../../source/tutorials/models/GLM4.x.md:781
msgid "Take the `serve` as an example. Run the code as follows."
msgstr "以 `serve` 为例,运行以下代码。"
#: ../../source/tutorials/models/GLM4.x.md:808
#: ../../source/tutorials/models/GLM4.x.md:803
msgid ""
"After about several minutes, you can get the performance evaluation "
"result."
msgstr "大约几分钟后,您将获得性能评估结果。"
#: ../../source/tutorials/models/GLM4.x.md:810
#: ../../source/tutorials/models/GLM4.x.md:805
msgid "Best Practices"
msgstr "最佳实践"
#: ../../source/tutorials/models/GLM4.x.md:812
#: ../../source/tutorials/models/GLM4.x.md:807
msgid "In this chapter, we recommend best practices for three scenarios:"
msgstr "本章节,我们针对三种场景推荐最佳实践:"
#: ../../source/tutorials/models/GLM4.x.md:814
#: ../../source/tutorials/models/GLM4.x.md:809
msgid ""
"Long-context: For long sequences with low concurrency (≤ 4): set `dp1 "
"tp16`; For long sequences with high concurrency (> 4): set `dp2 tp8`"
msgstr ""
"长上下文:对于低并发(≤ 4的长序列设置 `dp1 tp16`;对于高并发(> 4的长序列设置 `dp2 tp8`"
msgstr "长上下文:对于低并发(≤ 4的长序列设置 `dp1 tp16`;对于高并发(> 4的长序列设置 `dp2 tp8`"
#: ../../source/tutorials/models/GLM4.x.md:815
#: ../../source/tutorials/models/GLM4.x.md:810
msgid ""
"Low-latency: For short sequences with low latency: we recommend setting "
"`dp2 tp8`"
msgstr "低延迟:对于需要低延迟的短序列,我们推荐设置 `dp2 tp8`"
#: ../../source/tutorials/models/GLM4.x.md:816
#: ../../source/tutorials/models/GLM4.x.md:811
msgid ""
"High-throughput: For short sequences with high throughput: we also "
"recommend setting `dp2 tp8`"
msgstr "高吞吐量:对于需要高吞吐量的短序列,我们也推荐设置 `dp2 tp8`"
#: ../../source/tutorials/models/GLM4.x.md:818
#: ../../source/tutorials/models/GLM4.x.md:813
msgid ""
"**Notice:** `max-model-len` and `max-num-seqs` need to be set according "
"to the actual usage scenario. For other settings, please refer to the "
"**[Deployment](#deployment)** chapter."
msgstr ""
"**注意:** `max-model-len` 和 `max-num-seqs` 需要根据实际使用场景进行设置。其他设置请参考 **[部署](#deployment)** 章节。"
"**注意:** `max-model-len` 和 `max-num-seqs` 需要根据实际使用场景进行设置。其他设置请参考 "
"**[部署](#deployment)** 章节。"
#: ../../source/tutorials/models/GLM4.x.md:821
#: ../../source/tutorials/models/GLM4.x.md:816
msgid "FAQ"
msgstr "常见问题"
#: ../../source/tutorials/models/GLM4.x.md:823
#: ../../source/tutorials/models/GLM4.x.md:818
msgid "**Q: Why is the TPOT performance poor in Long-context test?**"
msgstr "**问为什么在长上下文测试中TPOT性能不佳**"
#: ../../source/tutorials/models/GLM4.x.md:825
#: ../../source/tutorials/models/GLM4.x.md:820
msgid ""
"A: Please ensure that the FIA operator replacement script has been "
"executed successfully to complete the replacement of FIA operators. Here "
@@ -501,28 +535,28 @@ msgid ""
"[A2](../../../../tools/install_flash_infer_attention_score_ops_a2.sh) and"
" [A3](../../../../tools/install_flash_infer_attention_score_ops_a3.sh)"
msgstr ""
"答请确保已成功执行FIA算子替换脚本以完成FIA算子的替换。脚本如下"
"[A2](../../../../tools/install_flash_infer_attention_score_ops_a2.sh)"
"[A3](../../../../tools/install_flash_infer_attention_score_ops_a3.sh)"
"答请确保已成功执行FIA算子替换脚本以完成FIA算子的替换。脚本如下[A2](../../../../tools/install_flash_infer_attention_score_ops_a2.sh)"
"[A3](../../../../tools/install_flash_infer_attention_score_ops_a3.sh)"
#: ../../source/tutorials/models/GLM4.x.md:827
#: ../../source/tutorials/models/GLM4.x.md:822
msgid ""
"**Q: Startup fails with HCCL port conflicts (address already bound). What"
" should I do?**"
msgstr "**问启动失败提示HCCL端口冲突地址已被占用。我该怎么办**"
#: ../../source/tutorials/models/GLM4.x.md:829
#: ../../source/tutorials/models/GLM4.x.md:824
msgid "A: Clean up old processes and restart: `pkill -f VLLM*`."
msgstr "答:清理旧进程并重启:`pkill -f VLLM*`。"
#: ../../source/tutorials/models/GLM4.x.md:831
#: ../../source/tutorials/models/GLM4.x.md:826
msgid "**Q: How to handle OOM or unstable startup?**"
msgstr "**问如何处理OOM或启动不稳定的问题**"
#: ../../source/tutorials/models/GLM4.x.md:833
#: ../../source/tutorials/models/GLM4.x.md:828
msgid ""
"A: Reduce `--max-num-seqs` and `--max-model-len` first. If needed, reduce"
" concurrency and load-testing pressure (e.g., `max-concurrency` / `num-"
"prompts`)."
msgstr ""
"答:首先减少 `--max-num-seqs` 和 `--max-model-len`。如有需要,降低并发度和负载测试压力(例如,`max-concurrency` / `num-prompts`)。"
"答:首先减少 `--max-num-seqs` 和 `--max-model-len`。如有需要,降低并发度和负载测试压力(例如,`max-"
"concurrency` / `num-prompts`)。"

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -30,10 +30,11 @@ msgstr "简介"
#: ../../source/tutorials/models/GLM5.md:5
msgid ""
"[GLM-5](https://huggingface.co/zai-org/GLM-5) use a Mixture-of-Experts "
"(MoE) architecture and targeting at complex systems engineering and long-"
"(MoE) architecture and targets at complex systems engineering and long-"
"horizon agentic tasks."
msgstr ""
"[GLM-5](https://huggingface.co/zai-org/GLM-5) 采用混合专家 (Mixture-of-Experts, MoE) 架构,旨在处理复杂系统工程和长视野智能体任务。"
"[GLM-5](https://huggingface.co/zai-org/GLM-5) 采用混合专家 (Mixture-of-Experts,"
" MoE) 架构,旨在处理复杂系统工程和长视野智能体任务。"
#: ../../source/tutorials/models/GLM5.md:7
msgid ""
@@ -41,7 +42,8 @@ msgid ""
"`vllm-ascend:v0.17.0rc1` and `vllm-ascend:v0.18.0rc1` , the version of "
"transformers need to be upgraded to 5.2.0."
msgstr ""
"`GLM-5` 模型首次在 `vllm-ascend:v0.17.0rc1` 版本中得到支持。在 `vllm-ascend:v0.17.0rc1` 和 `vllm-ascend:v0.18.0rc1` 版本中,需要将 transformers 的版本升级到 5.2.0。"
"`GLM-5` 模型首次在 `vllm-ascend:v0.17.0rc1` 版本中得到支持。在 `vllm-ascend:v0.17.0rc1`"
" 和 `vllm-ascend:v0.18.0rc1` 版本中,需要将 transformers 的版本升级到 5.2.0。"
#: ../../source/tutorials/models/GLM5.md:9
msgid ""
@@ -49,8 +51,7 @@ msgid ""
"including supported features, feature configuration, environment "
"preparation, single-node and multi-node deployment, accuracy and "
"performance evaluation."
msgstr ""
"本文档将展示该模型的主要验证步骤,包括支持的特性、特性配置、环境准备、单节点和多节点部署、精度和性能评估。"
msgstr "本文档将展示该模型的主要验证步骤,包括支持的特性、特性配置、环境准备、单节点和多节点部署、精度和性能评估。"
#: ../../source/tutorials/models/GLM5.md:11
msgid "Supported Features"
@@ -61,15 +62,13 @@ msgid ""
"Refer to [supported "
"features](../../user_guide/support_matrix/supported_models.md) to get the"
" model's supported feature matrix."
msgstr ""
"请参考[支持的特性](../../user_guide/support_matrix/supported_models.md)以获取模型支持的特性矩阵。"
msgstr "请参考[支持的特性](../../user_guide/support_matrix/supported_models.md)以获取模型支持的特性矩阵。"
#: ../../source/tutorials/models/GLM5.md:15
msgid ""
"Refer to [feature guide](../../user_guide/feature_guide/index.md) to get "
"the feature's configuration."
msgstr ""
"请参考[特性指南](../../user_guide/feature_guide/index.md)以获取特性的配置方法。"
msgstr "请参考[特性指南](../../user_guide/feature_guide/index.md)以获取特性的配置方法。"
#: ../../source/tutorials/models/GLM5.md:17
msgid "Environment Preparation"
@@ -84,35 +83,34 @@ msgid ""
"`GLM-5`(BF16 version): [Download model "
"weight](https://www.modelscope.cn/models/ZhipuAI/GLM-5)."
msgstr ""
"`GLM-5` (BF16 版本): [下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-5)。"
"`GLM-5` (BF16 版本): "
"[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-5)。"
#: ../../source/tutorials/models/GLM5.md:22
msgid ""
"`GLM-5-w4a8`: [Download model weight](https://modelscope.cn/models/Eco-"
"Tech/GLM-5-w4a8)."
msgstr ""
"`GLM-5-w4a8`: [下载模型权重](https://modelscope.cn/models/Eco-Tech/GLM-5-w4a8)。"
msgstr "`GLM-5-w4a8`: [下载模型权重](https://modelscope.cn/models/Eco-Tech/GLM-5-w4a8)。"
#: ../../source/tutorials/models/GLM5.md:23
msgid ""
"`GLM-5-w8a8`: [Download model weight](https://www.modelscope.cn/models"
"/Eco-Tech/GLM-5-w8a8)."
msgstr ""
"`GLM-5-w8a8`: [下载模型权重](https://www.modelscope.cn/models/Eco-Tech/GLM-5-w8a8)。"
"`GLM-5-w8a8`: [下载模型权重](https://www.modelscope.cn/models/Eco-"
"Tech/GLM-5-w8a8)。"
#: ../../source/tutorials/models/GLM5.md:24
msgid ""
"You can use [msmodelslim](https://gitcode.com/Ascend/msmodelslim) to "
"quantify the model naively."
msgstr ""
"您可以使用 [msmodelslim](https://gitcode.com/Ascend/msmodelslim) 对模型进行简单的量化。"
msgstr "您可以使用 [msmodelslim](https://gitcode.com/Ascend/msmodelslim) 对模型进行简单的量化。"
#: ../../source/tutorials/models/GLM5.md:26
msgid ""
"It is recommended to download the model weight to the shared directory of"
" multiple nodes, such as `/root/.cache/`"
msgstr ""
"建议将模型权重下载到多个节点的共享目录中,例如 `/root/.cache/`"
msgstr "建议将模型权重下载到多个节点的共享目录中,例如 `/root/.cache/`"
#: ../../source/tutorials/models/GLM5.md:28
msgid "Installation"
@@ -146,7 +144,8 @@ msgid ""
"Install `vllm-ascend` from source, refer to "
"[installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)."
msgstr ""
"从源码安装 `vllm-ascend`,请参考[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)。"
"从源码安装 `vllm-"
"ascend`,请参考[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)。"
#: ../../source/tutorials/models/GLM5.md:123
msgid ""
@@ -200,7 +199,9 @@ msgid ""
"optimize inference efficiency. It allows non-blocking task scheduling to "
"improve concurrency and throughput, especially when processing large-"
"scale models."
msgstr "`--async-scheduling` 异步调度是一种用于优化推理效率的技术。它允许非阻塞的任务调度,以提高并发性和吞吐量,尤其是在处理大规模模型时。"
msgstr ""
"`--async-scheduling` "
"异步调度是一种用于优化推理效率的技术。它允许非阻塞的任务调度,以提高并发性和吞吐量,尤其是在处理大规模模型时。"
#: ../../source/tutorials/models/GLM5.md:254
msgid "Multi-node Deployment"
@@ -211,7 +212,9 @@ msgid ""
"If you want to deploy multi-node environment, you need to verify multi-"
"node communication according to [verify multi-node communication "
"environment](../../installation.md#verify-multi-node-communication)."
msgstr "如果您想部署多节点环境,需要根据[验证多节点通信环境](../../installation.md#verify-multi-node-communication)来验证多节点通信。"
msgstr ""
"如果您想部署多节点环境,需要根据[验证多节点通信环境](../../installation.md#verify-multi-node-"
"communication)来验证多节点通信。"
#: ../../source/tutorials/models/GLM5.md:265
msgid "`glm-5-bf16`: require at least 2 Atlas 800 A3 (64G × 16)."
@@ -240,7 +243,9 @@ msgid ""
"For bf16 weight, use this script on each node to enable [Multi Token "
"Prediction "
"(MTP)](../../user_guide/feature_guide/Multi_Token_Prediction.md)."
msgstr "对于 bf16 权重,在每个节点上使用此脚本来启用[多令牌预测 (MTP)](../../user_guide/feature_guide/Multi_Token_Prediction.md)。"
msgstr ""
"对于 bf16 权重,在每个节点上使用此脚本来启用[多令牌预测 "
"(MTP)](../../user_guide/feature_guide/Multi_Token_Prediction.md)。"
#: ../../source/tutorials/models/GLM5.md:526
msgid "`glm-5-w8a8`: require 2 Atlas 800 A3 (64G × 16)."
@@ -276,200 +281,221 @@ msgid ""
"deployment, `layer_sharding` is supported only on prefill/P nodes with "
"`kv_role=\"kv_producer\"`; do not enable it on decode/D nodes or "
"`kv_role=\"kv_both\"` nodes."
msgstr "为了在预填充阶段支持 200k 的上下文窗口,需要在每个预填充节点的 `--additional_config` 中添加参数 `\"layer_sharding\": [\"q_b_proj\"]`。在 PD 解耦部署中,`layer_sharding` 仅在 `kv_role=\"kv_producer\"` 的预填充/P 节点上受支持;不要在解码/D 节点或 `kv_role=\"kv_both\"` 的节点上启用它。"
msgstr ""
"为了在预填充阶段支持 200k 的上下文窗口,需要在每个预填充节点的 `--additional_config` 中添加参数 "
"`\"layer_sharding\": [\"q_b_proj\"]`。在 PD 解耦部署中,`layer_sharding` 仅在 "
"`kv_role=\"kv_producer\"` 的预填充/P 节点上受支持;不要在解码/D 节点或 `kv_role=\"kv_both\"`"
" 的节点上启用它。"
#: ../../source/tutorials/models/GLM5.md:747
#: ../../source/tutorials/models/GLM5.md:1233
#: ../../source/tutorials/models/GLM5.md:1231
msgid "Prefill node 0"
msgstr "预填充节点 0"
#: ../../source/tutorials/models/GLM5.md:826
#: ../../source/tutorials/models/GLM5.md:1240
#: ../../source/tutorials/models/GLM5.md:825
#: ../../source/tutorials/models/GLM5.md:1238
msgid "Prefill node 1"
msgstr "预填充节点 1"
#: ../../source/tutorials/models/GLM5.md:906
#: ../../source/tutorials/models/GLM5.md:1247
#: ../../source/tutorials/models/GLM5.md:904
#: ../../source/tutorials/models/GLM5.md:1245
msgid "Decode node 0"
msgstr "解码节点 0"
#: ../../source/tutorials/models/GLM5.md:988
#: ../../source/tutorials/models/GLM5.md:1254
#: ../../source/tutorials/models/GLM5.md:986
#: ../../source/tutorials/models/GLM5.md:1252
msgid "Decode node 1"
msgstr "解码节点 1"
#: ../../source/tutorials/models/GLM5.md:1069
#: ../../source/tutorials/models/GLM5.md:1261
#: ../../source/tutorials/models/GLM5.md:1067
#: ../../source/tutorials/models/GLM5.md:1259
msgid "Decode node 2"
msgstr "解码节点 2"
#: ../../source/tutorials/models/GLM5.md:1150
#: ../../source/tutorials/models/GLM5.md:1268
#: ../../source/tutorials/models/GLM5.md:1148
#: ../../source/tutorials/models/GLM5.md:1266
msgid "Decode node 3"
msgstr "解码节点 3"
#: ../../source/tutorials/models/GLM5.md:1231
#: ../../source/tutorials/models/GLM5.md:1229
msgid ""
"Once the preparation is done, you can start the server with the following"
" command on each node:"
msgstr "准备工作完成后,您可以在每个节点上使用以下命令启动服务器:"
#: ../../source/tutorials/models/GLM5.md:1275
#: ../../source/tutorials/models/GLM5.md:1273
msgid "Request Forwarding"
msgstr "请求转发"
#: ../../source/tutorials/models/GLM5.md:1277
#: ../../source/tutorials/models/GLM5.md:1275
msgid ""
"To set up request forwarding, run the following script on any machine. "
"You can get the proxy program in the repository's examples: "
"[load_balance_proxy_server_example.py](https://github.com/vllm-project"
"/vllm-"
"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
msgstr "要设置请求转发,请在任何机器上运行以下脚本。您可以在仓库的示例中找到代理程序:[load_balance_proxy_server_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
msgstr ""
"要设置请求转发,请在任何机器上运行以下脚本。您可以在仓库的示例中找到代理程序:[load_balance_proxy_server_example.py](https://github.com"
"/vllm-project/vllm-"
"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
#: ../../source/tutorials/models/GLM5.md:1318
#: ../../source/tutorials/models/GLM5.md:1316
msgid "**Notice:**"
msgstr "**注意:**"
#: ../../source/tutorials/models/GLM5.md:1320
#: ../../source/tutorials/models/GLM5.md:1318
msgid "Some configurations for optimization are shown below:"
msgstr "以下是一些用于优化的配置:"
#: ../../source/tutorials/models/GLM5.md:1322
#: ../../source/tutorials/models/GLM5.md:1320
msgid ""
"`VLLM_ASCEND_ENABLE_FLASHCOMM1`: Enable FlashComm optimization to reduce "
"communication and computation overhead on prefill node. With FlashComm "
"enabled, layer_sharding list cannot include o_proj as an element."
msgstr "`VLLM_ASCEND_ENABLE_FLASHCOMM1`: 启用 FlashComm 优化以减少预填充节点上的通信和计算开销。启用 FlashComm 后layer_sharding 列表不能包含 o_proj 作为元素。"
msgstr ""
"`VLLM_ASCEND_ENABLE_FLASHCOMM1`: 启用 FlashComm 优化以减少预填充节点上的通信和计算开销。启用 "
"FlashComm 后layer_sharding 列表不能包含 o_proj 作为元素。"
#: ../../source/tutorials/models/GLM5.md:1323
#: ../../source/tutorials/models/GLM5.md:1321
msgid ""
"`VLLM_ASCEND_ENABLE_FUSED_MC2`: Enable following fused operators: "
"dispatch_gmm_combine_decode and dispatch_ffn_combine operator."
msgstr "`VLLM_ASCEND_ENABLE_FUSED_MC2`: 启用以下融合算子dispatch_gmm_combine_decode 和 dispatch_ffn_combine 算子。"
"dispatch_gmm_combine_decode and dispatch_ffn_combine operator. and please"
" **note** that this environment variable can only be enabled on decode "
"nodes."
msgstr ""
"`VLLM_ASCEND_ENABLE_FUSED_MC2`: 启用以下融合算子dispatch_gmm_combine_decode 和 "
"dispatch_ffn_combine 算子。并请**注意**,此环境变量仅可在解码节点上启用。"
#: ../../source/tutorials/models/GLM5.md:1324
#: ../../source/tutorials/models/GLM5.md:1322
msgid "`VLLM_ASCEND_ENABLE_MLAPO`: Enable fused operator MlaPreprocessOperation."
msgstr "`VLLM_ASCEND_ENABLE_MLAPO`: 启用融合算子 MlaPreprocessOperation。"
#: ../../source/tutorials/models/GLM5.md:1326
#: ../../source/tutorials/models/GLM5.md:1324
msgid ""
"Please refer to the following python file for further explanation and "
"restrictions of the environment variables above: "
"[envs.py](https://github.com/vllm-project/vllm-"
"ascend/blob/main/vllm_ascend/envs.py)"
msgstr "有关上述环境变量的进一步解释和限制,请参考以下 python 文件:[envs.py](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/envs.py)"
msgstr ""
"有关上述环境变量的进一步解释和限制,请参考以下 python 文件:[envs.py](https://github.com/vllm-"
"project/vllm-ascend/blob/main/vllm_ascend/envs.py)"
#: ../../source/tutorials/models/GLM5.md:1328
#: ../../source/tutorials/models/GLM5.md:1326
msgid "Functional Verification"
msgstr "功能验证"
#: ../../source/tutorials/models/GLM5.md:1330
#: ../../source/tutorials/models/GLM5.md:1328
msgid "Once your server is started, you can query the model with input prompts:"
msgstr "服务器启动后,您可以使用输入提示词查询模型:"
#: ../../source/tutorials/models/GLM5.md:1343
#: ../../source/tutorials/models/GLM5.md:1341
msgid "Accuracy Evaluation"
msgstr "精度评估"
#: ../../source/tutorials/models/GLM5.md:1345
#: ../../source/tutorials/models/GLM5.md:1343
msgid "Here are two accuracy evaluation methods."
msgstr "以下是两种精度评估方法。"
#: ../../source/tutorials/models/GLM5.md:1347
#: ../../source/tutorials/models/GLM5.md:1359
#: ../../source/tutorials/models/GLM5.md:1345
#: ../../source/tutorials/models/GLM5.md:1357
msgid "Using AISBench"
msgstr "使用AISBench"
#: ../../source/tutorials/models/GLM5.md:1349
#: ../../source/tutorials/models/GLM5.md:1347
msgid ""
"Refer to [Using "
"AISBench](../../developer_guide/evaluation/using_ais_bench.md) for "
"details."
msgstr "详情请参考[使用AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"
#: ../../source/tutorials/models/GLM5.md:1351
#: ../../source/tutorials/models/GLM5.md:1349
msgid "After execution, you can get the result."
msgstr "执行后,您将获得结果。"
#: ../../source/tutorials/models/GLM5.md:1353
#: ../../source/tutorials/models/GLM5.md:1351
msgid "Using Language Model Evaluation Harness"
msgstr "使用Language Model Evaluation Harness"
#: ../../source/tutorials/models/GLM5.md:1355
#: ../../source/tutorials/models/GLM5.md:1353
msgid "Not tested yet."
msgstr "尚未测试。"
#: ../../source/tutorials/models/GLM5.md:1357
#: ../../source/tutorials/models/GLM5.md:1355
msgid "Performance"
msgstr "性能"
#: ../../source/tutorials/models/GLM5.md:1361
#: ../../source/tutorials/models/GLM5.md:1359
msgid ""
"Refer to [Using AISBench for performance "
"evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
"performance-evaluation) for details."
msgstr "详情请参考[使用AISBench进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
msgstr ""
"详情请参考[使用AISBench进行性能评估](../../developer_guide/evaluation/using_ais_bench.md"
"#execute-performance-evaluation)。"
#: ../../source/tutorials/models/GLM5.md:1363
#: ../../source/tutorials/models/GLM5.md:1361
msgid "Using vLLM Benchmark"
msgstr "使用vLLM基准测试"
#: ../../source/tutorials/models/GLM5.md:1365
#: ../../source/tutorials/models/GLM5.md:1363
msgid ""
"Refer to [vllm "
"benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) "
"for more details."
msgstr "更多详情请参考[vllm基准测试](https://docs.vllm.ai/en/latest/contributing/benchmarks.html)。"
#: ../../source/tutorials/models/GLM5.md:1367
#: ../../source/tutorials/models/GLM5.md:1365
msgid "Best Practices"
msgstr "最佳实践"
#: ../../source/tutorials/models/GLM5.md:1369
#: ../../source/tutorials/models/GLM5.md:1367
msgid ""
"In this chapter, we recommend best practices in prefill-decode "
"disaggregation scenario with 1P1D architecture using 4 Atlas 800 A3 (64G "
"× 16):"
msgstr "本章节我们推荐在使用4台Atlas 800 A364G × 16的1P1D架构下预填充-解码分离场景的最佳实践:"
#: ../../source/tutorials/models/GLM5.md:1371
#: ../../source/tutorials/models/GLM5.md:1369
msgid ""
"Low-latency: We recommend setting `dp4 tp8` on prefill nodes and `dp4 "
"tp8` on decode nodes for low latency situation."
msgstr "低延迟场景:对于低延迟场景,我们建议在预填充节点上设置`dp4 tp8`,在解码节点上设置`dp4 tp8`。"
#: ../../source/tutorials/models/GLM5.md:1372
#: ../../source/tutorials/models/GLM5.md:1370
msgid ""
"High-throughput: `dp4 tp8` on prefill nodes and `dp8 tp4` on decode nodes"
" is recommended for high throughput situation."
msgstr "高吞吐场景:对于高吞吐场景,建议在预填充节点上设置`dp4 tp8`,在解码节点上设置`dp8 tp4`。"
#: ../../source/tutorials/models/GLM5.md:1374
#: ../../source/tutorials/models/GLM5.md:1372
msgid ""
"**Notice:** `max-model-len` and `max-num-seqs` need to be set according "
"to the actual usage scenario. For other settings, please refer to the "
"**[Deployment](#deployment)** chapter."
msgstr "**注意:** `max-model-len`和`max-num-seqs`需要根据实际使用场景进行设置。其他设置请参考**[部署](#deployment)**章节。"
msgstr ""
"**注意:** `max-model-len`和`max-num-"
"seqs`需要根据实际使用场景进行设置。其他设置请参考**[部署](#deployment)**章节。"
#: ../../source/tutorials/models/GLM5.md:1377
#: ../../source/tutorials/models/GLM5.md:1375
msgid "FAQ"
msgstr "常见问题"
#: ../../source/tutorials/models/GLM5.md:1379
#: ../../source/tutorials/models/GLM5.md:1377
msgid ""
"**Q: How to solve ValueError: Tokenizer class TokenizersBackend does not "
"exist or is not currently imported?**"
msgstr "**问如何解决ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported?**"
msgstr ""
"**问如何解决ValueError: Tokenizer class TokenizersBackend does not exist or "
"is not currently imported?**"
#: ../../source/tutorials/models/GLM5.md:1381
#: ../../source/tutorials/models/GLM5.md:1379
msgid "A: Please update the version of transformers to 5.2.0"
msgstr "答请将transformers版本更新至5.2.0"
#: ../../source/tutorials/models/GLM5.md:1383
#: ../../source/tutorials/models/GLM5.md:1381
msgid "**Q: How to enable function calling for GLM-5?**"
msgstr "**问如何为GLM-5启用函数调用功能**"
#: ../../source/tutorials/models/GLM5.md:1385
#: ../../source/tutorials/models/GLM5.md:1383
msgid "A: Please add following configurations in vLLM startup command"
msgstr "答请在vLLM启动命令中添加以下配置"

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -35,7 +35,9 @@ msgid ""
"resolution visual encoder with the ERNIE-4.5-0.3B language model to "
"enable accurate element recognition."
msgstr ""
"PaddleOCR-VL 是一款专为文档解析设计的 SOTA 且资源高效的模型。其核心组件是 PaddleOCR-VL-0.9B一个紧凑而强大的视觉语言模型VLM它集成了 NaViT 风格的动态分辨率视觉编码器和 ERNIE-4.5-0.3B 语言模型,以实现精确的元素识别。"
"PaddleOCR-VL 是一款专为文档解析设计的 SOTA 且资源高效的模型。其核心组件是 PaddleOCR-"
"VL-0.9B一个紧凑而强大的视觉语言模型VLM它集成了 NaViT 风格的动态分辨率视觉编码器和 ERNIE-4.5-0.3B "
"语言模型,以实现精确的元素识别。"
#: ../../source/tutorials/models/PaddleOCR-VL.md:7
msgid ""
@@ -44,8 +46,7 @@ msgid ""
"preparation, single-node deployment, and functional verification. It is "
"designed to help users quickly complete model deployment and "
"verification."
msgstr ""
"本文档提供了完整的模型部署和验证的详细工作流程,包括支持的特性、环境准备、单节点部署和功能验证。旨在帮助用户快速完成模型部署和验证。"
msgstr "本文档提供了完整的模型部署和验证的详细工作流程,包括支持的特性、环境准备、单节点部署和功能验证。旨在帮助用户快速完成模型部署和验证。"
#: ../../source/tutorials/models/PaddleOCR-VL.md:9
msgid "Supported Features"
@@ -56,8 +57,7 @@ msgid ""
"Refer to [supported "
"features](../../user_guide/support_matrix/supported_models.md) to get the"
" model's supported feature matrix."
msgstr ""
"请参考[支持的特性](../../user_guide/support_matrix/supported_models.md)以获取模型支持的特性矩阵。"
msgstr "请参考[支持的特性](../../user_guide/support_matrix/supported_models.md)以获取模型支持的特性矩阵。"
#: ../../source/tutorials/models/PaddleOCR-VL.md:13
msgid ""
@@ -78,7 +78,8 @@ msgid ""
"`PaddleOCR-VL-0.9B`: [PaddleOCR-"
"VL-0.9B](https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL)"
msgstr ""
"`PaddleOCR-VL-0.9B`: [PaddleOCR-VL-0.9B](https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL)"
"`PaddleOCR-VL-0.9B`: [PaddleOCR-"
"VL-0.9B](https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL)"
#: ../../source/tutorials/models/PaddleOCR-VL.md:21
msgid ""
@@ -99,13 +100,15 @@ msgid ""
"Select an image based on your machine type and start the docker image on "
"your node, refer to [using docker](../../installation.md#set-up-using-"
"docker)."
msgstr "根据您的机器类型选择镜像并在节点上启动 docker 镜像,请参考[使用 docker](../../installation.md#set-up-using-docker)。"
msgstr ""
"根据您的机器类型选择镜像并在节点上启动 docker 镜像,请参考[使用 docker](../../installation.md#set-"
"up-using-docker)。"
#: ../../source/tutorials/models/PaddleOCR-VL.md:51
msgid ""
"The 310P device is supported from version 0.15.0rc1. You need to select "
"the corresponding image for installation."
msgstr "310P 设备从版本 0.15.0rc1 开始支持。您需要选择对应的镜像进行安装。"
"The Atlas 300 inference products are supported from version 0.15.0rc1. "
"You need to select the corresponding image for installation."
msgstr "Atlas 300 推理产品从版本 0.15.0rc1 开始支持。您需要选择对应的镜像进行安装。"
#: ../../source/tutorials/models/PaddleOCR-VL.md:54
msgid "Deployment"
@@ -122,8 +125,9 @@ msgstr "单 NPU (PaddleOCR-VL)"
#: ../../source/tutorials/models/PaddleOCR-VL.md:60
msgid ""
"PaddleOCR-VL supports single-node single-card deployment on the 910B4 and"
" 310P platform. Follow these steps to start the inference service:"
msgstr "PaddleOCR-VL 支持在 910B4 和 310P 平台上进行单节点单卡部署。请按照以下步骤启动推理服务:"
" Atlas 300 inference products platform. Follow these steps to start the "
"inference service:"
msgstr "PaddleOCR-VL 支持在 910B4 和 Atlas 300 推理产品平台上进行单节点单卡部署。请按照以下步骤启动推理服务:"
#: ../../source/tutorials/models/PaddleOCR-VL.md:62
msgid ""
@@ -144,18 +148,20 @@ msgid "Run the following script to start the vLLM server on single 910B4:"
msgstr "运行以下脚本在单张 910B4 上启动 vLLM 服务器:"
#: ../../source/tutorials/models/PaddleOCR-VL.md
msgid "310P"
msgstr "310P"
msgid "Atlas 300 inference products"
msgstr "Atlas 300 推理产品"
#: ../../source/tutorials/models/PaddleOCR-VL.md:97
msgid "Run the following script to start the vLLM server on single 310P:"
msgstr "运行以下脚本在单张 310P 上启动 vLLM 服务器:"
msgid ""
"Run the following script to start the vLLM server on single Atlas 300 "
"inference products:"
msgstr "运行以下脚本在单张 Atlas 300 推理产品上启动 vLLM 服务器:"
#: ../../source/tutorials/models/PaddleOCR-VL.md:116
msgid ""
"The `--max_model_len` option is added to prevent errors when generating "
"the attention operator mask on the 310P device."
msgstr "添加 `--max_model_len` 选项是为了防止在 310P 设备上生成注意力算子掩码时出错。"
"the attention operator mask on the Atlas 300 inference products."
msgstr "添加 `--max_model_len` 选项是为了防止在 Atlas 300 推理产品上生成注意力算子掩码时出错。"
#: ../../source/tutorials/models/PaddleOCR-VL.md:121
msgid "Multiple NPU (PaddleOCR-VL)"
@@ -204,7 +210,9 @@ msgid ""
"DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL "
"model, making it more consistent with the examples provided by the "
"official PaddlePaddle documentation."
msgstr "在上面的示例中,我们演示了如何使用 vLLM 推理 PaddleOCR-VL-0.9B 模型。通常,我们还需要集成 PP-DocLayoutV2 模型,以充分发挥 PaddleOCR-VL 模型的能力,使其更符合官方 PaddlePaddle 文档提供的示例。"
msgstr ""
"在上面的示例中,我们演示了如何使用 vLLM 推理 PaddleOCR-VL-0.9B 模型。通常,我们还需要集成 PP-DocLayoutV2 "
"模型,以充分发挥 PaddleOCR-VL 模型的能力,使其更符合官方 PaddlePaddle 文档提供的示例。"
#: ../../source/tutorials/models/PaddleOCR-VL.md:205
msgid ""
@@ -230,11 +238,13 @@ msgstr "使用以下命令启动容器:"
#: ../../source/tutorials/models/PaddleOCR-VL.md:235
msgid ""
"Install "
"Install "
"[PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined)"
" and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)"
" and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)"
msgstr ""
"安装 [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) 和 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)"
"安装 "
"[PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined)"
" 和 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)"
#: ../../source/tutorials/models/PaddleOCR-VL.md:246
msgid "The OpenCV component may be missing:"
@@ -252,11 +262,14 @@ msgstr "OM 推理"
#: ../../source/tutorials/models/PaddleOCR-VL.md:264
msgid ""
"The 310P device supports only the OM model inference. For details about "
"the process, see the guide provided in "
"The Atlas 300 inference products support only the OM model inference. For"
" details about the process, see the guide provided in "
"[ModelZoo](https://gitcode.com/Ascend/ModelZoo-"
"PyTorch/tree/master/ACL_PyTorch/built-in/ocr/PP-DocLayoutV2)."
msgstr "310P 设备仅支持 OM 模型推理。有关该过程的详细信息,请参阅 [ModelZoo](https://gitcode.com/Ascend/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/ocr/PP-DocLayoutV2) 中提供的指南。"
msgstr ""
"Atlas 300 推理产品仅支持 OM 模型推理。有关该过程的详细信息,请参阅 [ModelZoo](https://gitcode.com/Ascend"
"/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/ocr/PP-DocLayoutV2) "
"中提供的指南。"
#: ../../source/tutorials/models/PaddleOCR-VL.md:268
msgid ""

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -51,7 +51,8 @@ msgid ""
"demonstration, showcasing the `Qwen3-VL-8B-Instruct` model as an example "
"for single NPU deployment and the `Qwen2.5-VL-32B-Instruct` model as an "
"example for multi-NPU deployment."
msgstr "本教程使用 vLLM-Ascend `v0.11.0rc3-a3` 版本进行演示,以 `Qwen3-VL-8B-Instruct` 模型为例展示单NPU部署以 `Qwen2.5-VL-32B-Instruct` 模型为例展示多NPU部署。"
msgstr ""
"本教程使用 vLLM-Ascend `v0.11.0rc3-a3` 版本进行演示,以 `Qwen3-VL-8B-Instruct` 模型为例展示单NPU部署以 `Qwen2.5-VL-32B-Instruct` 模型为例展示多NPU部署。"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:11
msgid "Supported Features"
@@ -86,56 +87,65 @@ msgstr "需要 1 个 Atlas 800I A2 (64G × 8) 节点或 1 个 Atlas 800 A3 (64G
msgid ""
"`Qwen2.5-VL-3B-Instruct`: [Download model "
"weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-3B-Instruct)"
msgstr "`Qwen2.5-VL-3B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-3B-Instruct)"
msgstr ""
"`Qwen2.5-VL-3B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-3B-Instruct)"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:24
msgid ""
"`Qwen2.5-VL-7B-Instruct`: [Download model "
"weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct)"
msgstr "`Qwen2.5-VL-7B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct)"
msgstr ""
"`Qwen2.5-VL-7B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct)"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:25
msgid ""
"`Qwen2.5-VL-32B-Instruct`:[Download model "
"weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct)"
msgstr "`Qwen2.5-VL-32B-Instruct`:[下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct)"
msgstr ""
"`Qwen2.5-VL-32B-Instruct`:[下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct)"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:26
msgid ""
"`Qwen2.5-VL-72B-Instruct`:[Download model "
"weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-72B-Instruct)"
msgstr "`Qwen2.5-VL-72B-Instruct`:[下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-72B-Instruct)"
msgstr ""
"`Qwen2.5-VL-72B-Instruct`:[下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-72B-Instruct)"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:27
msgid ""
"`Qwen3-VL-2B-Instruct`: [Download model "
"weight](https://modelscope.cn/models/Qwen/Qwen3-VL-2B-Instruct)"
msgstr "`Qwen3-VL-2B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-2B-Instruct)"
msgstr ""
"`Qwen3-VL-2B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-2B-Instruct)"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:28
msgid ""
"`Qwen3-VL-4B-Instruct`: [Download model "
"weight](https://modelscope.cn/models/Qwen/Qwen3-VL-4B-Instruct)"
msgstr "`Qwen3-VL-4B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-4B-Instruct)"
msgstr ""
"`Qwen3-VL-4B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-4B-Instruct)"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:29
msgid ""
"`Qwen3-VL-8B-Instruct`: [Download model "
"weight](https://modelscope.cn/models/Qwen/Qwen3-VL-8B-Instruct)"
msgstr "`Qwen3-VL-8B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-8B-Instruct)"
msgstr ""
"`Qwen3-VL-8B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-8B-Instruct)"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:30
msgid ""
"`Qwen3-VL-32B-Instruct`: [Download model "
"weight](https://modelscope.cn/models/Qwen/Qwen3-VL-32B-Instruct)"
msgstr "`Qwen3-VL-32B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-32B-Instruct)"
msgstr ""
"`Qwen3-VL-32B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-32B-Instruct)"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:32
msgid ""
"A sample Qwen2.5-VL quantization script can be found in the modelslim "
"code repository. [Qwen2.5-VL Quantization Script "
"Example](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)"
msgstr "可以在 modelslim 代码仓库中找到 Qwen2.5-VL 的量化脚本示例。[Qwen2.5-VL 量化脚本示例](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)"
msgstr ""
"可以在 modelslim 代码仓库中找到 Qwen2.5-VL 的量化脚本示例。[Qwen2.5-VL 量化脚本示例](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:34
msgid ""
@@ -172,8 +182,7 @@ msgid ""
"memory. You can find more details "
"[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
msgstr ""
"`max_split_size_mb` 可防止原生分配器拆分大于此大小(以 MB 为单位)的内存块。这可以减少内存碎片,并可能使一些临界工作负载在内存耗尽前完成。您可以在"
"[<u>此处</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
"`max_split_size_mb` 可防止原生分配器拆分大于此大小(以 MB 为单位)的内存块。这可以减少内存碎片,并可能使一些临界工作负载在内存耗尽前完成。您可以在[<u>此处</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:115
msgid "Deployment"
@@ -217,10 +226,10 @@ msgid ""
"Add `--max_model_len` option to avoid ValueError that the Qwen3-VL-8B-"
"Instruct model's max seq len (256000) is larger than the maximum number "
"of tokens that can be stored in KV cache. This will differ with different"
" NPU series based on the HBM size. Please modify the value according to a"
" suitable value for your NPU series."
" NPU series based on the on-chip memory size. Please modify the value "
"according to a suitable value for your NPU series."
msgstr ""
"添加 `--max_model_len` 选项以避免 ValueError该错误提示 Qwen3-VL-8B-Instruct 模型的最大序列长度256000大于 KV 缓存可存储的最大令牌数。此值因不同 NPU 系列的 HBM 大小而异。请根据您 NPU 系列的合适值修改此值。"
"添加 `--max_model_len` 选项以避免 ValueError该错误提示 Qwen3-VL-8B-Instruct 模型的最大序列长度256000大于 KV 缓存可存储的最大令牌数。此值因不同 NPU 系列的片上内存大小而异。请根据您 NPU 系列的合适值修改此值。"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:335
#: ../../source/tutorials/models/Qwen-VL-Dense.md:422
@@ -253,10 +262,10 @@ msgid ""
"Add `--max_model_len` option to avoid ValueError that the Qwen2.5-VL-32B-"
"Instruct model's max_model_len (128000) is larger than the maximum number"
" of tokens that can be stored in KV cache. This will differ with "
"different NPU series base on the HBM size. Please modify the value "
"according to a suitable value for your NPU series."
"different NPU series base on the on-chip memory size. Please modify the "
"value according to a suitable value for your NPU series."
msgstr ""
"添加 `--max_model_len` 选项以避免 ValueError该错误提示 Qwen2.5-VL-32B-Instruct 模型的最大模型长度128000大于 KV 缓存可存储的最大令牌数。此值因不同 NPU 系列的 HBM 大小而异。请根据您 NPU 系列的合适值修改此值。"
"添加 `--max_model_len` 选项以避免 ValueError该错误提示 Qwen2.5-VL-32B-Instruct 模型的最大模型长度128000大于 KV 缓存可存储的最大令牌数。此值因不同 NPU 系列的片上内存大小而异。请根据您 NPU 系列的合适值修改此值。"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:468
msgid "Accuracy Evaluation"
@@ -292,7 +301,8 @@ msgid ""
"Refer to [Using "
"lm_eval](../../developer_guide/evaluation/using_lm_eval.md) for more "
"details on `lm_eval` installation."
msgstr "有关 `lm_eval` 安装的更多详细信息,请参考[使用 lm_eval](../../developer_guide/evaluation/using_lm_eval.md)。"
msgstr ""
"有关 `lm_eval` 安装的更多详细信息,请参考[使用 lm_eval](../../developer_guide/evaluation/using_lm_eval.md)。"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:492
#: ../../source/tutorials/models/Qwen-VL-Dense.md:523
@@ -315,7 +325,8 @@ msgstr "以 `mmmu_val` 数据集作为测试数据集为例,在离线模式下
msgid ""
"After execution, you can get the result, here is the result of `Qwen2.5"
"-VL-32B-Instruct` in `vllm-ascend:0.11.0rc3` for reference only."
msgstr "执行后,您将获得结果。以下是 `vllm-ascend:0.11.0rc3` 中 `Qwen2.5-VL-32B-Instruct` 的结果,仅供参考。"
msgstr ""
"执行后,您将获得结果。以下是 `vllm-ascend:0.11.0rc3` 中 `Qwen2.5-VL-32B-Instruct` 的结果,仅供参考。"
#: ../../source/tutorials/models/Qwen-VL-Dense.md:543
msgid "Performance"
@@ -357,4 +368,4 @@ msgstr "性能评估必须在在线模式下进行。以 `serve` 为例。按如
msgid ""
"After about several minutes, you can get the performance evaluation "
"result."
msgstr "大约几分钟后,您将获得性能评估结果。"
msgstr "大约几分钟后,您将获得性能评估结果。"

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -35,7 +35,8 @@ msgid ""
"advancements in reasoning, instruction-following, agent capabilities, and"
" multilingual support."
msgstr ""
"Qwen3 是 Qwen 系列最新一代的大语言模型提供了一套完整的稠密模型和专家混合模型。基于广泛的训练Qwen3 在推理、指令遵循、智能体能力和多语言支持方面实现了突破性进展。"
"Qwen3 是 Qwen 系列最新一代的大语言模型,提供了一套完整的稠密模型和专家混合MoE模型。基于广泛的训练Qwen3 "
"在推理、指令遵循、智能体能力和多语言支持方面实现了突破性进展。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:7
msgid ""
@@ -80,7 +81,9 @@ msgid ""
"1 Atlas 800 A2 (64G × 8) node or 2 Atlas 800 A2(32G × 8)nodes. [Download "
"model weight](https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B)"
msgstr ""
"`Qwen3-235B-A22B`(BF16 版本):需要 1 个 Atlas 800 A3 (64G × 16) 节点、1 个 Atlas 800 A2 (64G × 8) 节点或 2 个 Atlas 800 A2(32G × 8) 节点。[下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B)"
"`Qwen3-235B-A22B`(BF16 版本):需要 1 个 Atlas 800 A3 (64G × 16) 节点、1 个 Atlas "
"800 A2 (64G × 8) 节点或 2 个 Atlas 800 A2(32G × 8) "
"节点。[下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B)"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:22
msgid ""
@@ -89,7 +92,10 @@ msgid ""
"8)nodes. [Download model weight](https://modelscope.cn/models/vllm-"
"ascend/Qwen3-235B-A22B-W8A8)"
msgstr ""
"`Qwen3-235B-A22B-w8a8`(量化版本):需要 1 个 Atlas 800 A3 (64G × 16) 节点、1 个 Atlas 800 A2 (64G × 8) 节点或 2 个 Atlas 800 A2(32G × 8) 节点。[下载模型权重](https://modelscope.cn/models/vllm-ascend/Qwen3-235B-A22B-W8A8)"
"`Qwen3-235B-A22B-w8a8`(量化版本):需要 1 个 Atlas 800 A3 (64G × 16) 节点、1 个 Atlas "
"800 A2 (64G × 8) 节点或 2 个 Atlas 800 A2(32G × 8) "
"节点。[下载模型权重](https://modelscope.cn/models/vllm-ascend/Qwen3-235B-A22B-"
"W8A8)"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:24
msgid ""
@@ -106,7 +112,9 @@ msgid ""
"If you want to deploy multi-node environment, you need to verify multi-"
"node communication according to [verify multi-node communication "
"environment](../../installation.md#verify-multi-node-communication)."
msgstr "如果您想部署多节点环境,需要根据[验证多节点通信环境](../../installation.md#verify-multi-node-communication)来验证多节点通信。"
msgstr ""
"如果您想部署多节点环境,需要根据[验证多节点通信环境](../../installation.md#verify-multi-node-"
"communication)来验证多节点通信。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:30
msgid "Installation"
@@ -121,14 +129,18 @@ msgid ""
"For example, using images `quay.io/ascend/vllm-ascend:v0.11.0rc2`(for "
"Atlas 800 A2) and `quay.io/ascend/vllm-ascend:v0.11.0rc2-a3`(for Atlas "
"800 A3)."
msgstr "例如,使用镜像 `quay.io/ascend/vllm-ascend:v0.11.0rc2`(适用于 Atlas 800 A2和 `quay.io/ascend/vllm-ascend:v0.11.0rc2-a3`(适用于 Atlas 800 A3"
msgstr ""
"例如,使用镜像 `quay.io/ascend/vllm-ascend:v0.11.0rc2`(适用于 Atlas 800 A2和 "
"`quay.io/ascend/vllm-ascend:v0.11.0rc2-a3`(适用于 Atlas 800 A3。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:38
msgid ""
"Select an image based on your machine type and start the docker image on "
"your node, refer to [using docker](../../installation.md#set-up-using-"
"docker)."
msgstr "根据您的机器类型选择镜像并在节点上启动 Docker 容器,请参考[使用 Docker](../../installation.md#set-up-using-docker)。"
msgstr ""
"根据您的机器类型选择镜像并在节点上启动 Docker 容器,请参考[使用 Docker](../../installation.md#set-"
"up-using-docker)。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md
msgid "Build from source"
@@ -142,7 +154,9 @@ msgstr "您可以从源码构建所有组件。"
msgid ""
"Install `vllm-ascend`, refer to [set up using "
"python](../../installation.md#set-up-using-python)."
msgstr "安装 `vllm-ascend`,请参考[使用 Python 设置](../../installation.md#set-up-using-python)。"
msgstr ""
"安装 `vllm-ascend`,请参考[使用 Python 设置](../../installation.md#set-up-using-"
"python)。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:84
msgid ""
@@ -163,7 +177,10 @@ msgid ""
"`Qwen3-235B-A22B` and `Qwen3-235B-A22B-w8a8` can both be deployed on 1 "
"Atlas 800 A3(64G*16), 1 Atlas 800 A2(64G*8). Quantized version need to "
"start with parameter `--quantization ascend`."
msgstr "`Qwen3-235B-A22B` 和 `Qwen3-235B-A22B-w8a8` 都可以部署在 1 个 Atlas 800 A3(64G*16) 或 1 个 Atlas 800 A2(64G*8) 上。量化版本需要使用参数 `--quantization ascend` 启动。"
msgstr ""
"`Qwen3-235B-A22B` 和 `Qwen3-235B-A22B-w8a8` 都可以部署在 1 个 Atlas 800 "
"A3(64G*16) 或 1 个 Atlas 800 A2(64G*8) 上。量化版本需要使用参数 `--quantization ascend`"
" 启动。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:93
msgid "Run the following script to execute online 128k inference."
@@ -181,7 +198,10 @@ msgid ""
"quantization weights to run long seqs (such as 128k context), it is "
"required to use yarn rope-scaling technique."
msgstr ""
"[Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B#processing-long-texts) 原本仅支持 40960 上下文长度max_position_embeddings。如果您想使用它及其相关的量化权重来运行长序列例如 128k 上下文),需要使用 yarn rope-scaling 技术。"
"[Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B#processing-"
"long-texts) 原本仅支持 40960 "
"上下文长度max_position_embeddings。如果您想使用它及其相关的量化权重来运行长序列例如 128k 上下文),需要使用 "
"yarn rope-scaling 技术。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:129
#, python-brace-format
@@ -192,7 +212,8 @@ msgid ""
" \\`."
msgstr ""
"对于 `v0.12.0` 及以上版本的 vLLM使用参数`--hf-overrides '{\"rope_parameters\": "
"{\"rope_type\":\"yarn\",\"rope_theta\":1000000,\"factor\":4,\"original_max_position_embeddings\":32768}}' \\`。"
"{\"rope_type\":\"yarn\",\"rope_theta\":1000000,\"factor\":4,\"original_max_position_embeddings\":32768}}'"
" \\`。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:130
#, python-brace-format
@@ -205,7 +226,10 @@ msgid ""
"parameter."
msgstr ""
"对于 `v0.12.0` 以下版本的 vLLM使用参数`--rope_scaling "
"'{\"rope_type\":\"yarn\",\"factor\":4,\"original_max_position_embeddings\":32768}' \\`。如果您使用的是像 [Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507) 这样原本就支持长上下文的权重,则无需添加此参数。"
"'{\"rope_type\":\"yarn\",\"factor\":4,\"original_max_position_embeddings\":32768}'"
" \\`。如果您使用的是像 [Qwen3-235B-A22B-"
"Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)"
" 这样原本就支持长上下文的权重,则无需添加此参数。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:133
msgid "The parameters are explained as follows:"
@@ -215,7 +239,9 @@ msgstr "参数解释如下:"
msgid ""
"`--data-parallel-size` 1 and `--tensor-parallel-size` 8 are common "
"settings for data parallelism (DP) and tensor parallelism (TP) sizes."
msgstr "`--data-parallel-size` 1 和 `--tensor-parallel-size` 8 是数据并行DP和张量并行TP大小的常见设置。"
msgstr ""
"`--data-parallel-size` 1 和 `--tensor-parallel-size` 8 "
"是数据并行DP和张量并行TP大小的常见设置。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:136
msgid ""
@@ -233,21 +259,28 @@ msgid ""
"testing performance, it is generally recommended that `--max-num-seqs` * "
"`--data-parallel-size` >= the actual total concurrency."
msgstr ""
"`--max-num-seqs` 表示每个 DP 组允许处理的最大请求数。如果发送到服务的请求数超过此限制,超出的请求将保持在等待状态,不会被调度。请注意,在等待状态所花费的时间也会计入 TTFT 和 TPOT 等指标。因此,在测试性能时,通常建议 `--max-num-seqs` * `--data-parallel-size` >= 实际总并发数。"
"`--max-num-seqs` 表示每个 DP "
"组允许处理的最大请求数。如果发送到服务的请求数超过此限制,超出的请求将保持在等待状态,不会被调度。请注意,在等待状态所花费的时间也会计入 TTFT"
" 和 TPOT 等指标。因此,在测试性能时,通常建议 `--max-num-seqs` * `--data-parallel-size` >= "
"实际总并发数。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:138
msgid ""
"`--max-num-batched-tokens` represents the maximum number of tokens that "
"the model can process in a single step. Currently, vLLM v1 scheduling "
"enables ChunkPrefill/SplitFuse by default, which means:"
msgstr "`--max-num-batched-tokens` 表示模型在单步中可以处理的最大 token 数。目前vLLM v1 调度默认启用 ChunkPrefill/SplitFuse这意味着"
msgstr ""
"`--max-num-batched-tokens` 表示模型在单步中可以处理的最大 token 数。目前vLLM v1 调度默认启用 "
"ChunkPrefill/SplitFuse这意味着"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:139
msgid ""
"(1) If the input length of a request is greater than `--max-num-batched-"
"tokens`, it will be divided into multiple rounds of computation according"
" to `--max-num-batched-tokens`;"
msgstr "(1) 如果一个请求的输入长度大于 `--max-num-batched-tokens`,它将根据 `--max-num-batched-tokens` 被分成多轮计算;"
msgstr ""
"(1) 如果一个请求的输入长度大于 `--max-num-batched-tokens`,它将根据 `--max-num-batched-"
"tokens` 被分成多轮计算;"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:140
msgid ""
@@ -277,14 +310,21 @@ msgid ""
"memory-utilization` too high may lead to OOM (Out of Memory) issues "
"during actual inference. The default value is `0.9`."
msgstr ""
"`--gpu-memory-utilization` 表示 vLLM 将用于实际推理的 HBM 比例。其核心功能是计算可用的 kv_cache 大小。在预热阶段(在 vLLM 中称为 profile runvLLM 会记录输入大小为 `--max-num-batched-tokens` 的推理过程中的峰值 GPU 内存使用量。然后,可用的 kv_cache 大小计算为:`--gpu-memory-utilization` * HBM 大小 - 峰值 GPU 内存使用量。因此,`--gpu-memory-utilization` 的值越大,可以使用的 kv_cache 就越多。然而,由于预热阶段的 GPU 内存使用量可能与实际推理期间不同(例如,由于 EP 负载不均),将 `--gpu-memory-utilization` 设置得过高可能会导致实际推理期间出现 OOM内存不足问题。默认值为 `0.9`。"
"`--gpu-memory-utilization` 表示 vLLM 将用于实际推理的 HBM 比例。其核心功能是计算可用的 kv_cache "
"大小。在预热阶段(在 vLLM 中称为 profile runvLLM 会记录输入大小为 `--max-num-batched-tokens`"
" 的推理过程中的峰值 GPU 内存使用量。然后,可用的 kv_cache 大小计算为:`--gpu-memory-utilization` * "
"HBM 大小 - 峰值 GPU 内存使用量。因此,`--gpu-memory-utilization` 的值越大,可以使用的 kv_cache "
"就越多。然而,由于预热阶段的 GPU 内存使用量可能与实际推理期间不同(例如,由于 EP 负载不均),将 `--gpu-memory-"
"utilization` 设置得过高可能会导致实际推理期间出现 OOM内存不足问题。默认值为 `0.9`。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:143
msgid ""
"`--enable-expert-parallel` indicates that EP is enabled. Note that vLLM "
"does not support a mixed approach of ETP and EP; that is, MoE can either "
"use pure EP or pure TP."
msgstr "`--enable-expert-parallel` 表示启用了 EP。请注意vLLM 不支持 ETP 和 EP 的混合方法也就是说MoE 可以使用纯 EP 或纯 TP。"
msgstr ""
"`--enable-expert-parallel` 表示启用了 EP。请注意vLLM 不支持 ETP 和 EP 的混合方法也就是说MoE "
"可以使用纯 EP 或纯 TP。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:144
msgid ""
@@ -308,7 +348,10 @@ msgid ""
"mainly used to reduce the cost of operator dispatch. Currently, "
"\"FULL_DECODE_ONLY\" is recommended."
msgstr ""
"`--compilation-config` 包含与 aclgraph 图模式相关的配置。最重要的配置是 \"cudagraph_mode\" 和 \"cudagraph_capture_sizes\",其含义如下:\"cudagraph_mode\":表示特定的图模式。目前支持 \"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 \"FULL_DECODE_ONLY\"。"
"`--compilation-config` 包含与 aclgraph 图模式相关的配置。最重要的配置是 \"cudagraph_mode\" 和"
" \"cudagraph_capture_sizes\",其含义如下:\"cudagraph_mode\":表示特定的图模式。目前支持 "
"\"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 "
"\"FULL_DECODE_ONLY\"。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:148
msgid ""
@@ -319,14 +362,18 @@ msgid ""
"Currently, the default setting is recommended. Only in some scenarios is "
"it necessary to set this separately to achieve optimal performance."
msgstr ""
"\"cudagraph_capture_sizes\":表示不同级别的图模式。默认值为 [1, 2, 4, 8, 16, 24, 32, 40,..., `--max-num-seqs`]。在图模式下,不同级别图的输入是固定的,级别之间的输入会自动填充到下一个级别。目前推荐使用默认设置。只有在某些场景下,才需要单独设置此参数以达到最佳性能。"
"\"cudagraph_capture_sizes\":表示不同级别的图模式。默认值为 [1, 2, 4, 8, 16, 24, 32, "
"40,..., `--max-num-"
"seqs`]。在图模式下,不同级别图的输入是固定的,级别之间的输入会自动填充到下一个级别。目前推荐使用默认设置。只有在某些场景下,才需要单独设置此参数以达到最佳性能。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:149
msgid ""
"`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` indicates that Flashcomm1 "
"optimization is enabled. Currently, this optimization is only supported "
"for MoE in scenarios where tp_size > 1."
msgstr "`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` 表示启用了 Flashcomm1 优化。目前,此优化仅在 tp_size > 1 的场景下对 MoE 支持。"
msgstr ""
"`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` 表示启用了 Flashcomm1 优化。目前,此优化仅在 "
"tp_size > 1 的场景下对 MoE 支持。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:151
msgid "Multi-node Deployment with MP (Recommended)"
@@ -336,7 +383,9 @@ msgstr "使用 MP 进行多节点部署(推荐)"
msgid ""
"Assume you have Atlas 800 A3 (64G*16) nodes (or 2* A2), and want to "
"deploy the `Qwen3-VL-235B-A22B-Instruct` model across multiple nodes."
msgstr "假设您有 Atlas 800 A3 (64G*16) 节点(或 2* A2并希望跨多个节点部署 `Qwen3-VL-235B-A22B-Instruct` 模型。"
msgstr ""
"假设您有 Atlas 800 A3 (64G*16) 节点(或 2* A2并希望跨多个节点部署 `Qwen3-VL-235B-A22B-"
"Instruct` 模型。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:155
msgid "Node 0"
@@ -368,7 +417,9 @@ msgstr "预填充-解码分离"
msgid ""
"refer to [Prefill-Decode Disaggregation Mooncake Verification "
"(Qwen)](../features/pd_disaggregation_mooncake_multi_node.md)"
msgstr "请参阅 [Prefill-Decode 分离部署 Mooncake 验证 (Qwen)](../features/pd_disaggregation_mooncake_multi_node.md)"
msgstr ""
"请参阅 [Prefill-Decode 分离部署 Mooncake 验证 "
"(Qwen)](../features/pd_disaggregation_mooncake_multi_node.md)"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:262
msgid "Functional Verification"
@@ -453,7 +504,10 @@ msgid ""
"Refer to [Using AISBench for performance "
"evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
"performance-evaluation) for details."
msgstr "详情请参阅 [使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
msgstr ""
"详情请参阅 [使用 AISBench "
"进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-"
"performance-evaluation)。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:297
msgid "Using vLLM Benchmark"
@@ -542,13 +596,13 @@ msgstr "单节点 A3 (64G*16)"
msgid "Example server scripts:"
msgstr "服务器脚本示例:"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:368
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:597
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:367
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:595
msgid "Benchmark scripts:"
msgstr "基准测试脚本:"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:384
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:613
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:383
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:611
msgid "Reference test results:"
msgstr "参考测试结果:"
@@ -592,48 +646,53 @@ msgstr "48.69"
msgid "2761.72"
msgstr "2761.72"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:390
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:619
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:389
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:617
msgid "Note:"
msgstr "注意:"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:392
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:391
msgid ""
"Setting `export VLLM_ASCEND_ENABLE_FUSED_MC2=1` enables MoE fused "
"operators that reduce time consumption of MoE in both prefill and decode."
" This is an experimental feature which only supports W8A8 quantization on"
" Atlas A3 servers now. If you encounter any problems when using this "
"feature, you can disable it by setting `export "
"VLLM_ASCEND_ENABLE_FUSED_MC2=0` and update issues in vLLM-Ascend "
"community."
msgstr "设置 `export VLLM_ASCEND_ENABLE_FUSED_MC2=1` 可启用 MoE 融合算子,以减少预填充和解码阶段 MoE 的时间消耗。这是一个实验性功能,目前仅支持 Atlas A3 服务器上的 W8A8 量化。如果您在使用此功能时遇到任何问题,可以通过设置 `export VLLM_ASCEND_ENABLE_FUSED_MC2=0` 来禁用它,并在 vLLM-Ascend 社区更新问题。"
"operators that reduce time consumption of MoE in decode. This is an "
"experimental feature which only supports W8A8 quantization on Atlas A3 "
"servers now. If you encounter any problems when using this feature, you "
"can disable it by setting `export VLLM_ASCEND_ENABLE_FUSED_MC2=0` and "
"update issues in vLLM-Ascend community. **Note** that this environment "
"variable can only be enabled on decode nodes."
msgstr ""
"设置 `export VLLM_ASCEND_ENABLE_FUSED_MC2=1` 可启用 MoE 融合算子,以减少解码阶段 MoE "
"的时间消耗。这是一个实验性功能,目前仅支持 Atlas A3 服务器上的 W8A8 量化。如果您在使用此功能时遇到任何问题,可以通过设置 "
"`export VLLM_ASCEND_ENABLE_FUSED_MC2=0` 来禁用它,并在 vLLM-Ascend 社区更新问题。**注意**,此环境变量只能在解码节点上启用。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:393
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:392
msgid ""
"Here we disable prefix cache because of random datasets. You can enable "
"prefix cache if requests have long common prefix."
msgstr "由于使用随机数据集,此处我们禁用了前缀缓存。如果请求具有较长的公共前缀,您可以启用前缀缓存。"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:395
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:394
msgid "Three Node A3 -- PD disaggregation"
msgstr "三节点 A3 -- PD 分离部署"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:397
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:396
msgid ""
"On three Atlas 800 A3(64G*16) server, we recommend to use one node as one"
" prefill instance and two nodes as one decode instance. Example server "
"scripts: Prefill Node 1"
msgstr "在三台 Atlas 800 A3(64G*16) 服务器上,我们建议使用一个节点作为一个预填充实例,两个节点作为一个解码实例。服务器脚本示例:预填充节点 1"
msgstr ""
"在三台 Atlas 800 A3(64G*16) "
"服务器上,我们建议使用一个节点作为一个预填充实例,两个节点作为一个解码实例。服务器脚本示例:预填充节点 1"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:462
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:460
msgid "Decode Node 1"
msgstr "解码节点 1"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:526
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:524
msgid "Decode Node 2"
msgstr "解码节点 2"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:591
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:589
msgid "PD proxy:"
msgstr "PD 代理:"
@@ -657,9 +716,13 @@ msgstr "52.07"
msgid "8593.44"
msgstr "8593.44"
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:621
#: ../../source/tutorials/models/Qwen3-235B-A22B.md:619
msgid ""
"We recommend to set `export VLLM_ASCEND_ENABLE_FUSED_MC2=2` on this "
"scenario (typically EP32 for Qwen3-235B). This enables a different MoE "
"fusion operator."
msgstr "在此场景下(通常 Qwen3-235B 使用 EP32我们建议设置 `export VLLM_ASCEND_ENABLE_FUSED_MC2=2`。这将启用一个不同的 MoE 融合算子。"
"fusion operator. **Note** that this environment variable can only be "
"enabled on decode nodes."
msgstr ""
"在此场景下(通常 Qwen3-235B 使用 EP32我们建议设置 `export "
"VLLM_ASCEND_ENABLE_FUSED_MC2=2`。这将启用一个不同的 MoE 融合算子。"
"**注意**:此环境变量只能在解码节点上启用。"

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -29,17 +29,15 @@ msgstr "简介"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:5
msgid ""
"Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation "
"models. It processes text, images, audio, and video, and delivers real-"
"Qwen3-Omni is a native end-to-end multilingual omni-modal foundation "
"model. It processes text, images, audio, and video, and delivers real-"
"time streaming responses in both text and natural speech. We introduce "
"several architectural upgrades to improve performance and efficiency. The"
" Thinking model of Qwen3-Omni-30B-A3B, containing the thinker component, "
"equipped with chain-of-thought reasoning, supporting audio, video, and "
"text input, with text output."
" Thinking model of Qwen3-Omni-30B-A3B, which contains the thinker "
"component, is equipped with chain-of-thought reasoning and supports "
"audio, video, and text input, with text output."
msgstr ""
"Qwen3-Omni "
"是原生端到端多语言全模态基础模型。它能处理文本、图像、音频和视频并以文本和自然语音形式提供实时流式响应。我们引入了多项架构升级以提升性能和效率。Qwen3"
"-Omni-30B-A3B 的 Thinking 模型包含思考器组件,具备思维链推理能力,支持音频、视频和文本输入,输出为文本。"
"Qwen3-Omni 是原生端到端多语言全模态基础模型。它能处理文本、图像、音频和视频并以文本和自然语音形式提供实时流式响应。我们引入了多项架构升级以提升性能和效率。Qwen3-Omni-30B-A3B 的 Thinking 模型包含思考器组件,具备思维链推理能力,支持音频、视频和文本输入,输出为文本。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:7
msgid ""
@@ -54,21 +52,19 @@ msgstr "支持的功能"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:11
msgid ""
"Refer to [supported features](https://docs.vllm.ai/projects/ascend/zh-"
"cn/latest/user_guide/support_matrix/supported_models.html) to get the "
"Refer to [supported features](https://docs.vllm.ai/projects/ascend/zh-"
"cn/latest/user_guide/support_matrix/supported_models.html) to get the "
"model's supported feature matrix."
msgstr ""
"请参考 [支持的功能](https://docs.vllm.ai/projects/ascend/zh-"
"cn/latest/user_guide/support_matrix/supported_models.html) 以获取模型支持的功能矩阵。"
"请参考 [支持的功能](https://docs.vllm.ai/projects/ascend/zh-cn/latest/user_guide/support_matrix/supported_models.html) 以获取模型支持的功能矩阵。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:13
msgid ""
"Refer to [feature guide](https://docs.vllm.ai/projects/ascend/zh-"
"cn/latest/user_guide/feature_guide/index.html) to get the feature's "
"Refer to [feature guide](https://docs.vllm.ai/projects/ascend/zh-"
"cn/latest/user_guide/feature_guide/index.html) to get the feature's "
"configuration."
msgstr ""
"请参考 [功能指南](https://docs.vllm.ai/projects/ascend/zh-"
"cn/latest/user_guide/feature_guide/index.html) 以获取功能的配置信息。"
"请参考 [功能指南](https://docs.vllm.ai/projects/ascend/zh-cn/latest/user_guide/feature_guide/index.html) 以获取功能的配置信息。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:15
msgid "Environment Preparation"
@@ -83,17 +79,15 @@ msgid ""
"`Qwen3-Omni-30B-A3B-Thinking` requires 2 NPU Cards (64G × 2).[Download "
"model weight](https://modelscope.cn/models/Qwen/Qwen3-Omni-30B-A3B-"
"Thinking) It is recommended to download the model weight to the shared "
"directory of multiple nodes, such as `/root/.cache/`"
"directory of multiple nodes, such as `/root/.cache/`"
msgstr ""
"`Qwen3-Omni-30B-A3B-Thinking` 需要 2 张 NPU 卡 (64G × "
"2)。[下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-Omni-30B-A3B-"
"Thinking)。建议将模型权重下载到多节点的共享目录,例如 `/root/.cache/`。"
"`Qwen3-Omni-30B-A3B-Thinking` 需要 2 张 NPU 卡 (64G × 2)。[下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-Omni-30B-A3B-Thinking)。建议将模型权重下载到多节点的共享目录,例如 `/root/.cache/`。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:22
msgid "Installation"
msgstr "安装"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:24
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md
msgid "Use docker image"
msgstr "使用 Docker 镜像"
@@ -109,10 +103,9 @@ msgid ""
"your node, refer to [using docker](../../installation.md#set-up-using-"
"docker)."
msgstr ""
"根据您的机器类型选择镜像并在节点上启动 Docker 镜像,请参考 [使用 Docker](../../installation.md#set-"
"up-using-docker)。"
"根据您的机器类型选择镜像并在节点上启动 Docker 镜像,请参考 [使用 Docker](../../installation.md#set-up-using-docker)。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:32
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md
msgid "Build from source"
msgstr "从源码构建"
@@ -125,8 +118,7 @@ msgid ""
"Install `vllm-ascend`, refer to [set up using "
"python](../../installation.md#set-up-using-python)."
msgstr ""
"安装 `vllm-ascend`,请参考 [使用 Python 设置](../../installation.md#set-up-using-"
"python)。"
"安装 `vllm-ascend`,请参考 [使用 Python 设置](../../installation.md#set-up-using-python)。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:71
msgid "Please install system dependencies"
@@ -159,8 +151,7 @@ msgid ""
" least 1, and for 32 GB of memory, tensor-parallel-size should be at "
"least 2."
msgstr ""
"运行以下脚本在多 NPU 上启动 vLLM 服务器:对于具有 64 GB NPU 卡内存的 Atlas A2tensor-parallel-"
"size 应至少为 1对于 32 GB 内存tensor-parallel-size 应至少为 2。"
"运行以下脚本在多 NPU 上启动 vLLM 服务器:对于具有 64 GB NPU 卡内存的 Atlas A2tensor-parallel-size 应至少为 1对于 32 GB 内存tensor-parallel-size 应至少为 2。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:188
msgid "Functional Verification"
@@ -188,8 +179,7 @@ msgid ""
"dataset, and run accuracy evaluation of `Qwen3-Omni-30B-A3B-Thinking` in "
"online mode."
msgstr ""
"以 `gsm8k`、`omnibench`、`bbh` 数据集作为测试数据集为例,在在线模式下运行 `Qwen3-Omni-30B-A3B-"
"Thinking` 的精度评估。"
"以 `gsm8k`、`omnibench`、`bbh` 数据集作为测试数据集为例,在在线模式下运行 `Qwen3-Omni-30B-A3B-Thinking` 的精度评估。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:239
msgid ""
@@ -197,21 +187,19 @@ msgid ""
"evalscope(<https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_evalscope.html"
"#install-evalscope-using-pip>) for `evalscope`installation."
msgstr ""
"关于 `evalscope` 的安装,请参考使用 evalscope "
"(<https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_evalscope.html"
"#install-evalscope-using-pip>)。"
"关于 `evalscope` 的安装,请参考使用 evalscope (<https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_evalscope.html#install-evalscope-using-pip>)。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:240
msgid "Run `evalscope` to execute the accuracy evaluation."
msgstr "运行 `evalscope` 以执行精度评估。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:255
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:296
msgid ""
"After execution, you can get the result, here is the result of `Qwen3"
"-Omni-30B-A3B-Thinking` in vllm-ascend:0.13.0rc1 for reference only."
msgstr ""
"执行后,您可以获得结果。以下是 `Qwen3-Omni-30B-A3B-Thinking` 在 vllm-ascend:0.13.0rc1 "
"中的结果,仅供参考。"
"执行后,您可以获得结果。以下是 `Qwen3-Omni-30B-A3B-Thinking` 在 vllm-ascend:0.13.0rc1 中的结果,仅供参考。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:269
msgid "Performance"
@@ -228,8 +216,7 @@ msgid ""
"benchmark](https://docs.vllm.ai/en/latest/benchmarking/) for more "
"details."
msgstr ""
"以运行 `Qwen3-Omni-30B-A3B-Thinking` 的性能评估为例。更多详情请参考 vllm 基准测试。更多详情请参考 [vllm"
" 基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"
"以运行 `Qwen3-Omni-30B-A3B-Thinking` 的性能评估为例。更多详情请参考 vllm 基准测试。更多详情请参考 [vllm 基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:277
msgid "There are three `vllm bench` subcommands:"
@@ -249,12 +236,4 @@ msgstr "`throughput`:对离线推理吞吐量进行基准测试。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:283
msgid "Take the `serve` as an example. Run the code as follows."
msgstr "以 `serve` 为例。按如下方式运行代码。"
#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:296
msgid ""
"After execution, you can get the result, here is the result of `Qwen3"
"-Omni-30B-A3B-Thinking` in vllm-ascend:0.13.0rc1 for reference only."
msgstr ""
"执行后,您可以获得结果。以下是 `Qwen3-Omni-30B-A3B-Thinking` 在 vllm-ascend:0.13.0rc1 "
"中的结果,仅供参考。"
msgstr "以 `serve` 为例。按如下方式运行代码。"

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-15 09:41+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -118,7 +118,7 @@ msgstr ""
msgid "Installation"
msgstr "安装"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:34
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md
msgid "Use docker image"
msgstr "使用 Docker 镜像"
@@ -140,7 +140,7 @@ msgstr ""
"根据您的机器类型选择镜像并在节点上启动 Docker 镜像,请参考[使用 Docker](../../installation.md#set-"
"up-using-docker)。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:76
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md
msgid "Build from source"
msgstr "从源码构建"
@@ -185,15 +185,15 @@ msgid ""
"A3(64G*16)."
msgstr "在 1 个 Atlas 800 A3(64G*16) 上运行以下脚本以执行在线 128k 推理。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:133
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:132
msgid "**Notice:**"
msgstr "**注意:**"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:135
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:134
msgid "The parameters are explained as follows:"
msgstr "参数解释如下:"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:137
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:136
msgid ""
"`--data-parallel-size` 1 and `--tensor-parallel-size` 16 are common "
"settings for data parallelism (DP) and tensor parallelism (TP) sizes."
@@ -201,13 +201,13 @@ msgstr ""
"`--data-parallel-size` 1 和 `--tensor-parallel-size` 16 是数据并行 (DP) 和张量并行 "
"(TP) 大小的常见设置。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:138
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:137
msgid ""
"`--max-model-len` represents the context length, which is the maximum "
"value of the input plus output for a single request."
msgstr "`--max-model-len` 表示上下文长度,即单个请求的输入加输出的最大值。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:139
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:138
msgid ""
"`--max-num-seqs` indicates the maximum number of requests that each DP "
"group is allowed to process. If the number of requests sent to the "
@@ -222,7 +222,7 @@ msgstr ""
" 和 TPOT 等指标。因此,在测试性能时,通常建议 `--max-num-seqs` * `--data-parallel-size` >= "
"实际总并发数。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:140
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:139
msgid ""
"`--max-num-batched-tokens` represents the maximum number of tokens that "
"the model can process in a single step. Currently, vLLM v1 scheduling "
@@ -231,7 +231,7 @@ msgstr ""
"`--max-num-batched-tokens` 表示模型单步可以处理的最大 token 数。目前vLLM v1 调度默认启用 "
"ChunkPrefill/SplitFuse这意味着"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:141
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:140
msgid ""
"(1) If the input length of a request is greater than `--max-num-batched-"
"tokens`, it will be divided into multiple rounds of computation according"
@@ -240,20 +240,20 @@ msgstr ""
"(1) 如果请求的输入长度大于 `--max-num-batched-tokens`,它将根据 `--max-num-batched-"
"tokens` 被分成多轮计算;"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:142
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:141
msgid ""
"(2) Decode requests are prioritized for scheduling, and prefill requests "
"are scheduled only if there is available capacity."
msgstr "(2) 解码请求优先调度,只有在有可用容量时才调度预填充请求。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:143
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:142
msgid ""
"Generally, if `--max-num-batched-tokens` is set to a larger value, the "
"overall latency will be lower, but the pressure on GPU memory (activation"
" value usage) will be greater."
msgstr "通常,如果 `--max-num-batched-tokens` 设置得较大,整体延迟会更低,但 GPU 内存(激活值使用)的压力会更大。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:144
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:143
msgid ""
"`--gpu-memory-utilization` represents the proportion of HBM that vLLM "
"will use for actual inference. Its essential function is to calculate the"
@@ -275,7 +275,7 @@ msgstr ""
"就越多。然而,由于预热阶段的 GPU 内存使用量可能与实际推理时不同(例如,由于 EP 负载不均),将 `--gpu-memory-"
"utilization` 设置得过高可能导致实际推理时出现 OOM内存不足问题。默认值为 `0.9`。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:145
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:144
msgid ""
"`--enable-expert-parallel` indicates that EP is enabled. Note that vLLM "
"does not support a mixed approach of ETP and EP; that is, MoE can either "
@@ -284,7 +284,7 @@ msgstr ""
"`--enable-expert-parallel` 表示启用了 EP。请注意vLLM 不支持 ETP 和 EP 的混合方法也就是说MoE "
"要么使用纯 EP要么使用纯 TP。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:146
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:145
msgid ""
"`--no-enable-prefix-caching` indicates that prefix caching is disabled. "
"To enable it, for mamba-like models Qwen3.5, set `--enable-prefix-"
@@ -298,13 +298,13 @@ msgstr ""
"的实现可能在调度时导致非常大的 block_size。例如block_size 可能被调整为 2048这意味着任何短于 2048 "
"的前缀将永远不会被缓存。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:147
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:146
msgid ""
"`--quantization` \"ascend\" indicates that quantization is used. To "
"disable quantization, remove this option."
msgstr "`--quantization` \"ascend\" 表示使用了量化。要禁用量化,请移除此选项。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:148
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:147
msgid ""
"`--compilation-config` contains configurations related to the aclgraph "
"graph mode. The most significant configurations are \"cudagraph_mode\" "
@@ -319,7 +319,7 @@ msgstr ""
"\"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 "
"\"FULL_DECODE_ONLY\"。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:150
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:149
msgid ""
"\"cudagraph_capture_sizes\": represents different levels of graph modes. "
"The default value is [1, 2, 4, 8, 16, 24, 32, 40,..., `--max-num-seqs`]. "
@@ -332,123 +332,132 @@ msgstr ""
"40,..., `--max-num-"
"seqs`]。在图模式下,不同级别图的输入是固定的,级别之间的输入会自动填充到下一个级别。目前推荐使用默认设置。只有在某些场景下,才需要单独设置此参数以达到最佳性能。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:152
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:151
msgid "Multi-node Deployment with MP (Recommended)"
msgstr "使用 MP 的多节点部署(推荐)"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:154
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:153
msgid ""
"Assume you have 2 Atlas 800 A2 nodes, and want to deploy the `Qwen3.5"
"-397B-A17B-w8a8-mtp` model across multiple nodes."
msgstr "假设您有 2 个 Atlas 800 A2 节点,并希望跨多个节点部署 `Qwen3.5-397B-A17B-w8a8-mtp` 模型。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:156
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:155
msgid "Node 0"
msgstr "节点 0"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:202
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:201
msgid "Node1"
msgstr "节点 1"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:252
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:251
msgid ""
"If the service starts successfully, the following information will be "
"displayed on node 0:"
msgstr "如果服务启动成功,节点 0 上将显示以下信息:"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:263
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:262
msgid "Multi-node Deployment with Ray"
msgstr "使用 Ray 的多节点部署"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:265
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:264
msgid "refer to [Ray Distributed (Qwen/Qwen3-235B-A22B)](../features/ray.md)."
msgstr "请参考 [Ray 分布式 (Qwen/Qwen3-235B-A22B)](../features/ray.md)。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:267
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:266
msgid "Prefill-Decode Disaggregation"
msgstr "预填充-解码解耦"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:269
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:268
msgid ""
"We recommend using Mooncake for deployment: "
"[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)."
msgstr "我们推荐使用 Mooncake 进行部署:[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)。"
msgstr ""
"我们推荐使用 Mooncake "
"进行部署:[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:271
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:270
msgid ""
"Take Atlas 800 A3 (64G × 16) for example, we recommend to deploy 1P1D (3 "
"nodes) to run Qwen3.5-397B-A17B."
msgstr "以 Atlas 800 A3 (64G × 16) 为例,我们建议部署 1P1D3 个节点)来运行 Qwen3.5-397B-A17B。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:273
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:272
msgid "`Qwen3.5-397B-A17B-w8a8-mtp 1P1D` require 3 Atlas 800 A3 (64G × 16)."
msgstr "`Qwen3.5-397B-A17B-w8a8-mtp 1P1D` 需要 3 个 Atlas 800 A3 (64G × 16)。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:275
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:274
msgid ""
"To run the vllm-ascend `Prefill-Decode Disaggregation` service, you need "
"to deploy `run_p.sh` 、`run_d0.sh` and `run_d1.sh` script on each node and"
" deploy a `proxy.sh` script on prefill master node to forward requests."
msgstr "要运行 vllm-ascend `Prefill-Decode Disaggregation` 服务,您需要在每个节点上部署 `run_p.sh`、`run_d0.sh` 和 `run_d1.sh` 脚本,并在预填充主节点上部署一个 `proxy.sh` 脚本来转发请求。"
msgstr ""
"要运行 vllm-ascend `Prefill-Decode Disaggregation` 服务,您需要在每个节点上部署 "
"`run_p.sh`、`run_d0.sh` 和 `run_d1.sh` 脚本,并在预填充主节点上部署一个 `proxy.sh` 脚本来转发请求。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:277
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:276
msgid "Prefill Node 0 `run_p.sh` script"
msgstr "预填充节点 0 `run_p.sh` 脚本"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:352
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:350
msgid "Decode Node 0 `run_d0.sh` script"
msgstr "解码节点 0 `run_d0.sh` 脚本"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:432
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:430
msgid "Decode Node 1 `run_d1.sh` script"
msgstr "解码节点 1 `run_d1.sh` 脚本"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:519
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:517
msgid "Run the `proxy.sh` script on the prefill master node"
msgstr "在预填充主节点上运行 `proxy.sh` 脚本"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:521
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:519
msgid ""
"Run a proxy server on the same node with the prefiller service instance. "
"You can get the proxy program in the repository's examples: "
"[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-"
"project/vllm-"
"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
msgstr "在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序:[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
msgstr ""
"在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序:[load\\_balance\\_proxy\\_server\\_example.py](https://github.com"
"/vllm-project/vllm-"
"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:547
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:545
msgid "Functional Verification"
msgstr "功能验证"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:549
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:547
msgid "Once your server is started, you can query the model with input prompts:"
msgstr "服务器启动后,您可以使用输入提示词查询模型:"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:562
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:560
msgid "Accuracy Evaluation"
msgstr "精度评估"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:564
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:562
msgid "Here are two accuracy evaluation methods."
msgstr "以下是两种精度评估方法。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:566
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:578
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:564
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:576
msgid "Using AISBench"
msgstr "使用 AISBench"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:568
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:566
msgid ""
"Refer to [Using "
"AISBench](../../developer_guide/evaluation/using_ais_bench.md) for "
"details."
msgstr "详情请参阅[使用 AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:570
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:568
msgid ""
"After execution, you can get the result, here is the result of `Qwen3.5"
"-397B-A17B-w8a8` in `vllm-ascend:v0.17.0rc1` for reference only."
msgstr "执行后,您可以获得结果,以下是 `vllm-ascend:v0.17.0rc1` 中 `Qwen3.5-397B-A17B-w8a8` 的结果,仅供参考。"
msgstr ""
"执行后,您可以获得结果,以下是 `vllm-ascend:v0.17.0rc1` 中 `Qwen3.5-397B-A17B-w8a8` "
"的结果,仅供参考。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:76
msgid "dataset"
@@ -490,54 +499,74 @@ msgstr "生成"
msgid "96.74"
msgstr "96.74"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:576
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:574
msgid "Performance"
msgstr "性能"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:580
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:578
msgid ""
"Refer to [Using AISBench for performance "
"evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
"performance-evaluation) for details."
msgstr "详情请参阅[使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
msgstr ""
"详情请参阅[使用 AISBench "
"进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-"
"performance-evaluation)。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:582
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:580
msgid "Using vLLM Benchmark"
msgstr "使用 vLLM Benchmark"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:584
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:582
msgid "Run performance evaluation of `Qwen3.5-397B-A17B-w8a8` as an example."
msgstr "以运行 `Qwen3.5-397B-A17B-w8a8` 的性能评估为例。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:586
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:584
msgid ""
"Refer to [vllm "
"benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) "
"for more details."
msgstr "更多详情请参阅 [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html)。"
msgstr ""
"更多详情请参阅 [vllm "
"benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html)。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:588
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:586
msgid "There are three `vllm bench` subcommands:"
msgstr "`vllm bench` 有三个子命令:"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:590
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:588
msgid "`latency`: Benchmark the latency of a single batch of requests."
msgstr "`latency`:对单批请求的延迟进行基准测试。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:591
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:589
msgid "`serve`: Benchmark the online serving throughput."
msgstr "`serve`:对在线服务吞吐量进行基准测试。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:592
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:590
msgid "`throughput`: Benchmark offline inference throughput."
msgstr "`throughput`:对离线推理吞吐量进行基准测试。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:594
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:592
msgid "Take the `serve` as an example. Run the code as follows."
msgstr "以 `serve` 为例。运行代码如下。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:601
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:599
msgid ""
"After about several minutes, you can get the performance evaluation "
"result."
msgstr "大约几分钟后,您将获得性能评估结果。"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:601
msgid "Qwen3.5-397B-A17B Known issues"
msgstr "Qwen3.5-397B-A17B 已知问题"
#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:603
msgid ""
"Issue1: For single-node deployment scenario, when fused_mc2 is enabled, "
"using multi-DP model deployment may cause garbled or empty outputs after "
"the model triggers recomputation.When tuning performance by adjusting "
"model parallelism, ensure that this fused operator is disabled when DP > "
"1. For PD deployment scenarioD nodes can avoid this problem by enabling "
"the recompute scheduler."
msgstr ""
"问题1在单节点部署场景下当启用 fused_mc2 时,使用多 DP 模型部署可能会导致模型触发重计算后输出乱码或为空。在通过调整模型并行度来调优性能时,请确保当 DP > 1 时禁用此融合算子。对于 PD 部署场景D 节点可以通过启用重计算调度器来避免此问题。"

View File

@@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2026-04-14 09:08+0000\n"
"POT-Creation-Date: 2026-04-22 08:13+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
@@ -37,7 +37,9 @@ msgid ""
"model with vLLM Ascend. Note that only 0.9.2rc1 and higher versions of "
"vLLM Ascend support the model."
msgstr ""
"Qwen3 Embedding 模型系列是 Qwen 家族最新的专有模型,专为文本嵌入和排序任务设计。它基于 Qwen3 系列的稠密基础模型提供了多种尺寸0.6B、4B 和 8B的全面文本嵌入和重排序模型。本指南描述了如何使用 vLLM Ascend 运行该模型。请注意,只有 vLLM Ascend 0.9.2rc1 及更高版本支持此模型。"
"Qwen3 Embedding 模型系列是 Qwen 家族最新的专有模型,专为文本嵌入和排序任务设计。它基于 Qwen3 "
"系列的稠密基础模型提供了多种尺寸0.6B、4B 和 8B的全面文本嵌入和重排序模型。本指南描述了如何使用 vLLM Ascend "
"运行该模型。请注意,只有 vLLM Ascend 0.9.2rc1 及更高版本支持此模型。"
#: ../../source/tutorials/models/Qwen3_embedding.md:7
msgid "Supported Features"
@@ -62,19 +64,25 @@ msgstr "模型权重"
msgid ""
"`Qwen3-Embedding-8B` [Download model "
"weight](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-8B)"
msgstr "`Qwen3-Embedding-8B` [下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-8B)"
msgstr ""
"`Qwen3-Embedding-8B` [下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3"
"-Embedding-8B)"
#: ../../source/tutorials/models/Qwen3_embedding.md:16
msgid ""
"`Qwen3-Embedding-4B` [Download model "
"weight](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-4B)"
msgstr "`Qwen3-Embedding-4B` [下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-4B)"
msgstr ""
"`Qwen3-Embedding-4B` [下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3"
"-Embedding-4B)"
#: ../../source/tutorials/models/Qwen3_embedding.md:17
msgid ""
"`Qwen3-Embedding-0.6B` [Download model "
"weight](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-0.6B)"
msgstr "`Qwen3-Embedding-0.6B` [下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-0.6B)"
msgstr ""
"`Qwen3-Embedding-0.6B` "
"[下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-0.6B)"
#: ../../source/tutorials/models/Qwen3_embedding.md:19
msgid ""
@@ -96,7 +104,9 @@ msgstr "您可以使用我们的官方 docker 镜像来运行 `Qwen3-Embedding`
msgid ""
"Start the docker image on your node, refer to [using "
"docker](../../installation.md#set-up-using-docker)."
msgstr "在您的节点上启动 docker 镜像,请参考[使用 docker](../../installation.md#set-up-using-docker)。"
msgstr ""
"在您的节点上启动 docker 镜像,请参考[使用 docker](../../installation.md#set-up-using-"
"docker)。"
#: ../../source/tutorials/models/Qwen3_embedding.md:27
msgid ""
@@ -142,10 +152,12 @@ msgstr "性能"
#: ../../source/tutorials/models/Qwen3_embedding.md:98
msgid ""
"Run performance of `Qwen3-Reranker-8B` as an example. Refer to [vllm "
"Run performance of `Qwen3-Embedding-8B` as an example. Refer to [vllm "
"benchmark](https://docs.vllm.ai/en/latest/contributing/) for more "
"details."
msgstr "以 `Qwen3-Reranker-8B` 的运行性能为例。更多详情请参考 [vllm 基准测试](https://docs.vllm.ai/en/latest/contributing/)。"
msgstr ""
"以 `Qwen3-Embedding-8B` 的运行性能为例。更多详情请参考 [vllm "
"基准测试](https://docs.vllm.ai/en/latest/contributing/)。"
#: ../../source/tutorials/models/Qwen3_embedding.md:101
msgid "Take the `serve` as an example. Run the code as follows."