[Doc]Add Chinese translation for documentation (#1870)

### What this PR does / why we need it? This PR adds a complete Chinese translation for the documentation using PO files and the gettext toolchain. The goal is to make the documentation more accessible to Chinese-speaking users and help the community grow. ### Does this PR introduce any user-facing change? Yes. This PR introduces Chinese documentation, which users can access alongside the original English documentation. No changes to the core code or APIs. ### How was this patch tested? The translated documentation was built locally using the standard documentation build process (`make html` or `sphinx-build`). I checked the generated HTML pages to ensure the Chinese content displays correctly and matches the original structure. No code changes were made, so no additional code tests are required. vLLM version: v0.9.2 vLLM main: vllm-project/vllm@5780121 --- Please review the translation and let me know if any improvements are needed. I am happy to update the translation based on feedback. - vLLM version: v0.9.2 - vLLM main: 7ba34b1241 --------- Signed-off-by: aidoczh <aidoczh@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-21 11:26:27 +08:00
parent 8cfd257992
commit c32eea96b7
52 changed files with 9553 additions and 7 deletions
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/index.po
@@ -0,0 +1,29 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/index.md:3
+msgid "Deployment"
+msgstr "部署"
+
+#: ../../tutorials/index.md:1
+msgid "Tutorials"
+msgstr "教程"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node.po
@@ -0,0 +1,192 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/multi_node.md:1
+msgid "Multi-Node-DP (DeepSeek)"
+msgstr "多节点分布式处理（DeepSeek）"
+
+#: ../../tutorials/multi_node.md:3
+msgid "Getting Start"
+msgstr "快速开始"
+
+#: ../../tutorials/multi_node.md:4
+msgid ""
+"vLLM-Ascend now supports Data Parallel (DP) deployment, enabling model "
+"weights to be replicated across multiple NPUs or instances, each processing "
+"independent batches of requests. This is particularly useful for scaling "
+"throughput across devices while maintaining high resource utilization."
+msgstr ""
+"vLLM-Ascend 现在支持数据并行（DP）部署，可以在多个 NPU "
+"或实例之间复制模型权重，每个实例处理独立的请求批次。这对于在保证高资源利用率的同时，实现跨设备的吞吐量扩展特别有用。"
+
+#: ../../tutorials/multi_node.md:6
+msgid ""
+"Each DP rank is deployed as a separate “core engine” process which "
+"communicates with front-end process(es) via ZMQ sockets. Data Parallel can "
+"be combined with Tensor Parallel, in which case each DP engine owns a number"
+" of per-NPU worker processes equal to the TP size."
+msgstr ""
+"每个 DP 进程作为一个单独的“核心引擎”进程部署，并通过 ZMQ 套接字与前端进程通信。数据并行可以与张量并行结合使用，此时每个 DP "
+"引擎拥有数量等于 TP 大小的每 NPU 工作进程。"
+
+#: ../../tutorials/multi_node.md:8
+msgid ""
+"For Mixture-of-Experts (MoE) models — especially advanced architectures like"
+" DeepSeek that utilize Multi-head Latent Attention (MLA) — a hybrid "
+"parallelism approach is recommended:     - Use **Data Parallelism (DP)** for"
+" attention layers, which are replicated across devices and handle separate "
+"batches.     - Use **Expert or Tensor Parallelism (EP/TP)** for expert "
+"layers, which are sharded across devices to distribute the computation."
+msgstr ""
+"对于混合专家（Mixture-of-Experts, MoE）模型——尤其是像 DeepSeek 这样采用多头潜在注意力（Multi-head Latent Attention, MLA）的高级架构——推荐使用混合并行策略：\n"
+"    - 对于注意力层，使用 **数据并行（Data Parallelism, DP）**，这些层会在各设备间复刻，并处理不同的批次。\n"
+"    - 对于专家层，使用 **专家并行或张量并行（Expert or Tensor Parallelism, EP/TP）**，这些层会在设备间分片，从而分担计算。"
+
+#: ../../tutorials/multi_node.md:12
+msgid ""
+"This division enables attention layers to be replicated across Data Parallel"
+" (DP) ranks, enabling them to process different batches independently. "
+"Meanwhile, expert layers are partitioned (sharded) across devices using "
+"Expert or Tensor Parallelism(DP*TP), maximizing hardware utilization and "
+"efficiency."
+msgstr ""
+"这种划分使得注意力层能够在数据并行（DP）组内复制，从而能够独立处理不同的批次。同时，专家层通过专家或张量并行（DP*TP）在设备间进行分区（切片），最大化硬件利用率和效率。"
+
+#: ../../tutorials/multi_node.md:14
+msgid ""
+"In these cases the data parallel ranks are not completely independent, "
+"forward passes must be aligned and expert layers across all ranks are "
+"required to synchronize during every forward pass, even if there are fewer "
+"requests to be processed than DP ranks."
+msgstr ""
+"在这些情况下，数据并行的各个 rank 不是完全独立的，前向传播必须对齐，并且所有 rank "
+"上的专家层在每次前向传播时都需要同步，即使待处理的请求数量少于 DP rank 的数量。"
+
+#: ../../tutorials/multi_node.md:16
+msgid ""
+"For MoE models, when any requests are in progress in any rank, we must "
+"ensure that empty “dummy” forward passes are performed in all ranks which "
+"don’t currently have any requests scheduled. This is handled via a separate "
+"DP `Coordinator` process which communicates with all of the ranks, and a "
+"collective operation performed every N steps to determine when all ranks "
+"become idle and can be paused. When TP is used in conjunction with DP, "
+"expert layers form an EP or TP group of size (DP x TP)."
+msgstr ""
+"对于 MoE 模型，当任何一个 rank 有请求正在进行时，必须确保所有当前没有请求的 rank 都执行空的“虚拟”前向传播。这是通过一个单独的 DP "
+"`Coordinator` 协调器进程来实现的，该进程与所有 rank 通信，并且每隔 N 步执行一次集体操作，以判断所有 rank "
+"是否都处于空闲状态并可以暂停。当 TP 与 DP 结合使用时，专家层会组成一个规模为（DP x TP）的 EP 或 TP 组。"
+
+#: ../../tutorials/multi_node.md:18
+msgid "Verify Multi-Node Communication Environment"
+msgstr "验证多节点通信环境"
+
+#: ../../tutorials/multi_node.md:20
+msgid "Physical Layer Requirements:"
+msgstr "物理层要求："
+
+#: ../../tutorials/multi_node.md:22
+msgid ""
+"The physical machines must be located on the same WLAN, with network "
+"connectivity."
+msgstr "物理机器必须位于同一个 WLAN 中，并且具有网络连接。"
+
+#: ../../tutorials/multi_node.md:23
+msgid ""
+"All NPUs are connected with optical modules, and the connection status must "
+"be normal."
+msgstr "所有 NPU 都通过光模块连接，且连接状态必须正常。"
+
+#: ../../tutorials/multi_node.md:25
+msgid "Verification Process:"
+msgstr "验证流程："
+
+#: ../../tutorials/multi_node.md:27
+msgid ""
+"Execute the following commands on each node in sequence. The results must "
+"all be `success` and the status must be `UP`:"
+msgstr "在每个节点上依次执行以下命令。所有结果必须为 `success` 且状态必须为 `UP`："
+
+#: ../../tutorials/multi_node.md:44
+msgid "NPU Interconnect Verification:"
+msgstr "NPU 互连验证："
+
+#: ../../tutorials/multi_node.md:45
+msgid "1. Get NPU IP Addresses"
+msgstr "1. 获取 NPU IP 地址"
+
+#: ../../tutorials/multi_node.md:50
+msgid "2. Cross-Node PING Test"
+msgstr "2. 跨节点PING测试"
+
+#: ../../tutorials/multi_node.md:56
+msgid "Run with docker"
+msgstr "用 docker 运行"
+
+#: ../../tutorials/multi_node.md:57
+msgid ""
+"Assume you have two Atlas 800 A2(64G*8) nodes, and want to deploy the "
+"`deepseek-v3-w8a8` quantitative model across multi-node."
+msgstr "假设你有两台 Atlas 800 A2（64G*8）节点，并且想要在多节点上部署 `deepseek-v3-w8a8` 量化模型。"
+
+#: ../../tutorials/multi_node.md:92
+msgid ""
+"Before launch the inference server, ensure some environment variables are "
+"set for multi node communication"
+msgstr "在启动推理服务器之前，确保已经为多节点通信设置了一些环境变量。"
+
+#: ../../tutorials/multi_node.md:95
+msgid "Run the following scripts on two nodes respectively"
+msgstr "分别在两台节点上运行以下脚本"
+
+#: ../../tutorials/multi_node.md:97
+msgid "**node0**"
+msgstr "**节点0**"
+
+#: ../../tutorials/multi_node.md:137
+msgid "**node1**"
+msgstr "**节点1**"
+
+#: ../../tutorials/multi_node.md:176
+msgid ""
+"The Deployment view looks like:  ![alt text](../assets/multi_node_dp.png)"
+msgstr "部署视图如下所示：![替代文本](../assets/multi_node_dp.png)"
+
+#: ../../tutorials/multi_node.md:176
+msgid "alt text"
+msgstr "替代文本"
+
+#: ../../tutorials/multi_node.md:179
+msgid ""
+"Once your server is started, you can query the model with input prompts:"
+msgstr "一旦你的服务器启动，你可以通过输入提示词来查询模型："
+
+#: ../../tutorials/multi_node.md:192
+msgid "Run benchmarks"
+msgstr "运行基准测试"
+
+#: ../../tutorials/multi_node.md:193
+msgid ""
+"For details please refer to [benchmark](https://github.com/vllm-"
+"project/vllm-ascend/tree/main/benchmarks)"
+msgstr ""
+"详细信息请参阅 [benchmark](https://github.com/vllm-project/vllm-"
+"ascend/tree/main/benchmarks)"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu.po
@@ -0,0 +1,62 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/multi_npu.md:1
+msgid "Multi-NPU (QwQ 32B)"
+msgstr "多-NPU（QwQ 32B）"
+
+#: ../../tutorials/multi_npu.md:3
+msgid "Run vllm-ascend on Multi-NPU"
+msgstr "在多NPU上运行 vllm-ascend"
+
+#: ../../tutorials/multi_npu.md:5
+msgid "Run docker container:"
+msgstr "运行 docker 容器："
+
+#: ../../tutorials/multi_npu.md:30
+msgid "Setup environment variables:"
+msgstr "设置环境变量："
+
+#: ../../tutorials/multi_npu.md:40
+msgid "Online Inference on Multi-NPU"
+msgstr "多NPU的在线推理"
+
+#: ../../tutorials/multi_npu.md:42
+msgid "Run the following script to start the vLLM server on Multi-NPU:"
+msgstr "运行以下脚本，在多NPU上启动 vLLM 服务器："
+
+#: ../../tutorials/multi_npu.md:48
+msgid ""
+"Once your server is started, you can query the model with input prompts"
+msgstr "一旦服务器启动，就可以通过输入提示词来查询模型。"
+
+#: ../../tutorials/multi_npu.md:63
+msgid "Offline Inference on Multi-NPU"
+msgstr "多NPU离线推理"
+
+#: ../../tutorials/multi_npu.md:65
+msgid "Run the following script to execute offline inference on multi-NPU:"
+msgstr "运行以下脚本以在多NPU上执行离线推理："
+
+#: ../../tutorials/multi_npu.md:102
+msgid "If you run this script successfully, you can see the info shown below:"
+msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_moge.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_moge.po
@@ -0,0 +1,86 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/multi_npu_moge.md:1
+msgid "Multi-NPU (Pangu Pro MoE)"
+msgstr "多NPU（Pangu Pro MoE）"
+
+#: ../../tutorials/multi_npu_moge.md:3
+msgid "Run vllm-ascend on Multi-NPU"
+msgstr "在多NPU上运行 vllm-ascend"
+
+#: ../../tutorials/multi_npu_moge.md:5
+msgid "Run container:"
+msgstr "运行容器："
+
+#: ../../tutorials/multi_npu_moge.md:30
+msgid "Setup environment variables:"
+msgstr "设置环境变量："
+
+#: ../../tutorials/multi_npu_moge.md:37
+msgid "Download the model:"
+msgstr "下载该模型："
+
+#: ../../tutorials/multi_npu_moge.md:44
+msgid "Online Inference on Multi-NPU"
+msgstr "多NPU上的在线推理"
+
+#: ../../tutorials/multi_npu_moge.md:46
+msgid "Run the following script to start the vLLM server on Multi-NPU:"
+msgstr "运行以下脚本，在多NPU上启动 vLLM 服务器："
+
+#: ../../tutorials/multi_npu_moge.md:55
+msgid ""
+"Once your server is started, you can query the model with input prompts:"
+msgstr "一旦你的服务器启动，你可以通过输入提示词来查询模型："
+
+#: ../../tutorials/multi_npu_moge.md
+msgid "v1/completions"
+msgstr "v1/补全"
+
+#: ../../tutorials/multi_npu_moge.md
+msgid "v1/chat/completions"
+msgstr "v1/chat/completions"
+
+#: ../../tutorials/multi_npu_moge.md:96
+msgid "If you run this successfully, you can see the info shown below:"
+msgstr "如果你成功运行这个，你可以看到如下所示的信息："
+
+#: ../../tutorials/multi_npu_moge.md:102
+msgid "Offline Inference on Multi-NPU"
+msgstr "多NPU离线推理"
+
+#: ../../tutorials/multi_npu_moge.md:104
+msgid "Run the following script to execute offline inference on multi-NPU:"
+msgstr "运行以下脚本以在多NPU上执行离线推理："
+
+#: ../../tutorials/multi_npu_moge.md
+msgid "Graph Mode"
+msgstr "图模式"
+
+#: ../../tutorials/multi_npu_moge.md
+msgid "Eager Mode"
+msgstr "即时模式"
+
+#: ../../tutorials/multi_npu_moge.md:230
+msgid "If you run this script successfully, you can see the info shown below:"
+msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_quantization.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_quantization.po
@@ -0,0 +1,82 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/multi_npu_quantization.md:1
+msgid "Multi-NPU (QwQ 32B W8A8)"
+msgstr "多NPU（QwQ 32B W8A8）"
+
+#: ../../tutorials/multi_npu_quantization.md:3
+msgid "Run docker container"
+msgstr "运行 docker 容器"
+
+#: ../../tutorials/multi_npu_quantization.md:5
+msgid "w8a8 quantization feature is supported by v0.8.4rc2 or higher"
+msgstr "w8a8 量化功能由 v0.8.4rc2 或更高版本支持"
+
+#: ../../tutorials/multi_npu_quantization.md:31
+msgid "Install modelslim and convert model"
+msgstr "安装 modelslim 并转换模型"
+
+#: ../../tutorials/multi_npu_quantization.md:33
+msgid ""
+"You can choose to convert the model yourself or use the quantized model we "
+"uploaded,  see https://www.modelscope.cn/models/vllm-ascend/QwQ-32B-W8A8"
+msgstr ""
+"你可以选择自己转换模型，或者使用我们上传的量化模型，详见 https://www.modelscope.cn/models/vllm-"
+"ascend/QwQ-32B-W8A8"
+
+#: ../../tutorials/multi_npu_quantization.md:56
+msgid "Verify the quantized model"
+msgstr "验证量化模型"
+
+#: ../../tutorials/multi_npu_quantization.md:57
+msgid "The converted model files looks like:"
+msgstr "转换后的模型文件如下所示："
+
+#: ../../tutorials/multi_npu_quantization.md:70
+msgid ""
+"Run the following script to start the vLLM server with quantized model:"
+msgstr "运行以下脚本以启动带有量化模型的 vLLM 服务器："
+
+#: ../../tutorials/multi_npu_quantization.md:73
+msgid ""
+"The value \"ascend\" for \"--quantization\" argument will be supported after"
+" [a specific PR](https://github.com/vllm-project/vllm-ascend/pull/877) is "
+"merged and released, you can cherry-pick this commit for now."
+msgstr ""
+"在 [特定的PR](https://github.com/vllm-project/vllm-ascend/pull/877) 合并并发布后， \"--"
+"quantization\" 参数将支持值 \"ascend\"，你也可以现在手动挑选该提交。"
+
+#: ../../tutorials/multi_npu_quantization.md:79
+msgid ""
+"Once your server is started, you can query the model with input prompts"
+msgstr "一旦服务器启动，就可以通过输入提示词来查询模型。"
+
+#: ../../tutorials/multi_npu_quantization.md:93
+msgid ""
+"Run the following script to execute offline inference on multi-NPU with "
+"quantized model:"
+msgstr "运行以下脚本，在多NPU上使用量化模型执行离线推理："
+
+#: ../../tutorials/multi_npu_quantization.md:96
+msgid "To enable quantization for ascend, quantization method must be \"ascend\""
+msgstr "要在ascend上启用量化，量化方法必须为“ascend”。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_qwen3_moe.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_qwen3_moe.po
@@ -0,0 +1,71 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:1
+msgid "Multi-NPU (Qwen3-30B-A3B)"
+msgstr "多NPU（Qwen3-30B-A3B）"
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:3
+msgid "Run vllm-ascend on Multi-NPU with Qwen3 MoE"
+msgstr "在多NPU上运行带有Qwen3 MoE的vllm-ascend"
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:5
+msgid "Run docker container:"
+msgstr "运行 docker 容器："
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:30
+msgid "Setup environment variables:"
+msgstr "设置环境变量："
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:40
+msgid "Online Inference on Multi-NPU"
+msgstr "多NPU的在线推理"
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:42
+msgid "Run the following script to start the vLLM server on Multi-NPU:"
+msgstr "运行以下脚本以在多NPU上启动 vLLM 服务器："
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:44
+msgid ""
+"For an Atlas A2 with 64GB of NPU card memory, tensor-parallel-size should be"
+" at least 2, and for 32GB of memory, tensor-parallel-size should be at least"
+" 4."
+msgstr ""
+"对于拥有64GB NPU卡内存的Atlas A2，tensor-parallel-size 至少应为2；对于32GB内存的NPU卡，tensor-"
+"parallel-size 至少应为4。"
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:50
+msgid ""
+"Once your server is started, you can query the model with input prompts"
+msgstr "一旦服务器启动，就可以通过输入提示词来查询模型。"
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:65
+msgid "Offline Inference on Multi-NPU"
+msgstr "多NPU离线推理"
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:67
+msgid "Run the following script to execute offline inference on multi-NPU:"
+msgstr "运行以下脚本以在多NPU上执行离线推理："
+
+#: ../../tutorials/multi_npu_qwen3_moe.md:104
+msgid "If you run this script successfully, you can see the info shown below:"
+msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_node_300i.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_node_300i.po
@@ -0,0 +1,110 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/single_node_300i.md:1
+msgid "Single Node (Atlas 300I series)"
+msgstr "单节点（Atlas 300I 系列）"
+
+#: ../../tutorials/single_node_300i.md:4
+msgid ""
+"This Atlas 300I series is currently experimental. In future versions, there "
+"may be behavioral changes around model coverage, performance improvement."
+msgstr "Atlas 300I 系列目前处于实验阶段。在未来的版本中，模型覆盖范围和性能提升方面可能会有行为上的变化。"
+
+#: ../../tutorials/single_node_300i.md:7
+msgid "Run vLLM on Altlas 300I series"
+msgstr "在 Altlas 300I 系列上运行 vLLM"
+
+#: ../../tutorials/single_node_300i.md:9
+msgid "Run docker container:"
+msgstr "运行 docker 容器："
+
+#: ../../tutorials/single_node_300i.md:38
+msgid "Setup environment variables:"
+msgstr "设置环境变量："
+
+#: ../../tutorials/single_node_300i.md:48
+msgid "Online Inference on NPU"
+msgstr "在NPU上进行在线推理"
+
+#: ../../tutorials/single_node_300i.md:50
+msgid ""
+"Run the following script to start the vLLM server on NPU(Qwen3-0.6B:1 card, "
+"Qwen2.5-7B-Instruct:2 cards, Pangu-Pro-MoE-72B: 8 cards):"
+msgstr ""
+"运行以下脚本，在 NPU 上启动 vLLM 服务器（Qwen3-0.6B：1 张卡，Qwen2.5-7B-Instruct：2 张卡，Pangu-"
+"Pro-MoE-72B：8 张卡）："
+
+#: ../../tutorials/single_node_300i.md
+msgid "Qwen3-0.6B"
+msgstr "Qwen3-0.6B"
+
+#: ../../tutorials/single_node_300i.md:59
+#: ../../tutorials/single_node_300i.md:89
+#: ../../tutorials/single_node_300i.md:126
+msgid "Run the following command to start the vLLM server:"
+msgstr "运行以下命令以启动 vLLM 服务器："
+
+#: ../../tutorials/single_node_300i.md:70
+#: ../../tutorials/single_node_300i.md:100
+#: ../../tutorials/single_node_300i.md:140
+msgid ""
+"Once your server is started, you can query the model with input prompts"
+msgstr "一旦服务器启动，就可以通过输入提示词来查询模型。"
+
+#: ../../tutorials/single_node_300i.md
+msgid "Qwen/Qwen2.5-7B-Instruct"
+msgstr "Qwen/Qwen2.5-7B-Instruct"
+
+#: ../../tutorials/single_node_300i.md
+msgid "Pangu-Pro-MoE-72B"
+msgstr "Pangu-Pro-MoE-72B"
+
+#: ../../tutorials/single_node_300i.md:119
+#: ../../tutorials/single_node_300i.md:257
+msgid "Download the model:"
+msgstr "下载该模型："
+
+#: ../../tutorials/single_node_300i.md:157
+msgid "If you run this script successfully, you can see the results."
+msgstr "如果你成功运行此脚本，你就可以看到结果。"
+
+#: ../../tutorials/single_node_300i.md:159
+msgid "Offline Inference"
+msgstr "离线推理"
+
+#: ../../tutorials/single_node_300i.md:161
+msgid ""
+"Run the following script (`example.py`) to execute offline inference on NPU:"
+msgstr "运行以下脚本（`example.py`）以在 NPU 上执行离线推理："
+
+#: ../../tutorials/single_node_300i.md
+msgid "Qwen2.5-7B-Instruct"
+msgstr "Qwen2.5-7B-指令版"
+
+#: ../../tutorials/single_node_300i.md:320
+msgid "Run script:"
+msgstr "运行脚本："
+
+#: ../../tutorials/single_node_300i.md:325
+msgid "If you run this script successfully, you can see the info shown below:"
+msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu.po
@@ -0,0 +1,107 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/single_npu.md:1
+msgid "Single NPU (Qwen3 8B)"
+msgstr "单个NPU（Qwen3 8B）"
+
+#: ../../tutorials/single_npu.md:3
+msgid "Run vllm-ascend on Single NPU"
+msgstr "在单个 NPU 上运行 vllm-ascend"
+
+#: ../../tutorials/single_npu.md:5
+msgid "Offline Inference on Single NPU"
+msgstr "在单个NPU上进行离线推理"
+
+#: ../../tutorials/single_npu.md:7
+msgid "Run docker container:"
+msgstr "运行 docker 容器："
+
+#: ../../tutorials/single_npu.md:29
+msgid "Setup environment variables:"
+msgstr "设置环境变量："
+
+#: ../../tutorials/single_npu.md:40
+msgid ""
+"`max_split_size_mb` prevents the native allocator from splitting blocks "
+"larger than this size (in MB). This can reduce fragmentation and may allow "
+"some borderline workloads to complete without running out of memory. You can"
+" find more details "
+"[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
+msgstr ""
+"`max_split_size_mb` 防止本地分配器拆分超过此大小（以 MB "
+"为单位）的内存块。这可以减少内存碎片，并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
+
+#: ../../tutorials/single_npu.md:43
+msgid "Run the following script to execute offline inference on a single NPU:"
+msgstr "运行以下脚本以在单个 NPU 上执行离线推理："
+
+#: ../../tutorials/single_npu.md
+msgid "Graph Mode"
+msgstr "图模式"
+
+#: ../../tutorials/single_npu.md
+msgid "Eager Mode"
+msgstr "即时模式"
+
+#: ../../tutorials/single_npu.md:98
+msgid "If you run this script successfully, you can see the info shown below:"
+msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
+#: ../../tutorials/single_npu.md:105
+msgid "Online Serving on Single NPU"
+msgstr "单个 NPU 上的在线服务"
+
+#: ../../tutorials/single_npu.md:107
+msgid "Run docker container to start the vLLM server on a single NPU:"
+msgstr "运行 docker 容器，在单个 NPU 上启动 vLLM 服务器："
+
+#: ../../tutorials/single_npu.md:163
+msgid ""
+"Add `--max_model_len` option to avoid ValueError that the Qwen2.5-7B model's"
+" max seq len (32768) is larger than the maximum number of tokens that can be"
+" stored in KV cache (26240). This will differ with different NPU series base"
+" on the HBM size. Please modify the value according to a suitable value for "
+"your NPU series."
+msgstr ""
+"添加 `--max_model_len` 选项，以避免出现 Qwen2.5-7B 模型的最大序列长度（32768）大于 KV 缓存能存储的最大 "
+"token 数（26240）时的 ValueError。不同 NPU 系列由于 HBM 容量不同，该值也会有所不同。请根据您的 NPU "
+"系列，修改为合适的数值。"
+
+#: ../../tutorials/single_npu.md:166
+msgid "If your service start successfully, you can see the info shown below:"
+msgstr "如果你的服务启动成功，你会看到如下所示的信息："
+
+#: ../../tutorials/single_npu.md:174
+msgid ""
+"Once your server is started, you can query the model with input prompts:"
+msgstr "一旦你的服务器启动，你可以通过输入提示词来查询模型："
+
+#: ../../tutorials/single_npu.md:187
+msgid ""
+"If you query the server successfully, you can see the info shown below "
+"(client):"
+msgstr "如果你成功查询了服务器，你可以看到如下所示的信息（客户端）："
+
+#: ../../tutorials/single_npu.md:193
+msgid "Logs of the vllm server:"
+msgstr "vllm 服务器的日志："
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_audio.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_audio.po
@@ -0,0 +1,77 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/single_npu_audio.md:1
+msgid "Single NPU (Qwen2-Audio 7B)"
+msgstr "单个 NPU（Qwen2-Audio 7B）"
+
+#: ../../tutorials/single_npu_audio.md:3
+msgid "Run vllm-ascend on Single NPU"
+msgstr "在单个 NPU 上运行 vllm-ascend"
+
+#: ../../tutorials/single_npu_audio.md:5
+msgid "Offline Inference on Single NPU"
+msgstr "在单个NPU上进行离线推理"
+
+#: ../../tutorials/single_npu_audio.md:7
+msgid "Run docker container:"
+msgstr "运行 docker 容器："
+
+#: ../../tutorials/single_npu_audio.md:29
+msgid "Setup environment variables:"
+msgstr "设置环境变量："
+
+#: ../../tutorials/single_npu_audio.md:40
+msgid ""
+"`max_split_size_mb` prevents the native allocator from splitting blocks "
+"larger than this size (in MB). This can reduce fragmentation and may allow "
+"some borderline workloads to complete without running out of memory. You can"
+" find more details "
+"[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
+msgstr ""
+"`max_split_size_mb` 防止本地分配器拆分超过此大小（以 MB "
+"为单位）的内存块。这可以减少内存碎片，并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
+
+#: ../../tutorials/single_npu_audio.md:43
+msgid "Install packages required for audio processing:"
+msgstr "安装音频处理所需的软件包："
+
+#: ../../tutorials/single_npu_audio.md:50
+msgid "Run the following script to execute offline inference on a single NPU:"
+msgstr "运行以下脚本以在单个 NPU 上执行离线推理："
+
+#: ../../tutorials/single_npu_audio.md:114
+msgid "If you run this script successfully, you can see the info shown below:"
+msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
+#: ../../tutorials/single_npu_audio.md:120
+msgid "Online Serving on Single NPU"
+msgstr "单个 NPU 上的在线服务"
+
+#: ../../tutorials/single_npu_audio.md:122
+msgid ""
+"Currently, vllm's OpenAI-compatible server doesn't support audio inputs, "
+"find more details [<u>here</u>](https://github.com/vllm-"
+"project/vllm/issues/19977)."
+msgstr ""
+"目前，vllm 的兼容 OpenAI 的服务器不支持音频输入，更多详情请查看[<u>这里</u>](https://github.com/vllm-"
+"project/vllm/issues/19977)。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_multimodal.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_multimodal.po
@@ -0,0 +1,99 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/single_npu_multimodal.md:1
+msgid "Single NPU (Qwen2.5-VL 7B)"
+msgstr "单个NPU（Qwen2.5-VL 7B）"
+
+#: ../../tutorials/single_npu_multimodal.md:3
+msgid "Run vllm-ascend on Single NPU"
+msgstr "在单个 NPU 上运行 vllm-ascend"
+
+#: ../../tutorials/single_npu_multimodal.md:5
+msgid "Offline Inference on Single NPU"
+msgstr "在单个NPU上进行离线推理"
+
+#: ../../tutorials/single_npu_multimodal.md:7
+msgid "Run docker container:"
+msgstr "运行 docker 容器："
+
+#: ../../tutorials/single_npu_multimodal.md:29
+msgid "Setup environment variables:"
+msgstr "设置环境变量："
+
+#: ../../tutorials/single_npu_multimodal.md:40
+msgid ""
+"`max_split_size_mb` prevents the native allocator from splitting blocks "
+"larger than this size (in MB). This can reduce fragmentation and may allow "
+"some borderline workloads to complete without running out of memory. You can"
+" find more details "
+"[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
+msgstr ""
+"`max_split_size_mb` 防止本地分配器拆分超过此大小（以 MB "
+"为单位）的内存块。这可以减少内存碎片，并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
+
+#: ../../tutorials/single_npu_multimodal.md:43
+msgid "Run the following script to execute offline inference on a single NPU:"
+msgstr "运行以下脚本以在单个 NPU 上执行离线推理："
+
+#: ../../tutorials/single_npu_multimodal.md:109
+msgid "If you run this script successfully, you can see the info shown below:"
+msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
+#: ../../tutorials/single_npu_multimodal.md:121
+msgid "Online Serving on Single NPU"
+msgstr "单个 NPU 上的在线服务"
+
+#: ../../tutorials/single_npu_multimodal.md:123
+msgid "Run docker container to start the vLLM server on a single NPU:"
+msgstr "运行 docker 容器，在单个 NPU 上启动 vLLM 服务器："
+
+#: ../../tutorials/single_npu_multimodal.md:154
+msgid ""
+"Add `--max_model_len` option to avoid ValueError that the "
+"Qwen2.5-VL-7B-Instruct model's max seq len (128000) is larger than the "
+"maximum number of tokens that can be stored in KV cache. This will differ "
+"with different NPU series base on the HBM size. Please modify the value "
+"according to a suitable value for your NPU series."
+msgstr ""
+"新增 `--max_model_len` 选项，以避免出现 ValueError，即 Qwen2.5-VL-7B-Instruct "
+"模型的最大序列长度（128000）大于 KV 缓存可存储的最大 token 数。该数值会根据不同 NPU 系列的 HBM 大小而不同。请根据你的 NPU"
+" 系列，将该值设置为合适的数值。"
+
+#: ../../tutorials/single_npu_multimodal.md:157
+msgid "If your service start successfully, you can see the info shown below:"
+msgstr "如果你的服务启动成功，你会看到如下所示的信息："
+
+#: ../../tutorials/single_npu_multimodal.md:165
+msgid ""
+"Once your server is started, you can query the model with input prompts:"
+msgstr "一旦你的服务器启动，你可以通过输入提示词来查询模型："
+
+#: ../../tutorials/single_npu_multimodal.md:182
+msgid ""
+"If you query the server successfully, you can see the info shown below "
+"(client):"
+msgstr "如果你成功查询了服务器，你可以看到如下所示的信息（客户端）："
+
+#: ../../tutorials/single_npu_multimodal.md:188
+msgid "Logs of the vllm server:"
+msgstr "vllm 服务器的日志："
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_qwen3_embedding.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_qwen3_embedding.po
@@ -0,0 +1,70 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-ascend team
+# This file is distributed under the same license as the vllm-ascend
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-ascend\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/single_npu_qwen3_embedding.md:1
+msgid "Single NPU (Qwen3-Embedding-8B)"
+msgstr "单个NPU（Qwen3-Embedding-8B）"
+
+#: ../../tutorials/single_npu_qwen3_embedding.md:3
+msgid ""
+"The Qwen3 Embedding model series is the latest proprietary model of the Qwen"
+" family, specifically designed for text embedding and ranking tasks. "
+"Building upon the dense foundational models of the Qwen3 series, it provides"
+" a comprehensive range of text embeddings and reranking models in various "
+"sizes (0.6B, 4B, and 8B). This guide describes how to run the model with "
+"vLLM Ascend. Note that only 0.9.2rc1 and higher versions of vLLM Ascend "
+"support the model."
+msgstr ""
+"Qwen3 Embedding 模型系列是 Qwen 家族最新的专有模型，专为文本嵌入和排序任务设计。在 Qwen3 "
+"系列的密集基础模型之上，它提供了多种尺寸（0.6B、4B 和 8B）的文本嵌入与重排序模型。本指南介绍如何使用 vLLM Ascend "
+"运行该模型。请注意，只有 vLLM Ascend 0.9.2rc1 及更高版本才支持该模型。"
+
+#: ../../tutorials/single_npu_qwen3_embedding.md:5
+msgid "Run docker container"
+msgstr "运行 docker 容器"
+
+#: ../../tutorials/single_npu_qwen3_embedding.md:7
+msgid ""
+"Take Qwen3-Embedding-8B model as an example, first run the docker container "
+"with the following command:"
+msgstr "以 Qwen3-Embedding-8B 模型为例，首先使用以下命令运行 docker 容器："
+
+#: ../../tutorials/single_npu_qwen3_embedding.md:29
+msgid "Setup environment variables:"
+msgstr "设置环境变量："
+
+#: ../../tutorials/single_npu_qwen3_embedding.md:39
+msgid "Online Inference"
+msgstr "在线推理"
+
+#: ../../tutorials/single_npu_qwen3_embedding.md:45
+msgid ""
+"Once your server is started, you can query the model with input prompts"
+msgstr "一旦服务器启动，就可以通过输入提示词来查询模型。"
+
+#: ../../tutorials/single_npu_qwen3_embedding.md:56
+msgid "Offline Inference"
+msgstr "离线推理"
+
+#: ../../tutorials/single_npu_qwen3_embedding.md:92
+msgid "If you run this script successfully, you can see the info shown below:"
+msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："