v0.10.1rc1

This commit is contained in:
2025-09-09 09:40:35 +08:00
parent d6f6ef41fe
commit 9149384e03
432 changed files with 84698 additions and 1 deletions

View File

@@ -0,0 +1,29 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/index.md:3
msgid "Deployment"
msgstr "部署"
#: ../../tutorials/index.md:1
msgid "Tutorials"
msgstr "教程"

View File

@@ -0,0 +1,192 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/multi_node.md:1
msgid "Multi-Node-DP (DeepSeek)"
msgstr "多节点分布式处理DeepSeek"
#: ../../tutorials/multi_node.md:3
msgid "Getting Start"
msgstr "快速开始"
#: ../../tutorials/multi_node.md:4
msgid ""
"vLLM-Ascend now supports Data Parallel (DP) deployment, enabling model "
"weights to be replicated across multiple NPUs or instances, each processing "
"independent batches of requests. This is particularly useful for scaling "
"throughput across devices while maintaining high resource utilization."
msgstr ""
"vLLM-Ascend 现在支持数据并行DP部署可以在多个 NPU "
"或实例之间复制模型权重,每个实例处理独立的请求批次。这对于在保证高资源利用率的同时,实现跨设备的吞吐量扩展特别有用。"
#: ../../tutorials/multi_node.md:6
msgid ""
"Each DP rank is deployed as a separate “core engine” process which "
"communicates with front-end process(es) via ZMQ sockets. Data Parallel can "
"be combined with Tensor Parallel, in which case each DP engine owns a number"
" of per-NPU worker processes equal to the TP size."
msgstr ""
"每个 DP 进程作为一个单独的“核心引擎”进程部署,并通过 ZMQ 套接字与前端进程通信。数据并行可以与张量并行结合使用,此时每个 DP "
"引擎拥有数量等于 TP 大小的每 NPU 工作进程。"
#: ../../tutorials/multi_node.md:8
msgid ""
"For Mixture-of-Experts (MoE) models — especially advanced architectures like"
" DeepSeek that utilize Multi-head Latent Attention (MLA) — a hybrid "
"parallelism approach is recommended: - Use **Data Parallelism (DP)** for"
" attention layers, which are replicated across devices and handle separate "
"batches. - Use **Expert or Tensor Parallelism (EP/TP)** for expert "
"layers, which are sharded across devices to distribute the computation."
msgstr ""
"对于混合专家Mixture-of-Experts, MoE模型——尤其是像 DeepSeek 这样采用多头潜在注意力Multi-head Latent Attention, MLA的高级架构——推荐使用混合并行策略\n"
" - 对于注意力层,使用 **数据并行Data Parallelism, DP**,这些层会在各设备间复刻,并处理不同的批次。\n"
" - 对于专家层,使用 **专家并行或张量并行Expert or Tensor Parallelism, EP/TP**,这些层会在设备间分片,从而分担计算。"
#: ../../tutorials/multi_node.md:12
msgid ""
"This division enables attention layers to be replicated across Data Parallel"
" (DP) ranks, enabling them to process different batches independently. "
"Meanwhile, expert layers are partitioned (sharded) across devices using "
"Expert or Tensor Parallelism(DP*TP), maximizing hardware utilization and "
"efficiency."
msgstr ""
"这种划分使得注意力层能够在数据并行DP组内复制从而能够独立处理不同的批次。同时专家层通过专家或张量并行DP*TP在设备间进行分区切片最大化硬件利用率和效率。"
#: ../../tutorials/multi_node.md:14
msgid ""
"In these cases the data parallel ranks are not completely independent, "
"forward passes must be aligned and expert layers across all ranks are "
"required to synchronize during every forward pass, even if there are fewer "
"requests to be processed than DP ranks."
msgstr ""
"在这些情况下,数据并行的各个 rank 不是完全独立的,前向传播必须对齐,并且所有 rank "
"上的专家层在每次前向传播时都需要同步,即使待处理的请求数量少于 DP rank 的数量。"
#: ../../tutorials/multi_node.md:16
msgid ""
"For MoE models, when any requests are in progress in any rank, we must "
"ensure that empty “dummy” forward passes are performed in all ranks which "
"dont currently have any requests scheduled. This is handled via a separate "
"DP `Coordinator` process which communicates with all of the ranks, and a "
"collective operation performed every N steps to determine when all ranks "
"become idle and can be paused. When TP is used in conjunction with DP, "
"expert layers form an EP or TP group of size (DP x TP)."
msgstr ""
"对于 MoE 模型,当任何一个 rank 有请求正在进行时,必须确保所有当前没有请求的 rank 都执行空的“虚拟”前向传播。这是通过一个单独的 DP "
"`Coordinator` 协调器进程来实现的,该进程与所有 rank 通信,并且每隔 N 步执行一次集体操作,以判断所有 rank "
"是否都处于空闲状态并可以暂停。当 TP 与 DP 结合使用时专家层会组成一个规模为DP x TP的 EP 或 TP 组。"
#: ../../tutorials/multi_node.md:18
msgid "Verify Multi-Node Communication Environment"
msgstr "验证多节点通信环境"
#: ../../tutorials/multi_node.md:20
msgid "Physical Layer Requirements:"
msgstr "物理层要求:"
#: ../../tutorials/multi_node.md:22
msgid ""
"The physical machines must be located on the same WLAN, with network "
"connectivity."
msgstr "物理机器必须位于同一个 WLAN 中,并且具有网络连接。"
#: ../../tutorials/multi_node.md:23
msgid ""
"All NPUs are connected with optical modules, and the connection status must "
"be normal."
msgstr "所有 NPU 都通过光模块连接,且连接状态必须正常。"
#: ../../tutorials/multi_node.md:25
msgid "Verification Process:"
msgstr "验证流程:"
#: ../../tutorials/multi_node.md:27
msgid ""
"Execute the following commands on each node in sequence. The results must "
"all be `success` and the status must be `UP`:"
msgstr "在每个节点上依次执行以下命令。所有结果必须为 `success` 且状态必须为 `UP`"
#: ../../tutorials/multi_node.md:44
msgid "NPU Interconnect Verification:"
msgstr "NPU 互连验证:"
#: ../../tutorials/multi_node.md:45
msgid "1. Get NPU IP Addresses"
msgstr "1. 获取 NPU IP 地址"
#: ../../tutorials/multi_node.md:50
msgid "2. Cross-Node PING Test"
msgstr "2. 跨节点PING测试"
#: ../../tutorials/multi_node.md:56
msgid "Run with docker"
msgstr "用 docker 运行"
#: ../../tutorials/multi_node.md:57
msgid ""
"Assume you have two Atlas 800 A2(64G*8) nodes, and want to deploy the "
"`deepseek-v3-w8a8` quantitative model across multi-node."
msgstr "假设你有两台 Atlas 800 A264G*8节点并且想要在多节点上部署 `deepseek-v3-w8a8` 量化模型。"
#: ../../tutorials/multi_node.md:92
msgid ""
"Before launch the inference server, ensure some environment variables are "
"set for multi node communication"
msgstr "在启动推理服务器之前,确保已经为多节点通信设置了一些环境变量。"
#: ../../tutorials/multi_node.md:95
msgid "Run the following scripts on two nodes respectively"
msgstr "分别在两台节点上运行以下脚本"
#: ../../tutorials/multi_node.md:97
msgid "**node0**"
msgstr "**节点0**"
#: ../../tutorials/multi_node.md:137
msgid "**node1**"
msgstr "**节点1**"
#: ../../tutorials/multi_node.md:176
msgid ""
"The Deployment view looks like: ![alt text](../assets/multi_node_dp.png)"
msgstr "部署视图如下所示:![替代文本](../assets/multi_node_dp.png)"
#: ../../tutorials/multi_node.md:176
msgid "alt text"
msgstr "替代文本"
#: ../../tutorials/multi_node.md:179
msgid ""
"Once your server is started, you can query the model with input prompts:"
msgstr "一旦你的服务器启动,你可以通过输入提示词来查询模型:"
#: ../../tutorials/multi_node.md:192
msgid "Run benchmarks"
msgstr "运行基准测试"
#: ../../tutorials/multi_node.md:193
msgid ""
"For details please refer to [benchmark](https://github.com/vllm-"
"project/vllm-ascend/tree/main/benchmarks)"
msgstr ""
"详细信息请参阅 [benchmark](https://github.com/vllm-project/vllm-"
"ascend/tree/main/benchmarks)"

View File

@@ -0,0 +1,62 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/multi_npu.md:1
msgid "Multi-NPU (QwQ 32B)"
msgstr "多-NPUQwQ 32B"
#: ../../tutorials/multi_npu.md:3
msgid "Run vllm-ascend on Multi-NPU"
msgstr "在多NPU上运行 vllm-ascend"
#: ../../tutorials/multi_npu.md:5
msgid "Run docker container:"
msgstr "运行 docker 容器:"
#: ../../tutorials/multi_npu.md:30
msgid "Setup environment variables:"
msgstr "设置环境变量:"
#: ../../tutorials/multi_npu.md:40
msgid "Online Inference on Multi-NPU"
msgstr "多NPU的在线推理"
#: ../../tutorials/multi_npu.md:42
msgid "Run the following script to start the vLLM server on Multi-NPU:"
msgstr "运行以下脚本在多NPU上启动 vLLM 服务器:"
#: ../../tutorials/multi_npu.md:48
msgid ""
"Once your server is started, you can query the model with input prompts"
msgstr "一旦服务器启动,就可以通过输入提示词来查询模型。"
#: ../../tutorials/multi_npu.md:63
msgid "Offline Inference on Multi-NPU"
msgstr "多NPU离线推理"
#: ../../tutorials/multi_npu.md:65
msgid "Run the following script to execute offline inference on multi-NPU:"
msgstr "运行以下脚本以在多NPU上执行离线推理"
#: ../../tutorials/multi_npu.md:102
msgid "If you run this script successfully, you can see the info shown below:"
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"

View File

@@ -0,0 +1,86 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/multi_npu_moge.md:1
msgid "Multi-NPU (Pangu Pro MoE)"
msgstr "多NPUPangu Pro MoE"
#: ../../tutorials/multi_npu_moge.md:3
msgid "Run vllm-ascend on Multi-NPU"
msgstr "在多NPU上运行 vllm-ascend"
#: ../../tutorials/multi_npu_moge.md:5
msgid "Run container:"
msgstr "运行容器:"
#: ../../tutorials/multi_npu_moge.md:30
msgid "Setup environment variables:"
msgstr "设置环境变量:"
#: ../../tutorials/multi_npu_moge.md:37
msgid "Download the model:"
msgstr "下载该模型:"
#: ../../tutorials/multi_npu_moge.md:44
msgid "Online Inference on Multi-NPU"
msgstr "多NPU上的在线推理"
#: ../../tutorials/multi_npu_moge.md:46
msgid "Run the following script to start the vLLM server on Multi-NPU:"
msgstr "运行以下脚本在多NPU上启动 vLLM 服务器:"
#: ../../tutorials/multi_npu_moge.md:55
msgid ""
"Once your server is started, you can query the model with input prompts:"
msgstr "一旦你的服务器启动,你可以通过输入提示词来查询模型:"
#: ../../tutorials/multi_npu_moge.md
msgid "v1/completions"
msgstr "v1/补全"
#: ../../tutorials/multi_npu_moge.md
msgid "v1/chat/completions"
msgstr "v1/chat/completions"
#: ../../tutorials/multi_npu_moge.md:96
msgid "If you run this successfully, you can see the info shown below:"
msgstr "如果你成功运行这个,你可以看到如下所示的信息:"
#: ../../tutorials/multi_npu_moge.md:102
msgid "Offline Inference on Multi-NPU"
msgstr "多NPU离线推理"
#: ../../tutorials/multi_npu_moge.md:104
msgid "Run the following script to execute offline inference on multi-NPU:"
msgstr "运行以下脚本以在多NPU上执行离线推理"
#: ../../tutorials/multi_npu_moge.md
msgid "Graph Mode"
msgstr "图模式"
#: ../../tutorials/multi_npu_moge.md
msgid "Eager Mode"
msgstr "即时模式"
#: ../../tutorials/multi_npu_moge.md:230
msgid "If you run this script successfully, you can see the info shown below:"
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"

View File

@@ -0,0 +1,82 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/multi_npu_quantization.md:1
msgid "Multi-NPU (QwQ 32B W8A8)"
msgstr "多NPUQwQ 32B W8A8"
#: ../../tutorials/multi_npu_quantization.md:3
msgid "Run docker container"
msgstr "运行 docker 容器"
#: ../../tutorials/multi_npu_quantization.md:5
msgid "w8a8 quantization feature is supported by v0.8.4rc2 or higher"
msgstr "w8a8 量化功能由 v0.8.4rc2 或更高版本支持"
#: ../../tutorials/multi_npu_quantization.md:31
msgid "Install modelslim and convert model"
msgstr "安装 modelslim 并转换模型"
#: ../../tutorials/multi_npu_quantization.md:33
msgid ""
"You can choose to convert the model yourself or use the quantized model we "
"uploaded, see https://www.modelscope.cn/models/vllm-ascend/QwQ-32B-W8A8"
msgstr ""
"你可以选择自己转换模型,或者使用我们上传的量化模型,详见 https://www.modelscope.cn/models/vllm-"
"ascend/QwQ-32B-W8A8"
#: ../../tutorials/multi_npu_quantization.md:56
msgid "Verify the quantized model"
msgstr "验证量化模型"
#: ../../tutorials/multi_npu_quantization.md:57
msgid "The converted model files looks like:"
msgstr "转换后的模型文件如下所示:"
#: ../../tutorials/multi_npu_quantization.md:70
msgid ""
"Run the following script to start the vLLM server with quantized model:"
msgstr "运行以下脚本以启动带有量化模型的 vLLM 服务器:"
#: ../../tutorials/multi_npu_quantization.md:73
msgid ""
"The value \"ascend\" for \"--quantization\" argument will be supported after"
" [a specific PR](https://github.com/vllm-project/vllm-ascend/pull/877) is "
"merged and released, you can cherry-pick this commit for now."
msgstr ""
"在 [特定的PR](https://github.com/vllm-project/vllm-ascend/pull/877) 合并并发布后, \"--"
"quantization\" 参数将支持值 \"ascend\",你也可以现在手动挑选该提交。"
#: ../../tutorials/multi_npu_quantization.md:79
msgid ""
"Once your server is started, you can query the model with input prompts"
msgstr "一旦服务器启动,就可以通过输入提示词来查询模型。"
#: ../../tutorials/multi_npu_quantization.md:93
msgid ""
"Run the following script to execute offline inference on multi-NPU with "
"quantized model:"
msgstr "运行以下脚本在多NPU上使用量化模型执行离线推理"
#: ../../tutorials/multi_npu_quantization.md:96
msgid "To enable quantization for ascend, quantization method must be \"ascend\""
msgstr "要在ascend上启用量化量化方法必须为“ascend”。"

View File

@@ -0,0 +1,71 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/multi_npu_qwen3_moe.md:1
msgid "Multi-NPU (Qwen3-30B-A3B)"
msgstr "多NPUQwen3-30B-A3B"
#: ../../tutorials/multi_npu_qwen3_moe.md:3
msgid "Run vllm-ascend on Multi-NPU with Qwen3 MoE"
msgstr "在多NPU上运行带有Qwen3 MoE的vllm-ascend"
#: ../../tutorials/multi_npu_qwen3_moe.md:5
msgid "Run docker container:"
msgstr "运行 docker 容器:"
#: ../../tutorials/multi_npu_qwen3_moe.md:30
msgid "Setup environment variables:"
msgstr "设置环境变量:"
#: ../../tutorials/multi_npu_qwen3_moe.md:40
msgid "Online Inference on Multi-NPU"
msgstr "多NPU的在线推理"
#: ../../tutorials/multi_npu_qwen3_moe.md:42
msgid "Run the following script to start the vLLM server on Multi-NPU:"
msgstr "运行以下脚本以在多NPU上启动 vLLM 服务器:"
#: ../../tutorials/multi_npu_qwen3_moe.md:44
msgid ""
"For an Atlas A2 with 64GB of NPU card memory, tensor-parallel-size should be"
" at least 2, and for 32GB of memory, tensor-parallel-size should be at least"
" 4."
msgstr ""
"对于拥有64GB NPU卡内存的Atlas A2tensor-parallel-size 至少应为2对于32GB内存的NPU卡tensor-"
"parallel-size 至少应为4。"
#: ../../tutorials/multi_npu_qwen3_moe.md:50
msgid ""
"Once your server is started, you can query the model with input prompts"
msgstr "一旦服务器启动,就可以通过输入提示词来查询模型。"
#: ../../tutorials/multi_npu_qwen3_moe.md:65
msgid "Offline Inference on Multi-NPU"
msgstr "多NPU离线推理"
#: ../../tutorials/multi_npu_qwen3_moe.md:67
msgid "Run the following script to execute offline inference on multi-NPU:"
msgstr "运行以下脚本以在多NPU上执行离线推理"
#: ../../tutorials/multi_npu_qwen3_moe.md:104
msgid "If you run this script successfully, you can see the info shown below:"
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"

View File

@@ -0,0 +1,110 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/single_node_300i.md:1
msgid "Single Node (Atlas 300I series)"
msgstr "单节点Atlas 300I 系列)"
#: ../../tutorials/single_node_300i.md:4
msgid ""
"This Atlas 300I series is currently experimental. In future versions, there "
"may be behavioral changes around model coverage, performance improvement."
msgstr "Atlas 300I 系列目前处于实验阶段。在未来的版本中,模型覆盖范围和性能提升方面可能会有行为上的变化。"
#: ../../tutorials/single_node_300i.md:7
msgid "Run vLLM on Altlas 300I series"
msgstr "在 Altlas 300I 系列上运行 vLLM"
#: ../../tutorials/single_node_300i.md:9
msgid "Run docker container:"
msgstr "运行 docker 容器:"
#: ../../tutorials/single_node_300i.md:38
msgid "Setup environment variables:"
msgstr "设置环境变量:"
#: ../../tutorials/single_node_300i.md:48
msgid "Online Inference on NPU"
msgstr "在NPU上进行在线推理"
#: ../../tutorials/single_node_300i.md:50
msgid ""
"Run the following script to start the vLLM server on NPU(Qwen3-0.6B:1 card, "
"Qwen2.5-7B-Instruct:2 cards, Pangu-Pro-MoE-72B: 8 cards):"
msgstr ""
"运行以下脚本,在 NPU 上启动 vLLM 服务器Qwen3-0.6B1 张卡Qwen2.5-7B-Instruct2 张卡Pangu-"
"Pro-MoE-72B8 张卡):"
#: ../../tutorials/single_node_300i.md
msgid "Qwen3-0.6B"
msgstr "Qwen3-0.6B"
#: ../../tutorials/single_node_300i.md:59
#: ../../tutorials/single_node_300i.md:89
#: ../../tutorials/single_node_300i.md:126
msgid "Run the following command to start the vLLM server:"
msgstr "运行以下命令以启动 vLLM 服务器:"
#: ../../tutorials/single_node_300i.md:70
#: ../../tutorials/single_node_300i.md:100
#: ../../tutorials/single_node_300i.md:140
msgid ""
"Once your server is started, you can query the model with input prompts"
msgstr "一旦服务器启动,就可以通过输入提示词来查询模型。"
#: ../../tutorials/single_node_300i.md
msgid "Qwen/Qwen2.5-7B-Instruct"
msgstr "Qwen/Qwen2.5-7B-Instruct"
#: ../../tutorials/single_node_300i.md
msgid "Pangu-Pro-MoE-72B"
msgstr "Pangu-Pro-MoE-72B"
#: ../../tutorials/single_node_300i.md:119
#: ../../tutorials/single_node_300i.md:257
msgid "Download the model:"
msgstr "下载该模型:"
#: ../../tutorials/single_node_300i.md:157
msgid "If you run this script successfully, you can see the results."
msgstr "如果你成功运行此脚本,你就可以看到结果。"
#: ../../tutorials/single_node_300i.md:159
msgid "Offline Inference"
msgstr "离线推理"
#: ../../tutorials/single_node_300i.md:161
msgid ""
"Run the following script (`example.py`) to execute offline inference on NPU:"
msgstr "运行以下脚本(`example.py`)以在 NPU 上执行离线推理:"
#: ../../tutorials/single_node_300i.md
msgid "Qwen2.5-7B-Instruct"
msgstr "Qwen2.5-7B-指令版"
#: ../../tutorials/single_node_300i.md:320
msgid "Run script:"
msgstr "运行脚本:"
#: ../../tutorials/single_node_300i.md:325
msgid "If you run this script successfully, you can see the info shown below:"
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"

View File

@@ -0,0 +1,107 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/single_npu.md:1
msgid "Single NPU (Qwen3 8B)"
msgstr "单个NPUQwen3 8B"
#: ../../tutorials/single_npu.md:3
msgid "Run vllm-ascend on Single NPU"
msgstr "在单个 NPU 上运行 vllm-ascend"
#: ../../tutorials/single_npu.md:5
msgid "Offline Inference on Single NPU"
msgstr "在单个NPU上进行离线推理"
#: ../../tutorials/single_npu.md:7
msgid "Run docker container:"
msgstr "运行 docker 容器:"
#: ../../tutorials/single_npu.md:29
msgid "Setup environment variables:"
msgstr "设置环境变量:"
#: ../../tutorials/single_npu.md:40
msgid ""
"`max_split_size_mb` prevents the native allocator from splitting blocks "
"larger than this size (in MB). This can reduce fragmentation and may allow "
"some borderline workloads to complete without running out of memory. You can"
" find more details "
"[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
msgstr ""
"`max_split_size_mb` 防止本地分配器拆分超过此大小(以 MB "
"为单位)的内存块。这可以减少内存碎片,并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
#: ../../tutorials/single_npu.md:43
msgid "Run the following script to execute offline inference on a single NPU:"
msgstr "运行以下脚本以在单个 NPU 上执行离线推理:"
#: ../../tutorials/single_npu.md
msgid "Graph Mode"
msgstr "图模式"
#: ../../tutorials/single_npu.md
msgid "Eager Mode"
msgstr "即时模式"
#: ../../tutorials/single_npu.md:98
msgid "If you run this script successfully, you can see the info shown below:"
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
#: ../../tutorials/single_npu.md:105
msgid "Online Serving on Single NPU"
msgstr "单个 NPU 上的在线服务"
#: ../../tutorials/single_npu.md:107
msgid "Run docker container to start the vLLM server on a single NPU:"
msgstr "运行 docker 容器,在单个 NPU 上启动 vLLM 服务器:"
#: ../../tutorials/single_npu.md:163
msgid ""
"Add `--max_model_len` option to avoid ValueError that the Qwen2.5-7B model's"
" max seq len (32768) is larger than the maximum number of tokens that can be"
" stored in KV cache (26240). This will differ with different NPU series base"
" on the HBM size. Please modify the value according to a suitable value for "
"your NPU series."
msgstr ""
"添加 `--max_model_len` 选项,以避免出现 Qwen2.5-7B 模型的最大序列长度32768大于 KV 缓存能存储的最大 "
"token 数26240时的 ValueError。不同 NPU 系列由于 HBM 容量不同,该值也会有所不同。请根据您的 NPU "
"系列,修改为合适的数值。"
#: ../../tutorials/single_npu.md:166
msgid "If your service start successfully, you can see the info shown below:"
msgstr "如果你的服务启动成功,你会看到如下所示的信息:"
#: ../../tutorials/single_npu.md:174
msgid ""
"Once your server is started, you can query the model with input prompts:"
msgstr "一旦你的服务器启动,你可以通过输入提示词来查询模型:"
#: ../../tutorials/single_npu.md:187
msgid ""
"If you query the server successfully, you can see the info shown below "
"(client):"
msgstr "如果你成功查询了服务器,你可以看到如下所示的信息(客户端):"
#: ../../tutorials/single_npu.md:193
msgid "Logs of the vllm server:"
msgstr "vllm 服务器的日志:"

View File

@@ -0,0 +1,77 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/single_npu_audio.md:1
msgid "Single NPU (Qwen2-Audio 7B)"
msgstr "单个 NPUQwen2-Audio 7B"
#: ../../tutorials/single_npu_audio.md:3
msgid "Run vllm-ascend on Single NPU"
msgstr "在单个 NPU 上运行 vllm-ascend"
#: ../../tutorials/single_npu_audio.md:5
msgid "Offline Inference on Single NPU"
msgstr "在单个NPU上进行离线推理"
#: ../../tutorials/single_npu_audio.md:7
msgid "Run docker container:"
msgstr "运行 docker 容器:"
#: ../../tutorials/single_npu_audio.md:29
msgid "Setup environment variables:"
msgstr "设置环境变量:"
#: ../../tutorials/single_npu_audio.md:40
msgid ""
"`max_split_size_mb` prevents the native allocator from splitting blocks "
"larger than this size (in MB). This can reduce fragmentation and may allow "
"some borderline workloads to complete without running out of memory. You can"
" find more details "
"[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
msgstr ""
"`max_split_size_mb` 防止本地分配器拆分超过此大小(以 MB "
"为单位)的内存块。这可以减少内存碎片,并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
#: ../../tutorials/single_npu_audio.md:43
msgid "Install packages required for audio processing:"
msgstr "安装音频处理所需的软件包:"
#: ../../tutorials/single_npu_audio.md:50
msgid "Run the following script to execute offline inference on a single NPU:"
msgstr "运行以下脚本以在单个 NPU 上执行离线推理:"
#: ../../tutorials/single_npu_audio.md:114
msgid "If you run this script successfully, you can see the info shown below:"
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
#: ../../tutorials/single_npu_audio.md:120
msgid "Online Serving on Single NPU"
msgstr "单个 NPU 上的在线服务"
#: ../../tutorials/single_npu_audio.md:122
msgid ""
"Currently, vllm's OpenAI-compatible server doesn't support audio inputs, "
"find more details [<u>here</u>](https://github.com/vllm-"
"project/vllm/issues/19977)."
msgstr ""
"目前vllm 的兼容 OpenAI 的服务器不支持音频输入,更多详情请查看[<u>这里</u>](https://github.com/vllm-"
"project/vllm/issues/19977)。"

View File

@@ -0,0 +1,99 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/single_npu_multimodal.md:1
msgid "Single NPU (Qwen2.5-VL 7B)"
msgstr "单个NPUQwen2.5-VL 7B"
#: ../../tutorials/single_npu_multimodal.md:3
msgid "Run vllm-ascend on Single NPU"
msgstr "在单个 NPU 上运行 vllm-ascend"
#: ../../tutorials/single_npu_multimodal.md:5
msgid "Offline Inference on Single NPU"
msgstr "在单个NPU上进行离线推理"
#: ../../tutorials/single_npu_multimodal.md:7
msgid "Run docker container:"
msgstr "运行 docker 容器:"
#: ../../tutorials/single_npu_multimodal.md:29
msgid "Setup environment variables:"
msgstr "设置环境变量:"
#: ../../tutorials/single_npu_multimodal.md:40
msgid ""
"`max_split_size_mb` prevents the native allocator from splitting blocks "
"larger than this size (in MB). This can reduce fragmentation and may allow "
"some borderline workloads to complete without running out of memory. You can"
" find more details "
"[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
msgstr ""
"`max_split_size_mb` 防止本地分配器拆分超过此大小(以 MB "
"为单位)的内存块。这可以减少内存碎片,并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
#: ../../tutorials/single_npu_multimodal.md:43
msgid "Run the following script to execute offline inference on a single NPU:"
msgstr "运行以下脚本以在单个 NPU 上执行离线推理:"
#: ../../tutorials/single_npu_multimodal.md:109
msgid "If you run this script successfully, you can see the info shown below:"
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
#: ../../tutorials/single_npu_multimodal.md:121
msgid "Online Serving on Single NPU"
msgstr "单个 NPU 上的在线服务"
#: ../../tutorials/single_npu_multimodal.md:123
msgid "Run docker container to start the vLLM server on a single NPU:"
msgstr "运行 docker 容器,在单个 NPU 上启动 vLLM 服务器:"
#: ../../tutorials/single_npu_multimodal.md:154
msgid ""
"Add `--max_model_len` option to avoid ValueError that the "
"Qwen2.5-VL-7B-Instruct model's max seq len (128000) is larger than the "
"maximum number of tokens that can be stored in KV cache. This will differ "
"with different NPU series base on the HBM size. Please modify the value "
"according to a suitable value for your NPU series."
msgstr ""
"新增 `--max_model_len` 选项,以避免出现 ValueError即 Qwen2.5-VL-7B-Instruct "
"模型的最大序列长度128000大于 KV 缓存可存储的最大 token 数。该数值会根据不同 NPU 系列的 HBM 大小而不同。请根据你的 NPU"
" 系列,将该值设置为合适的数值。"
#: ../../tutorials/single_npu_multimodal.md:157
msgid "If your service start successfully, you can see the info shown below:"
msgstr "如果你的服务启动成功,你会看到如下所示的信息:"
#: ../../tutorials/single_npu_multimodal.md:165
msgid ""
"Once your server is started, you can query the model with input prompts:"
msgstr "一旦你的服务器启动,你可以通过输入提示词来查询模型:"
#: ../../tutorials/single_npu_multimodal.md:182
msgid ""
"If you query the server successfully, you can see the info shown below "
"(client):"
msgstr "如果你成功查询了服务器,你可以看到如下所示的信息(客户端):"
#: ../../tutorials/single_npu_multimodal.md:188
msgid "Logs of the vllm server:"
msgstr "vllm 服务器的日志:"

View File

@@ -0,0 +1,70 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../tutorials/single_npu_qwen3_embedding.md:1
msgid "Single NPU (Qwen3-Embedding-8B)"
msgstr "单个NPUQwen3-Embedding-8B"
#: ../../tutorials/single_npu_qwen3_embedding.md:3
msgid ""
"The Qwen3 Embedding model series is the latest proprietary model of the Qwen"
" family, specifically designed for text embedding and ranking tasks. "
"Building upon the dense foundational models of the Qwen3 series, it provides"
" a comprehensive range of text embeddings and reranking models in various "
"sizes (0.6B, 4B, and 8B). This guide describes how to run the model with "
"vLLM Ascend. Note that only 0.9.2rc1 and higher versions of vLLM Ascend "
"support the model."
msgstr ""
"Qwen3 Embedding 模型系列是 Qwen 家族最新的专有模型,专为文本嵌入和排序任务设计。在 Qwen3 "
"系列的密集基础模型之上它提供了多种尺寸0.6B、4B 和 8B的文本嵌入与重排序模型。本指南介绍如何使用 vLLM Ascend "
"运行该模型。请注意,只有 vLLM Ascend 0.9.2rc1 及更高版本才支持该模型。"
#: ../../tutorials/single_npu_qwen3_embedding.md:5
msgid "Run docker container"
msgstr "运行 docker 容器"
#: ../../tutorials/single_npu_qwen3_embedding.md:7
msgid ""
"Take Qwen3-Embedding-8B model as an example, first run the docker container "
"with the following command:"
msgstr "以 Qwen3-Embedding-8B 模型为例,首先使用以下命令运行 docker 容器:"
#: ../../tutorials/single_npu_qwen3_embedding.md:29
msgid "Setup environment variables:"
msgstr "设置环境变量:"
#: ../../tutorials/single_npu_qwen3_embedding.md:39
msgid "Online Inference"
msgstr "在线推理"
#: ../../tutorials/single_npu_qwen3_embedding.md:45
msgid ""
"Once your server is started, you can query the model with input prompts"
msgstr "一旦服务器启动,就可以通过输入提示词来查询模型。"
#: ../../tutorials/single_npu_qwen3_embedding.md:56
msgid "Offline Inference"
msgstr "离线推理"
#: ../../tutorials/single_npu_qwen3_embedding.md:92
msgid "If you run this script successfully, you can see the info shown below:"
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"