forked from EngineX-Ascend/enginex-ascend-910-vllm
v0.10.1rc1
This commit is contained in:
29
docs/source/locale/zh_CN/LC_MESSAGES/tutorials/index.po
Normal file
29
docs/source/locale/zh_CN/LC_MESSAGES/tutorials/index.po
Normal file
@@ -0,0 +1,29 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/index.md:3
|
||||
msgid "Deployment"
|
||||
msgstr "部署"
|
||||
|
||||
#: ../../tutorials/index.md:1
|
||||
msgid "Tutorials"
|
||||
msgstr "教程"
|
||||
192
docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node.po
Normal file
192
docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node.po
Normal file
@@ -0,0 +1,192 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/multi_node.md:1
|
||||
msgid "Multi-Node-DP (DeepSeek)"
|
||||
msgstr "多节点分布式处理(DeepSeek)"
|
||||
|
||||
#: ../../tutorials/multi_node.md:3
|
||||
msgid "Getting Start"
|
||||
msgstr "快速开始"
|
||||
|
||||
#: ../../tutorials/multi_node.md:4
|
||||
msgid ""
|
||||
"vLLM-Ascend now supports Data Parallel (DP) deployment, enabling model "
|
||||
"weights to be replicated across multiple NPUs or instances, each processing "
|
||||
"independent batches of requests. This is particularly useful for scaling "
|
||||
"throughput across devices while maintaining high resource utilization."
|
||||
msgstr ""
|
||||
"vLLM-Ascend 现在支持数据并行(DP)部署,可以在多个 NPU "
|
||||
"或实例之间复制模型权重,每个实例处理独立的请求批次。这对于在保证高资源利用率的同时,实现跨设备的吞吐量扩展特别有用。"
|
||||
|
||||
#: ../../tutorials/multi_node.md:6
|
||||
msgid ""
|
||||
"Each DP rank is deployed as a separate “core engine” process which "
|
||||
"communicates with front-end process(es) via ZMQ sockets. Data Parallel can "
|
||||
"be combined with Tensor Parallel, in which case each DP engine owns a number"
|
||||
" of per-NPU worker processes equal to the TP size."
|
||||
msgstr ""
|
||||
"每个 DP 进程作为一个单独的“核心引擎”进程部署,并通过 ZMQ 套接字与前端进程通信。数据并行可以与张量并行结合使用,此时每个 DP "
|
||||
"引擎拥有数量等于 TP 大小的每 NPU 工作进程。"
|
||||
|
||||
#: ../../tutorials/multi_node.md:8
|
||||
msgid ""
|
||||
"For Mixture-of-Experts (MoE) models — especially advanced architectures like"
|
||||
" DeepSeek that utilize Multi-head Latent Attention (MLA) — a hybrid "
|
||||
"parallelism approach is recommended: - Use **Data Parallelism (DP)** for"
|
||||
" attention layers, which are replicated across devices and handle separate "
|
||||
"batches. - Use **Expert or Tensor Parallelism (EP/TP)** for expert "
|
||||
"layers, which are sharded across devices to distribute the computation."
|
||||
msgstr ""
|
||||
"对于混合专家(Mixture-of-Experts, MoE)模型——尤其是像 DeepSeek 这样采用多头潜在注意力(Multi-head Latent Attention, MLA)的高级架构——推荐使用混合并行策略:\n"
|
||||
" - 对于注意力层,使用 **数据并行(Data Parallelism, DP)**,这些层会在各设备间复刻,并处理不同的批次。\n"
|
||||
" - 对于专家层,使用 **专家并行或张量并行(Expert or Tensor Parallelism, EP/TP)**,这些层会在设备间分片,从而分担计算。"
|
||||
|
||||
#: ../../tutorials/multi_node.md:12
|
||||
msgid ""
|
||||
"This division enables attention layers to be replicated across Data Parallel"
|
||||
" (DP) ranks, enabling them to process different batches independently. "
|
||||
"Meanwhile, expert layers are partitioned (sharded) across devices using "
|
||||
"Expert or Tensor Parallelism(DP*TP), maximizing hardware utilization and "
|
||||
"efficiency."
|
||||
msgstr ""
|
||||
"这种划分使得注意力层能够在数据并行(DP)组内复制,从而能够独立处理不同的批次。同时,专家层通过专家或张量并行(DP*TP)在设备间进行分区(切片),最大化硬件利用率和效率。"
|
||||
|
||||
#: ../../tutorials/multi_node.md:14
|
||||
msgid ""
|
||||
"In these cases the data parallel ranks are not completely independent, "
|
||||
"forward passes must be aligned and expert layers across all ranks are "
|
||||
"required to synchronize during every forward pass, even if there are fewer "
|
||||
"requests to be processed than DP ranks."
|
||||
msgstr ""
|
||||
"在这些情况下,数据并行的各个 rank 不是完全独立的,前向传播必须对齐,并且所有 rank "
|
||||
"上的专家层在每次前向传播时都需要同步,即使待处理的请求数量少于 DP rank 的数量。"
|
||||
|
||||
#: ../../tutorials/multi_node.md:16
|
||||
msgid ""
|
||||
"For MoE models, when any requests are in progress in any rank, we must "
|
||||
"ensure that empty “dummy” forward passes are performed in all ranks which "
|
||||
"don’t currently have any requests scheduled. This is handled via a separate "
|
||||
"DP `Coordinator` process which communicates with all of the ranks, and a "
|
||||
"collective operation performed every N steps to determine when all ranks "
|
||||
"become idle and can be paused. When TP is used in conjunction with DP, "
|
||||
"expert layers form an EP or TP group of size (DP x TP)."
|
||||
msgstr ""
|
||||
"对于 MoE 模型,当任何一个 rank 有请求正在进行时,必须确保所有当前没有请求的 rank 都执行空的“虚拟”前向传播。这是通过一个单独的 DP "
|
||||
"`Coordinator` 协调器进程来实现的,该进程与所有 rank 通信,并且每隔 N 步执行一次集体操作,以判断所有 rank "
|
||||
"是否都处于空闲状态并可以暂停。当 TP 与 DP 结合使用时,专家层会组成一个规模为(DP x TP)的 EP 或 TP 组。"
|
||||
|
||||
#: ../../tutorials/multi_node.md:18
|
||||
msgid "Verify Multi-Node Communication Environment"
|
||||
msgstr "验证多节点通信环境"
|
||||
|
||||
#: ../../tutorials/multi_node.md:20
|
||||
msgid "Physical Layer Requirements:"
|
||||
msgstr "物理层要求:"
|
||||
|
||||
#: ../../tutorials/multi_node.md:22
|
||||
msgid ""
|
||||
"The physical machines must be located on the same WLAN, with network "
|
||||
"connectivity."
|
||||
msgstr "物理机器必须位于同一个 WLAN 中,并且具有网络连接。"
|
||||
|
||||
#: ../../tutorials/multi_node.md:23
|
||||
msgid ""
|
||||
"All NPUs are connected with optical modules, and the connection status must "
|
||||
"be normal."
|
||||
msgstr "所有 NPU 都通过光模块连接,且连接状态必须正常。"
|
||||
|
||||
#: ../../tutorials/multi_node.md:25
|
||||
msgid "Verification Process:"
|
||||
msgstr "验证流程:"
|
||||
|
||||
#: ../../tutorials/multi_node.md:27
|
||||
msgid ""
|
||||
"Execute the following commands on each node in sequence. The results must "
|
||||
"all be `success` and the status must be `UP`:"
|
||||
msgstr "在每个节点上依次执行以下命令。所有结果必须为 `success` 且状态必须为 `UP`:"
|
||||
|
||||
#: ../../tutorials/multi_node.md:44
|
||||
msgid "NPU Interconnect Verification:"
|
||||
msgstr "NPU 互连验证:"
|
||||
|
||||
#: ../../tutorials/multi_node.md:45
|
||||
msgid "1. Get NPU IP Addresses"
|
||||
msgstr "1. 获取 NPU IP 地址"
|
||||
|
||||
#: ../../tutorials/multi_node.md:50
|
||||
msgid "2. Cross-Node PING Test"
|
||||
msgstr "2. 跨节点PING测试"
|
||||
|
||||
#: ../../tutorials/multi_node.md:56
|
||||
msgid "Run with docker"
|
||||
msgstr "用 docker 运行"
|
||||
|
||||
#: ../../tutorials/multi_node.md:57
|
||||
msgid ""
|
||||
"Assume you have two Atlas 800 A2(64G*8) nodes, and want to deploy the "
|
||||
"`deepseek-v3-w8a8` quantitative model across multi-node."
|
||||
msgstr "假设你有两台 Atlas 800 A2(64G*8)节点,并且想要在多节点上部署 `deepseek-v3-w8a8` 量化模型。"
|
||||
|
||||
#: ../../tutorials/multi_node.md:92
|
||||
msgid ""
|
||||
"Before launch the inference server, ensure some environment variables are "
|
||||
"set for multi node communication"
|
||||
msgstr "在启动推理服务器之前,确保已经为多节点通信设置了一些环境变量。"
|
||||
|
||||
#: ../../tutorials/multi_node.md:95
|
||||
msgid "Run the following scripts on two nodes respectively"
|
||||
msgstr "分别在两台节点上运行以下脚本"
|
||||
|
||||
#: ../../tutorials/multi_node.md:97
|
||||
msgid "**node0**"
|
||||
msgstr "**节点0**"
|
||||
|
||||
#: ../../tutorials/multi_node.md:137
|
||||
msgid "**node1**"
|
||||
msgstr "**节点1**"
|
||||
|
||||
#: ../../tutorials/multi_node.md:176
|
||||
msgid ""
|
||||
"The Deployment view looks like: "
|
||||
msgstr "部署视图如下所示:"
|
||||
|
||||
#: ../../tutorials/multi_node.md:176
|
||||
msgid "alt text"
|
||||
msgstr "替代文本"
|
||||
|
||||
#: ../../tutorials/multi_node.md:179
|
||||
msgid ""
|
||||
"Once your server is started, you can query the model with input prompts:"
|
||||
msgstr "一旦你的服务器启动,你可以通过输入提示词来查询模型:"
|
||||
|
||||
#: ../../tutorials/multi_node.md:192
|
||||
msgid "Run benchmarks"
|
||||
msgstr "运行基准测试"
|
||||
|
||||
#: ../../tutorials/multi_node.md:193
|
||||
msgid ""
|
||||
"For details please refer to [benchmark](https://github.com/vllm-"
|
||||
"project/vllm-ascend/tree/main/benchmarks)"
|
||||
msgstr ""
|
||||
"详细信息请参阅 [benchmark](https://github.com/vllm-project/vllm-"
|
||||
"ascend/tree/main/benchmarks)"
|
||||
62
docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu.po
Normal file
62
docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu.po
Normal file
@@ -0,0 +1,62 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/multi_npu.md:1
|
||||
msgid "Multi-NPU (QwQ 32B)"
|
||||
msgstr "多-NPU(QwQ 32B)"
|
||||
|
||||
#: ../../tutorials/multi_npu.md:3
|
||||
msgid "Run vllm-ascend on Multi-NPU"
|
||||
msgstr "在多NPU上运行 vllm-ascend"
|
||||
|
||||
#: ../../tutorials/multi_npu.md:5
|
||||
msgid "Run docker container:"
|
||||
msgstr "运行 docker 容器:"
|
||||
|
||||
#: ../../tutorials/multi_npu.md:30
|
||||
msgid "Setup environment variables:"
|
||||
msgstr "设置环境变量:"
|
||||
|
||||
#: ../../tutorials/multi_npu.md:40
|
||||
msgid "Online Inference on Multi-NPU"
|
||||
msgstr "多NPU的在线推理"
|
||||
|
||||
#: ../../tutorials/multi_npu.md:42
|
||||
msgid "Run the following script to start the vLLM server on Multi-NPU:"
|
||||
msgstr "运行以下脚本,在多NPU上启动 vLLM 服务器:"
|
||||
|
||||
#: ../../tutorials/multi_npu.md:48
|
||||
msgid ""
|
||||
"Once your server is started, you can query the model with input prompts"
|
||||
msgstr "一旦服务器启动,就可以通过输入提示词来查询模型。"
|
||||
|
||||
#: ../../tutorials/multi_npu.md:63
|
||||
msgid "Offline Inference on Multi-NPU"
|
||||
msgstr "多NPU离线推理"
|
||||
|
||||
#: ../../tutorials/multi_npu.md:65
|
||||
msgid "Run the following script to execute offline inference on multi-NPU:"
|
||||
msgstr "运行以下脚本以在多NPU上执行离线推理:"
|
||||
|
||||
#: ../../tutorials/multi_npu.md:102
|
||||
msgid "If you run this script successfully, you can see the info shown below:"
|
||||
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
|
||||
@@ -0,0 +1,86 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:1
|
||||
msgid "Multi-NPU (Pangu Pro MoE)"
|
||||
msgstr "多NPU(Pangu Pro MoE)"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:3
|
||||
msgid "Run vllm-ascend on Multi-NPU"
|
||||
msgstr "在多NPU上运行 vllm-ascend"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:5
|
||||
msgid "Run container:"
|
||||
msgstr "运行容器:"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:30
|
||||
msgid "Setup environment variables:"
|
||||
msgstr "设置环境变量:"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:37
|
||||
msgid "Download the model:"
|
||||
msgstr "下载该模型:"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:44
|
||||
msgid "Online Inference on Multi-NPU"
|
||||
msgstr "多NPU上的在线推理"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:46
|
||||
msgid "Run the following script to start the vLLM server on Multi-NPU:"
|
||||
msgstr "运行以下脚本,在多NPU上启动 vLLM 服务器:"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:55
|
||||
msgid ""
|
||||
"Once your server is started, you can query the model with input prompts:"
|
||||
msgstr "一旦你的服务器启动,你可以通过输入提示词来查询模型:"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md
|
||||
msgid "v1/completions"
|
||||
msgstr "v1/补全"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md
|
||||
msgid "v1/chat/completions"
|
||||
msgstr "v1/chat/completions"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:96
|
||||
msgid "If you run this successfully, you can see the info shown below:"
|
||||
msgstr "如果你成功运行这个,你可以看到如下所示的信息:"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:102
|
||||
msgid "Offline Inference on Multi-NPU"
|
||||
msgstr "多NPU离线推理"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:104
|
||||
msgid "Run the following script to execute offline inference on multi-NPU:"
|
||||
msgstr "运行以下脚本以在多NPU上执行离线推理:"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md
|
||||
msgid "Graph Mode"
|
||||
msgstr "图模式"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md
|
||||
msgid "Eager Mode"
|
||||
msgstr "即时模式"
|
||||
|
||||
#: ../../tutorials/multi_npu_moge.md:230
|
||||
msgid "If you run this script successfully, you can see the info shown below:"
|
||||
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
|
||||
@@ -0,0 +1,82 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:1
|
||||
msgid "Multi-NPU (QwQ 32B W8A8)"
|
||||
msgstr "多NPU(QwQ 32B W8A8)"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:3
|
||||
msgid "Run docker container"
|
||||
msgstr "运行 docker 容器"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:5
|
||||
msgid "w8a8 quantization feature is supported by v0.8.4rc2 or higher"
|
||||
msgstr "w8a8 量化功能由 v0.8.4rc2 或更高版本支持"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:31
|
||||
msgid "Install modelslim and convert model"
|
||||
msgstr "安装 modelslim 并转换模型"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:33
|
||||
msgid ""
|
||||
"You can choose to convert the model yourself or use the quantized model we "
|
||||
"uploaded, see https://www.modelscope.cn/models/vllm-ascend/QwQ-32B-W8A8"
|
||||
msgstr ""
|
||||
"你可以选择自己转换模型,或者使用我们上传的量化模型,详见 https://www.modelscope.cn/models/vllm-"
|
||||
"ascend/QwQ-32B-W8A8"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:56
|
||||
msgid "Verify the quantized model"
|
||||
msgstr "验证量化模型"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:57
|
||||
msgid "The converted model files looks like:"
|
||||
msgstr "转换后的模型文件如下所示:"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:70
|
||||
msgid ""
|
||||
"Run the following script to start the vLLM server with quantized model:"
|
||||
msgstr "运行以下脚本以启动带有量化模型的 vLLM 服务器:"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:73
|
||||
msgid ""
|
||||
"The value \"ascend\" for \"--quantization\" argument will be supported after"
|
||||
" [a specific PR](https://github.com/vllm-project/vllm-ascend/pull/877) is "
|
||||
"merged and released, you can cherry-pick this commit for now."
|
||||
msgstr ""
|
||||
"在 [特定的PR](https://github.com/vllm-project/vllm-ascend/pull/877) 合并并发布后, \"--"
|
||||
"quantization\" 参数将支持值 \"ascend\",你也可以现在手动挑选该提交。"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:79
|
||||
msgid ""
|
||||
"Once your server is started, you can query the model with input prompts"
|
||||
msgstr "一旦服务器启动,就可以通过输入提示词来查询模型。"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:93
|
||||
msgid ""
|
||||
"Run the following script to execute offline inference on multi-NPU with "
|
||||
"quantized model:"
|
||||
msgstr "运行以下脚本,在多NPU上使用量化模型执行离线推理:"
|
||||
|
||||
#: ../../tutorials/multi_npu_quantization.md:96
|
||||
msgid "To enable quantization for ascend, quantization method must be \"ascend\""
|
||||
msgstr "要在ascend上启用量化,量化方法必须为“ascend”。"
|
||||
@@ -0,0 +1,71 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:1
|
||||
msgid "Multi-NPU (Qwen3-30B-A3B)"
|
||||
msgstr "多NPU(Qwen3-30B-A3B)"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:3
|
||||
msgid "Run vllm-ascend on Multi-NPU with Qwen3 MoE"
|
||||
msgstr "在多NPU上运行带有Qwen3 MoE的vllm-ascend"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:5
|
||||
msgid "Run docker container:"
|
||||
msgstr "运行 docker 容器:"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:30
|
||||
msgid "Setup environment variables:"
|
||||
msgstr "设置环境变量:"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:40
|
||||
msgid "Online Inference on Multi-NPU"
|
||||
msgstr "多NPU的在线推理"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:42
|
||||
msgid "Run the following script to start the vLLM server on Multi-NPU:"
|
||||
msgstr "运行以下脚本以在多NPU上启动 vLLM 服务器:"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:44
|
||||
msgid ""
|
||||
"For an Atlas A2 with 64GB of NPU card memory, tensor-parallel-size should be"
|
||||
" at least 2, and for 32GB of memory, tensor-parallel-size should be at least"
|
||||
" 4."
|
||||
msgstr ""
|
||||
"对于拥有64GB NPU卡内存的Atlas A2,tensor-parallel-size 至少应为2;对于32GB内存的NPU卡,tensor-"
|
||||
"parallel-size 至少应为4。"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:50
|
||||
msgid ""
|
||||
"Once your server is started, you can query the model with input prompts"
|
||||
msgstr "一旦服务器启动,就可以通过输入提示词来查询模型。"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:65
|
||||
msgid "Offline Inference on Multi-NPU"
|
||||
msgstr "多NPU离线推理"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:67
|
||||
msgid "Run the following script to execute offline inference on multi-NPU:"
|
||||
msgstr "运行以下脚本以在多NPU上执行离线推理:"
|
||||
|
||||
#: ../../tutorials/multi_npu_qwen3_moe.md:104
|
||||
msgid "If you run this script successfully, you can see the info shown below:"
|
||||
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
|
||||
@@ -0,0 +1,110 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:1
|
||||
msgid "Single Node (Atlas 300I series)"
|
||||
msgstr "单节点(Atlas 300I 系列)"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:4
|
||||
msgid ""
|
||||
"This Atlas 300I series is currently experimental. In future versions, there "
|
||||
"may be behavioral changes around model coverage, performance improvement."
|
||||
msgstr "Atlas 300I 系列目前处于实验阶段。在未来的版本中,模型覆盖范围和性能提升方面可能会有行为上的变化。"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:7
|
||||
msgid "Run vLLM on Altlas 300I series"
|
||||
msgstr "在 Altlas 300I 系列上运行 vLLM"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:9
|
||||
msgid "Run docker container:"
|
||||
msgstr "运行 docker 容器:"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:38
|
||||
msgid "Setup environment variables:"
|
||||
msgstr "设置环境变量:"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:48
|
||||
msgid "Online Inference on NPU"
|
||||
msgstr "在NPU上进行在线推理"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:50
|
||||
msgid ""
|
||||
"Run the following script to start the vLLM server on NPU(Qwen3-0.6B:1 card, "
|
||||
"Qwen2.5-7B-Instruct:2 cards, Pangu-Pro-MoE-72B: 8 cards):"
|
||||
msgstr ""
|
||||
"运行以下脚本,在 NPU 上启动 vLLM 服务器(Qwen3-0.6B:1 张卡,Qwen2.5-7B-Instruct:2 张卡,Pangu-"
|
||||
"Pro-MoE-72B:8 张卡):"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md
|
||||
msgid "Qwen3-0.6B"
|
||||
msgstr "Qwen3-0.6B"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:59
|
||||
#: ../../tutorials/single_node_300i.md:89
|
||||
#: ../../tutorials/single_node_300i.md:126
|
||||
msgid "Run the following command to start the vLLM server:"
|
||||
msgstr "运行以下命令以启动 vLLM 服务器:"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:70
|
||||
#: ../../tutorials/single_node_300i.md:100
|
||||
#: ../../tutorials/single_node_300i.md:140
|
||||
msgid ""
|
||||
"Once your server is started, you can query the model with input prompts"
|
||||
msgstr "一旦服务器启动,就可以通过输入提示词来查询模型。"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md
|
||||
msgid "Qwen/Qwen2.5-7B-Instruct"
|
||||
msgstr "Qwen/Qwen2.5-7B-Instruct"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md
|
||||
msgid "Pangu-Pro-MoE-72B"
|
||||
msgstr "Pangu-Pro-MoE-72B"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:119
|
||||
#: ../../tutorials/single_node_300i.md:257
|
||||
msgid "Download the model:"
|
||||
msgstr "下载该模型:"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:157
|
||||
msgid "If you run this script successfully, you can see the results."
|
||||
msgstr "如果你成功运行此脚本,你就可以看到结果。"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:159
|
||||
msgid "Offline Inference"
|
||||
msgstr "离线推理"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:161
|
||||
msgid ""
|
||||
"Run the following script (`example.py`) to execute offline inference on NPU:"
|
||||
msgstr "运行以下脚本(`example.py`)以在 NPU 上执行离线推理:"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md
|
||||
msgid "Qwen2.5-7B-Instruct"
|
||||
msgstr "Qwen2.5-7B-指令版"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:320
|
||||
msgid "Run script:"
|
||||
msgstr "运行脚本:"
|
||||
|
||||
#: ../../tutorials/single_node_300i.md:325
|
||||
msgid "If you run this script successfully, you can see the info shown below:"
|
||||
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
|
||||
107
docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu.po
Normal file
107
docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu.po
Normal file
@@ -0,0 +1,107 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/single_npu.md:1
|
||||
msgid "Single NPU (Qwen3 8B)"
|
||||
msgstr "单个NPU(Qwen3 8B)"
|
||||
|
||||
#: ../../tutorials/single_npu.md:3
|
||||
msgid "Run vllm-ascend on Single NPU"
|
||||
msgstr "在单个 NPU 上运行 vllm-ascend"
|
||||
|
||||
#: ../../tutorials/single_npu.md:5
|
||||
msgid "Offline Inference on Single NPU"
|
||||
msgstr "在单个NPU上进行离线推理"
|
||||
|
||||
#: ../../tutorials/single_npu.md:7
|
||||
msgid "Run docker container:"
|
||||
msgstr "运行 docker 容器:"
|
||||
|
||||
#: ../../tutorials/single_npu.md:29
|
||||
msgid "Setup environment variables:"
|
||||
msgstr "设置环境变量:"
|
||||
|
||||
#: ../../tutorials/single_npu.md:40
|
||||
msgid ""
|
||||
"`max_split_size_mb` prevents the native allocator from splitting blocks "
|
||||
"larger than this size (in MB). This can reduce fragmentation and may allow "
|
||||
"some borderline workloads to complete without running out of memory. You can"
|
||||
" find more details "
|
||||
"[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
|
||||
msgstr ""
|
||||
"`max_split_size_mb` 防止本地分配器拆分超过此大小(以 MB "
|
||||
"为单位)的内存块。这可以减少内存碎片,并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
|
||||
|
||||
#: ../../tutorials/single_npu.md:43
|
||||
msgid "Run the following script to execute offline inference on a single NPU:"
|
||||
msgstr "运行以下脚本以在单个 NPU 上执行离线推理:"
|
||||
|
||||
#: ../../tutorials/single_npu.md
|
||||
msgid "Graph Mode"
|
||||
msgstr "图模式"
|
||||
|
||||
#: ../../tutorials/single_npu.md
|
||||
msgid "Eager Mode"
|
||||
msgstr "即时模式"
|
||||
|
||||
#: ../../tutorials/single_npu.md:98
|
||||
msgid "If you run this script successfully, you can see the info shown below:"
|
||||
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
|
||||
|
||||
#: ../../tutorials/single_npu.md:105
|
||||
msgid "Online Serving on Single NPU"
|
||||
msgstr "单个 NPU 上的在线服务"
|
||||
|
||||
#: ../../tutorials/single_npu.md:107
|
||||
msgid "Run docker container to start the vLLM server on a single NPU:"
|
||||
msgstr "运行 docker 容器,在单个 NPU 上启动 vLLM 服务器:"
|
||||
|
||||
#: ../../tutorials/single_npu.md:163
|
||||
msgid ""
|
||||
"Add `--max_model_len` option to avoid ValueError that the Qwen2.5-7B model's"
|
||||
" max seq len (32768) is larger than the maximum number of tokens that can be"
|
||||
" stored in KV cache (26240). This will differ with different NPU series base"
|
||||
" on the HBM size. Please modify the value according to a suitable value for "
|
||||
"your NPU series."
|
||||
msgstr ""
|
||||
"添加 `--max_model_len` 选项,以避免出现 Qwen2.5-7B 模型的最大序列长度(32768)大于 KV 缓存能存储的最大 "
|
||||
"token 数(26240)时的 ValueError。不同 NPU 系列由于 HBM 容量不同,该值也会有所不同。请根据您的 NPU "
|
||||
"系列,修改为合适的数值。"
|
||||
|
||||
#: ../../tutorials/single_npu.md:166
|
||||
msgid "If your service start successfully, you can see the info shown below:"
|
||||
msgstr "如果你的服务启动成功,你会看到如下所示的信息:"
|
||||
|
||||
#: ../../tutorials/single_npu.md:174
|
||||
msgid ""
|
||||
"Once your server is started, you can query the model with input prompts:"
|
||||
msgstr "一旦你的服务器启动,你可以通过输入提示词来查询模型:"
|
||||
|
||||
#: ../../tutorials/single_npu.md:187
|
||||
msgid ""
|
||||
"If you query the server successfully, you can see the info shown below "
|
||||
"(client):"
|
||||
msgstr "如果你成功查询了服务器,你可以看到如下所示的信息(客户端):"
|
||||
|
||||
#: ../../tutorials/single_npu.md:193
|
||||
msgid "Logs of the vllm server:"
|
||||
msgstr "vllm 服务器的日志:"
|
||||
@@ -0,0 +1,77 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:1
|
||||
msgid "Single NPU (Qwen2-Audio 7B)"
|
||||
msgstr "单个 NPU(Qwen2-Audio 7B)"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:3
|
||||
msgid "Run vllm-ascend on Single NPU"
|
||||
msgstr "在单个 NPU 上运行 vllm-ascend"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:5
|
||||
msgid "Offline Inference on Single NPU"
|
||||
msgstr "在单个NPU上进行离线推理"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:7
|
||||
msgid "Run docker container:"
|
||||
msgstr "运行 docker 容器:"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:29
|
||||
msgid "Setup environment variables:"
|
||||
msgstr "设置环境变量:"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:40
|
||||
msgid ""
|
||||
"`max_split_size_mb` prevents the native allocator from splitting blocks "
|
||||
"larger than this size (in MB). This can reduce fragmentation and may allow "
|
||||
"some borderline workloads to complete without running out of memory. You can"
|
||||
" find more details "
|
||||
"[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
|
||||
msgstr ""
|
||||
"`max_split_size_mb` 防止本地分配器拆分超过此大小(以 MB "
|
||||
"为单位)的内存块。这可以减少内存碎片,并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:43
|
||||
msgid "Install packages required for audio processing:"
|
||||
msgstr "安装音频处理所需的软件包:"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:50
|
||||
msgid "Run the following script to execute offline inference on a single NPU:"
|
||||
msgstr "运行以下脚本以在单个 NPU 上执行离线推理:"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:114
|
||||
msgid "If you run this script successfully, you can see the info shown below:"
|
||||
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:120
|
||||
msgid "Online Serving on Single NPU"
|
||||
msgstr "单个 NPU 上的在线服务"
|
||||
|
||||
#: ../../tutorials/single_npu_audio.md:122
|
||||
msgid ""
|
||||
"Currently, vllm's OpenAI-compatible server doesn't support audio inputs, "
|
||||
"find more details [<u>here</u>](https://github.com/vllm-"
|
||||
"project/vllm/issues/19977)."
|
||||
msgstr ""
|
||||
"目前,vllm 的兼容 OpenAI 的服务器不支持音频输入,更多详情请查看[<u>这里</u>](https://github.com/vllm-"
|
||||
"project/vllm/issues/19977)。"
|
||||
@@ -0,0 +1,99 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:1
|
||||
msgid "Single NPU (Qwen2.5-VL 7B)"
|
||||
msgstr "单个NPU(Qwen2.5-VL 7B)"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:3
|
||||
msgid "Run vllm-ascend on Single NPU"
|
||||
msgstr "在单个 NPU 上运行 vllm-ascend"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:5
|
||||
msgid "Offline Inference on Single NPU"
|
||||
msgstr "在单个NPU上进行离线推理"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:7
|
||||
msgid "Run docker container:"
|
||||
msgstr "运行 docker 容器:"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:29
|
||||
msgid "Setup environment variables:"
|
||||
msgstr "设置环境变量:"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:40
|
||||
msgid ""
|
||||
"`max_split_size_mb` prevents the native allocator from splitting blocks "
|
||||
"larger than this size (in MB). This can reduce fragmentation and may allow "
|
||||
"some borderline workloads to complete without running out of memory. You can"
|
||||
" find more details "
|
||||
"[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
|
||||
msgstr ""
|
||||
"`max_split_size_mb` 防止本地分配器拆分超过此大小(以 MB "
|
||||
"为单位)的内存块。这可以减少内存碎片,并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:43
|
||||
msgid "Run the following script to execute offline inference on a single NPU:"
|
||||
msgstr "运行以下脚本以在单个 NPU 上执行离线推理:"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:109
|
||||
msgid "If you run this script successfully, you can see the info shown below:"
|
||||
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:121
|
||||
msgid "Online Serving on Single NPU"
|
||||
msgstr "单个 NPU 上的在线服务"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:123
|
||||
msgid "Run docker container to start the vLLM server on a single NPU:"
|
||||
msgstr "运行 docker 容器,在单个 NPU 上启动 vLLM 服务器:"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:154
|
||||
msgid ""
|
||||
"Add `--max_model_len` option to avoid ValueError that the "
|
||||
"Qwen2.5-VL-7B-Instruct model's max seq len (128000) is larger than the "
|
||||
"maximum number of tokens that can be stored in KV cache. This will differ "
|
||||
"with different NPU series base on the HBM size. Please modify the value "
|
||||
"according to a suitable value for your NPU series."
|
||||
msgstr ""
|
||||
"新增 `--max_model_len` 选项,以避免出现 ValueError,即 Qwen2.5-VL-7B-Instruct "
|
||||
"模型的最大序列长度(128000)大于 KV 缓存可存储的最大 token 数。该数值会根据不同 NPU 系列的 HBM 大小而不同。请根据你的 NPU"
|
||||
" 系列,将该值设置为合适的数值。"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:157
|
||||
msgid "If your service start successfully, you can see the info shown below:"
|
||||
msgstr "如果你的服务启动成功,你会看到如下所示的信息:"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:165
|
||||
msgid ""
|
||||
"Once your server is started, you can query the model with input prompts:"
|
||||
msgstr "一旦你的服务器启动,你可以通过输入提示词来查询模型:"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:182
|
||||
msgid ""
|
||||
"If you query the server successfully, you can see the info shown below "
|
||||
"(client):"
|
||||
msgstr "如果你成功查询了服务器,你可以看到如下所示的信息(客户端):"
|
||||
|
||||
#: ../../tutorials/single_npu_multimodal.md:188
|
||||
msgid "Logs of the vllm server:"
|
||||
msgstr "vllm 服务器的日志:"
|
||||
@@ -0,0 +1,70 @@
|
||||
# SOME DESCRIPTIVE TITLE.
|
||||
# Copyright (C) 2025, vllm-ascend team
|
||||
# This file is distributed under the same license as the vllm-ascend
|
||||
# package.
|
||||
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
|
||||
#
|
||||
#, fuzzy
|
||||
msgid ""
|
||||
msgstr ""
|
||||
"Project-Id-Version: vllm-ascend\n"
|
||||
"Report-Msgid-Bugs-To: \n"
|
||||
"POT-Creation-Date: 2025-07-18 09:01+0800\n"
|
||||
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
|
||||
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
|
||||
"Language-Team: zh_CN <LL@li.org>\n"
|
||||
"Language: zh_CN\n"
|
||||
"MIME-Version: 1.0\n"
|
||||
"Content-Type: text/plain; charset=utf-8\n"
|
||||
"Content-Transfer-Encoding: 8bit\n"
|
||||
"Plural-Forms: nplurals=1; plural=0;\n"
|
||||
"Generated-By: Babel 2.17.0\n"
|
||||
|
||||
#: ../../tutorials/single_npu_qwen3_embedding.md:1
|
||||
msgid "Single NPU (Qwen3-Embedding-8B)"
|
||||
msgstr "单个NPU(Qwen3-Embedding-8B)"
|
||||
|
||||
#: ../../tutorials/single_npu_qwen3_embedding.md:3
|
||||
msgid ""
|
||||
"The Qwen3 Embedding model series is the latest proprietary model of the Qwen"
|
||||
" family, specifically designed for text embedding and ranking tasks. "
|
||||
"Building upon the dense foundational models of the Qwen3 series, it provides"
|
||||
" a comprehensive range of text embeddings and reranking models in various "
|
||||
"sizes (0.6B, 4B, and 8B). This guide describes how to run the model with "
|
||||
"vLLM Ascend. Note that only 0.9.2rc1 and higher versions of vLLM Ascend "
|
||||
"support the model."
|
||||
msgstr ""
|
||||
"Qwen3 Embedding 模型系列是 Qwen 家族最新的专有模型,专为文本嵌入和排序任务设计。在 Qwen3 "
|
||||
"系列的密集基础模型之上,它提供了多种尺寸(0.6B、4B 和 8B)的文本嵌入与重排序模型。本指南介绍如何使用 vLLM Ascend "
|
||||
"运行该模型。请注意,只有 vLLM Ascend 0.9.2rc1 及更高版本才支持该模型。"
|
||||
|
||||
#: ../../tutorials/single_npu_qwen3_embedding.md:5
|
||||
msgid "Run docker container"
|
||||
msgstr "运行 docker 容器"
|
||||
|
||||
#: ../../tutorials/single_npu_qwen3_embedding.md:7
|
||||
msgid ""
|
||||
"Take Qwen3-Embedding-8B model as an example, first run the docker container "
|
||||
"with the following command:"
|
||||
msgstr "以 Qwen3-Embedding-8B 模型为例,首先使用以下命令运行 docker 容器:"
|
||||
|
||||
#: ../../tutorials/single_npu_qwen3_embedding.md:29
|
||||
msgid "Setup environment variables:"
|
||||
msgstr "设置环境变量:"
|
||||
|
||||
#: ../../tutorials/single_npu_qwen3_embedding.md:39
|
||||
msgid "Online Inference"
|
||||
msgstr "在线推理"
|
||||
|
||||
#: ../../tutorials/single_npu_qwen3_embedding.md:45
|
||||
msgid ""
|
||||
"Once your server is started, you can query the model with input prompts"
|
||||
msgstr "一旦服务器启动,就可以通过输入提示词来查询模型。"
|
||||
|
||||
#: ../../tutorials/single_npu_qwen3_embedding.md:56
|
||||
msgid "Offline Inference"
|
||||
msgstr "离线推理"
|
||||
|
||||
#: ../../tutorials/single_npu_qwen3_embedding.md:92
|
||||
msgid "If you run this script successfully, you can see the info shown below:"
|
||||
msgstr "如果你成功运行此脚本,你可以看到如下所示的信息:"
|
||||
Reference in New Issue
Block a user