[v0.18.0][Doc] Translated Doc files 2026-04-22 (#8565)

## Auto-Translation Summary Translated **43** file(s): - <code>docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/KV_Cache_Pool_Guide.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/cpu_binding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/disaggregated_prefill.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/eplb_swift_balancer.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/npugraph_ex.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/patch.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/Design_Documents/quantization.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/index.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/multi_node_test.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_ais_bench.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_evalscope.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_lm_eval.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_opencompass.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/faqs.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/installation.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_colocated_mooncake_multi_instance.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/DeepSeek-V3.1.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM4.x.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM5.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/PaddleOCR-VL.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen-VL-Dense.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-235B-A22B.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3_embedding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/Fine_grained_TP.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/batch_invariance.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/context_parallel.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/cpu_binding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/dynamic_batch.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/eplb_swift_balancer.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/kv_pool.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/layer_sharding.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/netloader.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/npugraph_ex.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/sleep_mode.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/ucm_deployment.po</code> - <code>docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/weight_prefetch.po</code> --- [Workflow run](https://github.com/vllm-project/vllm-ascend/actions/runs/24767290887) Signed-off-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com> Co-authored-by: vllm-ascend-ci <vllm-ascend-ci@users.noreply.github.com>
2026-04-23 11:06:05 +08:00
parent 9e31e4f234
commit 0c458aa6dc
43 changed files with 1389 additions and 1012 deletions
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/long_sequence_context_parallel_multi_node.po
@@ -1,14 +1,7 @@
-# SOME DESCRIPTIVE TITLE.
-# Copyright (C) 2025, vllm-ascend team
-# This file is distributed under the same license as the vllm-ascend
-# package.
-# FIRST AUTHOR <EMAIL@ADDRESS>, 2026.
-#
-msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-15 09:41+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -187,8 +180,8 @@ msgstr "`--tensor-parallel-size` 16 是张量并行（TP）大小的常见设置

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:305
 msgid ""
-"`--prefill-context-parallel-size` 2 are common settings for prefill "
-"context parallelism (PCP) sizes."
+"`--prefill-context-parallel-size` 2 is common setting for prefill context"
+" parallelism (PCP) sizes."
 msgstr "`--prefill-context-parallel-size` 2 是预填充上下文并行（PCP）大小的常见设置。"

 #: ../../source/tutorials/features/long_sequence_context_parallel_multi_node.md:306
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_colocated_mooncake_multi_instance.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_colocated_mooncake_multi_instance.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -40,7 +40,9 @@ msgid ""
 "demonstrates how to use vllm-ascend v0.11.0 (with vLLM v0.11.0) on two "
 "Atlas 800T A2 nodes to deploy two vLLM instances. Each instance occupies "
 "4 NPU cards and uses PD-colocated deployment."
-msgstr "本指南以 Qwen2.5-72B-Instruct 模型为例，演示如何在两个 Atlas 800T A2 节点上使用 vllm-ascend v0.11.0（包含 vLLM v0.11.0）部署两个 vLLM 实例。每个实例占用 4 个 NPU 卡，并采用 PD 共置部署。"
+msgstr ""
+"本指南以 Qwen2.5-72B-Instruct 模型为例，演示如何在两个 Atlas 800T A2 节点上使用 vllm-ascend "
+"v0.11.0（包含 vLLM v0.11.0）部署两个 vLLM 实例。每个实例占用 4 个 NPU 卡，并采用 PD 共置部署。"

 #: ../../source/tutorials/features/pd_colocated_mooncake_multi_instance.md:14
 msgid "Verify Multi-Node Communication Environment"
@@ -128,7 +130,10 @@ msgid ""
 "Mooncake is the serving platform for Kimi, a leading LLM service provided"
 " by Moonshot AI. Installation and compilation guide: <https://github.com"
 "/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries>."
-msgstr "Mooncake 是 Kimi 的服务平台，Kimi 是由 Moonshot AI 提供的领先 LLM 服务。安装和编译指南：<https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file#build-and-use-binaries>。"
+msgstr ""
+"Mooncake 是 Kimi 的服务平台，Kimi 是由 Moonshot AI 提供的领先 LLM "
+"服务。安装和编译指南：<https://github.com/kvcache-ai/Mooncake?tab=readme-ov-file"
+"#build-and-use-binaries>。"

 #: ../../source/tutorials/features/pd_colocated_mooncake_multi_instance.md:121
 msgid "First, obtain the Mooncake project using the following command:"
@@ -275,7 +280,10 @@ msgid ""
 " cross-node, cross-instance KV Cache. Instance 1 utilizes NPU cards [0-3]"
 " on the first Atlas 800T A2 server, while Instance 2 utilizes cards [0-3]"
 " on the second server."
-msgstr "在节点 1 和节点 2 上分别创建容器，并在每个容器中启动 Qwen2.5-72B-Instruct 模型服务，以测试跨节点、跨实例 KV Cache 的可重用性和性能。实例 1 使用第一个 Atlas 800T A2 服务器上的 NPU 卡 [0-3]，而实例 2 使用第二个服务器上的卡 [0-3]。"
+msgstr ""
+"在节点 1 和节点 2 上分别创建容器，并在每个容器中启动 Qwen2.5-72B-Instruct 模型服务，以测试跨节点、跨实例 KV "
+"Cache 的可重用性和性能。实例 1 使用第一个 Atlas 800T A2 服务器上的 NPU 卡 [0-3]，而实例 2 "
+"使用第二个服务器上的卡 [0-3]。"

 #: ../../source/tutorials/features/pd_colocated_mooncake_multi_instance.md:208
 msgid "Deploy Instance 1"
@@ -430,9 +438,9 @@ msgstr "步骤 2 的准备工作"
 #: ../../source/tutorials/features/pd_colocated_mooncake_multi_instance.md:285
 msgid ""
 "Before Step 2, send a fully random Dataset B to Instance 1. Due to the "
-"unified HBM/DRAM KV Cache with LRU (Least Recently Used) eviction policy,"
-" Dataset B's cache evicts Dataset A's cache from HBM, leaving Dataset A's"
-" cache only in Node 1's DRAM."
+"unified on-chip memory/DRAM KV Cache with LRU (Least Recently Used) "
+"eviction policy, Dataset B's cache evicts Dataset A's cache from on-chip "
+"memory, leaving Dataset A's cache only in Node 1's DRAM."
 msgstr "在步骤2之前，向实例1发送一个完全随机的数据集B。由于采用了具有LRU（最近最少使用）淘汰策略的统一HBM/DRAM KV缓存，数据集B的缓存会将数据集A的缓存从HBM中淘汰，使得数据集A的缓存仅保留在节点1的DRAM中。"

 #: ../../source/tutorials/features/pd_colocated_mooncake_multi_instance.md:290
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_multi_node.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-15 09:41+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -40,7 +40,7 @@ msgid ""
 "servers to deploy the \"2P1D\" architecture. Assume the IP of the "
 "prefiller server is 192.0.0.1 (prefill 1) and 192.0.0.2 (prefill 2), and "
 "the decoder servers are 192.0.0.3 (decoder 1) and 192.0.0.4 (decoder 2). "
-"On each server, use 8 NPUs 16 chips to deploy one service instance."
+"On each server, use 8 NPUs and 16 chips to deploy one service instance."
 msgstr ""
 "以 Deepseek-r1-w8a8 模型为例，使用 4 台 Atlas 800T A3 服务器部署 \"2P1D\" 架构。假设预填充服务器 "
 "IP 为 192.0.0.1（预填充节点 1）和 192.0.0.2（预填充节点 2），解码服务器 IP 为 192.0.0.3（解码节点 1）和"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/features/pd_disaggregation_mooncake_single_node.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-15 09:41+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -30,16 +30,17 @@ msgstr "开始使用"
 #: ../../source/tutorials/features/pd_disaggregation_mooncake_single_node.md:5
 msgid ""
 "vLLM-Ascend now supports prefill-decode (PD) disaggregation. This guide "
-"takes one-by-one steps to verify these features with constrained "
-"resources."
-msgstr "vLLM-Ascend 现已支持预填充-解码 (PD) 解耦架构。本指南将逐步引导您在有限资源下验证这些功能。"
+"provides step-by-step instructions to verify this features in resource-"
+"constrained environments."
+msgstr "vLLM-Ascend 现已支持预填充-解码 (PD) 解耦架构。本指南提供逐步说明，帮助您在资源受限的环境中验证这些功能。"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_single_node.md:7
 msgid ""
-"Using the Qwen2.5-VL-7B-Instruct model as an example, use vLLM-Ascend "
+"Using the Qwen2.5-VL-7B-Instruct model as an example, use vllm-ascend "
 "v0.11.0rc1 (with vLLM v0.11.0) on 1 Atlas 800T A2 server to deploy the "
-"\"1P1D\" architecture. Assume the IP address is 192.0.0.1."
-msgstr "以 Qwen2.5-VL-7B-Instruct 模型为例，在 1 台 Atlas 800T A2 服务器上使用 vLLM-Ascend v0.11.0rc1 (包含 vLLM v0.11.0) 部署 \"1P1D\" 架构。假设 IP 地址为 192.0.0.1。"
+"\"1P1D\" architecture (one Prefiller and one Decoder on the same node). "
+"Assume the IP address is 192.0.0.1."
+msgstr "以 Qwen2.5-VL-7B-Instruct 模型为例，在 1 台 Atlas 800T A2 服务器上使用 vllm-ascend v0.11.0rc1（包含 vLLM v0.11.0）部署 \"1P1D\" 架构（同一节点上一个预填充器和一个解码器）。假设 IP 地址为 192.0.0.1。"

 #: ../../source/tutorials/features/pd_disaggregation_mooncake_single_node.md:9
 msgid "Verify Communication Environment"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/DeepSeek-V3.1.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/DeepSeek-V3.1.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -32,32 +32,25 @@ msgid ""
 "DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-"
 "thinking mode. Compared to the previous version, this upgrade brings "
 "improvements in multiple aspects:"
-msgstr ""
-"DeepSeek-V3.1 是一个支持思考模式和非思考模式的混合模型。与前一版本相比，此"
-"次升级在多个方面带来了改进："
+msgstr "DeepSeek-V3.1 是一个支持思考模式和非思考模式的混合模型。与前一版本相比，此次升级在多个方面带来了改进："

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:7
 msgid ""
 "Hybrid thinking mode: One model supports both thinking mode and non-"
 "thinking mode by changing the chat template."
-msgstr ""
-"混合思考模式：一个模型通过更改聊天模板，同时支持思考模式和非思考模式。"
+msgstr "混合思考模式：一个模型通过更改聊天模板，同时支持思考模式和非思考模式。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:9
 msgid ""
 "Smarter tool calling: Through post-training optimization, the model's "
 "performance in tool usage and agent tasks has significantly improved."
-msgstr ""
-"更智能的工具调用：通过后训练优化，模型在工具使用和智能体任务方面的性能显著提"
-"升。"
+msgstr "更智能的工具调用：通过后训练优化，模型在工具使用和智能体任务方面的性能显著提升。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:11
 msgid ""
 "Higher thinking efficiency: DeepSeek-V3.1-Think achieves comparable "
 "answer quality to DeepSeek-R1-0528, while responding more quickly."
-msgstr ""
-"更高的思考效率：DeepSeek-V3.1-Think 实现了与 DeepSeek-R1-0528 相当的答案质"
-"量，同时响应速度更快。"
+msgstr "更高的思考效率：DeepSeek-V3.1-Think 实现了与 DeepSeek-R1-0528 相当的答案质量，同时响应速度更快。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:13
 msgid "The `DeepSeek-V3.1` model is first supported in `vllm-ascend:v0.9.1rc3`."
@@ -69,9 +62,7 @@ msgid ""
 "including supported features, feature configuration, environment "
 "preparation, single-node and multi-node deployment, accuracy and "
 "performance evaluation."
-msgstr ""
-"本文档将展示该模型的主要验证步骤，包括支持的特性、特性配置、环境准备、单节点"
-"和多节点部署、精度和性能评估。"
+msgstr "本文档将展示该模型的主要验证步骤，包括支持的特性、特性配置、环境准备、单节点和多节点部署、精度和性能评估。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:17
 msgid "Supported Features"
@@ -90,9 +81,7 @@ msgstr ""
 msgid ""
 "Refer to [feature guide](../../user_guide/feature_guide/index.md) to get "
 "the feature's configuration."
-msgstr ""
-"请参考 [特性指南](../../user_guide/feature_guide/index.md) 以获取特性的配"
-"置。"
+msgstr "请参考 [特性指南](../../user_guide/feature_guide/index.md) 以获取特性的配置。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:23
 msgid "Environment Preparation"
@@ -107,8 +96,8 @@ msgid ""
 "`DeepSeek-V3.1`(BF16 version): [Download model "
 "weight](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-V3.1)."
 msgstr ""
-"`DeepSeek-V3.1`（BF16 版本）：[下载模型权重](https://www.modelscope.cn/"
-"models/deepseek-ai/DeepSeek-V3.1)。"
+"`DeepSeek-V3.1`（BF16 版本）：[下载模型权重](https://www.modelscope.cn/models"
+"/deepseek-ai/DeepSeek-V3.1)。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:28
 msgid ""
@@ -116,9 +105,9 @@ msgid ""
 "[Download model weight](https://www.modelscope.cn/models/Eco-"
 "Tech/DeepSeek-V3.1-w8a8-mtp-QuaRot)."
 msgstr ""
-"`DeepSeek-V3.1-w8a8-mtp-QuaRot`（混合 MTP 量化版本）：[下载模型权重]"
-"(https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-w8a8-mtp-"
-"QuaRot)。"
+"`DeepSeek-V3.1-w8a8-mtp-QuaRot`（混合 MTP "
+"量化版本）：[下载模型权重](https://www.modelscope.cn/models/Eco-"
+"Tech/DeepSeek-V3.1-w8a8-mtp-QuaRot)。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:29
 msgid ""
@@ -126,9 +115,9 @@ msgid ""
 " [Download model weight](https://www.modelscope.cn/models/Eco-"
 "Tech/DeepSeek-V3.1-Terminus-w4a8-mtp-QuaRot)."
 msgstr ""
-"`DeepSeek-V3.1-Terminus-w4a8-mtp-QuaRot`（混合 MTP 量化版本）：[下载模型权"
-"重](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-Terminus-w4a8-"
-"mtp-QuaRot)。"
+"`DeepSeek-V3.1-Terminus-w4a8-mtp-QuaRot`（混合 MTP "
+"量化版本）：[下载模型权重](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1"
+"-Terminus-w4a8-mtp-QuaRot)。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:30
 #, python-format
@@ -137,8 +126,7 @@ msgid ""
 "[msmodelslim](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v31-w8a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96-mtp-%E9%87%8F%E5%8C%96)."
 " You can use this method to quantize the model."
 msgstr ""
-"`量化方法`："
-"[msmodelslim](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v31-w8a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96-mtp-%E9%87%8F%E5%8C%96)。"
+"`量化方法`：[msmodelslim](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v31-w8a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96-mtp-%E9%87%8F%E5%8C%96)。"
 " 您可以使用此方法对模型进行量化。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:32
@@ -157,8 +145,8 @@ msgid ""
 "node communication according to [verify multi-node communication "
 "environment](../../installation.md#verify-multi-node-communication)."
 msgstr ""
-"如果您想部署多节点环境，需要根据 [验证多节点通信环境](../../installation."
-"md#verify-multi-node-communication) 验证多节点通信。"
+"如果您想部署多节点环境，需要根据 [验证多节点通信环境](../../installation.md#verify-multi-node-"
+"communication) 验证多节点通信。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:38
 msgid "Installation"
@@ -174,8 +162,8 @@ msgid ""
 "your node, refer to [using docker](../../installation.md#set-up-using-"
 "docker)."
 msgstr ""
-"根据您的机器类型选择镜像并在节点上启动 docker 镜像，请参考 [使用 docker]"
-"(../../installation.md#set-up-using-docker)。"
+"根据您的机器类型选择镜像并在节点上启动 docker 镜像，请参考 [使用 docker](../../installation.md#set-"
+"up-using-docker)。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:80
 msgid ""
@@ -195,9 +183,7 @@ msgstr "单节点部署"
 msgid ""
 "Quantized model `DeepSeek-V3.1-w8a8-mtp-QuaRot` can be deployed on 1 "
 "Atlas 800 A3 (64G × 16)."
-msgstr ""
-"量化模型 `DeepSeek-V3.1-w8a8-mtp-QuaRot` 可以部署在 1 台 Atlas 800 A3 "
-"（64G × 16）上。"
+msgstr "量化模型 `DeepSeek-V3.1-w8a8-mtp-QuaRot` 可以部署在 1 台 Atlas 800 A3 （64G × 16）上。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:88
 msgid "Run the following script to execute online inference."
@@ -215,9 +201,8 @@ msgid ""
 " Furthermore, enabling this feature is not recommended in scenarios where"
 " PD is separated."
 msgstr ""
-"设置环境变量 `VLLM_ASCEND_BALANCE_SCHEDULING=1` 启用均衡调度。这可能有助于"
-"在 v1 调度器中提高输出吞吐量并降低 TPOT。然而，在某些场景下 TTFT 可能会下"
-"降。此外，在 PD 分离的场景中不建议启用此功能。"
+"设置环境变量 `VLLM_ASCEND_BALANCE_SCHEDULING=1` 启用均衡调度。这可能有助于在 v1 "
+"调度器中提高输出吞吐量并降低 TPOT。然而，在某些场景下 TTFT 可能会下降。此外，在 PD 分离的场景中不建议启用此功能。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:135
 msgid ""
@@ -233,24 +218,20 @@ msgid ""
 "`16384` is sufficient, however, for precision testing, please set it at "
 "least `35000`."
 msgstr ""
-"`--max-model-len` 指定最大上下文长度——即单个请求的输入和输出令牌之和。对于输"
-"入长度为 3.5K 和输出长度为 1.5K 的性能测试，`16384` 的值就足够了，但是，对于"
-"精度测试，请至少将其设置为 `35000`。"
+"`--max-model-len` 指定最大上下文长度——即单个请求的输入和输出令牌之和。对于输入长度为 3.5K 和输出长度为 1.5K "
+"的性能测试，`16384` 的值就足够了，但是，对于精度测试，请至少将其设置为 `35000`。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:137
 msgid ""
 "`--no-enable-prefix-caching` indicates that prefix caching is disabled. "
 "To enable it, remove this option."
-msgstr ""
-"`--no-enable-prefix-caching` 表示前缀缓存被禁用。要启用它，请移除此选项。"
+msgstr "`--no-enable-prefix-caching` 表示前缀缓存被禁用。要启用它，请移除此选项。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:138
 msgid ""
 "If you use the w4a8 weight, more memory will be allocated to kvcache, and"
 " you can try to increase system throughput to achieve greater throughput."
-msgstr ""
-"如果使用 w4a8 权重，将分配更多内存给 kvcache，您可以尝试增加系统吞吐量以实现"
-"更大的吞吐量。"
+msgstr "如果使用 w4a8 权重，将分配更多内存给 kvcache，您可以尝试增加系统吞吐量以实现更大的吞吐量。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:140
 msgid "Multi-node Deployment"
@@ -260,8 +241,7 @@ msgstr "多节点部署"
 msgid ""
 "`DeepSeek-V3.1-w8a8-mtp-QuaRot`: require at least 2 Atlas 800 A2 (64G × "
 "8)."
-msgstr ""
-"`DeepSeek-V3.1-w8a8-mtp-QuaRot`：需要至少 2 台 Atlas 800 A2（64G × 8）。"
+msgstr "`DeepSeek-V3.1-w8a8-mtp-QuaRot`：需要至少 2 台 Atlas 800 A2（64G × 8）。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:144
 msgid "Run the following scripts on two nodes respectively."
@@ -284,8 +264,8 @@ msgid ""
 "We recommend using Mooncake for deployment: "
 "[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)."
 msgstr ""
-"我们建议使用 Mooncake 进行部署：[Mooncake](../features/"
-"pd_disaggregation_mooncake_multi_node.md)。"
+"我们建议使用 Mooncake "
+"进行部署：[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:256
 msgid ""
@@ -293,27 +273,27 @@ msgid ""
 "nodes) rather than 1P1D (2 nodes), because there is no enough NPU memory "
 "to serve high concurrency in 1P1D case."
 msgstr ""
-"以 Atlas 800 A3（64G × 16）为例，我们建议部署 2P1D（4 个节点）而不是 1P1D"
-"（2 个节点），因为在 1P1D 情况下没有足够的 NPU 内存来服务高并发。"
+"以 Atlas 800 A3（64G × 16）为例，我们建议部署 2P1D（4 个节点）而不是 1P1D（2 个节点），因为在 1P1D "
+"情况下没有足够的 NPU 内存来服务高并发。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:258
 msgid ""
 "`DeepSeek-V3.1-w8a8-mtp-QuaRot 2P1D Layerwise` require 4 Atlas 800 A3 "
 "(64G × 16)."
 msgstr ""
-"`DeepSeek-V3.1-w8a8-mtp-QuaRot 2P1D Layerwise` 需要 4 台 Atlas 800 A3 "
-"（64G × 16）。"
+"`DeepSeek-V3.1-w8a8-mtp-QuaRot 2P1D Layerwise` 需要 4 台 Atlas 800 A3 （64G ×"
+" 16）。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:260
 msgid ""
 "To run the vllm-ascend `Prefill-Decode Disaggregation` service, you need "
-"to deploy a `launch_dp_program.py` script and a `run_dp_template.sh` "
+"to deploy a `launch_online_dp.py` script and a `run_dp_template.sh` "
 "script on each node and deploy a `proxy.sh` script on prefill master node"
 " to forward requests."
 msgstr ""
 "要运行 vllm-ascend `Prefill-Decode 解耦`服务，您需要在每个节点上部署一个 "
-"`launch_dp_program.py` 脚本和一个 `run_dp_template.sh` 脚本，并在 prefill "
-"主节点上部署一个 `proxy.sh` 脚本来转发请求。"
+"`launch_online_dp.py` 脚本和一个 `run_dp_template.sh` 脚本，并在 prefill 主节点上部署一个 "
+"`proxy.sh` 脚本来转发请求。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:262
 msgid ""
@@ -321,9 +301,9 @@ msgid ""
 "[launch\\_online\\_dp.py](https://github.com/vllm-project/vllm-"
 "ascend/blob/main/examples/external_online_dp/launch_online_dp.py)"
 msgstr ""
-"`launch_online_dp.py` 用于启动外部 dp vllm 服务器。[launch\\_online\\_dp."
-"py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/"
-"external_online_dp/launch_online_dp.py)"
+"`launch_online_dp.py` 用于启动外部 dp vllm "
+"服务器。[launch\\_online\\_dp.py](https://github.com/vllm-project/vllm-"
+"ascend/blob/main/examples/external_online_dp/launch_online_dp.py)"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:265
 msgid "Prefill Node 0 `run_dp_template.sh` script"
@@ -358,8 +338,8 @@ msgid ""
 "Prefill-Decode (PD) separation scenario, enable MLAPO only on decode "
 "nodes."
 msgstr ""
-"`VLLM_ASCEND_ENABLE_MLAPO=1`：启用融合算子，这可以显著提高性能但会消耗更多 "
-"NPU 内存。在 Prefill-Decode (PD) 分离场景中，仅在 decode 节点上启用 MLAPO。"
+"`VLLM_ASCEND_ENABLE_MLAPO=1`：启用融合算子，这可以显著提高性能但会消耗更多 NPU 内存。在 Prefill-"
+"Decode (PD) 分离场景中，仅在 decode 节点上启用 MLAPO。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:576
 msgid ""
@@ -367,9 +347,7 @@ msgid ""
 "Multi-Token Prediction (MTP) is enabled, asynchronous scheduling of "
 "operator delivery can be implemented to overlap the operator delivery "
 "latency."
-msgstr ""
-"`--async-scheduling`：启用异步调度功能。当启用多令牌预测 (MTP) 时，可以实现算"
-"子交付的异步调度，以重叠算子交付延迟。"
+msgstr "`--async-scheduling`：启用异步调度功能。当启用多令牌预测 (MTP) 时，可以实现算子交付的异步调度，以重叠算子交付延迟。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:577
 msgid ""
@@ -378,9 +356,8 @@ msgid ""
 "it is recommended to set them to the number of frequently occurring "
 "requests on the Decode (D) node."
 msgstr ""
-"`cudagraph_capture_sizes`：推荐值为 `n x (mtp + 1)`。最小值为 `n = 1`，最大"
-"值为 `n = max-num-seqs`。对于其他值，建议将其设置为 Decode (D) 节点上频繁出"
-"现的请求数量。"
+"`cudagraph_capture_sizes`：推荐值为 `n x (mtp + 1)`。最小值为 `n = 1`，最大值为 `n = "
+"max-num-seqs`。对于其他值，建议将其设置为 Decode (D) 节点上频繁出现的请求数量。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:578
 msgid ""
@@ -390,9 +367,9 @@ msgid ""
 "the PD separation scenario, it is recommended to enable this "
 "configuration on both prefill and decode nodes simultaneously."
 msgstr ""
-"`recompute_scheduler_enable: true`：启用重计算调度器。当 decode 节点的键值缓"
-"存 (KV Cache) 不足时，请求将被发送到 prefill 节点以重新计算 KV Cache。在 PD "
-"分离场景中，建议同时在 prefill 和 decode 节点上启用此配置。"
+"`recompute_scheduler_enable: true`：启用重计算调度器。当 decode 节点的键值缓存 (KV Cache) "
+"不足时，请求将被发送到 prefill 节点以重新计算 KV Cache。在 PD 分离场景中，建议同时在 prefill 和 decode "
+"节点上启用此配置。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:579
 msgid ""
@@ -402,8 +379,7 @@ msgid ""
 "improved efficiency."
 msgstr ""
 "`multistream_overlap_shared_expert: true`：当张量并行 (TP) 大小为 1 或 "
-"`enable_shared_expert_dp: true` 时，启用额外的流来重叠共享专家的计算过程，以"
-"提高效率。"
+"`enable_shared_expert_dp: true` 时，启用额外的流来重叠共享专家的计算过程，以提高效率。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:580
 msgid ""
@@ -412,9 +388,8 @@ msgid ""
 "embedding layer to be greater than 1, which is used to reduce the "
 "computational load of each card on the LMHead embedding layer."
 msgstr ""
-"`lmhead_tensor_parallel_size: 16`：当 decode 节点的张量并行 (TP) 大小为 1 "
-"时，此参数允许 LMHead 嵌入层的 TP 大小大于 1，用于减少每张卡在 LMHead 嵌入层"
-"上的计算负载。"
+"`lmhead_tensor_parallel_size: 16`：当 decode 节点的张量并行 (TP) 大小为 1 时，此参数允许 "
+"LMHead 嵌入层的 TP 大小大于 1，用于减少每张卡在 LMHead 嵌入层上的计算负载。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:582
 msgid "run server for each node:"
@@ -431,7 +406,10 @@ msgid ""
 "[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-"
 "project/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
-msgstr "在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序：[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
+msgstr ""
+"在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序：[load\\_balance\\_proxy\\_server\\_example.py](https://github.com"
+"/vllm-project/vllm-"
+"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:653
 msgid "Functional Verification"
@@ -466,7 +444,9 @@ msgid ""
 "After execution, you can get the result, here is the result of "
 "`DeepSeek-V3.1-w8a8-mtp-QuaRot` in `vllm-ascend:0.11.0rc1` for reference "
 "only."
-msgstr "执行后，您可以获得结果。以下是 `vllm-ascend:0.11.0rc1` 中 `DeepSeek-V3.1-w8a8-mtp-QuaRot` 的结果，仅供参考。"
+msgstr ""
+"执行后，您可以获得结果。以下是 `vllm-ascend:0.11.0rc1` 中 `DeepSeek-V3.1-w8a8-mtp-QuaRot`"
+" 的结果，仅供参考。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:44
 msgid "dataset"
@@ -541,7 +521,10 @@ msgid ""
 "Refer to [Using AISBench for performance "
 "evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
 "performance-evaluation) for details."
-msgstr "详情请参考[使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
+msgstr ""
+"详情请参考[使用 AISBench "
+"进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-"
+"performance-evaluation)。"

 #: ../../source/tutorials/models/DeepSeek-V3.1.md:693
 msgid "The performance result is:"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM4.x.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM4.x.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -74,41 +74,56 @@ msgstr "模型权重"
 msgid ""
 "`GLM-4.5`(BF16 version): [Download model "
 "weight](https://www.modelscope.cn/models/ZhipuAI/GLM-4.5)."
-msgstr "`GLM-4.5`（BF16 版本）：[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.5)。"
+msgstr ""
+"`GLM-4.5`（BF16 "
+"版本）：[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.5)。"

 #: ../../source/tutorials/models/GLM4.x.md:22
 msgid ""
 "`GLM-4.6`(BF16 version): [Download model "
 "weight](https://www.modelscope.cn/models/ZhipuAI/GLM-4.6)."
-msgstr "`GLM-4.6`（BF16 版本）：[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.6)。"
+msgstr ""
+"`GLM-4.6`（BF16 "
+"版本）：[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.6)。"

 #: ../../source/tutorials/models/GLM4.x.md:23
 msgid ""
 "`GLM-4.7`(BF16 version): [Download model "
 "weight](https://www.modelscope.cn/models/ZhipuAI/GLM-4.7)."
-msgstr "`GLM-4.7`（BF16 版本）：[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.7)。"
+msgstr ""
+"`GLM-4.7`（BF16 "
+"版本）：[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-4.7)。"

 #: ../../source/tutorials/models/GLM4.x.md:24
 msgid ""
 "`GLM-4.5-w8a8-with-float-mtp`(Quantized version with mtp): [Download "
 "model weight](https://modelers.cn/models/Modelers_Park/GLM-4.5-w8a8)."
-msgstr "`GLM-4.5-w8a8-with-float-mtp`（带 mtp 的量化版本）：[下载模型权重](https://modelers.cn/models/Modelers_Park/GLM-4.5-w8a8)。"
+msgstr ""
+"`GLM-4.5-w8a8-with-float-mtp`（带 mtp "
+"的量化版本）：[下载模型权重](https://modelers.cn/models/Modelers_Park/GLM-4.5-w8a8)。"

 #: ../../source/tutorials/models/GLM4.x.md:25
 msgid ""
 "`GLM-4.6-w8a8`(Quantized version without mtp): [Download model "
 "weight](https://modelers.cn/models/Modelers_Park/GLM-4.6-w8a8). Because "
-"vllm do not support GLM4.6 mtp in October, so we do not provide mtp "
-"version. And last month, it supported, you can use the following "
-"quantization scheme to add mtp weights to Quantized weights."
-msgstr "`GLM-4.6-w8a8`（不带 mtp 的量化版本）：[下载模型权重](https://modelers.cn/models/Modelers_Park/GLM-4.6-w8a8)。由于 vllm 在十月份不支持 GLM4.6 的 mtp，因此我们不提供 mtp 版本。上个月已支持，您可以使用以下量化方案将 mtp 权重添加到量化权重中。"
+"vllm does not support GLM4.6 mtp in October, we do not provide an mtp "
+"version. Last month, it was supported; you can use the following "
+"quantization scheme to add mtp weights to the quantized weights."
+msgstr ""
+"`GLM-4.6-w8a8`（不带 mtp "
+"的量化版本）：[下载模型权重](https://modelers.cn/models/Modelers_Park/GLM-4.6-w8a8)。由于"
+" vllm 在十月份不支持 GLM4.6 的 mtp，因此我们不提供 mtp 版本。上个月已支持，您可以使用以下量化方案将 mtp "
+"权重添加到量化权重中。"

 #: ../../source/tutorials/models/GLM4.x.md:26
 msgid ""
 "`GLM-4.7-w8a8-with-float-mtp`(Quantized version without mtp): [Download "
 "model weight](https://modelscope.cn/models/Eco-"
 "Tech/GLM-4.7-W8A8-floatmtp)."
-msgstr "`GLM-4.7-w8a8-with-float-mtp`（不带 mtp 的量化版本）：[下载模型权重](https://modelscope.cn/models/Eco-Tech/GLM-4.7-W8A8-floatmtp)。"
+msgstr ""
+"`GLM-4.7-w8a8-with-float-mtp`（不带 mtp "
+"的量化版本）：[下载模型权重](https://modelscope.cn/models/Eco-"
+"Tech/GLM-4.7-W8A8-floatmtp)。"

 #: ../../source/tutorials/models/GLM4.x.md:27
 msgid ""
@@ -136,14 +151,17 @@ msgid "A3 series"
 msgstr "A3 系列"

 #: ../../source/tutorials/models/GLM4.x.md:42
-#: ../../source/tutorials/models/GLM4.x.md:85
-msgid "Start the docker image on your each node."
-msgstr "在您的每个节点上启动 docker 镜像。"
+msgid "Start the docker image on each node."
+msgstr "在每个节点上启动 docker 镜像。"

 #: ../../source/tutorials/models/GLM4.x.md
 msgid "A2 series"
 msgstr "A2 系列"

+#: ../../source/tutorials/models/GLM4.x.md:85
+msgid "Start the docker image on your each node."
+msgstr "在每个节点上启动 docker 镜像。"
+
 #: ../../source/tutorials/models/GLM4.x.md:118
 msgid ""
 "In addition, if you don't want to use the docker image as above, you can "
@@ -180,7 +198,12 @@ msgid ""
 "The optimization of the FIA operator will be enabled by default in CANN "
 "9.x releases, and manual replacement will no longer be required. Please "
 "stay tuned for updates to this document."
-msgstr "我们已在 CANN 8.5.1 中优化了 FIA 算子。需要手动替换与 FIA 算子相关的文件。请执行 FIA 算子替换脚本：[A2](../../../../tools/install_flash_infer_attention_score_ops_a2.sh) 和 [A3](../../../../tools/install_flash_infer_attention_score_ops_a3.sh)。FIA 算子的优化将在 CANN 9.x 版本中默认启用，届时将不再需要手动替换。请关注本文档的更新。"
+msgstr ""
+"我们已在 CANN 8.5.1 中优化了 FIA 算子。需要手动替换与 FIA 算子相关的文件。请执行 FIA "
+"算子替换脚本：[A2](../../../../tools/install_flash_infer_attention_score_ops_a2.sh)"
+" 和 "
+"[A3](../../../../tools/install_flash_infer_attention_score_ops_a3.sh)。FIA"
+" 算子的优化将在 CANN 9.x 版本中默认启用，届时将不再需要手动替换。请关注本文档的更新。"

 #: ../../source/tutorials/models/GLM4.x.md:132
 msgid "Single-node Deployment"
@@ -194,144 +217,155 @@ msgstr "在低延迟场景下，我们推荐单机部署。"
 msgid ""
 "Quantized model `glm4.7_w8a8_with_float_mtp` can be deployed on 1 Atlas "
 "800 A3 (64G × 16) or 1 Atlas 800 A2 (64G × 8)."
-msgstr "量化模型 `glm4.7_w8a8_with_float_mtp` 可以部署在 1 台 Atlas 800 A3（64G × 16）或 1 台 Atlas 800 A2（64G × 8）上。"
+msgstr ""
+"量化模型 `glm4.7_w8a8_with_float_mtp` 可以部署在 1 台 Atlas 800 A3（64G × 16）或 1 台 "
+"Atlas 800 A2（64G × 8）上。"

 #: ../../source/tutorials/models/GLM4.x.md:137
 msgid "Run the following script to execute online inference."
 msgstr "运行以下脚本以执行在线推理。"

-#: ../../source/tutorials/models/GLM4.x.md:169
+#: ../../source/tutorials/models/GLM4.x.md:168
 msgid "**Notice:** The parameters are explained as follows:"
 msgstr "**注意：** 参数解释如下："

-#: ../../source/tutorials/models/GLM4.x.md:172
+#: ../../source/tutorials/models/GLM4.x.md:171
 msgid ""
 "`--async-scheduling` Asynchronous scheduling is a technique used to "
 "optimize inference efficiency. It allows non-blocking task scheduling to "
 "improve concurrency and throughput, especially when processing large-"
 "scale models."
-msgstr "`--async-scheduling` 异步调度是一种用于优化推理效率的技术。它允许非阻塞的任务调度，以提高并发性和吞吐量，特别是在处理大规模模型时。"
+msgstr ""
+"`--async-scheduling` "
+"异步调度是一种用于优化推理效率的技术。它允许非阻塞的任务调度，以提高并发性和吞吐量，特别是在处理大规模模型时。"

-#: ../../source/tutorials/models/GLM4.x.md:173
+#: ../../source/tutorials/models/GLM4.x.md:172
 msgid ""
 "`fusion_ops_gmmswigluquant` The performance of the GmmSwigluQuant fusion "
 "operator tends to degrade when the total number of NPUs is ≤ 16."
 msgstr "`fusion_ops_gmmswigluquant` 当 NPU 总数 ≤ 16 时，GmmSwigluQuant 融合算子的性能往往会下降。"

-#: ../../source/tutorials/models/GLM4.x.md:175
+#: ../../source/tutorials/models/GLM4.x.md:174
 msgid "Multi-node Deployment"
 msgstr "多节点部署"

-#: ../../source/tutorials/models/GLM4.x.md:177
+#: ../../source/tutorials/models/GLM4.x.md:176
 msgid ""
 "Although the former tutorial said \"Not recommended to deploy multi-node "
 "on Atlas 800 A2 (64G × 8)\", but if you insist to deploy GLM-4.x model on"
 " multi-node like 2 × Atlas 800 A2 (64G × 8), run the following scripts on"
 " two nodes respectively."
-msgstr "尽管之前的教程提到“不建议在 Atlas 800 A2（64G × 8）上部署多节点”，但如果您坚持要在类似 2 × Atlas 800 A2（64G × 8）的多节点上部署 GLM-4.x 模型，请分别在两个节点上运行以下脚本。"
+msgstr ""
+"尽管之前的教程提到“不建议在 Atlas 800 A2（64G × 8）上部署多节点”，但如果您坚持要在类似 2 × Atlas 800 "
+"A2（64G × 8）的多节点上部署 GLM-4.x 模型，请分别在两个节点上运行以下脚本。"

-#: ../../source/tutorials/models/GLM4.x.md:179
+#: ../../source/tutorials/models/GLM4.x.md:178
 msgid "**Node 0**"
 msgstr "**节点 0**"

-#: ../../source/tutorials/models/GLM4.x.md:230
+#: ../../source/tutorials/models/GLM4.x.md:228
 msgid "**Node 1**"
 msgstr "**节点 1**"

-#: ../../source/tutorials/models/GLM4.x.md:283
+#: ../../source/tutorials/models/GLM4.x.md:280
 msgid "Prefill-Decode Disaggregation"
 msgstr "Prefill-Decode 解耦部署"

-#: ../../source/tutorials/models/GLM4.x.md:285
+#: ../../source/tutorials/models/GLM4.x.md:282
 msgid ""
 "We'd like to show the deployment guide of `GLM4.7` on multi-node "
 "environment with 2P1D for better performance."
 msgstr "我们将展示 `GLM4.7` 在多节点环境（2P1D）下的部署指南，以获得更好的性能。"

-#: ../../source/tutorials/models/GLM4.x.md:287
+#: ../../source/tutorials/models/GLM4.x.md:284
 msgid "Before you start, please"
 msgstr "在开始之前，请"

-#: ../../source/tutorials/models/GLM4.x.md:289
+#: ../../source/tutorials/models/GLM4.x.md:286
 msgid "prepare the script `launch_online_dp.py` on each node:"
 msgstr "在每个节点上准备脚本 `launch_online_dp.py`："

-#: ../../source/tutorials/models/GLM4.x.md:392
+#: ../../source/tutorials/models/GLM4.x.md:389
 msgid "prepare the script `run_dp_template.sh` on each node."
 msgstr "在每个节点上准备脚本 `run_dp_template.sh`。"

-#: ../../source/tutorials/models/GLM4.x.md:394
-#: ../../source/tutorials/models/GLM4.x.md:669
+#: ../../source/tutorials/models/GLM4.x.md:391
+#: ../../source/tutorials/models/GLM4.x.md:664
 msgid "Prefill node 0"
 msgstr "Prefill 节点 0"

-#: ../../source/tutorials/models/GLM4.x.md:460
-#: ../../source/tutorials/models/GLM4.x.md:676
+#: ../../source/tutorials/models/GLM4.x.md:456
+#: ../../source/tutorials/models/GLM4.x.md:671
 msgid "Prefill node 1"
 msgstr "Prefill 节点 1"

-#: ../../source/tutorials/models/GLM4.x.md:525
-#: ../../source/tutorials/models/GLM4.x.md:683
+#: ../../source/tutorials/models/GLM4.x.md:520
+#: ../../source/tutorials/models/GLM4.x.md:678
 msgid "Decode node 0"
 msgstr "Decode 节点 0"

-#: ../../source/tutorials/models/GLM4.x.md:596
-#: ../../source/tutorials/models/GLM4.x.md:690
+#: ../../source/tutorials/models/GLM4.x.md:591
+#: ../../source/tutorials/models/GLM4.x.md:685
 msgid "Decode node 1"
 msgstr "Decode 节点 1"

-#: ../../source/tutorials/models/GLM4.x.md:667
+#: ../../source/tutorials/models/GLM4.x.md:662
 msgid ""
 "Once the preparation is done, you can start the server with the following"
 " command on each node:"
 msgstr "准备工作完成后，您可以在每个节点上使用以下命令启动服务器："

-#: ../../source/tutorials/models/GLM4.x.md:697
+#: ../../source/tutorials/models/GLM4.x.md:692
 msgid "Request Forwarding"
 msgstr "请求转发"

-#: ../../source/tutorials/models/GLM4.x.md:699
+#: ../../source/tutorials/models/GLM4.x.md:694
 msgid ""
 "To set up request forwarding, run the following script on any machine. "
 "You can get the proxy program in the repository's examples: "
 "[load_balance_proxy_server_example.py](https://github.com/vllm-project"
 "/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
-msgstr "要设置请求转发，请在任何机器上运行以下脚本。您可以在仓库的示例中找到代理程序：[load_balance_proxy_server_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
+msgstr ""
+"要设置请求转发，请在任何机器上运行以下脚本。您可以在仓库的示例中找到代理程序：[load_balance_proxy_server_example.py](https://github.com"
+"/vllm-project/vllm-"
+"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"

-#: ../../source/tutorials/models/GLM4.x.md:728
+#: ../../source/tutorials/models/GLM4.x.md:723
 msgid "Functional Verification"
 msgstr "功能验证"

-#: ../../source/tutorials/models/GLM4.x.md:730
+#: ../../source/tutorials/models/GLM4.x.md:725
 msgid "Once your server is started, you can query the model with input prompts:"
 msgstr "服务器启动后，您可以使用输入提示词查询模型："

-#: ../../source/tutorials/models/GLM4.x.md:749
+#: ../../source/tutorials/models/GLM4.x.md:744
 msgid "Accuracy Evaluation"
 msgstr "精度评估"

-#: ../../source/tutorials/models/GLM4.x.md:751
+#: ../../source/tutorials/models/GLM4.x.md:746
 msgid "Here are two accuracy evaluation methods."
 msgstr "这里有两种精度评估方法。"

-#: ../../source/tutorials/models/GLM4.x.md:753
-#: ../../source/tutorials/models/GLM4.x.md:770
+#: ../../source/tutorials/models/GLM4.x.md:748
+#: ../../source/tutorials/models/GLM4.x.md:765
 msgid "Using AISBench"
 msgstr "使用 AISBench"

-#: ../../source/tutorials/models/GLM4.x.md:755
+#: ../../source/tutorials/models/GLM4.x.md:750
 msgid ""
 "Refer to [Using "
 "AISBench](../../developer_guide/evaluation/using_ais_bench.md) for "
 "details."
 msgstr "详情请参考[使用 AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"

-#: ../../source/tutorials/models/GLM4.x.md:757
+#: ../../source/tutorials/models/GLM4.x.md:752
 msgid ""
 "After execution, you can get the result, here is the result of `GLM4.7` "
 "in `vllm-ascend:main` (after `vllm-ascend:0.14.0rc1`) for reference only."
-msgstr "执行后，您可以获得结果，以下是 `GLM4.7` 在 `vllm-ascend:main`（`vllm-ascend:0.14.0rc1` 之后）中的结果，仅供参考。"
+msgstr ""
+"执行后，您可以获得结果，以下是 `GLM4.7` 在 `vllm-ascend:main`（`vllm-ascend:0.14.0rc1` "
+"之后）中的结果，仅供参考。"

 #: ../../source/tutorials/models/GLM4.x.md:87
 msgid "dataset"
@@ -389,111 +423,111 @@ msgstr "MATH500"
 msgid "98.8"
 msgstr "98.8"

-#: ../../source/tutorials/models/GLM4.x.md:764
+#: ../../source/tutorials/models/GLM4.x.md:759
 msgid "Using Language Model Evaluation Harness"
 msgstr "使用语言模型评估工具"

-#: ../../source/tutorials/models/GLM4.x.md:766
+#: ../../source/tutorials/models/GLM4.x.md:761
 msgid "Not tested yet."
 msgstr "尚未测试。"

-#: ../../source/tutorials/models/GLM4.x.md:768
+#: ../../source/tutorials/models/GLM4.x.md:763
 msgid "Performance"
 msgstr "性能"

-#: ../../source/tutorials/models/GLM4.x.md:772
+#: ../../source/tutorials/models/GLM4.x.md:767
 msgid ""
 "Refer to [Using AISBench for performance "
 "evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
 "performance-evaluation) for details."
 msgstr ""
-"详情请参考[使用AISBench进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
+"详情请参考[使用AISBench进行性能评估](../../developer_guide/evaluation/using_ais_bench.md"
+"#execute-performance-evaluation)。"

-#: ../../source/tutorials/models/GLM4.x.md:774
+#: ../../source/tutorials/models/GLM4.x.md:769
 msgid "Using vLLM Benchmark"
 msgstr "使用vLLM基准测试"

-#: ../../source/tutorials/models/GLM4.x.md:776
+#: ../../source/tutorials/models/GLM4.x.md:771
 msgid "Run performance evaluation of `GLM-4.x` as an example."
 msgstr "以运行 `GLM-4.x` 的性能评估为例。"

-#: ../../source/tutorials/models/GLM4.x.md:778
+#: ../../source/tutorials/models/GLM4.x.md:773
 msgid ""
 "Refer to [vllm benchmark](https://docs.vllm.ai/en/latest/benchmarking/) "
 "for more details."
-msgstr ""
-"更多详情请参考 [vllm基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"
+msgstr "更多详情请参考 [vllm基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"

-#: ../../source/tutorials/models/GLM4.x.md:780
+#: ../../source/tutorials/models/GLM4.x.md:775
 msgid "There are three `vllm bench` subcommands:"
 msgstr "`vllm bench` 包含三个子命令："

-#: ../../source/tutorials/models/GLM4.x.md:782
+#: ../../source/tutorials/models/GLM4.x.md:777
 msgid "`latency`: Benchmark the latency of a single batch of requests."
 msgstr "`latency`：基准测试单批次请求的延迟。"

-#: ../../source/tutorials/models/GLM4.x.md:783
+#: ../../source/tutorials/models/GLM4.x.md:778
 msgid "`serve`: Benchmark the online serving throughput."
 msgstr "`serve`：基准测试在线服务吞吐量。"

-#: ../../source/tutorials/models/GLM4.x.md:784
+#: ../../source/tutorials/models/GLM4.x.md:779
 msgid "`throughput`: Benchmark offline inference throughput."
 msgstr "`throughput`：基准测试离线推理吞吐量。"

-#: ../../source/tutorials/models/GLM4.x.md:786
+#: ../../source/tutorials/models/GLM4.x.md:781
 msgid "Take the `serve` as an example. Run the code as follows."
 msgstr "以 `serve` 为例，运行以下代码。"

-#: ../../source/tutorials/models/GLM4.x.md:808
+#: ../../source/tutorials/models/GLM4.x.md:803
 msgid ""
 "After about several minutes, you can get the performance evaluation "
 "result."
 msgstr "大约几分钟后，您将获得性能评估结果。"

-#: ../../source/tutorials/models/GLM4.x.md:810
+#: ../../source/tutorials/models/GLM4.x.md:805
 msgid "Best Practices"
 msgstr "最佳实践"

-#: ../../source/tutorials/models/GLM4.x.md:812
+#: ../../source/tutorials/models/GLM4.x.md:807
 msgid "In this chapter, we recommend best practices for three scenarios:"
 msgstr "本章节，我们针对三种场景推荐最佳实践："

-#: ../../source/tutorials/models/GLM4.x.md:814
+#: ../../source/tutorials/models/GLM4.x.md:809
 msgid ""
 "Long-context: For long sequences with low concurrency (≤ 4): set `dp1 "
 "tp16`; For long sequences with high concurrency (> 4): set `dp2 tp8`"
-msgstr ""
-"长上下文：对于低并发（≤ 4）的长序列，设置 `dp1 tp16`；对于高并发（> 4）的长序列，设置 `dp2 tp8`"
+msgstr "长上下文：对于低并发（≤ 4）的长序列，设置 `dp1 tp16`；对于高并发（> 4）的长序列，设置 `dp2 tp8`"

-#: ../../source/tutorials/models/GLM4.x.md:815
+#: ../../source/tutorials/models/GLM4.x.md:810
 msgid ""
 "Low-latency: For short sequences with low latency: we recommend setting "
 "`dp2 tp8`"
 msgstr "低延迟：对于需要低延迟的短序列，我们推荐设置 `dp2 tp8`"

-#: ../../source/tutorials/models/GLM4.x.md:816
+#: ../../source/tutorials/models/GLM4.x.md:811
 msgid ""
 "High-throughput: For short sequences with high throughput: we also "
 "recommend setting `dp2 tp8`"
 msgstr "高吞吐量：对于需要高吞吐量的短序列，我们也推荐设置 `dp2 tp8`"

-#: ../../source/tutorials/models/GLM4.x.md:818
+#: ../../source/tutorials/models/GLM4.x.md:813
 msgid ""
 "**Notice:** `max-model-len` and `max-num-seqs` need to be set according "
 "to the actual usage scenario. For other settings, please refer to the "
 "**[Deployment](#deployment)** chapter."
 msgstr ""
-"**注意：** `max-model-len` 和 `max-num-seqs` 需要根据实际使用场景进行设置。其他设置请参考 **[部署](#deployment)** 章节。"
+"**注意：** `max-model-len` 和 `max-num-seqs` 需要根据实际使用场景进行设置。其他设置请参考 "
+"**[部署](#deployment)** 章节。"

-#: ../../source/tutorials/models/GLM4.x.md:821
+#: ../../source/tutorials/models/GLM4.x.md:816
 msgid "FAQ"
 msgstr "常见问题"

-#: ../../source/tutorials/models/GLM4.x.md:823
+#: ../../source/tutorials/models/GLM4.x.md:818
 msgid "**Q: Why is the TPOT performance poor in Long-context test?**"
 msgstr "**问：为什么在长上下文测试中TPOT性能不佳？**"

-#: ../../source/tutorials/models/GLM4.x.md:825
+#: ../../source/tutorials/models/GLM4.x.md:820
 msgid ""
 "A: Please ensure that the FIA operator replacement script has been "
 "executed successfully to complete the replacement of FIA operators. Here "
@@ -501,28 +535,28 @@ msgid ""
 "[A2](../../../../tools/install_flash_infer_attention_score_ops_a2.sh) and"
 " [A3](../../../../tools/install_flash_infer_attention_score_ops_a3.sh)"
 msgstr ""
-"答：请确保已成功执行FIA算子替换脚本以完成FIA算子的替换。脚本如下："
-"[A2](../../../../tools/install_flash_infer_attention_score_ops_a2.sh) 和 "
-"[A3](../../../../tools/install_flash_infer_attention_score_ops_a3.sh)"
+"答：请确保已成功执行FIA算子替换脚本以完成FIA算子的替换。脚本如下：[A2](../../../../tools/install_flash_infer_attention_score_ops_a2.sh)"
+" 和 [A3](../../../../tools/install_flash_infer_attention_score_ops_a3.sh)"

-#: ../../source/tutorials/models/GLM4.x.md:827
+#: ../../source/tutorials/models/GLM4.x.md:822
 msgid ""
 "**Q: Startup fails with HCCL port conflicts (address already bound). What"
 " should I do?**"
 msgstr "**问：启动失败，提示HCCL端口冲突（地址已被占用）。我该怎么办？**"

-#: ../../source/tutorials/models/GLM4.x.md:829
+#: ../../source/tutorials/models/GLM4.x.md:824
 msgid "A: Clean up old processes and restart: `pkill -f VLLM*`."
 msgstr "答：清理旧进程并重启：`pkill -f VLLM*`。"

-#: ../../source/tutorials/models/GLM4.x.md:831
+#: ../../source/tutorials/models/GLM4.x.md:826
 msgid "**Q: How to handle OOM or unstable startup?**"
 msgstr "**问：如何处理OOM或启动不稳定的问题？**"

-#: ../../source/tutorials/models/GLM4.x.md:833
+#: ../../source/tutorials/models/GLM4.x.md:828
 msgid ""
 "A: Reduce `--max-num-seqs` and `--max-model-len` first. If needed, reduce"
 " concurrency and load-testing pressure (e.g., `max-concurrency` / `num-"
 "prompts`)."
 msgstr ""
-"答：首先减少 `--max-num-seqs` 和 `--max-model-len`。如有需要，降低并发度和负载测试压力（例如，`max-concurrency` / `num-prompts`）。"
+"答：首先减少 `--max-num-seqs` 和 `--max-model-len`。如有需要，降低并发度和负载测试压力（例如，`max-"
+"concurrency` / `num-prompts`）。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM5.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/GLM5.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -30,10 +30,11 @@ msgstr "简介"
 #: ../../source/tutorials/models/GLM5.md:5
 msgid ""
 "[GLM-5](https://huggingface.co/zai-org/GLM-5) use a Mixture-of-Experts "
-"(MoE) architecture and targeting at complex systems engineering and long-"
+"(MoE) architecture and targets at complex systems engineering and long-"
 "horizon agentic tasks."
 msgstr ""
-"[GLM-5](https://huggingface.co/zai-org/GLM-5) 采用混合专家 (Mixture-of-Experts, MoE) 架构，旨在处理复杂系统工程和长视野智能体任务。"
+"[GLM-5](https://huggingface.co/zai-org/GLM-5) 采用混合专家 (Mixture-of-Experts,"
+" MoE) 架构，旨在处理复杂系统工程和长视野智能体任务。"

 #: ../../source/tutorials/models/GLM5.md:7
 msgid ""
@@ -41,7 +42,8 @@ msgid ""
 "`vllm-ascend:v0.17.0rc1` and `vllm-ascend:v0.18.0rc1` , the version of "
 "transformers need to be upgraded to 5.2.0."
 msgstr ""
-"`GLM-5` 模型首次在 `vllm-ascend:v0.17.0rc1` 版本中得到支持。在 `vllm-ascend:v0.17.0rc1` 和 `vllm-ascend:v0.18.0rc1` 版本中，需要将 transformers 的版本升级到 5.2.0。"
+"`GLM-5` 模型首次在 `vllm-ascend:v0.17.0rc1` 版本中得到支持。在 `vllm-ascend:v0.17.0rc1`"
+" 和 `vllm-ascend:v0.18.0rc1` 版本中，需要将 transformers 的版本升级到 5.2.0。"

 #: ../../source/tutorials/models/GLM5.md:9
 msgid ""
@@ -49,8 +51,7 @@ msgid ""
 "including supported features, feature configuration, environment "
 "preparation, single-node and multi-node deployment, accuracy and "
 "performance evaluation."
-msgstr ""
-"本文档将展示该模型的主要验证步骤，包括支持的特性、特性配置、环境准备、单节点和多节点部署、精度和性能评估。"
+msgstr "本文档将展示该模型的主要验证步骤，包括支持的特性、特性配置、环境准备、单节点和多节点部署、精度和性能评估。"

 #: ../../source/tutorials/models/GLM5.md:11
 msgid "Supported Features"
@@ -61,15 +62,13 @@ msgid ""
 "Refer to [supported "
 "features](../../user_guide/support_matrix/supported_models.md) to get the"
 " model's supported feature matrix."
-msgstr ""
-"请参考[支持的特性](../../user_guide/support_matrix/supported_models.md)以获取模型支持的特性矩阵。"
+msgstr "请参考[支持的特性](../../user_guide/support_matrix/supported_models.md)以获取模型支持的特性矩阵。"

 #: ../../source/tutorials/models/GLM5.md:15
 msgid ""
 "Refer to [feature guide](../../user_guide/feature_guide/index.md) to get "
 "the feature's configuration."
-msgstr ""
-"请参考[特性指南](../../user_guide/feature_guide/index.md)以获取特性的配置方法。"
+msgstr "请参考[特性指南](../../user_guide/feature_guide/index.md)以获取特性的配置方法。"

 #: ../../source/tutorials/models/GLM5.md:17
 msgid "Environment Preparation"
@@ -84,35 +83,34 @@ msgid ""
 "`GLM-5`(BF16 version): [Download model "
 "weight](https://www.modelscope.cn/models/ZhipuAI/GLM-5)."
 msgstr ""
-"`GLM-5` (BF16 版本): [下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-5)。"
+"`GLM-5` (BF16 版本): "
+"[下载模型权重](https://www.modelscope.cn/models/ZhipuAI/GLM-5)。"

 #: ../../source/tutorials/models/GLM5.md:22
 msgid ""
 "`GLM-5-w4a8`: [Download model weight](https://modelscope.cn/models/Eco-"
 "Tech/GLM-5-w4a8)."
-msgstr ""
-"`GLM-5-w4a8`: [下载模型权重](https://modelscope.cn/models/Eco-Tech/GLM-5-w4a8)。"
+msgstr "`GLM-5-w4a8`: [下载模型权重](https://modelscope.cn/models/Eco-Tech/GLM-5-w4a8)。"

 #: ../../source/tutorials/models/GLM5.md:23
 msgid ""
 "`GLM-5-w8a8`: [Download model weight](https://www.modelscope.cn/models"
 "/Eco-Tech/GLM-5-w8a8)."
 msgstr ""
-"`GLM-5-w8a8`: [下载模型权重](https://www.modelscope.cn/models/Eco-Tech/GLM-5-w8a8)。"
+"`GLM-5-w8a8`: [下载模型权重](https://www.modelscope.cn/models/Eco-"
+"Tech/GLM-5-w8a8)。"

 #: ../../source/tutorials/models/GLM5.md:24
 msgid ""
 "You can use [msmodelslim](https://gitcode.com/Ascend/msmodelslim) to "
 "quantify the model naively."
-msgstr ""
-"您可以使用 [msmodelslim](https://gitcode.com/Ascend/msmodelslim) 对模型进行简单的量化。"
+msgstr "您可以使用 [msmodelslim](https://gitcode.com/Ascend/msmodelslim) 对模型进行简单的量化。"

 #: ../../source/tutorials/models/GLM5.md:26
 msgid ""
 "It is recommended to download the model weight to the shared directory of"
 " multiple nodes, such as `/root/.cache/`"
-msgstr ""
-"建议将模型权重下载到多个节点的共享目录中，例如 `/root/.cache/`"
+msgstr "建议将模型权重下载到多个节点的共享目录中，例如 `/root/.cache/`"

 #: ../../source/tutorials/models/GLM5.md:28
 msgid "Installation"
@@ -146,7 +144,8 @@ msgid ""
 "Install `vllm-ascend` from source, refer to "
 "[installation](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)."
 msgstr ""
-"从源码安装 `vllm-ascend`，请参考[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)。"
+"从源码安装 `vllm-"
+"ascend`，请参考[安装指南](https://docs.vllm.ai/projects/ascend/en/latest/installation.html)。"

 #: ../../source/tutorials/models/GLM5.md:123
 msgid ""
@@ -200,7 +199,9 @@ msgid ""
 "optimize inference efficiency. It allows non-blocking task scheduling to "
 "improve concurrency and throughput, especially when processing large-"
 "scale models."
-msgstr "`--async-scheduling` 异步调度是一种用于优化推理效率的技术。它允许非阻塞的任务调度，以提高并发性和吞吐量，尤其是在处理大规模模型时。"
+msgstr ""
+"`--async-scheduling` "
+"异步调度是一种用于优化推理效率的技术。它允许非阻塞的任务调度，以提高并发性和吞吐量，尤其是在处理大规模模型时。"

 #: ../../source/tutorials/models/GLM5.md:254
 msgid "Multi-node Deployment"
@@ -211,7 +212,9 @@ msgid ""
 "If you want to deploy multi-node environment, you need to verify multi-"
 "node communication according to [verify multi-node communication "
 "environment](../../installation.md#verify-multi-node-communication)."
-msgstr "如果您想部署多节点环境，需要根据[验证多节点通信环境](../../installation.md#verify-multi-node-communication)来验证多节点通信。"
+msgstr ""
+"如果您想部署多节点环境，需要根据[验证多节点通信环境](../../installation.md#verify-multi-node-"
+"communication)来验证多节点通信。"

 #: ../../source/tutorials/models/GLM5.md:265
 msgid "`glm-5-bf16`: require at least 2 Atlas 800 A3 (64G × 16)."
@@ -240,7 +243,9 @@ msgid ""
 "For bf16 weight, use this script on each node to enable [Multi Token "
 "Prediction "
 "(MTP)](../../user_guide/feature_guide/Multi_Token_Prediction.md)."
-msgstr "对于 bf16 权重，在每个节点上使用此脚本来启用[多令牌预测 (MTP)](../../user_guide/feature_guide/Multi_Token_Prediction.md)。"
+msgstr ""
+"对于 bf16 权重，在每个节点上使用此脚本来启用[多令牌预测 "
+"(MTP)](../../user_guide/feature_guide/Multi_Token_Prediction.md)。"

 #: ../../source/tutorials/models/GLM5.md:526
 msgid "`glm-5-w8a8`: require 2 Atlas 800 A3 (64G × 16)."
@@ -276,200 +281,221 @@ msgid ""
 "deployment, `layer_sharding` is supported only on prefill/P nodes with "
 "`kv_role=\"kv_producer\"`; do not enable it on decode/D nodes or "
 "`kv_role=\"kv_both\"` nodes."
-msgstr "为了在预填充阶段支持 200k 的上下文窗口，需要在每个预填充节点的 `--additional_config` 中添加参数 `\"layer_sharding\": [\"q_b_proj\"]`。在 PD 解耦部署中，`layer_sharding` 仅在 `kv_role=\"kv_producer\"` 的预填充/P 节点上受支持；不要在解码/D 节点或 `kv_role=\"kv_both\"` 的节点上启用它。"
+msgstr ""
+"为了在预填充阶段支持 200k 的上下文窗口，需要在每个预填充节点的 `--additional_config` 中添加参数 "
+"`\"layer_sharding\": [\"q_b_proj\"]`。在 PD 解耦部署中，`layer_sharding` 仅在 "
+"`kv_role=\"kv_producer\"` 的预填充/P 节点上受支持；不要在解码/D 节点或 `kv_role=\"kv_both\"`"
+" 的节点上启用它。"

 #: ../../source/tutorials/models/GLM5.md:747
-#: ../../source/tutorials/models/GLM5.md:1233
+#: ../../source/tutorials/models/GLM5.md:1231
 msgid "Prefill node 0"
 msgstr "预填充节点 0"

-#: ../../source/tutorials/models/GLM5.md:826
-#: ../../source/tutorials/models/GLM5.md:1240
+#: ../../source/tutorials/models/GLM5.md:825
+#: ../../source/tutorials/models/GLM5.md:1238
 msgid "Prefill node 1"
 msgstr "预填充节点 1"

-#: ../../source/tutorials/models/GLM5.md:906
-#: ../../source/tutorials/models/GLM5.md:1247
+#: ../../source/tutorials/models/GLM5.md:904
+#: ../../source/tutorials/models/GLM5.md:1245
 msgid "Decode node 0"
 msgstr "解码节点 0"

-#: ../../source/tutorials/models/GLM5.md:988
-#: ../../source/tutorials/models/GLM5.md:1254
+#: ../../source/tutorials/models/GLM5.md:986
+#: ../../source/tutorials/models/GLM5.md:1252
 msgid "Decode node 1"
 msgstr "解码节点 1"

-#: ../../source/tutorials/models/GLM5.md:1069
-#: ../../source/tutorials/models/GLM5.md:1261
+#: ../../source/tutorials/models/GLM5.md:1067
+#: ../../source/tutorials/models/GLM5.md:1259
 msgid "Decode node 2"
 msgstr "解码节点 2"

-#: ../../source/tutorials/models/GLM5.md:1150
-#: ../../source/tutorials/models/GLM5.md:1268
+#: ../../source/tutorials/models/GLM5.md:1148
+#: ../../source/tutorials/models/GLM5.md:1266
 msgid "Decode node 3"
 msgstr "解码节点 3"

-#: ../../source/tutorials/models/GLM5.md:1231
+#: ../../source/tutorials/models/GLM5.md:1229
 msgid ""
 "Once the preparation is done, you can start the server with the following"
 " command on each node:"
 msgstr "准备工作完成后，您可以在每个节点上使用以下命令启动服务器："

-#: ../../source/tutorials/models/GLM5.md:1275
+#: ../../source/tutorials/models/GLM5.md:1273
 msgid "Request Forwarding"
 msgstr "请求转发"

-#: ../../source/tutorials/models/GLM5.md:1277
+#: ../../source/tutorials/models/GLM5.md:1275
 msgid ""
 "To set up request forwarding, run the following script on any machine. "
 "You can get the proxy program in the repository's examples: "
 "[load_balance_proxy_server_example.py](https://github.com/vllm-project"
 "/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
-msgstr "要设置请求转发，请在任何机器上运行以下脚本。您可以在仓库的示例中找到代理程序：[load_balance_proxy_server_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
+msgstr ""
+"要设置请求转发，请在任何机器上运行以下脚本。您可以在仓库的示例中找到代理程序：[load_balance_proxy_server_example.py](https://github.com"
+"/vllm-project/vllm-"
+"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"

-#: ../../source/tutorials/models/GLM5.md:1318
+#: ../../source/tutorials/models/GLM5.md:1316
 msgid "**Notice:**"
 msgstr "**注意：**"

-#: ../../source/tutorials/models/GLM5.md:1320
+#: ../../source/tutorials/models/GLM5.md:1318
 msgid "Some configurations for optimization are shown below:"
 msgstr "以下是一些用于优化的配置："

-#: ../../source/tutorials/models/GLM5.md:1322
+#: ../../source/tutorials/models/GLM5.md:1320
 msgid ""
 "`VLLM_ASCEND_ENABLE_FLASHCOMM1`: Enable FlashComm optimization to reduce "
 "communication and computation overhead on prefill node. With FlashComm "
 "enabled, layer_sharding list cannot include o_proj as an element."
-msgstr "`VLLM_ASCEND_ENABLE_FLASHCOMM1`: 启用 FlashComm 优化以减少预填充节点上的通信和计算开销。启用 FlashComm 后，layer_sharding 列表不能包含 o_proj 作为元素。"
+msgstr ""
+"`VLLM_ASCEND_ENABLE_FLASHCOMM1`: 启用 FlashComm 优化以减少预填充节点上的通信和计算开销。启用 "
+"FlashComm 后，layer_sharding 列表不能包含 o_proj 作为元素。"

-#: ../../source/tutorials/models/GLM5.md:1323
+#: ../../source/tutorials/models/GLM5.md:1321
 msgid ""
 "`VLLM_ASCEND_ENABLE_FUSED_MC2`: Enable following fused operators: "
-"dispatch_gmm_combine_decode and dispatch_ffn_combine operator."
-msgstr "`VLLM_ASCEND_ENABLE_FUSED_MC2`: 启用以下融合算子：dispatch_gmm_combine_decode 和 dispatch_ffn_combine 算子。"
+"dispatch_gmm_combine_decode and dispatch_ffn_combine operator. and please"
+" **note** that this environment variable can only be enabled on decode "
+"nodes."
+msgstr ""
+"`VLLM_ASCEND_ENABLE_FUSED_MC2`: 启用以下融合算子：dispatch_gmm_combine_decode 和 "
+"dispatch_ffn_combine 算子。并请**注意**，此环境变量仅可在解码节点上启用。"

-#: ../../source/tutorials/models/GLM5.md:1324
+#: ../../source/tutorials/models/GLM5.md:1322
 msgid "`VLLM_ASCEND_ENABLE_MLAPO`: Enable fused operator MlaPreprocessOperation."
 msgstr "`VLLM_ASCEND_ENABLE_MLAPO`: 启用融合算子 MlaPreprocessOperation。"

-#: ../../source/tutorials/models/GLM5.md:1326
+#: ../../source/tutorials/models/GLM5.md:1324
 msgid ""
 "Please refer to the following python file for further explanation and "
 "restrictions of the environment variables above: "
 "[envs.py](https://github.com/vllm-project/vllm-"
 "ascend/blob/main/vllm_ascend/envs.py)"
-msgstr "有关上述环境变量的进一步解释和限制，请参考以下 python 文件：[envs.py](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/envs.py)"
+msgstr ""
+"有关上述环境变量的进一步解释和限制，请参考以下 python 文件：[envs.py](https://github.com/vllm-"
+"project/vllm-ascend/blob/main/vllm_ascend/envs.py)"

-#: ../../source/tutorials/models/GLM5.md:1328
+#: ../../source/tutorials/models/GLM5.md:1326
 msgid "Functional Verification"
 msgstr "功能验证"

-#: ../../source/tutorials/models/GLM5.md:1330
+#: ../../source/tutorials/models/GLM5.md:1328
 msgid "Once your server is started, you can query the model with input prompts:"
 msgstr "服务器启动后，您可以使用输入提示词查询模型："

-#: ../../source/tutorials/models/GLM5.md:1343
+#: ../../source/tutorials/models/GLM5.md:1341
 msgid "Accuracy Evaluation"
 msgstr "精度评估"

-#: ../../source/tutorials/models/GLM5.md:1345
+#: ../../source/tutorials/models/GLM5.md:1343
 msgid "Here are two accuracy evaluation methods."
 msgstr "以下是两种精度评估方法。"

-#: ../../source/tutorials/models/GLM5.md:1347
-#: ../../source/tutorials/models/GLM5.md:1359
+#: ../../source/tutorials/models/GLM5.md:1345
+#: ../../source/tutorials/models/GLM5.md:1357
 msgid "Using AISBench"
 msgstr "使用AISBench"

-#: ../../source/tutorials/models/GLM5.md:1349
+#: ../../source/tutorials/models/GLM5.md:1347
 msgid ""
 "Refer to [Using "
 "AISBench](../../developer_guide/evaluation/using_ais_bench.md) for "
 "details."
 msgstr "详情请参考[使用AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"

-#: ../../source/tutorials/models/GLM5.md:1351
+#: ../../source/tutorials/models/GLM5.md:1349
 msgid "After execution, you can get the result."
 msgstr "执行后，您将获得结果。"

-#: ../../source/tutorials/models/GLM5.md:1353
+#: ../../source/tutorials/models/GLM5.md:1351
 msgid "Using Language Model Evaluation Harness"
 msgstr "使用Language Model Evaluation Harness"

-#: ../../source/tutorials/models/GLM5.md:1355
+#: ../../source/tutorials/models/GLM5.md:1353
 msgid "Not tested yet."
 msgstr "尚未测试。"

-#: ../../source/tutorials/models/GLM5.md:1357
+#: ../../source/tutorials/models/GLM5.md:1355
 msgid "Performance"
 msgstr "性能"

-#: ../../source/tutorials/models/GLM5.md:1361
+#: ../../source/tutorials/models/GLM5.md:1359
 msgid ""
 "Refer to [Using AISBench for performance "
 "evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
 "performance-evaluation) for details."
-msgstr "详情请参考[使用AISBench进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
+msgstr ""
+"详情请参考[使用AISBench进行性能评估](../../developer_guide/evaluation/using_ais_bench.md"
+"#execute-performance-evaluation)。"

-#: ../../source/tutorials/models/GLM5.md:1363
+#: ../../source/tutorials/models/GLM5.md:1361
 msgid "Using vLLM Benchmark"
 msgstr "使用vLLM基准测试"

-#: ../../source/tutorials/models/GLM5.md:1365
+#: ../../source/tutorials/models/GLM5.md:1363
 msgid ""
 "Refer to [vllm "
 "benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) "
 "for more details."
 msgstr "更多详情请参考[vllm基准测试](https://docs.vllm.ai/en/latest/contributing/benchmarks.html)。"

-#: ../../source/tutorials/models/GLM5.md:1367
+#: ../../source/tutorials/models/GLM5.md:1365
 msgid "Best Practices"
 msgstr "最佳实践"

-#: ../../source/tutorials/models/GLM5.md:1369
+#: ../../source/tutorials/models/GLM5.md:1367
 msgid ""
 "In this chapter, we recommend best practices in prefill-decode "
 "disaggregation scenario with 1P1D architecture using 4 Atlas 800 A3 (64G "
 "× 16):"
 msgstr "本章节，我们推荐在使用4台Atlas 800 A3（64G × 16）的1P1D架构下，预填充-解码分离场景的最佳实践："

-#: ../../source/tutorials/models/GLM5.md:1371
+#: ../../source/tutorials/models/GLM5.md:1369
 msgid ""
 "Low-latency: We recommend setting `dp4 tp8` on prefill nodes and `dp4 "
 "tp8` on decode nodes for low latency situation."
 msgstr "低延迟场景：对于低延迟场景，我们建议在预填充节点上设置`dp4 tp8`，在解码节点上设置`dp4 tp8`。"

-#: ../../source/tutorials/models/GLM5.md:1372
+#: ../../source/tutorials/models/GLM5.md:1370
 msgid ""
 "High-throughput: `dp4 tp8` on prefill nodes and `dp8 tp4` on decode nodes"
 " is recommended for high throughput situation."
 msgstr "高吞吐场景：对于高吞吐场景，建议在预填充节点上设置`dp4 tp8`，在解码节点上设置`dp8 tp4`。"

-#: ../../source/tutorials/models/GLM5.md:1374
+#: ../../source/tutorials/models/GLM5.md:1372
 msgid ""
 "**Notice:** `max-model-len` and `max-num-seqs` need to be set according "
 "to the actual usage scenario. For other settings, please refer to the "
 "**[Deployment](#deployment)** chapter."
-msgstr "**注意：** `max-model-len`和`max-num-seqs`需要根据实际使用场景进行设置。其他设置请参考**[部署](#deployment)**章节。"
+msgstr ""
+"**注意：** `max-model-len`和`max-num-"
+"seqs`需要根据实际使用场景进行设置。其他设置请参考**[部署](#deployment)**章节。"

-#: ../../source/tutorials/models/GLM5.md:1377
+#: ../../source/tutorials/models/GLM5.md:1375
 msgid "FAQ"
 msgstr "常见问题"

-#: ../../source/tutorials/models/GLM5.md:1379
+#: ../../source/tutorials/models/GLM5.md:1377
 msgid ""
 "**Q: How to solve ValueError: Tokenizer class TokenizersBackend does not "
 "exist or is not currently imported?**"
-msgstr "**问：如何解决ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported?**"
+msgstr ""
+"**问：如何解决ValueError: Tokenizer class TokenizersBackend does not exist or "
+"is not currently imported?**"

-#: ../../source/tutorials/models/GLM5.md:1381
+#: ../../source/tutorials/models/GLM5.md:1379
 msgid "A: Please update the version of transformers to 5.2.0"
 msgstr "答：请将transformers版本更新至5.2.0"

-#: ../../source/tutorials/models/GLM5.md:1383
+#: ../../source/tutorials/models/GLM5.md:1381
 msgid "**Q: How to enable function calling for GLM-5?**"
 msgstr "**问：如何为GLM-5启用函数调用功能？**"

-#: ../../source/tutorials/models/GLM5.md:1385
+#: ../../source/tutorials/models/GLM5.md:1383
 msgid "A: Please add following configurations in vLLM startup command"
 msgstr "答：请在vLLM启动命令中添加以下配置"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/PaddleOCR-VL.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/PaddleOCR-VL.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -35,7 +35,9 @@ msgid ""
 "resolution visual encoder with the ERNIE-4.5-0.3B language model to "
 "enable accurate element recognition."
 msgstr ""
-"PaddleOCR-VL 是一款专为文档解析设计的 SOTA 且资源高效的模型。其核心组件是 PaddleOCR-VL-0.9B，一个紧凑而强大的视觉语言模型（VLM），它集成了 NaViT 风格的动态分辨率视觉编码器和 ERNIE-4.5-0.3B 语言模型，以实现精确的元素识别。"
+"PaddleOCR-VL 是一款专为文档解析设计的 SOTA 且资源高效的模型。其核心组件是 PaddleOCR-"
+"VL-0.9B，一个紧凑而强大的视觉语言模型（VLM），它集成了 NaViT 风格的动态分辨率视觉编码器和 ERNIE-4.5-0.3B "
+"语言模型，以实现精确的元素识别。"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:7
 msgid ""
@@ -44,8 +46,7 @@ msgid ""
 "preparation, single-node deployment, and functional verification. It is "
 "designed to help users quickly complete model deployment and "
 "verification."
-msgstr ""
-"本文档提供了完整的模型部署和验证的详细工作流程，包括支持的特性、环境准备、单节点部署和功能验证。旨在帮助用户快速完成模型部署和验证。"
+msgstr "本文档提供了完整的模型部署和验证的详细工作流程，包括支持的特性、环境准备、单节点部署和功能验证。旨在帮助用户快速完成模型部署和验证。"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:9
 msgid "Supported Features"
@@ -56,8 +57,7 @@ msgid ""
 "Refer to [supported "
 "features](../../user_guide/support_matrix/supported_models.md) to get the"
 " model's supported feature matrix."
-msgstr ""
-"请参考[支持的特性](../../user_guide/support_matrix/supported_models.md)以获取模型支持的特性矩阵。"
+msgstr "请参考[支持的特性](../../user_guide/support_matrix/supported_models.md)以获取模型支持的特性矩阵。"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:13
 msgid ""
@@ -78,7 +78,8 @@ msgid ""
 "`PaddleOCR-VL-0.9B`: [PaddleOCR-"
 "VL-0.9B](https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL)"
 msgstr ""
-"`PaddleOCR-VL-0.9B`: [PaddleOCR-VL-0.9B](https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL)"
+"`PaddleOCR-VL-0.9B`: [PaddleOCR-"
+"VL-0.9B](https://www.modelscope.cn/models/PaddlePaddle/PaddleOCR-VL)"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:21
 msgid ""
@@ -99,13 +100,15 @@ msgid ""
 "Select an image based on your machine type and start the docker image on "
 "your node, refer to [using docker](../../installation.md#set-up-using-"
 "docker)."
-msgstr "根据您的机器类型选择镜像并在节点上启动 docker 镜像，请参考[使用 docker](../../installation.md#set-up-using-docker)。"
+msgstr ""
+"根据您的机器类型选择镜像并在节点上启动 docker 镜像，请参考[使用 docker](../../installation.md#set-"
+"up-using-docker)。"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:51
 msgid ""
-"The 310P device is supported from version 0.15.0rc1. You need to select "
-"the corresponding image for installation."
-msgstr "310P 设备从版本 0.15.0rc1 开始支持。您需要选择对应的镜像进行安装。"
+"The Atlas 300 inference products are supported from version 0.15.0rc1. "
+"You need to select the corresponding image for installation."
+msgstr "Atlas 300 推理产品从版本 0.15.0rc1 开始支持。您需要选择对应的镜像进行安装。"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:54
 msgid "Deployment"
@@ -122,8 +125,9 @@ msgstr "单 NPU (PaddleOCR-VL)"
 #: ../../source/tutorials/models/PaddleOCR-VL.md:60
 msgid ""
 "PaddleOCR-VL supports single-node single-card deployment on the 910B4 and"
-" 310P platform. Follow these steps to start the inference service:"
-msgstr "PaddleOCR-VL 支持在 910B4 和 310P 平台上进行单节点单卡部署。请按照以下步骤启动推理服务："
+" Atlas 300 inference products platform. Follow these steps to start the "
+"inference service:"
+msgstr "PaddleOCR-VL 支持在 910B4 和 Atlas 300 推理产品平台上进行单节点单卡部署。请按照以下步骤启动推理服务："

 #: ../../source/tutorials/models/PaddleOCR-VL.md:62
 msgid ""
@@ -144,18 +148,20 @@ msgid "Run the following script to start the vLLM server on single 910B4:"
 msgstr "运行以下脚本在单张 910B4 上启动 vLLM 服务器："

 #: ../../source/tutorials/models/PaddleOCR-VL.md
-msgid "310P"
-msgstr "310P"
+msgid "Atlas 300 inference products"
+msgstr "Atlas 300 推理产品"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:97
-msgid "Run the following script to start the vLLM server on single 310P:"
-msgstr "运行以下脚本在单张 310P 上启动 vLLM 服务器："
+msgid ""
+"Run the following script to start the vLLM server on single Atlas 300 "
+"inference products:"
+msgstr "运行以下脚本在单张 Atlas 300 推理产品上启动 vLLM 服务器："

 #: ../../source/tutorials/models/PaddleOCR-VL.md:116
 msgid ""
 "The `--max_model_len` option is added to prevent errors when generating "
-"the attention operator mask on the 310P device."
-msgstr "添加 `--max_model_len` 选项是为了防止在 310P 设备上生成注意力算子掩码时出错。"
+"the attention operator mask on the Atlas 300 inference products."
+msgstr "添加 `--max_model_len` 选项是为了防止在 Atlas 300 推理产品上生成注意力算子掩码时出错。"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:121
 msgid "Multiple NPU (PaddleOCR-VL)"
@@ -204,7 +210,9 @@ msgid ""
 "DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL "
 "model, making it more consistent with the examples provided by the "
 "official PaddlePaddle documentation."
-msgstr "在上面的示例中，我们演示了如何使用 vLLM 推理 PaddleOCR-VL-0.9B 模型。通常，我们还需要集成 PP-DocLayoutV2 模型，以充分发挥 PaddleOCR-VL 模型的能力，使其更符合官方 PaddlePaddle 文档提供的示例。"
+msgstr ""
+"在上面的示例中，我们演示了如何使用 vLLM 推理 PaddleOCR-VL-0.9B 模型。通常，我们还需要集成 PP-DocLayoutV2 "
+"模型，以充分发挥 PaddleOCR-VL 模型的能力，使其更符合官方 PaddlePaddle 文档提供的示例。"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:205
 msgid ""
@@ -230,11 +238,13 @@ msgstr "使用以下命令启动容器："

 #: ../../source/tutorials/models/PaddleOCR-VL.md:235
 msgid ""
-"Install "
+"Install "
 "[PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined)"
-" and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)"
+" and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)"
 msgstr ""
-"安装 [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) 和 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)"
+"安装 "
+"[PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined)"
+" 和 [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:246
 msgid "The OpenCV component may be missing:"
@@ -252,11 +262,14 @@ msgstr "OM 推理"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:264
 msgid ""
-"The 310P device supports only the OM model inference. For details about "
-"the process, see the guide provided in "
+"The Atlas 300 inference products support only the OM model inference. For"
+" details about the process, see the guide provided in "
 "[ModelZoo](https://gitcode.com/Ascend/ModelZoo-"
 "PyTorch/tree/master/ACL_PyTorch/built-in/ocr/PP-DocLayoutV2)."
-msgstr "310P 设备仅支持 OM 模型推理。有关该过程的详细信息，请参阅 [ModelZoo](https://gitcode.com/Ascend/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/ocr/PP-DocLayoutV2) 中提供的指南。"
+msgstr ""
+"Atlas 300 推理产品仅支持 OM 模型推理。有关该过程的详细信息，请参阅 [ModelZoo](https://gitcode.com/Ascend"
+"/ModelZoo-PyTorch/tree/master/ACL_PyTorch/built-in/ocr/PP-DocLayoutV2) "
+"中提供的指南。"

 #: ../../source/tutorials/models/PaddleOCR-VL.md:268
 msgid ""
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen-VL-Dense.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen-VL-Dense.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -51,7 +51,8 @@ msgid ""
 "demonstration, showcasing the `Qwen3-VL-8B-Instruct` model as an example "
 "for single NPU deployment and the `Qwen2.5-VL-32B-Instruct` model as an "
 "example for multi-NPU deployment."
-msgstr "本教程使用 vLLM-Ascend `v0.11.0rc3-a3` 版本进行演示，以 `Qwen3-VL-8B-Instruct` 模型为例展示单NPU部署，以 `Qwen2.5-VL-32B-Instruct` 模型为例展示多NPU部署。"
+msgstr ""
+"本教程使用 vLLM-Ascend `v0.11.0rc3-a3` 版本进行演示，以 `Qwen3-VL-8B-Instruct` 模型为例展示单NPU部署，以 `Qwen2.5-VL-32B-Instruct` 模型为例展示多NPU部署。"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:11
 msgid "Supported Features"
@@ -86,56 +87,65 @@ msgstr "需要 1 个 Atlas 800I A2 (64G × 8) 节点或 1 个 Atlas 800 A3 (64G
 msgid ""
 "`Qwen2.5-VL-3B-Instruct`: [Download model "
 "weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-3B-Instruct)"
-msgstr "`Qwen2.5-VL-3B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-3B-Instruct)"
+msgstr ""
+"`Qwen2.5-VL-3B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-3B-Instruct)"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:24
 msgid ""
 "`Qwen2.5-VL-7B-Instruct`: [Download model "
 "weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct)"
-msgstr "`Qwen2.5-VL-7B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct)"
+msgstr ""
+"`Qwen2.5-VL-7B-Instruct`: [下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-7B-Instruct)"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:25
 msgid ""
 "`Qwen2.5-VL-32B-Instruct`:[Download model "
 "weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct)"
-msgstr "`Qwen2.5-VL-32B-Instruct`:[下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct)"
+msgstr ""
+"`Qwen2.5-VL-32B-Instruct`:[下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-32B-Instruct)"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:26
 msgid ""
 "`Qwen2.5-VL-72B-Instruct`:[Download model "
 "weight](https://modelscope.cn/models/Qwen/Qwen2.5-VL-72B-Instruct)"
-msgstr "`Qwen2.5-VL-72B-Instruct`:[下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-72B-Instruct)"
+msgstr ""
+"`Qwen2.5-VL-72B-Instruct`:[下载模型权重](https://modelscope.cn/models/Qwen/Qwen2.5-VL-72B-Instruct)"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:27
 msgid ""
 "`Qwen3-VL-2B-Instruct`:   [Download model "
 "weight](https://modelscope.cn/models/Qwen/Qwen3-VL-2B-Instruct)"
-msgstr "`Qwen3-VL-2B-Instruct`:   [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-2B-Instruct)"
+msgstr ""
+"`Qwen3-VL-2B-Instruct`:   [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-2B-Instruct)"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:28
 msgid ""
 "`Qwen3-VL-4B-Instruct`:   [Download model "
 "weight](https://modelscope.cn/models/Qwen/Qwen3-VL-4B-Instruct)"
-msgstr "`Qwen3-VL-4B-Instruct`:   [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-4B-Instruct)"
+msgstr ""
+"`Qwen3-VL-4B-Instruct`:   [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-4B-Instruct)"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:29
 msgid ""
 "`Qwen3-VL-8B-Instruct`:   [Download model "
 "weight](https://modelscope.cn/models/Qwen/Qwen3-VL-8B-Instruct)"
-msgstr "`Qwen3-VL-8B-Instruct`:   [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-8B-Instruct)"
+msgstr ""
+"`Qwen3-VL-8B-Instruct`:   [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-8B-Instruct)"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:30
 msgid ""
 "`Qwen3-VL-32B-Instruct`:  [Download model "
 "weight](https://modelscope.cn/models/Qwen/Qwen3-VL-32B-Instruct)"
-msgstr "`Qwen3-VL-32B-Instruct`:  [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-32B-Instruct)"
+msgstr ""
+"`Qwen3-VL-32B-Instruct`:  [下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-VL-32B-Instruct)"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:32
 msgid ""
 "A sample Qwen2.5-VL quantization script can be found in the modelslim "
 "code repository. [Qwen2.5-VL Quantization Script "
 "Example](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)"
-msgstr "可以在 modelslim 代码仓库中找到 Qwen2.5-VL 的量化脚本示例。[Qwen2.5-VL 量化脚本示例](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)"
+msgstr ""
+"可以在 modelslim 代码仓库中找到 Qwen2.5-VL 的量化脚本示例。[Qwen2.5-VL 量化脚本示例](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/multimodal_vlm/Qwen2.5-VL/README.md)"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:34
 msgid ""
@@ -172,8 +182,7 @@ msgid ""
 "memory. You can find more details "
 "[<u>here</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
 msgstr ""
-"`max_split_size_mb` 可防止原生分配器拆分大于此大小（以 MB 为单位）的内存块。这可以减少内存碎片，并可能使一些临界工作负载在内存耗尽前完成。您可以在"
-"[<u>此处</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
+"`max_split_size_mb` 可防止原生分配器拆分大于此大小（以 MB 为单位）的内存块。这可以减少内存碎片，并可能使一些临界工作负载在内存耗尽前完成。您可以在[<u>此处</u>](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:115
 msgid "Deployment"
@@ -217,10 +226,10 @@ msgid ""
 "Add `--max_model_len` option to avoid ValueError that the Qwen3-VL-8B-"
 "Instruct model's max seq len (256000) is larger than the maximum number "
 "of tokens that can be stored in KV cache. This will differ with different"
-" NPU series based on the HBM size. Please modify the value according to a"
-" suitable value for your NPU series."
+" NPU series based on the on-chip memory size. Please modify the value "
+"according to a suitable value for your NPU series."
 msgstr ""
-"添加 `--max_model_len` 选项以避免 ValueError，该错误提示 Qwen3-VL-8B-Instruct 模型的最大序列长度（256000）大于 KV 缓存可存储的最大令牌数。此值因不同 NPU 系列的 HBM 大小而异。请根据您 NPU 系列的合适值修改此值。"
+"添加 `--max_model_len` 选项以避免 ValueError，该错误提示 Qwen3-VL-8B-Instruct 模型的最大序列长度（256000）大于 KV 缓存可存储的最大令牌数。此值因不同 NPU 系列的片上内存大小而异。请根据您 NPU 系列的合适值修改此值。"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:335
 #: ../../source/tutorials/models/Qwen-VL-Dense.md:422
@@ -253,10 +262,10 @@ msgid ""
 "Add `--max_model_len` option to avoid ValueError that the Qwen2.5-VL-32B-"
 "Instruct model's max_model_len (128000) is larger than the maximum number"
 " of tokens that can be stored in KV cache. This will differ with "
-"different NPU series base on the HBM size. Please modify the value "
-"according to a suitable value for your NPU series."
+"different NPU series base on the on-chip memory size. Please modify the "
+"value according to a suitable value for your NPU series."
 msgstr ""
-"添加 `--max_model_len` 选项以避免 ValueError，该错误提示 Qwen2.5-VL-32B-Instruct 模型的最大模型长度（128000）大于 KV 缓存可存储的最大令牌数。此值因不同 NPU 系列的 HBM 大小而异。请根据您 NPU 系列的合适值修改此值。"
+"添加 `--max_model_len` 选项以避免 ValueError，该错误提示 Qwen2.5-VL-32B-Instruct 模型的最大模型长度（128000）大于 KV 缓存可存储的最大令牌数。此值因不同 NPU 系列的片上内存大小而异。请根据您 NPU 系列的合适值修改此值。"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:468
 msgid "Accuracy Evaluation"
@@ -292,7 +301,8 @@ msgid ""
 "Refer to [Using "
 "lm_eval](../../developer_guide/evaluation/using_lm_eval.md) for more "
 "details on `lm_eval` installation."
-msgstr "有关 `lm_eval` 安装的更多详细信息，请参考[使用 lm_eval](../../developer_guide/evaluation/using_lm_eval.md)。"
+msgstr ""
+"有关 `lm_eval` 安装的更多详细信息，请参考[使用 lm_eval](../../developer_guide/evaluation/using_lm_eval.md)。"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:492
 #: ../../source/tutorials/models/Qwen-VL-Dense.md:523
@@ -315,7 +325,8 @@ msgstr "以 `mmmu_val` 数据集作为测试数据集为例，在离线模式下
 msgid ""
 "After execution, you can get the result, here is the result of `Qwen2.5"
 "-VL-32B-Instruct` in `vllm-ascend:0.11.0rc3` for reference only."
-msgstr "执行后，您将获得结果。以下是 `vllm-ascend:0.11.0rc3` 中 `Qwen2.5-VL-32B-Instruct` 的结果，仅供参考。"
+msgstr ""
+"执行后，您将获得结果。以下是 `vllm-ascend:0.11.0rc3` 中 `Qwen2.5-VL-32B-Instruct` 的结果，仅供参考。"

 #: ../../source/tutorials/models/Qwen-VL-Dense.md:543
 msgid "Performance"
@@ -357,4 +368,4 @@ msgstr "性能评估必须在在线模式下进行。以 `serve` 为例。按如
 msgid ""
 "After about several minutes, you can get the performance evaluation "
 "result."
-msgstr "大约几分钟后，您将获得性能评估结果。"
+msgstr "大约几分钟后，您将获得性能评估结果。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-235B-A22B.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-235B-A22B.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -35,7 +35,8 @@ msgid ""
 "advancements in reasoning, instruction-following, agent capabilities, and"
 " multilingual support."
 msgstr ""
-"Qwen3 是 Qwen 系列最新一代的大语言模型，提供了一套完整的稠密模型和专家混合模型。基于广泛的训练，Qwen3 在推理、指令遵循、智能体能力和多语言支持方面实现了突破性进展。"
+"Qwen3 是 Qwen 系列最新一代的大语言模型，提供了一套完整的稠密模型和专家混合（MoE）模型。基于广泛的训练，Qwen3 "
+"在推理、指令遵循、智能体能力和多语言支持方面实现了突破性进展。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:7
 msgid ""
@@ -80,7 +81,9 @@ msgid ""
 "1 Atlas 800 A2 (64G × 8) node or 2 Atlas 800 A2(32G × 8)nodes. [Download "
 "model weight](https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B)"
 msgstr ""
-"`Qwen3-235B-A22B`(BF16 版本)：需要 1 个 Atlas 800 A3 (64G × 16) 节点、1 个 Atlas 800 A2 (64G × 8) 节点或 2 个 Atlas 800 A2(32G × 8) 节点。[下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B)"
+"`Qwen3-235B-A22B`(BF16 版本)：需要 1 个 Atlas 800 A3 (64G × 16) 节点、1 个 Atlas "
+"800 A2 (64G × 8) 节点或 2 个 Atlas 800 A2(32G × 8) "
+"节点。[下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B)"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:22
 msgid ""
@@ -89,7 +92,10 @@ msgid ""
 "8)nodes. [Download model weight](https://modelscope.cn/models/vllm-"
 "ascend/Qwen3-235B-A22B-W8A8)"
 msgstr ""
-"`Qwen3-235B-A22B-w8a8`(量化版本)：需要 1 个 Atlas 800 A3 (64G × 16) 节点、1 个 Atlas 800 A2 (64G × 8) 节点或 2 个 Atlas 800 A2(32G × 8) 节点。[下载模型权重](https://modelscope.cn/models/vllm-ascend/Qwen3-235B-A22B-W8A8)"
+"`Qwen3-235B-A22B-w8a8`(量化版本)：需要 1 个 Atlas 800 A3 (64G × 16) 节点、1 个 Atlas "
+"800 A2 (64G × 8) 节点或 2 个 Atlas 800 A2(32G × 8) "
+"节点。[下载模型权重](https://modelscope.cn/models/vllm-ascend/Qwen3-235B-A22B-"
+"W8A8)"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:24
 msgid ""
@@ -106,7 +112,9 @@ msgid ""
 "If you want to deploy multi-node environment, you need to verify multi-"
 "node communication according to [verify multi-node communication "
 "environment](../../installation.md#verify-multi-node-communication)."
-msgstr "如果您想部署多节点环境，需要根据[验证多节点通信环境](../../installation.md#verify-multi-node-communication)来验证多节点通信。"
+msgstr ""
+"如果您想部署多节点环境，需要根据[验证多节点通信环境](../../installation.md#verify-multi-node-"
+"communication)来验证多节点通信。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:30
 msgid "Installation"
@@ -121,14 +129,18 @@ msgid ""
 "For example, using images `quay.io/ascend/vllm-ascend:v0.11.0rc2`(for "
 "Atlas 800 A2) and `quay.io/ascend/vllm-ascend:v0.11.0rc2-a3`(for Atlas "
 "800 A3)."
-msgstr "例如，使用镜像 `quay.io/ascend/vllm-ascend:v0.11.0rc2`（适用于 Atlas 800 A2）和 `quay.io/ascend/vllm-ascend:v0.11.0rc2-a3`（适用于 Atlas 800 A3）。"
+msgstr ""
+"例如，使用镜像 `quay.io/ascend/vllm-ascend:v0.11.0rc2`（适用于 Atlas 800 A2）和 "
+"`quay.io/ascend/vllm-ascend:v0.11.0rc2-a3`（适用于 Atlas 800 A3）。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:38
 msgid ""
 "Select an image based on your machine type and start the docker image on "
 "your node, refer to [using docker](../../installation.md#set-up-using-"
 "docker)."
-msgstr "根据您的机器类型选择镜像并在节点上启动 Docker 容器，请参考[使用 Docker](../../installation.md#set-up-using-docker)。"
+msgstr ""
+"根据您的机器类型选择镜像并在节点上启动 Docker 容器，请参考[使用 Docker](../../installation.md#set-"
+"up-using-docker)。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md
 msgid "Build from source"
@@ -142,7 +154,9 @@ msgstr "您可以从源码构建所有组件。"
 msgid ""
 "Install `vllm-ascend`, refer to [set up using "
 "python](../../installation.md#set-up-using-python)."
-msgstr "安装 `vllm-ascend`，请参考[使用 Python 设置](../../installation.md#set-up-using-python)。"
+msgstr ""
+"安装 `vllm-ascend`，请参考[使用 Python 设置](../../installation.md#set-up-using-"
+"python)。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:84
 msgid ""
@@ -163,7 +177,10 @@ msgid ""
 "`Qwen3-235B-A22B` and `Qwen3-235B-A22B-w8a8` can both be deployed on 1 "
 "Atlas 800 A3(64G*16), 1 Atlas 800 A2(64G*8). Quantized version need to "
 "start with parameter `--quantization ascend`."
-msgstr "`Qwen3-235B-A22B` 和 `Qwen3-235B-A22B-w8a8` 都可以部署在 1 个 Atlas 800 A3(64G*16) 或 1 个 Atlas 800 A2(64G*8) 上。量化版本需要使用参数 `--quantization ascend` 启动。"
+msgstr ""
+"`Qwen3-235B-A22B` 和 `Qwen3-235B-A22B-w8a8` 都可以部署在 1 个 Atlas 800 "
+"A3(64G*16) 或 1 个 Atlas 800 A2(64G*8) 上。量化版本需要使用参数 `--quantization ascend`"
+" 启动。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:93
 msgid "Run the following script to execute online 128k inference."
@@ -181,7 +198,10 @@ msgid ""
 "quantization weights to run long seqs (such as 128k context), it is "
 "required to use yarn rope-scaling technique."
 msgstr ""
-"[Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B#processing-long-texts) 原本仅支持 40960 上下文长度（max_position_embeddings）。如果您想使用它及其相关的量化权重来运行长序列（例如 128k 上下文），需要使用 yarn rope-scaling 技术。"
+"[Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B#processing-"
+"long-texts) 原本仅支持 40960 "
+"上下文长度（max_position_embeddings）。如果您想使用它及其相关的量化权重来运行长序列（例如 128k 上下文），需要使用 "
+"yarn rope-scaling 技术。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:129
 #, python-brace-format
@@ -192,7 +212,8 @@ msgid ""
 " \\`."
 msgstr ""
 "对于 `v0.12.0` 及以上版本的 vLLM，使用参数：`--hf-overrides '{\"rope_parameters\": "
-"{\"rope_type\":\"yarn\",\"rope_theta\":1000000,\"factor\":4,\"original_max_position_embeddings\":32768}}' \\`。"
+"{\"rope_type\":\"yarn\",\"rope_theta\":1000000,\"factor\":4,\"original_max_position_embeddings\":32768}}'"
+" \\`。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:130
 #, python-brace-format
@@ -205,7 +226,10 @@ msgid ""
 "parameter."
 msgstr ""
 "对于 `v0.12.0` 以下版本的 vLLM，使用参数：`--rope_scaling "
-"'{\"rope_type\":\"yarn\",\"factor\":4,\"original_max_position_embeddings\":32768}' \\`。如果您使用的是像 [Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507) 这样原本就支持长上下文的权重，则无需添加此参数。"
+"'{\"rope_type\":\"yarn\",\"factor\":4,\"original_max_position_embeddings\":32768}'"
+" \\`。如果您使用的是像 [Qwen3-235B-A22B-"
+"Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)"
+" 这样原本就支持长上下文的权重，则无需添加此参数。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:133
 msgid "The parameters are explained as follows:"
@@ -215,7 +239,9 @@ msgstr "参数解释如下："
 msgid ""
 "`--data-parallel-size` 1 and `--tensor-parallel-size` 8 are common "
 "settings for data parallelism (DP) and tensor parallelism (TP) sizes."
-msgstr "`--data-parallel-size` 1 和 `--tensor-parallel-size` 8 是数据并行（DP）和张量并行（TP）大小的常见设置。"
+msgstr ""
+"`--data-parallel-size` 1 和 `--tensor-parallel-size` 8 "
+"是数据并行（DP）和张量并行（TP）大小的常见设置。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:136
 msgid ""
@@ -233,21 +259,28 @@ msgid ""
 "testing performance, it is generally recommended that `--max-num-seqs` * "
 "`--data-parallel-size` >= the actual total concurrency."
 msgstr ""
-"`--max-num-seqs` 表示每个 DP 组允许处理的最大请求数。如果发送到服务的请求数超过此限制，超出的请求将保持在等待状态，不会被调度。请注意，在等待状态所花费的时间也会计入 TTFT 和 TPOT 等指标。因此，在测试性能时，通常建议 `--max-num-seqs` * `--data-parallel-size` >= 实际总并发数。"
+"`--max-num-seqs` 表示每个 DP "
+"组允许处理的最大请求数。如果发送到服务的请求数超过此限制，超出的请求将保持在等待状态，不会被调度。请注意，在等待状态所花费的时间也会计入 TTFT"
+" 和 TPOT 等指标。因此，在测试性能时，通常建议 `--max-num-seqs` * `--data-parallel-size` >= "
+"实际总并发数。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:138
 msgid ""
 "`--max-num-batched-tokens` represents the maximum number of tokens that "
 "the model can process in a single step. Currently, vLLM v1 scheduling "
 "enables ChunkPrefill/SplitFuse by default, which means:"
-msgstr "`--max-num-batched-tokens` 表示模型在单步中可以处理的最大 token 数。目前，vLLM v1 调度默认启用 ChunkPrefill/SplitFuse，这意味着："
+msgstr ""
+"`--max-num-batched-tokens` 表示模型在单步中可以处理的最大 token 数。目前，vLLM v1 调度默认启用 "
+"ChunkPrefill/SplitFuse，这意味着："

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:139
 msgid ""
 "(1) If the input length of a request is greater than `--max-num-batched-"
 "tokens`, it will be divided into multiple rounds of computation according"
 " to `--max-num-batched-tokens`;"
-msgstr "(1) 如果一个请求的输入长度大于 `--max-num-batched-tokens`，它将根据 `--max-num-batched-tokens` 被分成多轮计算；"
+msgstr ""
+"(1) 如果一个请求的输入长度大于 `--max-num-batched-tokens`，它将根据 `--max-num-batched-"
+"tokens` 被分成多轮计算；"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:140
 msgid ""
@@ -277,14 +310,21 @@ msgid ""
 "memory-utilization` too high may lead to OOM (Out of Memory) issues "
 "during actual inference. The default value is `0.9`."
 msgstr ""
-"`--gpu-memory-utilization` 表示 vLLM 将用于实际推理的 HBM 比例。其核心功能是计算可用的 kv_cache 大小。在预热阶段（在 vLLM 中称为 profile run），vLLM 会记录输入大小为 `--max-num-batched-tokens` 的推理过程中的峰值 GPU 内存使用量。然后，可用的 kv_cache 大小计算为：`--gpu-memory-utilization` * HBM 大小 - 峰值 GPU 内存使用量。因此，`--gpu-memory-utilization` 的值越大，可以使用的 kv_cache 就越多。然而，由于预热阶段的 GPU 内存使用量可能与实际推理期间不同（例如，由于 EP 负载不均），将 `--gpu-memory-utilization` 设置得过高可能会导致实际推理期间出现 OOM（内存不足）问题。默认值为 `0.9`。"
+"`--gpu-memory-utilization` 表示 vLLM 将用于实际推理的 HBM 比例。其核心功能是计算可用的 kv_cache "
+"大小。在预热阶段（在 vLLM 中称为 profile run），vLLM 会记录输入大小为 `--max-num-batched-tokens`"
+" 的推理过程中的峰值 GPU 内存使用量。然后，可用的 kv_cache 大小计算为：`--gpu-memory-utilization` * "
+"HBM 大小 - 峰值 GPU 内存使用量。因此，`--gpu-memory-utilization` 的值越大，可以使用的 kv_cache "
+"就越多。然而，由于预热阶段的 GPU 内存使用量可能与实际推理期间不同（例如，由于 EP 负载不均），将 `--gpu-memory-"
+"utilization` 设置得过高可能会导致实际推理期间出现 OOM（内存不足）问题。默认值为 `0.9`。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:143
 msgid ""
 "`--enable-expert-parallel` indicates that EP is enabled. Note that vLLM "
 "does not support a mixed approach of ETP and EP; that is, MoE can either "
 "use pure EP or pure TP."
-msgstr "`--enable-expert-parallel` 表示启用了 EP。请注意，vLLM 不支持 ETP 和 EP 的混合方法；也就是说，MoE 可以使用纯 EP 或纯 TP。"
+msgstr ""
+"`--enable-expert-parallel` 表示启用了 EP。请注意，vLLM 不支持 ETP 和 EP 的混合方法；也就是说，MoE "
+"可以使用纯 EP 或纯 TP。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:144
 msgid ""
@@ -308,7 +348,10 @@ msgid ""
 "mainly used to reduce the cost of operator dispatch. Currently, "
 "\"FULL_DECODE_ONLY\" is recommended."
 msgstr ""
-"`--compilation-config` 包含与 aclgraph 图模式相关的配置。最重要的配置是 \"cudagraph_mode\" 和 \"cudagraph_capture_sizes\"，其含义如下：\"cudagraph_mode\"：表示特定的图模式。目前支持 \"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 \"FULL_DECODE_ONLY\"。"
+"`--compilation-config` 包含与 aclgraph 图模式相关的配置。最重要的配置是 \"cudagraph_mode\" 和"
+" \"cudagraph_capture_sizes\"，其含义如下：\"cudagraph_mode\"：表示特定的图模式。目前支持 "
+"\"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 "
+"\"FULL_DECODE_ONLY\"。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:148
 msgid ""
@@ -319,14 +362,18 @@ msgid ""
 "Currently, the default setting is recommended. Only in some scenarios is "
 "it necessary to set this separately to achieve optimal performance."
 msgstr ""
-"\"cudagraph_capture_sizes\"：表示不同级别的图模式。默认值为 [1, 2, 4, 8, 16, 24, 32, 40,..., `--max-num-seqs`]。在图模式下，不同级别图的输入是固定的，级别之间的输入会自动填充到下一个级别。目前推荐使用默认设置。只有在某些场景下，才需要单独设置此参数以达到最佳性能。"
+"\"cudagraph_capture_sizes\"：表示不同级别的图模式。默认值为 [1, 2, 4, 8, 16, 24, 32, "
+"40,..., `--max-num-"
+"seqs`]。在图模式下，不同级别图的输入是固定的，级别之间的输入会自动填充到下一个级别。目前推荐使用默认设置。只有在某些场景下，才需要单独设置此参数以达到最佳性能。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:149
 msgid ""
 "`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` indicates that Flashcomm1 "
 "optimization is enabled. Currently, this optimization is only supported "
 "for MoE in scenarios where tp_size > 1."
-msgstr "`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` 表示启用了 Flashcomm1 优化。目前，此优化仅在 tp_size > 1 的场景下对 MoE 支持。"
+msgstr ""
+"`export VLLM_ASCEND_ENABLE_FLASHCOMM1=1` 表示启用了 Flashcomm1 优化。目前，此优化仅在 "
+"tp_size > 1 的场景下对 MoE 支持。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:151
 msgid "Multi-node Deployment with MP (Recommended)"
@@ -336,7 +383,9 @@ msgstr "使用 MP 进行多节点部署（推荐）"
 msgid ""
 "Assume you have Atlas 800 A3 (64G*16) nodes (or 2* A2), and want to "
 "deploy the `Qwen3-VL-235B-A22B-Instruct` model across multiple nodes."
-msgstr "假设您有 Atlas 800 A3 (64G*16) 节点（或 2* A2），并希望跨多个节点部署 `Qwen3-VL-235B-A22B-Instruct` 模型。"
+msgstr ""
+"假设您有 Atlas 800 A3 (64G*16) 节点（或 2* A2），并希望跨多个节点部署 `Qwen3-VL-235B-A22B-"
+"Instruct` 模型。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:155
 msgid "Node 0"
@@ -368,7 +417,9 @@ msgstr "预填充-解码分离"
 msgid ""
 "refer to [Prefill-Decode Disaggregation Mooncake Verification "
 "(Qwen)](../features/pd_disaggregation_mooncake_multi_node.md)"
-msgstr "请参阅 [Prefill-Decode 分离部署 Mooncake 验证 (Qwen)](../features/pd_disaggregation_mooncake_multi_node.md)"
+msgstr ""
+"请参阅 [Prefill-Decode 分离部署 Mooncake 验证 "
+"(Qwen)](../features/pd_disaggregation_mooncake_multi_node.md)"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:262
 msgid "Functional Verification"
@@ -453,7 +504,10 @@ msgid ""
 "Refer to [Using AISBench for performance "
 "evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
 "performance-evaluation) for details."
-msgstr "详情请参阅 [使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
+msgstr ""
+"详情请参阅 [使用 AISBench "
+"进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-"
+"performance-evaluation)。"

 #: ../../source/tutorials/models/Qwen3-235B-A22B.md:297
 msgid "Using vLLM Benchmark"
@@ -542,13 +596,13 @@ msgstr "单节点 A3 (64G*16)"
 msgid "Example server scripts:"
 msgstr "服务器脚本示例："

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:368
-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:597
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:367
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:595
 msgid "Benchmark scripts:"
 msgstr "基准测试脚本："

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:384
-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:613
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:383
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:611
 msgid "Reference test results:"
 msgstr "参考测试结果："

@@ -592,48 +646,53 @@ msgstr "48.69"
 msgid "2761.72"
 msgstr "2761.72"

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:390
-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:619
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:389
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:617
 msgid "Note:"
 msgstr "注意："

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:392
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:391
 msgid ""
 "Setting `export VLLM_ASCEND_ENABLE_FUSED_MC2=1` enables MoE fused "
-"operators that reduce time consumption of MoE in both prefill and decode."
-" This is an experimental feature which only supports W8A8 quantization on"
-" Atlas A3 servers now. If you encounter any problems when using this "
-"feature, you can disable it by setting `export "
-"VLLM_ASCEND_ENABLE_FUSED_MC2=0` and update issues in vLLM-Ascend "
-"community."
-msgstr "设置 `export VLLM_ASCEND_ENABLE_FUSED_MC2=1` 可启用 MoE 融合算子，以减少预填充和解码阶段 MoE 的时间消耗。这是一个实验性功能，目前仅支持 Atlas A3 服务器上的 W8A8 量化。如果您在使用此功能时遇到任何问题，可以通过设置 `export VLLM_ASCEND_ENABLE_FUSED_MC2=0` 来禁用它，并在 vLLM-Ascend 社区更新问题。"
+"operators that reduce time consumption of MoE in decode. This is an "
+"experimental feature which only supports W8A8 quantization on Atlas A3 "
+"servers now. If you encounter any problems when using this feature, you "
+"can disable it by setting `export VLLM_ASCEND_ENABLE_FUSED_MC2=0` and "
+"update issues in vLLM-Ascend community. **Note** that this environment "
+"variable can only be enabled on decode nodes."
+msgstr ""
+"设置 `export VLLM_ASCEND_ENABLE_FUSED_MC2=1` 可启用 MoE 融合算子，以减少解码阶段 MoE "
+"的时间消耗。这是一个实验性功能，目前仅支持 Atlas A3 服务器上的 W8A8 量化。如果您在使用此功能时遇到任何问题，可以通过设置 "
+"`export VLLM_ASCEND_ENABLE_FUSED_MC2=0` 来禁用它，并在 vLLM-Ascend 社区更新问题。**注意**，此环境变量只能在解码节点上启用。"

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:393
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:392
 msgid ""
 "Here we disable prefix cache because of random datasets. You can enable "
 "prefix cache if requests have long common prefix."
 msgstr "由于使用随机数据集，此处我们禁用了前缀缓存。如果请求具有较长的公共前缀，您可以启用前缀缓存。"

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:395
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:394
 msgid "Three Node A3 -- PD disaggregation"
 msgstr "三节点 A3 -- PD 分离部署"

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:397
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:396
 msgid ""
 "On three Atlas 800 A3(64G*16) server, we recommend to use one node as one"
 " prefill instance and two nodes as one decode instance. Example server "
 "scripts: Prefill Node 1"
-msgstr "在三台 Atlas 800 A3(64G*16) 服务器上，我们建议使用一个节点作为一个预填充实例，两个节点作为一个解码实例。服务器脚本示例：预填充节点 1"
+msgstr ""
+"在三台 Atlas 800 A3(64G*16) "
+"服务器上，我们建议使用一个节点作为一个预填充实例，两个节点作为一个解码实例。服务器脚本示例：预填充节点 1"

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:462
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:460
 msgid "Decode Node 1"
 msgstr "解码节点 1"

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:526
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:524
 msgid "Decode Node 2"
 msgstr "解码节点 2"

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:591
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:589
 msgid "PD proxy:"
 msgstr "PD 代理："

@@ -657,9 +716,13 @@ msgstr "52.07"
 msgid "8593.44"
 msgstr "8593.44"

-#: ../../source/tutorials/models/Qwen3-235B-A22B.md:621
+#: ../../source/tutorials/models/Qwen3-235B-A22B.md:619
 msgid ""
 "We recommend to set `export VLLM_ASCEND_ENABLE_FUSED_MC2=2` on this "
 "scenario (typically EP32 for Qwen3-235B). This enables a different MoE "
-"fusion operator."
-msgstr "在此场景下（通常 Qwen3-235B 使用 EP32），我们建议设置 `export VLLM_ASCEND_ENABLE_FUSED_MC2=2`。这将启用一个不同的 MoE 融合算子。"
+"fusion operator. **Note** that this environment variable can only be "
+"enabled on decode nodes."
+msgstr ""
+"在此场景下（通常 Qwen3-235B 使用 EP32），我们建议设置 `export "
+"VLLM_ASCEND_ENABLE_FUSED_MC2=2`。这将启用一个不同的 MoE 融合算子。"
+"**注意**：此环境变量只能在解码节点上启用。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-15 09:41+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -29,17 +29,15 @@ msgstr "简介"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:5
 msgid ""
-"Qwen3-Omni is the natively end-to-end multilingual omni-modal foundation "
-"models. It processes text, images, audio, and video, and delivers real-"
+"Qwen3-Omni is a native end-to-end multilingual omni-modal foundation "
+"model. It processes text, images, audio, and video, and delivers real-"
 "time streaming responses in both text and natural speech. We introduce "
 "several architectural upgrades to improve performance and efficiency. The"
-" Thinking model of Qwen3-Omni-30B-A3B, containing the thinker component, "
-"equipped with chain-of-thought reasoning, supporting audio, video, and "
-"text input, with text output."
+" Thinking model of Qwen3-Omni-30B-A3B, which contains the thinker "
+"component, is equipped with chain-of-thought reasoning and supports "
+"audio, video, and text input, with text output."
 msgstr ""
-"Qwen3-Omni "
-"是原生端到端多语言全模态基础模型。它能处理文本、图像、音频和视频，并以文本和自然语音形式提供实时流式响应。我们引入了多项架构升级以提升性能和效率。Qwen3"
-"-Omni-30B-A3B 的 Thinking 模型包含思考器组件，具备思维链推理能力，支持音频、视频和文本输入，输出为文本。"
+"Qwen3-Omni 是原生端到端多语言全模态基础模型。它能处理文本、图像、音频和视频，并以文本和自然语音形式提供实时流式响应。我们引入了多项架构升级以提升性能和效率。Qwen3-Omni-30B-A3B 的 Thinking 模型包含思考器组件，具备思维链推理能力，支持音频、视频和文本输入，输出为文本。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:7
 msgid ""
@@ -54,21 +52,19 @@ msgstr "支持的功能"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:11
 msgid ""
-"Refer to [supported features](https://docs.vllm.ai/projects/ascend/zh-"
-"cn/latest/user_guide/support_matrix/supported_models.html) to get the "
+"Refer to [supported features](https://docs.vllm.ai/projects/ascend/zh-"
+"cn/latest/user_guide/support_matrix/supported_models.html) to get the "
 "model's supported feature matrix."
 msgstr ""
-"请参考 [支持的功能](https://docs.vllm.ai/projects/ascend/zh-"
-"cn/latest/user_guide/support_matrix/supported_models.html) 以获取模型支持的功能矩阵。"
+"请参考 [支持的功能](https://docs.vllm.ai/projects/ascend/zh-cn/latest/user_guide/support_matrix/supported_models.html) 以获取模型支持的功能矩阵。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:13
 msgid ""
-"Refer to [feature guide](https://docs.vllm.ai/projects/ascend/zh-"
-"cn/latest/user_guide/feature_guide/index.html) to get the feature's "
+"Refer to [feature guide](https://docs.vllm.ai/projects/ascend/zh-"
+"cn/latest/user_guide/feature_guide/index.html) to get the feature's "
 "configuration."
 msgstr ""
-"请参考 [功能指南](https://docs.vllm.ai/projects/ascend/zh-"
-"cn/latest/user_guide/feature_guide/index.html) 以获取功能的配置信息。"
+"请参考 [功能指南](https://docs.vllm.ai/projects/ascend/zh-cn/latest/user_guide/feature_guide/index.html) 以获取功能的配置信息。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:15
 msgid "Environment Preparation"
@@ -83,17 +79,15 @@ msgid ""
 "`Qwen3-Omni-30B-A3B-Thinking` requires 2 NPU Cards (64G × 2).[Download "
 "model weight](https://modelscope.cn/models/Qwen/Qwen3-Omni-30B-A3B-"
 "Thinking) It is recommended to download the model weight to the shared "
-"directory of multiple nodes, such as `/root/.cache/`"
+"directory of multiple nodes, such as `/root/.cache/`"
 msgstr ""
-"`Qwen3-Omni-30B-A3B-Thinking` 需要 2 张 NPU 卡 (64G × "
-"2)。[下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-Omni-30B-A3B-"
-"Thinking)。建议将模型权重下载到多节点的共享目录，例如 `/root/.cache/`。"
+"`Qwen3-Omni-30B-A3B-Thinking` 需要 2 张 NPU 卡 (64G × 2)。[下载模型权重](https://modelscope.cn/models/Qwen/Qwen3-Omni-30B-A3B-Thinking)。建议将模型权重下载到多节点的共享目录，例如 `/root/.cache/`。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:22
 msgid "Installation"
 msgstr "安装"

-#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:24
+#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md
 msgid "Use docker image"
 msgstr "使用 Docker 镜像"

@@ -109,10 +103,9 @@ msgid ""
 "your node, refer to [using docker](../../installation.md#set-up-using-"
 "docker)."
 msgstr ""
-"根据您的机器类型选择镜像并在节点上启动 Docker 镜像，请参考 [使用 Docker](../../installation.md#set-"
-"up-using-docker)。"
+"根据您的机器类型选择镜像并在节点上启动 Docker 镜像，请参考 [使用 Docker](../../installation.md#set-up-using-docker)。"

-#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:32
+#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md
 msgid "Build from source"
 msgstr "从源码构建"

@@ -125,8 +118,7 @@ msgid ""
 "Install `vllm-ascend`, refer to [set up using "
 "python](../../installation.md#set-up-using-python)."
 msgstr ""
-"安装 `vllm-ascend`，请参考 [使用 Python 设置](../../installation.md#set-up-using-"
-"python)。"
+"安装 `vllm-ascend`，请参考 [使用 Python 设置](../../installation.md#set-up-using-python)。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:71
 msgid "Please install system dependencies"
@@ -159,8 +151,7 @@ msgid ""
 " least 1, and for 32 GB of memory, tensor-parallel-size should be at "
 "least 2."
 msgstr ""
-"运行以下脚本在多 NPU 上启动 vLLM 服务器：对于具有 64 GB NPU 卡内存的 Atlas A2，tensor-parallel-"
-"size 应至少为 1；对于 32 GB 内存，tensor-parallel-size 应至少为 2。"
+"运行以下脚本在多 NPU 上启动 vLLM 服务器：对于具有 64 GB NPU 卡内存的 Atlas A2，tensor-parallel-size 应至少为 1；对于 32 GB 内存，tensor-parallel-size 应至少为 2。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:188
 msgid "Functional Verification"
@@ -188,8 +179,7 @@ msgid ""
 "dataset, and run accuracy evaluation of `Qwen3-Omni-30B-A3B-Thinking` in "
 "online mode."
 msgstr ""
-"以 `gsm8k`、`omnibench`、`bbh` 数据集作为测试数据集为例，在在线模式下运行 `Qwen3-Omni-30B-A3B-"
-"Thinking` 的精度评估。"
+"以 `gsm8k`、`omnibench`、`bbh` 数据集作为测试数据集为例，在在线模式下运行 `Qwen3-Omni-30B-A3B-Thinking` 的精度评估。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:239
 msgid ""
@@ -197,21 +187,19 @@ msgid ""
 "evalscope(<https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_evalscope.html"
 "#install-evalscope-using-pip>) for `evalscope`installation."
 msgstr ""
-"关于 `evalscope` 的安装，请参考使用 evalscope "
-"(<https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_evalscope.html"
-"#install-evalscope-using-pip>)。"
+"关于 `evalscope` 的安装，请参考使用 evalscope (<https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_evalscope.html#install-evalscope-using-pip>)。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:240
 msgid "Run `evalscope` to execute the accuracy evaluation."
 msgstr "运行 `evalscope` 以执行精度评估。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:255
+#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:296
 msgid ""
 "After execution, you can get the result, here is the result of `Qwen3"
 "-Omni-30B-A3B-Thinking` in vllm-ascend:0.13.0rc1 for reference only."
 msgstr ""
-"执行后，您可以获得结果。以下是 `Qwen3-Omni-30B-A3B-Thinking` 在 vllm-ascend:0.13.0rc1 "
-"中的结果，仅供参考。"
+"执行后，您可以获得结果。以下是 `Qwen3-Omni-30B-A3B-Thinking` 在 vllm-ascend:0.13.0rc1 中的结果，仅供参考。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:269
 msgid "Performance"
@@ -228,8 +216,7 @@ msgid ""
 "benchmark](https://docs.vllm.ai/en/latest/benchmarking/) for more "
 "details."
 msgstr ""
-"以运行 `Qwen3-Omni-30B-A3B-Thinking` 的性能评估为例。更多详情请参考 vllm 基准测试。更多详情请参考 [vllm"
-" 基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"
+"以运行 `Qwen3-Omni-30B-A3B-Thinking` 的性能评估为例。更多详情请参考 vllm 基准测试。更多详情请参考 [vllm 基准测试](https://docs.vllm.ai/en/latest/benchmarking/)。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:277
 msgid "There are three `vllm bench` subcommands:"
@@ -249,12 +236,4 @@ msgstr "`throughput`：对离线推理吞吐量进行基准测试。"

 #: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:283
 msgid "Take the `serve` as an example. Run the code as follows."
-msgstr "以 `serve` 为例。按如下方式运行代码。"
-
-#: ../../source/tutorials/models/Qwen3-Omni-30B-A3B-Thinking.md:296
-msgid ""
-"After execution, you can get the result, here is the result of `Qwen3"
-"-Omni-30B-A3B-Thinking` in vllm-ascend:0.13.0rc1 for reference only."
-msgstr ""
-"执行后，您可以获得结果。以下是 `Qwen3-Omni-30B-A3B-Thinking` 在 vllm-ascend:0.13.0rc1 "
-"中的结果，仅供参考。"
+msgstr "以 `serve` 为例。按如下方式运行代码。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3.5-397B-A17B.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-15 09:41+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -118,7 +118,7 @@ msgstr ""
 msgid "Installation"
 msgstr "安装"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:34
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md
 msgid "Use docker image"
 msgstr "使用 Docker 镜像"

@@ -140,7 +140,7 @@ msgstr ""
 "根据您的机器类型选择镜像并在节点上启动 Docker 镜像，请参考[使用 Docker](../../installation.md#set-"
 "up-using-docker)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:76
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md
 msgid "Build from source"
 msgstr "从源码构建"

@@ -185,15 +185,15 @@ msgid ""
 "A3(64G*16)."
 msgstr "在 1 个 Atlas 800 A3(64G*16) 上运行以下脚本以执行在线 128k 推理。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:133
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:132
 msgid "**Notice:**"
 msgstr "**注意：**"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:135
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:134
 msgid "The parameters are explained as follows:"
 msgstr "参数解释如下："

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:137
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:136
 msgid ""
 "`--data-parallel-size` 1 and `--tensor-parallel-size` 16 are common "
 "settings for data parallelism (DP) and tensor parallelism (TP) sizes."
@@ -201,13 +201,13 @@ msgstr ""
 "`--data-parallel-size` 1 和 `--tensor-parallel-size` 16 是数据并行 (DP) 和张量并行 "
 "(TP) 大小的常见设置。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:138
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:137
 msgid ""
 "`--max-model-len` represents the context length, which is the maximum "
 "value of the input plus output for a single request."
 msgstr "`--max-model-len` 表示上下文长度，即单个请求的输入加输出的最大值。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:139
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:138
 msgid ""
 "`--max-num-seqs` indicates the maximum number of requests that each DP "
 "group is allowed to process. If the number of requests sent to the "
@@ -222,7 +222,7 @@ msgstr ""
 " 和 TPOT 等指标。因此，在测试性能时，通常建议 `--max-num-seqs` * `--data-parallel-size` >= "
 "实际总并发数。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:140
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:139
 msgid ""
 "`--max-num-batched-tokens` represents the maximum number of tokens that "
 "the model can process in a single step. Currently, vLLM v1 scheduling "
@@ -231,7 +231,7 @@ msgstr ""
 "`--max-num-batched-tokens` 表示模型单步可以处理的最大 token 数。目前，vLLM v1 调度默认启用 "
 "ChunkPrefill/SplitFuse，这意味着："

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:141
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:140
 msgid ""
 "(1) If the input length of a request is greater than `--max-num-batched-"
 "tokens`, it will be divided into multiple rounds of computation according"
@@ -240,20 +240,20 @@ msgstr ""
 "(1) 如果请求的输入长度大于 `--max-num-batched-tokens`，它将根据 `--max-num-batched-"
 "tokens` 被分成多轮计算；"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:142
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:141
 msgid ""
 "(2) Decode requests are prioritized for scheduling, and prefill requests "
 "are scheduled only if there is available capacity."
 msgstr "(2) 解码请求优先调度，只有在有可用容量时才调度预填充请求。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:143
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:142
 msgid ""
 "Generally, if `--max-num-batched-tokens` is set to a larger value, the "
 "overall latency will be lower, but the pressure on GPU memory (activation"
 " value usage) will be greater."
 msgstr "通常，如果 `--max-num-batched-tokens` 设置得较大，整体延迟会更低，但 GPU 内存（激活值使用）的压力会更大。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:144
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:143
 msgid ""
 "`--gpu-memory-utilization` represents the proportion of HBM that vLLM "
 "will use for actual inference. Its essential function is to calculate the"
@@ -275,7 +275,7 @@ msgstr ""
 "就越多。然而，由于预热阶段的 GPU 内存使用量可能与实际推理时不同（例如，由于 EP 负载不均），将 `--gpu-memory-"
 "utilization` 设置得过高可能导致实际推理时出现 OOM（内存不足）问题。默认值为 `0.9`。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:145
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:144
 msgid ""
 "`--enable-expert-parallel` indicates that EP is enabled. Note that vLLM "
 "does not support a mixed approach of ETP and EP; that is, MoE can either "
@@ -284,7 +284,7 @@ msgstr ""
 "`--enable-expert-parallel` 表示启用了 EP。请注意，vLLM 不支持 ETP 和 EP 的混合方法；也就是说，MoE "
 "要么使用纯 EP，要么使用纯 TP。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:146
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:145
 msgid ""
 "`--no-enable-prefix-caching` indicates that prefix caching is disabled. "
 "To enable it, for mamba-like models Qwen3.5, set `--enable-prefix-"
@@ -298,13 +298,13 @@ msgstr ""
 "的实现可能在调度时导致非常大的 block_size。例如，block_size 可能被调整为 2048，这意味着任何短于 2048 "
 "的前缀将永远不会被缓存。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:147
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:146
 msgid ""
 "`--quantization` \"ascend\" indicates that quantization is used. To "
 "disable quantization, remove this option."
 msgstr "`--quantization` \"ascend\" 表示使用了量化。要禁用量化，请移除此选项。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:148
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:147
 msgid ""
 "`--compilation-config` contains configurations related to the aclgraph "
 "graph mode. The most significant configurations are \"cudagraph_mode\" "
@@ -319,7 +319,7 @@ msgstr ""
 "\"PIECEWISE\" 和 \"FULL_DECODE_ONLY\"。图模式主要用于降低算子调度的开销。目前推荐使用 "
 "\"FULL_DECODE_ONLY\"。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:150
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:149
 msgid ""
 "\"cudagraph_capture_sizes\": represents different levels of graph modes. "
 "The default value is [1, 2, 4, 8, 16, 24, 32, 40,..., `--max-num-seqs`]. "
@@ -332,123 +332,132 @@ msgstr ""
 "40,..., `--max-num-"
 "seqs`]。在图模式下，不同级别图的输入是固定的，级别之间的输入会自动填充到下一个级别。目前推荐使用默认设置。只有在某些场景下，才需要单独设置此参数以达到最佳性能。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:152
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:151
 msgid "Multi-node Deployment with MP (Recommended)"
 msgstr "使用 MP 的多节点部署（推荐）"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:154
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:153
 msgid ""
 "Assume you have 2 Atlas 800 A2 nodes, and want to deploy the `Qwen3.5"
 "-397B-A17B-w8a8-mtp` model across multiple nodes."
 msgstr "假设您有 2 个 Atlas 800 A2 节点，并希望跨多个节点部署 `Qwen3.5-397B-A17B-w8a8-mtp` 模型。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:156
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:155
 msgid "Node 0"
 msgstr "节点 0"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:202
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:201
 msgid "Node1"
 msgstr "节点 1"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:252
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:251
 msgid ""
 "If the service starts successfully, the following information will be "
 "displayed on node 0:"
 msgstr "如果服务启动成功，节点 0 上将显示以下信息："

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:263
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:262
 msgid "Multi-node Deployment with Ray"
 msgstr "使用 Ray 的多节点部署"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:265
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:264
 msgid "refer to [Ray Distributed (Qwen/Qwen3-235B-A22B)](../features/ray.md)."
 msgstr "请参考 [Ray 分布式 (Qwen/Qwen3-235B-A22B)](../features/ray.md)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:267
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:266
 msgid "Prefill-Decode Disaggregation"
 msgstr "预填充-解码解耦"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:269
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:268
 msgid ""
 "We recommend using Mooncake for deployment: "
 "[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)."
-msgstr "我们推荐使用 Mooncake 进行部署：[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)。"
+msgstr ""
+"我们推荐使用 Mooncake "
+"进行部署：[Mooncake](../features/pd_disaggregation_mooncake_multi_node.md)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:271
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:270
 msgid ""
 "Take Atlas 800 A3 (64G × 16) for example, we recommend to deploy 1P1D (3 "
 "nodes) to run Qwen3.5-397B-A17B."
 msgstr "以 Atlas 800 A3 (64G × 16) 为例，我们建议部署 1P1D（3 个节点）来运行 Qwen3.5-397B-A17B。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:273
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:272
 msgid "`Qwen3.5-397B-A17B-w8a8-mtp 1P1D` require 3 Atlas 800 A3 (64G × 16)."
 msgstr "`Qwen3.5-397B-A17B-w8a8-mtp 1P1D` 需要 3 个 Atlas 800 A3 (64G × 16)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:275
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:274
 msgid ""
 "To run the vllm-ascend `Prefill-Decode Disaggregation` service, you need "
 "to deploy `run_p.sh` 、`run_d0.sh` and `run_d1.sh` script on each node and"
 " deploy a `proxy.sh` script on prefill master node to forward requests."
-msgstr "要运行 vllm-ascend `Prefill-Decode Disaggregation` 服务，您需要在每个节点上部署 `run_p.sh`、`run_d0.sh` 和 `run_d1.sh` 脚本，并在预填充主节点上部署一个 `proxy.sh` 脚本来转发请求。"
+msgstr ""
+"要运行 vllm-ascend `Prefill-Decode Disaggregation` 服务，您需要在每个节点上部署 "
+"`run_p.sh`、`run_d0.sh` 和 `run_d1.sh` 脚本，并在预填充主节点上部署一个 `proxy.sh` 脚本来转发请求。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:277
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:276
 msgid "Prefill Node 0 `run_p.sh` script"
 msgstr "预填充节点 0 `run_p.sh` 脚本"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:352
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:350
 msgid "Decode Node 0 `run_d0.sh` script"
 msgstr "解码节点 0 `run_d0.sh` 脚本"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:432
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:430
 msgid "Decode Node 1 `run_d1.sh` script"
 msgstr "解码节点 1 `run_d1.sh` 脚本"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:519
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:517
 msgid "Run the `proxy.sh` script on the prefill master node"
 msgstr "在预填充主节点上运行 `proxy.sh` 脚本"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:521
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:519
 msgid ""
 "Run a proxy server on the same node with the prefiller service instance. "
 "You can get the proxy program in the repository's examples: "
 "[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-"
 "project/vllm-"
 "ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
-msgstr "在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序：[load\\_balance\\_proxy\\_server\\_example.py](https://github.com/vllm-project/vllm-ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"
+msgstr ""
+"在与预填充服务实例相同的节点上运行一个代理服务器。您可以在仓库的示例中找到代理程序：[load\\_balance\\_proxy\\_server\\_example.py](https://github.com"
+"/vllm-project/vllm-"
+"ascend/blob/main/examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py)"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:547
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:545
 msgid "Functional Verification"
 msgstr "功能验证"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:549
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:547
 msgid "Once your server is started, you can query the model with input prompts:"
 msgstr "服务器启动后，您可以使用输入提示词查询模型："

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:562
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:560
 msgid "Accuracy Evaluation"
 msgstr "精度评估"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:564
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:562
 msgid "Here are two accuracy evaluation methods."
 msgstr "以下是两种精度评估方法。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:566
-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:578
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:564
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:576
 msgid "Using AISBench"
 msgstr "使用 AISBench"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:568
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:566
 msgid ""
 "Refer to [Using "
 "AISBench](../../developer_guide/evaluation/using_ais_bench.md) for "
 "details."
 msgstr "详情请参阅[使用 AISBench](../../developer_guide/evaluation/using_ais_bench.md)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:570
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:568
 msgid ""
 "After execution, you can get the result, here is the result of `Qwen3.5"
 "-397B-A17B-w8a8` in `vllm-ascend:v0.17.0rc1` for reference only."
-msgstr "执行后，您可以获得结果，以下是 `vllm-ascend:v0.17.0rc1` 中 `Qwen3.5-397B-A17B-w8a8` 的结果，仅供参考。"
+msgstr ""
+"执行后，您可以获得结果，以下是 `vllm-ascend:v0.17.0rc1` 中 `Qwen3.5-397B-A17B-w8a8` "
+"的结果，仅供参考。"

 #: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:76
 msgid "dataset"
@@ -490,54 +499,74 @@ msgstr "生成"
 msgid "96.74"
 msgstr "96.74"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:576
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:574
 msgid "Performance"
 msgstr "性能"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:580
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:578
 msgid ""
 "Refer to [Using AISBench for performance "
 "evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-"
 "performance-evaluation) for details."
-msgstr "详情请参阅[使用 AISBench 进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation)。"
+msgstr ""
+"详情请参阅[使用 AISBench "
+"进行性能评估](../../developer_guide/evaluation/using_ais_bench.md#execute-"
+"performance-evaluation)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:582
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:580
 msgid "Using vLLM Benchmark"
 msgstr "使用 vLLM Benchmark"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:584
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:582
 msgid "Run performance evaluation of `Qwen3.5-397B-A17B-w8a8` as an example."
 msgstr "以运行 `Qwen3.5-397B-A17B-w8a8` 的性能评估为例。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:586
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:584
 msgid ""
 "Refer to [vllm "
 "benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html) "
 "for more details."
-msgstr "更多详情请参阅 [vllm benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html)。"
+msgstr ""
+"更多详情请参阅 [vllm "
+"benchmark](https://docs.vllm.ai/en/latest/contributing/benchmarks.html)。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:588
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:586
 msgid "There are three `vllm bench` subcommands:"
 msgstr "`vllm bench` 有三个子命令："

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:590
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:588
 msgid "`latency`: Benchmark the latency of a single batch of requests."
 msgstr "`latency`：对单批请求的延迟进行基准测试。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:591
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:589
 msgid "`serve`: Benchmark the online serving throughput."
 msgstr "`serve`：对在线服务吞吐量进行基准测试。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:592
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:590
 msgid "`throughput`: Benchmark offline inference throughput."
 msgstr "`throughput`：对离线推理吞吐量进行基准测试。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:594
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:592
 msgid "Take the `serve` as an example. Run the code as follows."
 msgstr "以 `serve` 为例。运行代码如下。"

-#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:601
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:599
 msgid ""
 "After about several minutes, you can get the performance evaluation "
 "result."
 msgstr "大约几分钟后，您将获得性能评估结果。"
+
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:601
+msgid "Qwen3.5-397B-A17B Known issues"
+msgstr "Qwen3.5-397B-A17B 已知问题"
+
+#: ../../source/tutorials/models/Qwen3.5-397B-A17B.md:603
+msgid ""
+"Issue1: For single-node deployment scenario, when fused_mc2 is enabled, "
+"using multi-DP model deployment may cause garbled or empty outputs after "
+"the model triggers recomputation.When tuning performance by adjusting "
+"model parallelism, ensure that this fused operator is disabled when DP > "
+"1. For PD deployment scenario，D nodes can avoid this problem by enabling "
+"the recompute scheduler."
+msgstr ""
+"问题1：在单节点部署场景下，当启用 fused_mc2 时，使用多 DP 模型部署可能会导致模型触发重计算后输出乱码或为空。在通过调整模型并行度来调优性能时，请确保当 DP > 1 时禁用此融合算子。对于 PD 部署场景，D 节点可以通过启用重计算调度器来避免此问题。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3_embedding.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/models/Qwen3_embedding.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: vllm-ascend \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2026-04-14 09:08+0000\n"
+"POT-Creation-Date: 2026-04-22 08:13+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -37,7 +37,9 @@ msgid ""
 "model with vLLM Ascend. Note that only 0.9.2rc1 and higher versions of "
 "vLLM Ascend support the model."
 msgstr ""
-"Qwen3 Embedding 模型系列是 Qwen 家族最新的专有模型，专为文本嵌入和排序任务设计。它基于 Qwen3 系列的稠密基础模型，提供了多种尺寸（0.6B、4B 和 8B）的全面文本嵌入和重排序模型。本指南描述了如何使用 vLLM Ascend 运行该模型。请注意，只有 vLLM Ascend 0.9.2rc1 及更高版本支持此模型。"
+"Qwen3 Embedding 模型系列是 Qwen 家族最新的专有模型，专为文本嵌入和排序任务设计。它基于 Qwen3 "
+"系列的稠密基础模型，提供了多种尺寸（0.6B、4B 和 8B）的全面文本嵌入和重排序模型。本指南描述了如何使用 vLLM Ascend "
+"运行该模型。请注意，只有 vLLM Ascend 0.9.2rc1 及更高版本支持此模型。"

 #: ../../source/tutorials/models/Qwen3_embedding.md:7
 msgid "Supported Features"
@@ -62,19 +64,25 @@ msgstr "模型权重"
 msgid ""
 "`Qwen3-Embedding-8B` [Download model "
 "weight](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-8B)"
-msgstr "`Qwen3-Embedding-8B` [下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-8B)"
+msgstr ""
+"`Qwen3-Embedding-8B` [下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3"
+"-Embedding-8B)"

 #: ../../source/tutorials/models/Qwen3_embedding.md:16
 msgid ""
 "`Qwen3-Embedding-4B` [Download model "
 "weight](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-4B)"
-msgstr "`Qwen3-Embedding-4B` [下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-4B)"
+msgstr ""
+"`Qwen3-Embedding-4B` [下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3"
+"-Embedding-4B)"

 #: ../../source/tutorials/models/Qwen3_embedding.md:17
 msgid ""
 "`Qwen3-Embedding-0.6B` [Download model "
 "weight](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-0.6B)"
-msgstr "`Qwen3-Embedding-0.6B` [下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-0.6B)"
+msgstr ""
+"`Qwen3-Embedding-0.6B` "
+"[下载模型权重](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-0.6B)"

 #: ../../source/tutorials/models/Qwen3_embedding.md:19
 msgid ""
@@ -96,7 +104,9 @@ msgstr "您可以使用我们的官方 docker 镜像来运行 `Qwen3-Embedding`
 msgid ""
 "Start the docker image on your node, refer to [using "
 "docker](../../installation.md#set-up-using-docker)."
-msgstr "在您的节点上启动 docker 镜像，请参考[使用 docker](../../installation.md#set-up-using-docker)。"
+msgstr ""
+"在您的节点上启动 docker 镜像，请参考[使用 docker](../../installation.md#set-up-using-"
+"docker)。"

 #: ../../source/tutorials/models/Qwen3_embedding.md:27
 msgid ""
@@ -142,10 +152,12 @@ msgstr "性能"

 #: ../../source/tutorials/models/Qwen3_embedding.md:98
 msgid ""
-"Run performance of `Qwen3-Reranker-8B` as an example. Refer to [vllm "
+"Run performance of `Qwen3-Embedding-8B` as an example. Refer to [vllm "
 "benchmark](https://docs.vllm.ai/en/latest/contributing/) for more "
 "details."
-msgstr "以 `Qwen3-Reranker-8B` 的运行性能为例。更多详情请参考 [vllm 基准测试](https://docs.vllm.ai/en/latest/contributing/)。"
+msgstr ""
+"以 `Qwen3-Embedding-8B` 的运行性能为例。更多详情请参考 [vllm "
+"基准测试](https://docs.vllm.ai/en/latest/contributing/)。"

 #: ../../source/tutorials/models/Qwen3_embedding.md:101
 msgid "Take the `serve` as an example. Run the code as follows."