提交vllm0.11.0开发分支

2025-12-10 17:51:24 +08:00
parent deab7dd0b6
commit 7c22d621fb
175 changed files with 31856 additions and 8683 deletions
--- a/.DS_Store
+++ b/.DS_Store
--- a/.gitignore
+++ b/.gitignore
@@ -50,4 +50,5 @@ coverage.xml
 *.mo

 # Sphinx documentation
-docs/_build
+/docs/_build/
+
--- a/.readthedocs.yaml
+++ b/.readthedocs.yaml
@@ -1,16 +0,0 @@
-version: 2
-
-build:
-  os: ubuntu-22.04
-  tools:
-    python: "3.12"
-
-sphinx:
-  configuration: docs/source/conf.py
-  fail_on_warning: false 
-
-formats: []
-
-python:
-  install:
-    - requirements: docs/requirements-docs.txt
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,20 +1,20 @@
 Changelog
-===# Change Chinese to English comments
-The following records all changes worth noting in the project, formatted based on [Keep a Changelog].
+===
+以下记录了项目中所有值得关注的变更内容，其格式基于[Keep a Changelog]。

-This project version follows [Semantic Versioning] and [PEP-440].
+本项目版本遵守[Semantic Versioning]和[PEP-440]。

 [Unreleased]
 ---
 ### Added
- This records new content added
+- 这里记录新添加的内容
 ### Changed
- This records changed content
+- 这里记录变更的内容

 0.1.0 - 2025-08-12
 ---
 ### Added
- Create project
+- 创建项目


 [Unreleased]: http://icode.baidu.com/repos/baidu/hac-aiacc/vllm-kunlun/merge/0.1.0...master
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -1,201 +0,0 @@
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
-   1. Definitions.
-
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-
-   END OF TERMS AND CONDITIONS
-
-   APPENDIX: How to apply the Apache License to your work.
-
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-
-   Copyright [yyyy] [name of copyright owner]
-
-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--- a/README.md
+++ b/README.md
@@ -1,14 +1,18 @@
 ![vLLM Kunlun Logo](vllm_kunlun/patches/vLLM_Kunlun.jpg)

 <p align="center">
-  <a href="https://vllm-kunlun.readthedocs.io/en/latest/"><b>Documentation</b></a> |
-  <a href="https://join.slack.com/t/vllm-kunlun/shared_invite/zt-3iinb8u5z-FcqZKbNNdMJ_32fHmipzvw"><b>slack</b></a> |
+  <a href="./docs/_build/html/documentation.html"><b>Documentation</b></a> |
+  <a href=""><b>Users Forum</b></a> |
+  <a href="join.slack.com/t/vllm-kunlun/shared_invite/zt-3iinb8u5z-FcqZKbNNdMJ_32fHmipzvwjoin.slack.com/t/vllm-kunlun/shared_invite/zt-3iinb8u5z-FcqZKbNNdMJ_32fHmipzvw"><b>slack</b></a> |
 </p>

 ---

 ## Latest News🔥
- [2025/12] Initial release of vLLM Kunlun
+- [2025/11] 
+- [2025/11] 
+- [2025/11] 
+- [2025/11] Initial release of vLLM Kunlun

 ---

@@ -30,28 +34,107 @@ By utilizing the vLLM Kunlun plugin, popular open-source models, including Trans

 ---
 ## Supported Models
+<style>
+  table {
+    width: 100%;
+    border-collapse: collapse;
+    background: white;
+    margin: 20px 0;
+    box-shadow: 0 2px 8px rgba(0, 0, 0, 0.08);
+    border-radius: 8px;
+    overflow: hidden;
+  }
+  
+  th {
+    background: linear-gradient(135deg, #0E7DC6 0%, #0A5BA8 100%);
+    color: white;
+    padding: 14px 12px;
+    text-align: left;
+    font-weight: 600;
+    font-size: 13px;
+    letter-spacing: 0.5px;
+    border: none;
+  }
+  
+  td {
+    padding: 12px;
+    border-bottom: 1px solid #e8e8e8;
+    font-size: 13px;
+    color: #333;
+  }
+  
+  tr:last-child td {
+    border-bottom: none;
+  }
+  
+  tbody tr {
+    transition: background-color 0.2s ease;
+  }
+  
+  tbody tr:hover {
+    background-color: #f5faff;
+  }
+  
+  tbody tr:nth-child(even) {
+    background-color: #fafbfc;
+  }
+  
+  tbody tr:nth-child(even):hover {
+    background-color: #f0f7fc;
+  }
+  
+  .status-support {
+    color: #22863a;
+    font-weight: 600;
+    font-size: 14px;
+  }
+  
+  .status-progress {
+    color: #f6a909;
+    font-weight: 600;
+    font-size: 14px;
+  }
+  
+  .status-coming {
+    color: #999;
+    font-size: 12px;
+    background-color: #f5f5f5;
+    padding: 2px 6px;
+    border-radius: 3px;
+    display: inline-block;
+  }
+  
+  .model-name {
+    font-weight: 500;
+    color: #1e40af;
+  }
+
+  h3 {
+    color: #1e40af;
+    font-size: 16px;
+    margin-top: 30px;
+    margin-bottom: 15px;
+    font-weight: 600;
+  }
+
+  h3:first-of-type {
+    margin-top: 0;
+  }
+</style>

 <h3>Generaltive Models</h3>
 <table>
  <thead>
    <tr>
-      <th width="23%">Model</th>
+      <th width="20%">Model</th>
      <th width="12%">Support</th>
      <th width="15%">Quantization</th>
      <th width="10%">LoRA</th>
      <th width="20%">Piecewise Kunlun Graph</th>
-      <th width="20%">Note</th>
+      <th width="23%">Note</th>
    </tr>
  </thead>
  <tbody>
-    <tr>
-      <td class="model-name">Qwen2/2.5</td>
-      <td class="status-support">✅</td>
-      <td></td>
-      <td class="status-support">✅</td>
-      <td class="status-support">✅</td>
-      <td></td>
-    </tr>
    <tr>
      <td class="model-name">Qwen3</td>
      <td class="status-support">✅</td>
@@ -61,7 +144,7 @@ By utilizing the vLLM Kunlun plugin, popular open-source models, including Trans
      <td></td>
    </tr>
    <tr>
-      <td class="model-name">Qwen3-Moe/Coder</td>
+      <td class="model-name">Qwen3-Moe</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
@@ -69,53 +152,13 @@ By utilizing the vLLM Kunlun plugin, popular open-source models, including Trans
      <td></td>
    </tr>
    <tr>
-      <td class="model-name">QwQ-32B</td>
-      <td class="status-support">✅</td>
-      <td></td>
-      <td></td>
-      <td class="status-support">✅</td>
-      <td></td>
-    </tr>
-    <tr>
-      <td class="model-name">LLama2/3/3.1</td>
-      <td class="status-support">✅</td>
-      <td></td>
-      <td></td>
-      <td class="status-support">✅</td>
-      <td></td>
-    </tr>
-    <tr>
-      <td class="model-name">GLM-4.5/Air</td>
+      <td class="model-name">Qwen3-Next</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td class="status-support">✅</td>
      <td></td>
    </tr>
-    <tr>
-      <td class="model-name">Qwen3next</td>
-      <td class="status-progress">⚠️</td>
-      <td></td>
-      <td></td>
-      <td></td>
-      <td><span class="status-coming">comming soon</span></td>
-    </tr>
-    <tr>
-      <td class="model-name">Gpt oss</td>
-      <td class="status-progress">⚠️</td>
-      <td></td>
-      <td></td>
-      <td></td>
-      <td><span class="status-coming">comming soon</span></td>
-    </tr>
-    <tr>
-      <td class="model-name">Deepseek v3/3.2</td>
-      <td class="status-progress">⚠️</td>
-      <td></td>
-      <td></td>
-      <td></td>
-      <td><span class="status-coming">comming soon</span></td>
-    </tr>
  </tbody>
 </table>

@@ -133,61 +176,13 @@ By utilizing the vLLM Kunlun plugin, popular open-source models, including Trans
  </thead>
  <tbody>
    <tr>
-      <td class="model-name">Qianfan-VL</td>
+      <td class="model-name">Qwen3-VL</td>
      <td class="status-support">✅</td>
      <td></td>
      <td></td>
      <td class="status-support">✅</td>
      <td></td>
    </tr>
-    <tr>
-      <td class="model-name">Qwen2.5VL</td>
-      <td class="status-support">✅</td>
-      <td></td>
-      <td></td>
-      <td class="status-support">✅</td>
-      <td></td>
-    </tr>
-    <tr>
-      <td class="model-name">InternVL2.5/3/3.5</td>
-      <td class="status-support">✅</td>
-      <td></td>
-      <td></td>
-      <td class="status-support">✅</td>
-      <td></td>
-    </tr>
-    <tr>
-      <td class="model-name">InternVL3.5</td>
-      <td class="status-support">✅</td>
-      <td></td>
-      <td></td>
-      <td class="status-support">✅</td>
-      <td></td>
-    </tr>
-    <tr>
-      <td class="model-name">InternS1</td>
-      <td class="status-support">✅</td>
-      <td></td>
-      <td></td>
-      <td class="status-support">✅</td>
-      <td></td>
-    </tr>
-    <tr>
-      <td class="model-name">Qwen2.5 omini</td>
-      <td class="status-progress">⚠️</td>
-      <td></td>
-      <td></td>
-      <td></td>
-      <td><span class="status-coming">comming soon</span></td>
-    </tr>
-    <tr>
-      <td class="model-name">Qwen3vl</td>
-      <td class="status-progress">⚠️</td>
-      <td></td>
-      <td></td>
-      <td></td>
-      <td><span class="status-coming">comming soon</span></td>
-    </tr>
  </tbody>
 </table>

@@ -207,17 +202,17 @@ Please use the following recommended versions to get started quickly:

 | Version | Release type | Doc |
 |----------|---------------|-----|
-| v0.10.1.1 | Latest stable version | [QuickStart](https://vllm-kunlun.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-kunlun.readthedocs.io/en/latest/installation.html) for more details |
+| v0.11.0 | Latest stable version | [QuickStart](./docs/_build/html/quick_start.html) and [Installation](./docs/_build/html/installation.html) for more details |

 ---

 ## Contributing

-See [CONTRIBUTING](https://vllm-kunlun.readthedocs.io/en/latest/developer_guide/contribution/index.html) for more details, which is a step-by-step guide to help you set up the development environment, build, and test.
+See [CONTRIBUTING]() for more details, which is a step-by-step guide to help you set up the development environment, build, and test.

 We welcome and value any contributions and collaborations:
- Open an [Issue](https://github.com/baidu/vLLM-Kunlun/issues) if you find a bug or have a feature request
+- Open an [Issue]() if you find a bug or have a feature request

 ## License

-Apache License 2.0, as found in the [LICENSE](https://github.com/baidu/vLLM-Kunlun/blob/main/LICENSE.txt) file.
+Apache License 2.0, as found in the [LICENSE](./LICENSE) file.
--- a/ci.yml
+++ b/ci.yml
@@ -0,0 +1,19 @@
+Global:
+    version: "2.0"
+    group_email: hac@baidu.com
+Default:
+    profile:
+        - build
+Profiles:
+    - profile:
+      name: build
+      mode: AGENT
+      environment:
+        image: DECK_STD_CENTOS7
+        tools:
+            - python: 3.10.10
+      build:
+        command: sh build.sh
+      excludeTools: []
+      artifacts:
+        release: true
--- a/dockerfile/Dockerfile_vision
+++ b/dockerfile/Dockerfile_vision
@@ -0,0 +1,8 @@
+ARG BASE_IMAGE=iregistry.baidu-int.com/hac_test/aiak-inference-llm:xpu_dev_202508030_v1
+FROM ${BASE_IMAGE}
+
+COPY vllm-kunlun /workspace/vllm-kunlun
+
+RUN bash /workspace/vllm-kunlun/dockerfile/install.sh
+
+WORKDIR /workspace
--- a/dockerfile/install.sh
+++ b/dockerfile/install.sh
@@ -0,0 +1,34 @@
+#!/bin/bash
+
+set -exuo pipefail
+
+source /root/miniconda/etc/profile.d/conda.sh
+conda activate python310_torch25_cuda
+echo 'conda activate python310_torch25_cuda' >> ~/.bashrc
+echo 'source /workspace/vllm-kunlun/setup_env.sh' >> ~/.bashrc
+
+#安装社区vllm
+cd /workspace/vllm-kunlun
+pip uninstall vllm -y
+pip uninstall vllm-kunlun -y
+pip install vllm==0.11.0 --no-build-isolation --no-deps --index-url https://pip.baidu-int.com/simple/
+
+#
+pip install -r /workspace/vllm-kunlun/requirements.txt
+
+#安装vllm-kunlun
+python setup.py build
+python setup.py install
+cp vllm_kunlun/patches/eval_frame.py /root/miniconda/envs/python310_torch25_cuda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py
+
+#安装Kl3自定义torch 01130
+wget -O xpytorch-cp310-torch251-ubuntu2004-x64.run https://baidu-kunlun-public.su.bcebos.com/v1/baidu-kunlun-share/1130/xpytorch-cp310-torch251-ubuntu2004-x64.run?authorization=bce-auth-v1%2FALTAKypXxBzU7gg4Mk4K4c6OYR%2F2025-12-02T05%3A01%3A27Z%2F-1%2Fhost%2Ff3cf499234f82303891aed2bcb0628918e379a21e841a3fac6bd94afef491ff7
+bash xpytorch-cp310-torch251-ubuntu2004-x64.run
+rm xpytorch-cp310-torch251-ubuntu2004-x64.run
+#安装Klx3自定义算子库 01130
+pip uninstall xtorch_ops -y
+pip install "https://baidu-kunlun-public.su.bcebos.com/v1/baidu-kunlun-share/1130/xtorch_ops-0.1.2209%2B6752ad20-cp310-cp310-linux_x86_64.whl?authorization=bce-auth-v1%2FALTAKypXxBzU7gg4Mk4K4c6OYR%2F2025-12-05T06%3A18%3A00Z%2F-1%2Fhost%2F14936c2b7e7c557c1400e4c467c79f7a9217374a7aa4a046711ac4d948f460cd"
+#安装klx3自定义triton
+pip install "https://cce-ai-models.bj.bcebos.com/v1/vllm-kunlun-0.11.0/triton-3.0.0%2Bb2cde523-cp310-cp310-linux_x86_64.whl?authorization=bce-auth-v1%2FALTAKxPW2jzoJUuFZmI19s3yry%2F2025-11-05T02%3A47%3A29Z%2F-1%2Fhost%2Fd8c95dbd06187a3140ca3e681e00c6941c30e14bb1d4112a0c8bc3c93e5c9c3f"
+#安装AIAK自定义算子库
+pip install "https://cce-ai-models.bj.bcebos.com/v1/chenyili/xspeedgate_ops-0.0.0-cp310-cp310-linux_x86_64.whl?authorization=bce-auth-v1%2FALTAKxPW2jzoJUuFZmI19s3yry%2F2025-12-05T06%3A37%3A39Z%2F-1%2Fhost%2F1002777dadd2afe4c1f047cbf0d94244d5b1f03295cd8f7a2802b92a13cd5035"
--- a/docs/.DS_Store
+++ b/docs/.DS_Store
--- a/docs/README.md
+++ b/docs/README.md
@@ -5,53 +5,52 @@
 uv venv myenv --python 3.12 --seed
 source myenv/bin/activate

-
- # Step 1: Enter the docs directory
+# 步骤1：进入docs目录
 cd docs

-# Step 2: Install dependencies (using uv)
+# 步骤2：安装依赖（使用uv）
 uv pip install -r requirements-docs.txt

-# Install sphinx-autobuild (if not in requirements file)
+# 安装 sphinx-autobuild（如果没在 requirements 文件里）
 uv pip install sphinx-autobuild

-# Run from the docs directory:
+# 从 docs 目录运行：
 sphinx-autobuild ./source ./_build/html --port 8000

-# Step 1: Clean up old files
+# 步骤1：清理旧文件
 make clean

-# Step 2: Build HTML
+# 步骤2：构建HTML
 make html

-# Step 3: Local preview
+# 步骤3：本地预览
 python -m http.server -d _build/html/

-Browser access: http://localhost:8000
+浏览器访问：http://localhost:8000

 🌍 Internationalization
-Internationalization translation process (taking Chinese as an example)
+国际化翻译流程（以中文为例）

-# Step 1: Extract translatable text (generate .pot)
+# 步骤1：提取可翻译文本（生成 .pot）
 sphinx-build -b gettext source _build/gettext

-# Step 2: Generate/update Chinese .po file
+# 步骤2：生成/更新中文 .po 文件
 sphinx-intl update -p _build/gettext -l zh_CN

-# Step 3: Manually translate .po file
-# Use a text editor to open source/locale/zh_CN/LC_MESSAGES/*.po
-# Fill in the Chinese translation in msgstr ""
+# 步骤3：人工翻译 .po 文件
+# 用文本编辑器打开 source/locale/zh_CN/LC_MESSAGES/*.po
+# 在 msgstr "" 里填入中文翻译

-# Step 4: Compile and build Chinese documentation
+# 步骤4：编译并构建中文文档
 make intl

-# Step 5: View the effect
+# 步骤5：查看效果
 python -m http.server -d _build/html


-Browser access:
+浏览器访问：

-English version: http://localhost:8000
-Chinese version: http://localhost:8000/zh-cn
+英文版： http://localhost:8000
+中文版： http://localhost:8000/zh-cn

 ```
--- a/docs/envs.py
+++ b/docs/envs.py
@@ -47,15 +47,18 @@ env_variables: Dict[str, Callable[[], Any]] = {
    # The C compiler used for compiling the package. If not set, the default
    # value is None, which means the system default C compiler will be used.
    "C_COMPILER": lambda: os.getenv("C_COMPILER", None),
-
-    "SOC_VERSION": lambda: os.getenv("SOC_VERSION", "KUNLUNP800"),
+    # The version of the Kunlun chip. If not set, the default value is
+    # KUNLUN910B1(Available for A2 and A3 series). It's used for package building.
+    # Please make sure that the version is correct.
+    "SOC_VERSION": lambda: os.getenv("SOC_VERSION", "KUNLUN910B1"),
    # If set, vllm-kunlun will print verbose logs during compilation
    "VERBOSE": lambda: bool(int(os.getenv("VERBOSE", "0"))),
+    # The home path for CANN toolkit. If not set, the default value is
    # /usr/local/Kunlun/kunlun-toolkit/latest
    "KUNLUN_HOME_PATH": lambda: os.getenv("KUNLUN_HOME_PATH", None),
-    # The path for XCCL library, it's used by pyxccl communicator backend. If
-    # not set, the default value is libxccl.so。
-    "XCCL_SO_PATH": lambda: os.environ.get("XCCL_SO_PATH", None),
+    # The path for HCCL library, it's used by pyhccl communicator backend. If
+    # not set, the default value is libhccl.so。
+    "HCCL_SO_PATH": lambda: os.environ.get("HCCL_SO_PATH", None),
    # The version of vllm is installed. This value is used for developers who
    # installed vllm from source locally. In this case, the version of vllm is
    # usually changed. For example, if the version of vllm is "0.9.0", but when
@@ -116,6 +119,7 @@ env_variables: Dict[str, Callable[[], Any]] = {
    # and the mla_pa will be the default path of deepseek decode path.
    "VLLM_KUNLUN_MLA_PA": lambda: int(os.getenv("VLLM_KUNLUN_MLA_PA", 0)),
    # Whether to enable MatmulAllReduce fusion kernel when tensor parallel is enabled.
+    # this feature is supported in A2, and eager mode will get better performance.
    "VLLM_KUNLUN_ENABLE_MATMUL_ALLREDUCE": lambda: bool(
        int(os.getenv("VLLM_KUNLUN_ENABLE_MATMUL_ALLREDUCE", "0"))
    ),
--- a/docs/source/community/contributors.md
+++ b/docs/source/community/contributors.md
@@ -35,5 +35,4 @@
 |   Yijin Qiao   |
 |  Chenchao Hu   |
 |  Weijie Hong   |
-|   Song Jiang   |
-|   Hongwei Ma   |
+|   Song Jiang   |
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -65,17 +65,19 @@ myst_substitutions = {
    # the branch of vllm, used in vllm clone
    # - main branch: 'main'
    # - vX.Y.Z branch: 'vX.Y.Z'
-    "vllm_version": "0.10.1.1",
+    "vllm_version": "v0.11.0rc3",
    # the branch of vllm-kunlun, used in vllm-kunlun clone and image tag
    # - main branch: 'main'
    # - vX.Y.Z branch: latest vllm-kunlun release tag
-    "vllm_kunlun_version": "0.10.1.1",
+    "vllm_kunlun_version": "v0.11.0rc0",
    # the newest release version of vllm-kunlun and matched vLLM, used in pip install.
    # This value should be updated when cut down release.
-    "pip_vllm_kunlun_version": "0.10.1.1",
-    "pip_vllm_version": "0.10.1.1",
+    "pip_vllm_kunlun_version": "0.11.0rc0",
+    "pip_vllm_version": "0.11.0",
+    # CANN image tag
+    "cann_image_tag": "8.3.rc1-910b-ubuntu22.04-py3.11",
    # vllm version in ci
-    "ci_vllm_version": "0.10.1.1",
+    "ci_vllm_version": "v0.11.0",
 }

 # For cross-file header anchors
@@ -102,6 +104,7 @@ exclude_patterns = [
    ".venv",
    "README.md",
    "user_guide/release.template.md",
+    # TODO(yikun): Remove this after zh supported
    "**/*.zh.md",
 ]

@@ -115,7 +118,7 @@ html_theme = "sphinx_book_theme"
 html_logo = "logos/vllm-kunlun-logo-text-light.png"
 html_theme_options = {
    "path_to_docs": "docs/source",
-    "repository_url": "https://github.com/baidu/vLLM-Kunlun",
+    "repository_url": "https://github.com/xxxxx/vllm-kunlun",
    "use_repository_button": True,
    "use_edit_page_button": True,
 }
--- a/docs/source/developer_guide/contribution/contributing.md
+++ b/docs/source/developer_guide/contribution/contributing.md
@@ -0,0 +1,83 @@
+# Contributing
+
+## Building and Testing
+It's recommended to set up a local development environment to build vllm-kunlun and run tests
+before you submit a PR.
+
+#### Run models locally
+
+After completing Run lint setup which is shown in quicksatrt, you can run your changed locally:
+
+```{code-block} bash
+   :substitutions:
+
+python -m vllm.entrypoints.openai.api_server \
+      --host 0.0.0.0 \
+      --port 8356 \
+      --model your_modified_models \
+      --gpu-memory-utilization 0.9 \
+      --trust-remote-code \
+      --max-model-len 32768 \
+      --tensor-parallel-size 1 \
+      --dtype float16 \
+      --max_num_seqs 128 \
+      --max_num_batched_tokens 32768 \
+      --block-size 128 \
+      --no-enable-prefix-caching \
+      --no-enable-chunked-prefill \
+      --distributed-executor-backend mp \
+      --served-model-name your_modified_models \
+      --compilation-config '{"splitting_ops": ["vllm.unified_attention", 
+                                                "vllm.unified_attention_with_output",
+                                                "vllm.unified_attention_with_output_kunlun",
+                                                "vllm.mamba_mixer2", 
+                                                "vllm.mamba_mixer", 
+                                                "vllm.short_conv", 
+                                                "vllm.linear_attention", 
+                                                "vllm.plamo2_mamba_mixer", 
+                                                "vllm.gdn_attention", 
+                                                "vllm.sparse_attn_indexer"]}' \ 
+```
+Please save a screenshot of your service running successfully, and attach an accuracy report.
+
+#### Submit the commit
+
+```bash
+# Commit changed files using `-s`
+git commit -sm "your commit info"
+```
+
+🎉 Congratulations! You have completed the development environment setup.
+
+
+## PR Title and Classification
+
+Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
+
+- `[Attention]` for new features or optimization in attention.
+- `[Communicator]` for new features or optimization in communicators.
+- `[ModelRunner]` for new features or optimization in model runner.
+- `[Platform]` for new features or optimization in platform.
+- `[Worker]` for new features or optimization in worker.
+- `[Core]` for new features or optimization  in the core vllm-kunlun logic (such as platform, attention, communicators, model runner)
+- `[Kernel]` for changes affecting compute kernels and ops.
+- `[Bugfix]` for bug fixes.
+- `[Doc]` for documentation fixes and improvements.
+- `[Test]` for tests (such as unit tests).
+- `[CI]` for build or continuous integration improvements.
+- `[Misc]` for PRs that do not fit the above categories. Please use this sparingly.
+
+:::{note}
+If the PR spans more than one category, please include all relevant prefixes.
+:::
+
+## Others
+
+If you find any problem when contributing, you can join our slack group to talk with us and then feel free to submit a PR to improve the doc to help other developers. 
+
+:::{toctree}
+:caption: Index
+:maxdepth: 1
+testing
+multi_node_test
+:::
--- a/docs/source/developer_guide/contribution/index.md
+++ b/docs/source/developer_guide/contribution/index.md
@@ -1,70 +1,5 @@
 # Contributing

 ## Building and Testing
-It's recommended to set up a local development environment to build vllm-kunlun and run tests
-before you submit a PR.

-#### Run models locally
-
-After completing Run lint setup which is shown in quicksatrt, you can run your changed locally:
-
-```{code-block} bash
-   :substitutions:
-
-python -m vllm.entrypoints.openai.api_server \
-      --host 0.0.0.0 \
-      --port 8356 \
-      --model /your_modified_models\
-      --trust-remote-code \
-      --tensor-parallel-size 1 \
-      --no-enable-prefix-caching \
-      --no-enable-chunked-prefill \
-      --distributed-executor-backend mp \
-      --served-model-name your_modified_models \
-      --compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun",
-            "vllm.unified_attention", "vllm.unified_attention_with_output",
-            "vllm.mamba_mixer2"]}' \
-```
-Please save a screenshot of your service running successfully, and attach an accuracy report.
-
-#### Submit the commit
-
-```bash
-# Commit changed files using `-s`
-git commit -sm "your commit info"
-```
-
-🎉 Congratulations! You have completed the development environment setup.
-
-
-## PR Title and Classification
-
-Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
-
- `[Attention]` for new features or optimization in attention.
- `[Communicator]` for new features or optimization in communicators.
- `[ModelRunner]` for new features or optimization in model runner.
- `[Platform]` for new features or optimization in platform.
- `[Worker]` for new features or optimization in worker.
- `[Core]` for new features or optimization  in the core vllm-kunlun logic (such as platform, attention, communicators, model runner)
- `[Kernel]` for changes affecting compute kernels and ops.
- `[Bugfix]` for bug fixes.
- `[Doc]` for documentation fixes and improvements.
- `[Test]` for tests (such as unit tests).
- `[CI]` for build or continuous integration improvements.
- `[Misc]` for PRs that do not fit the above categories. Please use this sparingly.
-
-:::{note}
-If the PR spans more than one category, please include all relevant prefixes.
-:::
-
-## Others
-
-If you find any problem when contributing, you can join our slack group to talk with us and then feel free to submit a PR to improve the doc to help other developers. 
-
-:::{toctree}
-:caption: Index
-:maxdepth: 1
-testing
-multi_node_test
-:::
+Comming soon...
--- a/docs/source/developer_guide/evaluation/accuracy/accuracy_server.md
+++ b/docs/source/developer_guide/evaluation/accuracy/accuracy_server.md
@@ -88,24 +88,20 @@ if not os.path.exists(output_dir):  # Step 4: Check if the directory exists
 # dump the mixed data to a jsonl file
 dump_jsonl_data(mixed_data, output_path)  # Step 6: Securely write to the file
 ```
-
 Dataset composition visualization:
-
 ```
 ┌───────────────────────────────────────┐
 │       VL-Test (1000 samples)          │
 ├─────────────────┬─────────────────────┤
 │   PureText      │      Vision         │
-│   (333 samples) │    (667 samples)    │
+│   (333 样本)    │    (667 样本)        │
 ├─────────────────┼─────────────────────┤
 │ • mmlu_pro      │ • math_vista        │
 │ • ifeval        │ • mmmu_pro          │
 │ • gsm8k         │                     │
 └─────────────────┴─────────────────────┘
 ```
-
 #### 3.Test
-
 ```python
 from dotenv import dotenv_values

@@ -138,14 +134,13 @@ task_cfg = TaskConfig(

 run_task(task_cfg=task_cfg)
 ```
-
 Parameter Tuning Guide:

-| Parameter         | Current value | Effect                                   | Adjustment suggestions                                   |
-| ----------------- | ------------- | ---------------------------------------- | -------------------------------------------------------- |
-| `temperature`     | 0.6           | Control output diversity                 | Math problems ↓ 0.3 / Creative writing ↑ 0.9             |
-| `top_p`           | 0.95          | Filtering low-probability tokens         | Reduce "nonsense"                                        |
-| `eval_batch_size` | 5             | Number of requests processed in parallel | With sufficient video memory, it can be increased to 10. |
+| Parameter        | Current value | Effect  | Adjustment suggestions                |
+| ----------------- | ------ | --------------- | ----------------------- |
+| `temperature`     | 0.6    | Control output diversity  | Math problems ↓ 0.3 / Creative writing ↑ 0.9 |
+| `top_p`           | 0.95   | Filtering low-probability tokens | Reduce "nonsense"         |
+| `eval_batch_size` | 5      | Number of requests processed in parallel  | With sufficient video memory, it can be increased to 10.         |

 Run the test:

@@ -172,12 +167,11 @@ python accuracy.py 2>&1 | tee "$LOG_FILE"
 # ========================================
 EXIT_CODE=${PIPESTATUS[0]}
 if [ $EXIT_CODE -eq 0 ]; then
-    echo "✅ Evaluation completed! Log saved to: $LOG_FILE"
+    echo "✅ 评测完成! 日志已保存到: $LOG_FILE"
 else
-    echo "❌ Evaluation failed! Exit code: $EXIT_CODE Please check the log: $LOG_FILE"
+    echo "❌ 评测失败! 退出码: $EXIT_CODE 请查看日志: $LOG_FILE"
 fi
 ```
-
 #### 4.Common problem fixes

 ##### 4.1 NLTK resource missing fix
@@ -187,7 +181,6 @@ Resource punkt_tab not found.
 ```

 Solution：
-
 ```python
 import nltk
 import os
@@ -200,13 +193,13 @@ os.makedirs(download_dir, exist_ok=True)
 nltk.data.path.append(download_dir)

 # Step 3: Download necessary resources
-print("🔽 Start downloading punkt_tab resource...")
+print("🔽 开始下载punkt_tab资源...")
 try:
    nltk.download("punkt_tab", download_dir=download_dir)
-    print("✅ Download successful!")
+    print("✅ 下载成功!")
 except Exception as e:
-    print(f"❌ Download failed: {e}")
-    print("💡 Alternative: Download manually from GitHub")
+    print(f"❌ 下载失败: {e}")
+    print("💡 备选方案:手动从GitHub下载")
    print(
        "   URL: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt_tab.zip"
    )
--- a/docs/source/developer_guide/performance/performance_benchmark/benchmark_kernel.md
+++ b/docs/source/developer_guide/performance/performance_benchmark/benchmark_kernel.md
@@ -34,9 +34,9 @@ The fork pattern is used to track the entire time period from the start to the e
 /xxxx/xxxx/xprofiler -r500 --xpu=0 python test.py
 ```

- --r: Sets the trace time resolution in nanoseconds (ns). The default is 100. If an "out of space error" occurs, try increasing the -r value to 500.
+* --r: Sets the trace time resolution in nanoseconds (ns). The default is 100. If an "out of space error" occurs, try increasing the -r value to 500.

- --xpu: Specifies the acquisition device ID, supporting multi-card configuration. --xpu=all enables all cards; the default is card 0.
+* --xpu: Specifies the acquisition device ID, supporting multi-card configuration. --xpu=all enables all cards; the default is card 0.

 More parameters can be found in the command-line parameters section later.

@@ -58,7 +58,7 @@ A temporary .sock file will be generated in the execution directory. The path ne

 ```bash
 export XPU_ENABLE_PROFILER_TRACING=1
-export XPU_TRACING_OUTPUT_NAME=<xprofiler execution directory>/xprofiler.sock
+export XPU_TRACING_OUTPUT_NAME=<xprofiler 执行目录>/xprofiler.sock
 # Start your own program
 python xxx.py
 ```
@@ -99,7 +99,7 @@ xprofiler.sock
 ```python
 export XPU_ENABLE_PROFILER_TRACING=1
 # Here, the path to the .sock file from step 2 is used for assignment.
-export XPU_TRACING_OUTPUT_NAME=<xprofiler execution directory>/xprofiler.sock
+export XPU_TRACING_OUTPUT_NAME=<xprofiler 执行目录>/xprofiler.sock
 # Start your own program
 python xxx.py
 ```
@@ -108,21 +108,21 @@ Note: If you want to specify a particular card to run on, you must import the XP

 ##### More parameters

-| parameters                 | Example                                 | default value | describe                                                                                                                                                                                           |
-| -------------------------- | --------------------------------------- | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| -b or --buffer-size        | -b=512                                  | 256           | Specifies the size of the trace buffer in MB. This is generally not required. However, if there are many trace signals, the buffer size can be increased appropriately to avoid OOS (Out of Size). |
-| -x or --xpu                | -x=0--xpu=0                             | 0             | Set the card number to be tracked; multiple cards or all cards can be set.                                                                                                                         |
-| -t or --time               | -t=10                                   | off           | Enable time mode, in seconds, to capture information over a specified period.                                                                                                                      |
-| -d or --deamonize          | -r500                                   | 0             | Enable daemon mode to retrieve events in the background.                                                                                                                                           |
-| -r or --export-profile     | -e ./trace_output-e ./output/trace.json | ./            | Record the trace results to a document or folder. If this parameter is not specified, a default xprofiler.trace.json file will be generated in the execution directory.                            |
-| -S or --settings           | -S xprofiler.trace.json                 | off           | xprofiler reads a JSON file containing the events that need to be traced. If this parameter is not configured, xprofiler enables `--profile-api-trace` and `--sse-trace` by default.               |
-| -A or --profiler-api-trace | -A                                      | on            | Get driver events.                                                                                                                                                                                 |
-| -s or --sse-trace          | -s                                      | on            | Get all SSE events.                                                                                                                                                                                |
-| -C or --cluster-trace      | -C                                      | off           | Retrieve all cluster events.                                                                                                                                                                       |
-| -n or --sdnn-trace         | -n                                      | off           | Get all SDNN events.                                                                                                                                                                               |
-| -c or --sdnn-cluster-trace | -c                                      | off           | Retrieve all SDNN cluster events.                                                                                                                                                                  |
-| -E or --cache-trace        | -E                                      | off           | Get bandwidth statistics events.                                                                                                                                                                   |
-| -u or --debug              | -u44:open log，debug level-u0:close log | 33            | Debug the interface and enable driver event/device event logging.。                                                                                                                                |
+| parameters                | Example                     | default value | describe                |
+| -------------------------- | --------------------------------------- | ------ | ------------------------------------------------------------ |
+| -b or --buffer-size        | -b=512                                  | 256    | Specifies the size of the trace buffer in MB. This is generally not required. However, if there are many trace signals, the buffer size can be increased appropriately to avoid OOS (Out of Size). |
+| -x or --xpu                | -x=0--xpu=0                             | 0      | Set the card number to be tracked; multiple cards or all cards can be set.                       |
+| -t or --time               | -t=10                                   | off    | Enable time mode, in seconds, to capture information over a specified period.                   |
+| -d or --deamonize          | -r500                                   | 0      | Enable daemon mode to retrieve events in the background.                               |
+| -r or --export-profile     | -e ./trace_output-e ./output/trace.json | ./     | Record the trace results to a document or folder. If this parameter is not specified, a default xprofiler.trace.json file will be generated in the execution directory. |
+| -S or --settings           | -S xprofiler.trace.json                 | off    | xprofiler reads a JSON file containing the events that need to be traced. If this parameter is not configured, xprofiler enables `--profile-api-trace` and `--sse-trace` by default. |
+| -A or --profiler-api-trace | -A                                      | on     | Get driver events.                                              |
+| -s or --sse-trace          | -s                                      | on     | Get all SSE events.                                           |
+| -C or --cluster-trace      | -C                                      | off    | Retrieve all cluster events.                                        |
+| -n or --sdnn-trace         | -n                                      | off    | Get all SDNN events.                                           |
+| -c or --sdnn-cluster-trace | -c                                      | off    | Retrieve all SDNN cluster events.                                  |
+| -E or --cache-trace        | -E                                      | off    | Get bandwidth statistics events.                                           |
+| -u or --debug              | -u44:open log，debug level-u0:close log    | 33     | Debug the interface and enable driver event/device event logging.。                    |

 #### 3.View Results

@@ -144,4 +144,4 @@ Search directly, or visit[Perfetto UI](https://ui.perfetto.dev/#!/viewer?local_c

 With various performance data available, analysis and optimization can then be performed based on the results.

-(Further details to be added later)
+(Further details to be added later)
--- a/docs/source/developer_guide/performance/performance_benchmark/benchmark_server.md
+++ b/docs/source/developer_guide/performance/performance_benchmark/benchmark_server.md
@@ -11,26 +11,30 @@ You can directly use vLLM's CLI benchmark. For more details, please refer to[vLL
 Server startup script reference

 ```bash
-USE_ORI_ROPE=1 VLLM_USE_V1=1 python -m vllm.entrypoints.openai.api_server \
+python -m vllm.entrypoints.openai.api_server \
      --host 0.0.0.0 \
-      --port xxxx \
-      --model /xxxx/xxxx/model\
+      --port 8000 \
+      --model /xxxx/xxxx/mkdel\
      --gpu-memory-utilization 0.9 \
      --trust-remote-code \
      --max-model-len 32768 \
      --tensor-parallel-size 1 \
      --dtype float16 \
-      --max_num_seqs 128 \
-      --max_num_batched_tokens 32768 \
-      --max-seq-len-to-capture 32768 \
-      --block-size 128 \
      --no-enable-prefix-caching \
      --no-enable-chunked-prefill \
      --distributed-executor-backend mp \
      --served-model-name modelname \
-      --compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun",
-            "vllm.unified_attention", "vllm.unified_attention_with_output",
-            "vllm.mamba_mixer2"]}' \
+      --compilation-config '{"splitting_ops": ["vllm.unified_attention", 
+                                                "vllm.unified_attention_with_output",
+                                                "vllm.unified_attention_with_output_kunlun",
+                                                "vllm.mamba_mixer2", 
+                                                "vllm.mamba_mixer", 
+                                                "vllm.short_conv", 
+                                                "vllm.linear_attention", 
+                                                "vllm.plamo2_mamba_mixer", 
+                                                "vllm.gdn_attention", 
+                                                "vllm.sparse_attn_indexer"]}' \
+
 ```

 ##### 1.2Execute test
@@ -124,26 +128,30 @@ The following demonstrates the performance test of the Qwen3-8B in a single-card
 The first step is to start the server. The example script is shown below.

 ```bash
-USE_ORI_ROPE=1 VLLM_USE_V1=1 python -m vllm.entrypoints.openai.api_server \
+python -m vllm.entrypoints.openai.api_server \
      --host 0.0.0.0 \
-      --port xxxx \
-      --model /xxxx/xxxx/Qwen3-8B\
+      --port 8000 \
+      --model /models/Qwen3-8B\
      --gpu-memory-utilization 0.9 \
      --trust-remote-code \
      --max-model-len 32768 \
      --tensor-parallel-size 1 \
      --dtype float16 \
-      --max_num_seqs 128 \
-      --max_num_batched_tokens 32768 \
-      --max-seq-len-to-capture 32768 \
-      --block-size 128 \
      --no-enable-prefix-caching \
      --no-enable-chunked-prefill \
      --distributed-executor-backend mp \
-      --served-model-name Qwen3-8B \
-      --compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun",
-            "vllm.unified_attention", "vllm.unified_attention_with_output",
-            "vllm.mamba_mixer2"]}' \
+      --served-model-name Qwen3-8B-Instruct \
+      --compilation-config '{"splitting_ops": ["vllm.unified_attention", 
+                                                "vllm.unified_attention_with_output",
+                                                "vllm.unified_attention_with_output_kunlun",
+                                                "vllm.mamba_mixer2", 
+                                                "vllm.mamba_mixer", 
+                                                "vllm.short_conv", 
+                                                "vllm.linear_attention", 
+                                                "vllm.plamo2_mamba_mixer", 
+                                                "vllm.gdn_attention", 
+                                                "vllm.sparse_attn_indexer"]}' \
+
 ```

 ##### 2.2 Start EvalScope
--- a/docs/source/developer_guide/performance/performance_benchmark/index.md
+++ b/docs/source/developer_guide/performance/performance_benchmark/index.md
@@ -7,5 +7,4 @@ This document details the performance testing methods for vllm-kunlun and the an
 :maxdepth: 1
 benchmark_server
 benchmark_kernel
-profiling
 :::
--- a/docs/source/developer_guide/performance/performance_benchmark/profiling.md
+++ b/docs/source/developer_guide/performance/performance_benchmark/profiling.md
@@ -1,418 +0,0 @@
-## Profiling
-
-
-
-### 🔧 Action Plan（Three Phases）
-#### Phase 1️⃣: Multi-Device Log Redirection Configuration
-##### Background
-By default, kernel logs from all 8 XPU devices are interleaved and emitted to [stdout], resulting in:
- It becomes impossible to distinguish which log originates from which device.
- Timestamps become interleaved, making it difficult to analyze the temporal relationships.
- Single-device bottlenecks are masked by global aggregation.
-
-##### Solution
-During model initialization, create separate log files for each device.
-##### Code Explanation (embedded in qwen2.py)
-```python
-import os  # ← Ensure this is imported at the top of the file
-from vllm.distributed import get_tensor_model_parallel_rank  # ← Import function to get the tensor model parallel rank
-
-class Qwen2Model(nn.Module):
-
-    def __init__(self,
-                 *,
-                 vllm_config: VllmConfig,
-                 prefix: str = "",
-                 decoder_layer_type: type[nn.Module] = Qwen2DecoderLayer):
-        super().__init__()
-
-        # ========== [Expert Solution] Kunlun XPU Multi-Device Log Redirection ==========
-        try:
-            # Step 1: Get the current XPU device's rank (0~7)
-            rank = get_tensor_model_parallel_rank()
-            
-            # Step 2: Create log directory (works with your get_kernel_time_ex.py)
-            log_dir = "./xpu_logs"
-            os.makedirs(log_dir, exist_ok=True)
-            
-            # Step 3: Generate a separate log file for each device
-            log_file = os.path.join(log_dir, f"rank_{rank}.log")
-            
-            # Step 4: Core operation – redirect file descriptors
-            # os.O_TRUNC: Clear previous logs on each run to avoid mixing outputs
-            fd = os.open(log_file, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o664)
-            os.dup2(fd, 1)  # Redirect stdout → rank_X.log
-            os.dup2(fd, 2)  # Redirect stderr → rank_X.log
-            os.close(fd)     # Close original file descriptor; redirection persists
-            
-            # Optional: print a confirmation message (will go into rank_X.log)
-            print(f"[Qwen2Model Init] Rank {rank} log redirected to {log_file}")
-            
-        except Exception as e:
-            # Fallback mechanism: failure to redirect logs does not affect model loading
-            print(f"[WARNING] Failed to redirect log for rank: {e}", flush=True)
-        # ========== End of log redirection code ==========
-
-```
-##### ⚠️ Common Issues
-**Q1**:Why not use Python's `logging` module?
-**A**:The XPU runtime kernel logs are emitted from the C++ layer and cannot be captured by Python’s `logging` module. Redirection via low-level file descriptors is required.
-**Q1**:Will logs be lost if the model fails to load??
-**A**:The `try-except` block ensures that if log redirection fails, it falls back to the default behavior without affecting model startup.
-
-#### Phase 2️⃣: Profiling Environment Activation
-##### 🚀 vLLM Launch
-```bash
-unset XPU_DUMMY_EVENT
-export XPU_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export XPU_USE_MOE_SORTED_THRES=1
-export XFT_USE_FAST_SWIGLU=1
-export XMLIR_CUDNN_ENABLED=1
-export XPU_USE_DEFAULT_CTX=1
-export XMLIR_FORCE_USE_XPU_GRAPH=1
-export XPU_USE_FAST_SWIGLU=1
-export VLLM_HOST_IP=$(hostname -i)
-echo "VLLM_HOST_IP: $VLLM_HOST_IP"
-
-export XMLIR_ENABLE_MOCK_TORCH_COMPILE=false
-
-export XPUAPI_DEBUG=0x1              # Enable kernel performance logging
-export XPURT_DISPATCH_MODE=PROFILING # Activate profiling mode
-
-USE_ORI_ROPE=1 VLLM_USE_V1=1 python -m vllm.entrypoints.openai.api_server \
-      --host 0.0.0.0 \
-      --port 8000 \
-      --model /models/Qwen2.5-72B-Instruct \
-      --gpu-memory-utilization 0.9 \
-      --trust-remote-code \
-      --max-model-len 32768 \
-      --tensor-parallel-size 8 \
-      --dtype float16 \
-      --max_num_seqs 512 \
-      --max_num_batched_tokens 32768 \
-      --max-seq-len-to-capture 32768 \
-      --block-size 128 \
-      --no-enable-prefix-caching \
-      --no-enable-chunked-prefill \
-      --distributed-executor-backend mp \
-      --served-model-name Qwen2.5-72B-Instruct \
-      --compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun",
-            "vllm.unified_attention", "vllm.unified_attention_with_output",
-            "vllm.mamba_mixer2"]}' 2>&1 | tee output_p800.log
-
-```
-
-
-##### 🚀 Client Load Testing
-```bash
-#!/bin/bash
-
-# Define test combinations array (concurrency x input length x output length)
-TEST_COMBINATIONS=(
-    "8x1024x1024" # Medium-low concurrency
-)
-
-# Create result directory
-RESULT_DIR="bench_$(date +%Y%m%d_%H%M)"
-mkdir -p $RESULT_DIR
-
-# Summary results file
-SUMMARY_FILE="$RESULT_DIR/summary_results.csv"
-echo "num_prompts,input_len,output_len,throughput,latency_mean,latency_p50,latency_p90,latency_p99" >$SUMMARY_FILE
-
-# Progress counter
-TOTAL_TESTS=${#TEST_COMBINATIONS[@]}
-CURRENT_TEST=0
-
-# Loop through different test combinations
-for COMBINATION in "${TEST_COMBINATIONS[@]}"; do
-    # Parse combination parameters
-    NUM_PROMPTS=$(echo $COMBINATION | cut -d'x' -f1)
-    INPUT_LEN=$(echo $COMBINATION | cut -d'x' -f2)
-    OUTPUT_LEN=$(echo $COMBINATION | cut -d'x' -f3)
-
-    # Update progress
-    CURRENT_TEST=$((CURRENT_TEST + 1))
-
-    echo "=========================================================="
-    echo "Test progress: $CURRENT_TEST/$TOTAL_TESTS ($(printf "%.1f" $(echo "$CURRENT_TEST/$TOTAL_TESTS*100" | bc -l))%)"
-    echo "Current test configuration: concurrency=$NUM_PROMPTS, input length=$INPUT_LEN, output length=$OUTPUT_LEN"
-    echo "=========================================================="
-
-    OUTPUT_FILE="$RESULT_DIR/p800_${NUM_PROMPTS}_${INPUT_LEN}_${OUTPUT_LEN}.log"
-
-    # Run benchmark
-    python3 -m vllm.entrypoints.cli.main bench serve \
-        --host 127.0.0.1 \
-        --port 8000 \
-        --backend vllm \
-        --model Qwen2.5-72B-Instruct \
-        --dataset-name random \
-        --num-prompts $NUM_PROMPTS \
-        --random-input-len $INPUT_LEN \
-        --random-output-len $OUTPUT_LEN \
-        --tokenizer /ssd1/models/Qwen2.5-72B-Instruct \
-        --ignore-eos 2>&1 | tee $OUTPUT_FILE
-
-    # Wait 15 seconds to let the service recover
-    echo "Waiting 15 seconds before the next round..."
-    sleep 15
-
-    # Extract key performance metrics from output and append to summary file
-    THROUGHPUT=$(grep "Throughput" $OUTPUT_FILE | awk '{print $2}')
-    LATENCY_MEAN=$(grep "Mean latency" $OUTPUT_FILE | awk '{print $3}')
-    LATENCY_P50=$(grep "p50 latency" $OUTPUT_FILE | awk '{print $3}')
-    LATENCY_P90=$(grep "p90 latency" $OUTPUT_FILE | awk '{print $3}')
-    LATENCY_P99=$(grep "p99 latency" $OUTPUT_FILE | awk '{print $3}')
-
-    echo "$NUM_PROMPTS,$INPUT_LEN,$OUTPUT_LEN,$THROUGHPUT,$LATENCY_MEAN,$LATENCY_P50,$LATENCY_P90,$LATENCY_P99" >>$SUMMARY_FILE
-done
-
-# Output summary report
-echo "=========================================================="
-echo "Benchmark completed! Results saved in: $RESULT_DIR"
-echo "=========================================================="
-
-
-```
-
-#### Phase 3️⃣: Log Analysis and Bottleneck Identification
-```lua
-xpu_logs/
-├─ rank_0.log
-├─ rank_1.log
-├─ rank_2.log
-├─ rank_3.log
-├─ rank_4.log
-├─ rank_5.log
-├─ rank_6.log
-└─ rank_7.log
-
-```
-##### 🔍 Script Workflow (op_log.py)
-**Input**:Raw Kernel Logs (Sample Format)
-```
-[XPURT_PROF] void xblas_xpu3::fc_cdnn_infer<float16,...> 123456 ns
-[XPURT_PROF] void kl3_all_reduce<float16> 987654 ns
-```
-**Processing logic**
-:::::{tab-set}
-::::{tab-item} op_log.py 
-
-
-```python
-"""
-A better version of 'get_op_time.py', get more level dump and support kl3.
- 
-Usage: python3 get_kernel_time_ex.py --help
-"""
- 
-import os
-import sys
-import re
- 
-unit_factors = [0.9, 1.3, 1.45] # kunlun1, kunlun2, kunlun3
-patterns = ["\[XPURT_PROF\] (\S+)\s+\S+\s+(\S+) ns", "\[XPURT_PROF\] (\S+)\s+(\S+)\s+\S+ ns"]
-tab_space_num = int(4)
- 
-def get_total_time(res):
-    total_time = 0.0
-    for i in res.values():
-        total_time += i
-    return  total_time
- 
-def print_info_op(res, cnt, unit, op):
-    total_time = get_total_time(res)
-    total_cnt = 0
-    # print detailed op time
-    lis=sorted(res.items(), key=lambda d:d[1], reverse=True)
-    if sys.version_info.major == 2:
-        import commands
-        for i in range(len(lis)):
-            (status, cmd_output) = commands.getstatusoutput("c++filt {}".format(lis[i][0]))
-            if status == 0:
-                formt_type = (cmd_output.split('('))[0]
-            total_cnt += cnt[lis[i][0]]
-    elif sys.version_info.major == 3:
-        import subprocess
-        for i in range(len(lis)):
-            (status, cmd_output) = subprocess.getstatusoutput("c++filt {}".format(lis[i][0]))
-            if status == 0:
-                formt_type = (cmd_output.split('('))[0]
-            total_cnt += cnt[lis[i][0]]
-    print(f"{op} {total_time / unit} {total_cnt}")
- 
-def print_info_kernel(res, cnt, unit):
-    total_time = get_total_time(res)
-    total_cnt = 0
-    print("Total time(ms) is {}".format(total_time / unit))
-    # print detailed op time
-    lis=sorted(res.items(), key=lambda d:d[1], reverse=True)
-    if sys.version_info.major == 2:
-        print("{:<90}{:<10}{:<15}{:<15}".format("Op type", "count", "time(ms)", "%"))
-        import commands
-        for i in range(len(lis)):
-            (status, cmd_output) = commands.getstatusoutput("c++filt {}".format(lis[i][0]))
-            if status == 0:
-                formt_type = (cmd_output.split('('))[0]
-            print("{:<90}{:<10}{:<15}{:<15.5}".format(formt_type, cnt[lis[i][0]], lis[i][1] / unit, \
-                lis[i][1] / total_time * 100))
-            total_cnt += cnt[lis[i][0]]
-    elif sys.version_info.major == 3:
-        print("{:<90}{:<10}{:<20}{:<20}".format("Op type", "count", "time(ms)", "%"))
-        import subprocess
-        for i in range(len(lis)):
-            (status, cmd_output) = subprocess.getstatusoutput("c++filt {}".format(lis[i][0]))
-            if status == 0:
-                formt_type = (cmd_output.split('('))[0]
-            print("{:<150}{:<10}{:<25}{:<20.5}".format(formt_type, cnt[lis[i][0]], lis[i][1] / unit, \
-                lis[i][1] / total_time * 100))
-            total_cnt += cnt[lis[i][0]]
- 
-    print("Total count is {}".format(total_cnt))
- 
-def count_head_spaces(s: str) -> int:
-   
-    count = 0
-    for char in s:
-        if char == ' ':
-            count += 1
-        else:
-            break
-    return count
- 
-def process_line(lines, pattern1, unit_factor, dump_level):
-    """ process a line in a file with profiling info
- 
-    Args:
-        unit_factor: A factor differentiated by KUNLUN1 and KUNLUN2
- 
-    """
-    res = {}
-    cnt = {}
-    op = "init_op"
-    unit = unit_factor * 1000 * 1000 # ns -> ms
-    wait_next_one = False
-    for i in range(len(lines)):
-        cur_line = lines[i]
-        if "gtest_" in cur_line:
-            cur_level = count_head_spaces(cur_line) / tab_space_num
-            if cur_level == dump_level:
-                wait_next_one = False
-                print_info_op(res, cnt, unit, op)
-                # clear buf
-                res = {}
-                cnt = {}
-                op = cur_line.lstrip().rstrip()
-            elif cur_level < dump_level:
-                wait_next_one = True
-                # skip record kernel time untime next one
-                continue
-        if wait_next_one:
-            # skip record kernel time
-            continue
-        match = re.match(pattern1, lines[i])
-        if match:
-            op_type = match.group(1)
-            op_time = match.group(2)
-            if op_type in res:
-                res[op_type] += float(op_time)
-                cnt[op_type] += 1
-            else:
-                res[op_type] = float(op_time)
-                cnt[op_type] = 1
- 
-    # get left total time
-    if dump_level == -1:
-        print_info_kernel(res, cnt, unit)
-    else:
-        print_info_op(res, cnt, unit, op)
-    return res
- 
-def process_file(file_name, pattern2, unit_factor, dump_level = -1):
-    """ Process a file line by line
- 
-    Iteratively process each line in the target file.
- 
-    """
- 
-    with open(file_name, "r") as f:
-        lines = f.readlines()
-        f1_res_list = process_line(lines, pattern2, unit_factor, dump_level)
- 
-if __name__ == '__main__':
-    import argparse
- 
-
-    parser = argparse.ArgumentParser()
- 
-
-    group = parser.add_mutually_exclusive_group()
-    group.add_argument('-xpu1', action='store_true', help='指定为 xpu1')
-    group.add_argument('-xpu2', action='store_true', help='指定为 xpu2')
-    group.add_argument('-xpu3', action='store_true', help='指定为 xpu3')
-    parser.add_argument('--level', type=int, default=-1, help='指定 dump 缩进级别（默认为 -1）')
-
-    parser.add_argument('filename', help='要处理的文件名')
- 
-
-    args = parser.parse_args()
- 
-
-    filename = args.filename
-    xpu_version = 0
-    if args.xpu2:
-        xpu_version = 1
-    if args.xpu3:
-        xpu_version = 2
-    dump_level = args.level
-    print(f'Filename: {filename}')
-    print(f'-xpu option: {xpu_version}')
-    print(f'--level option: {dump_level}')
- 
-    unit_factor = unit_factors[xpu_version]
-    pattern_idx = 0
-    if xpu_version > 0:
-        pattern_idx = 1
-    process_file(filename, patterns[pattern_idx], unit_factor, dump_level)
- 
-```
-
-::::
-
-::::{tab-item} op_log.sh
-
-
-
-```bash
-
-for i in {0..7}; do
-    python op_log.py -xpu3 xpu_logs/rank_${i}.log > analysis_rank${i}.log
-    echo "Rank ${i} 分析完成"
-done
-
-
-for i in {0..7}; do
-    echo "=== Rank $i ===" 
-    head -n 6 analysis_rank${i}.log | tail -n 5
-done
-```
-::::
-:::::
-##### 📈 Output Example (analysis_rank0.log)
-```
-Filename: xpu_logs/rank_0.log
-xpu option: 2
--level option: -1
-Total time(ms) is 53742.29571862069
-Op type                                                                                   count     time(ms)            %                   
-void xblas_xpu3::fc_cdnn_infer<float16, float16, float16, float16, float, float, float, float, 1>                                                     661569    22736.262780689656       42.306              
-void kl3_all_reduce<float16>                                                                                                                          176134    14782.525712413793       27.506              
-void kl3_all_reduce_butterfly<float16>                                                                                                                164864    4197.28395862069         7.81           
-```
-##### 🚨 Troubleshooting Guide
-|Symptom|Cause|Solution|
-|-|-|-|
-|`xpu_logs` directory is empty|XPUAPI_DEBUG not enabled|Verify that the environment variable is correctly set|
-All 8 log files have identical content|Multi-process backend not activated|Ensure `--distributed-executor-backend` mp is specified|
-|Throughput drops >15%|Profiling overhead too high|Enable profiling only during analysis; disable in production|
--- a/docs/source/faqs.md
+++ b/docs/source/faqs.md
@@ -2,7 +2,7 @@

 ## Version Specific FAQs

- [[v0.10.1.1] FAQ & Feedback]
+- [[v0.11.0] FAQ & Feedback]

 ## General FAQs

@@ -20,12 +20,13 @@ We will support the kunlun4 M100 platform in early 2026.

 ### 2. How to get our docker containers?

-**base**:`docker pull wjie520/vllm_kunlun:v0.0.1`.
+**base**:`docker pull iregistry.baidu-int.com/xmlir/xmlir_ubuntu_2004_x86_64:v0.32`.

+**full**:`docker pull wjie520/vllm_kunlun:v0.0.1`.

 ### 3. How vllm-kunlun work with vLLM?

-vllm-kunlun is a hardware plugin for vLLM. Basically, the version of vllm-kunlun is the same as the version of vllm. For example, if you use vllm 0.10.1.1, you should use vllm-kunlun 0.10.1.1 as well. For main branch, we will make sure `vllm-kunlun` and `vllm` are compatible by each commit.
+vllm-kunlun is a hardware plugin for vLLM. Basically, the version of vllm-kunlun is the same as the version of vllm. For example, if you use vllm 0.11.0, you should use vllm-kunlun 0.11.0 as well. For main branch, we will make sure `vllm-kunlun` and `vllm` are compatible by each commit.


 ### 4. How to handle the out-of-memory issue?
--- a/docs/source/index.md
+++ b/docs/source/index.md
@@ -16,9 +16,9 @@

 <p style="text-align:center">
 <script async defer src="https://buttons.github.io/buttons.js"></script>
-<a class="github-button" href="https://github.com/baidu/vLLM-Kunlun" data-show-count="true" data-size="large" aria-label="Star">Star</a>
-<a class="github-button" href="https://github.com/baidu/vLLM-Kunlun/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch">Watch</a>
-<a class="github-button" href="https://github.com/baidu/vLLM-Kunlun/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
+<a class="github-button" href="https://github.com/vllm-project/vllm" data-show-count="true" data-size="large" aria-label="Star">Star</a>
+<a class="github-button" href="https://github.com/vllm-project/vllm/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch">Watch</a>
+<a class="github-button" href="https://github.com/vllm-project/vllm/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
 </p>
 :::

--- a/docs/source/installation.md
+++ b/docs/source/installation.md
@@ -11,7 +11,7 @@ This document describes how to install vllm-kunlun manually.
  - vLLM (same version as vllm-kunlun)

 ## Setup environment using container
-We provide a clean, minimal base image for your use`wjie520/vllm_kunlun:v0.0.1`.You can pull it using the `docker pull` command.
+We provide a clean, minimal base image for your use`iregistry.baidu-int.com/xmlir/xmlir_ubuntu_2004_x86_64:v0.32`.You can pull it using the `docker pull` command.
 ### Container startup script

 :::::{tab-set}
@@ -31,7 +31,7 @@ if [ $XPU_NUM -gt 0 ]; then
    done
    DOCKER_DEVICE_CONFIG="${DOCKER_DEVICE_CONFIG} --device=/dev/xpuctrl:/dev/xpuctrl"
 fi
-export build_image="wjie520/vllm_kunlun:v0.0.1"
+export build_image="iregistry.baidu-int.com/xmlir/xmlir_ubuntu_2004_x86_64:v0.32"
 docker run -itd ${DOCKER_DEVICE_CONFIG} \
    --net=host \
    --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
@@ -46,16 +46,16 @@ docker run -itd ${DOCKER_DEVICE_CONFIG} \
 ::::
 :::::
 ## Install vLLM-kunlun
-### Install vLLM 0.10.1.1
+### Install vLLM 0.11.0
 ```
 conda activate python310_torch25_cuda

-pip install vllm==0.10.1.1 --no-build-isolation --no-deps 
+pip install vllm==0.11.0 
 ```
 ### Build and Install
 Navigate to the vllm-kunlun directory and build the package:
 ```
-git clone https://github.com/baidu/vLLM-Kunlun # TODO: replace with Github Url to install vllm-kunlun
+git clone xxxx # TODO: replace with Github Url to install vllm-kunlun

 cd vllm-kunlun

@@ -71,28 +71,33 @@ Copy the eval_frame.py patch:
 ```
 cp vllm_kunlun/patches/eval_frame.py /root/miniconda/envs/python310_torch25_cuda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py
 ```
-## Update xpytorch
+## Install the KL3-customized build of PyTorch
 ```
-wget https://klx-sdk-release-public.su.bcebos.com/kunlun2aiak_output/0830/xpytorch-cp310-torch251-ubuntu2004-x64.run
-
-bash xpytorch-cp310-torch251-ubuntu2004-x64.run
+wget https://klx-sdk-release-public.su.bcebos.com/xpytorch/release/3.3.2.7/xpytorch-cp310-torch251-ubuntu2004-x64.run && bash xpytorch-cp310-torch251-ubuntu2004-x64.run
 ```

 ## Install custom ops
 ```
-pip install \
-https://xtorch_ops
-
-pip install \
-https://xspeedgate_ops-0.0.0-cp310-cp310-linux_x86_64.whl
+pip uninstall xtorch_ops -y && pip install \
+"https://baidu-kunlun-public.su.bcebos.com/v1/baidu-kunlun-share/xtorch_ops-0.1.2028%2B1baf1b15-cp310-cp310-linux_x86_64.whl?authorization=bce-auth-v1%2FALTAKypXxBzU7gg4Mk4K4c6OYR%2F2025-10-31T10%3A38%3A24Z%2F-1%2Fhost%2Faa1969b70a4a97c407d69614a5d5a3e26ea07286d13f0a2ab8daccc288152903"
 ```

+## Install the KLX3 custom Triton build
+```
+pip install \
+"https://cce-ai-models.bj.bcebos.com/v1/vllm-kunlun-0.11.0/triton-3.0.0%2Bb2cde523-cp310-cp310-linux_x86_64.whl?authorization=bce-auth-v1%2FALTAKxPW2jzoJUuFZmI19s3yry%2F2025-11-05T02%3A47%3A29Z%2F-1%2Fhost%2Fd8c95dbd06187a3140ca3e681e00c6941c30e14bb1d4112a0c8bc3c93e5c9c3f"
+```
+## Install the AIAK custom ops library
+```
+pip install \
+"https://cce-ai-models.bj.bcebos.com/v1/chenyili/xspeedgate_ops-0.0.0-cp310-cp310-linux_x86_64.whl?authorization=bce-auth-v1%2FALTAKxPW2jzoJUuFZmI19s3yry%2F2025-11-18T01%3A56%3A21Z%2F-1%2Fhost%2F28b57cbc5dc62ac1bf946e74146b3ea4952d2ffff448617f0303980dcaf6cb49"
+```
 ## Quick Start

 ### Set up the environment

 ```
-chmod +x /workspace/vllm-kunlun/setup_env.sh && source /workspace/vllm-kunlun/setup_env.sh
+chmod +x /workspace/baidu/hac-aiacc/vllm-kunlun/setup_env.sh && source /workspace/baidu/hac-aiacc/vllm-kunlun/setup_env.sh
 ```

 ### Run the server
@@ -107,7 +112,7 @@ chmod +x /workspace/vllm-kunlun/setup_env.sh && source /workspace/vllm-kunlun/se
 python -m vllm.entrypoints.openai.api_server \
      --host 0.0.0.0 \
      --port 8356 \
-      --model /models/Qwen3-8B\
+      --model models/Qwen3-VL-30B-A3B-Instruct \
      --gpu-memory-utilization 0.9 \
      --trust-remote-code \
      --max-model-len 32768 \
@@ -115,15 +120,22 @@ python -m vllm.entrypoints.openai.api_server \
      --dtype float16 \
      --max_num_seqs 128 \
      --max_num_batched_tokens 32768 \
-      --max-seq-len-to-capture 32768 \
      --block-size 128 \
      --no-enable-prefix-caching \
      --no-enable-chunked-prefill \
      --distributed-executor-backend mp \
-      --served-model-name Qwen3-8B \
-      --compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun",
-            "vllm.unified_attention", "vllm.unified_attention_with_output",
-            "vllm.mamba_mixer2"]}' \
+      --served-model-name Qwen3-VL-30B-A3B-Instruct \
+      --compilation-config '{"splitting_ops": ["vllm.unified_attention", 
+                                                "vllm.unified_attention_with_output",
+                                                "vllm.unified_attention_with_output_kunlun",
+                                                "vllm.mamba_mixer2", 
+                                                "vllm.mamba_mixer", 
+                                                "vllm.short_conv", 
+                                                "vllm.linear_attention", 
+                                                "vllm.plamo2_mamba_mixer", 
+                                                "vllm.gdn_attention", 
+                                                "vllm.sparse_attn_indexer"]}' \  
+
 ```
 ::::
 :::::
--- a/docs/source/locale/zh_CN/LC_MESSAGES/community/contributors.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/community/contributors.po
--- a/docs/source/locale/zh_CN/LC_MESSAGES/community/governance.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/community/governance.po
@@ -0,0 +1,228 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/community/governance.md:1
+msgid "Governance"
+msgstr "治理"
+
+#: ../../source/community/governance.md:3
+msgid "Mission"
+msgstr "使命"
+
+#~ msgid ""
+#~ "As a vital component of vLLM, the"
+#~ " vLLM Kunlun project is dedicated to"
+#~ " providing an easy, fast, and cheap"
+#~ " LLM Serving for Everyone on Kunlun"
+#~ " XPU, and to actively contribute to"
+#~ " the enrichment of vLLM."
+#~ msgstr ""
+#~ "作为 vLLM 的重要组成部分，vLLM Kunlun 项目致力于为所有人在 "
+#~ "Kunlun XPU 上提供简单、快速且低成本的大语言模型服务，并积极促进 vLLM "
+#~ "的丰富发展。"
+
+#~ msgid "Principles"
+#~ msgstr "原则"
+
+#~ msgid ""
+#~ "vLLM Kunlun follows the vLLM community's"
+#~ " code of conduct：[vLLM - CODE OF "
+#~ "CONDUCT](https://github.com/vllm-"
+#~ "project/vllm/blob/main/CODE_OF_CONDUCT.md)"
+#~ msgstr ""
+#~ "vLLM Kunlun 遵循 vLLM 社区的行为准则：[vLLM - "
+#~ "行为准则](https://github.com/vllm-"
+#~ "project/vllm/blob/main/CODE_OF_CONDUCT.md)"
+
+#~ msgid "Governance - Mechanics"
+#~ msgstr "治理 - 机制"
+
+#~ msgid ""
+#~ "vLLM Kunlun is an open-source "
+#~ "project under the vLLM community, where"
+#~ " the authority to appoint roles is"
+#~ " ultimately determined by the vLLM "
+#~ "community. It adopts a hierarchical "
+#~ "technical governance structure."
+#~ msgstr "vLLM Kunlun 是 vLLM 社区下的一个开源项目，其角色任命权最终由 vLLM 社区决定。它采用分层的技术治理结构。"
+
+#~ msgid "Contributor:"
+#~ msgstr "贡献者："
+
+#~ msgid ""
+#~ "**Responsibility:** Help new contributors on"
+#~ " boarding, handle and respond to "
+#~ "community questions, review RFCs, code"
+#~ msgstr "**职责：** 帮助新贡献者加入，处理和回复社区问题，审查RFC和代码"
+
+#~ msgid ""
+#~ "**Requirements:** Complete at least 1 "
+#~ "contribution. Contributor is someone who "
+#~ "consistently and actively participates in "
+#~ "a project, included but not limited "
+#~ "to issue/review/commits/community involvement."
+#~ msgstr "**要求：** 完成至少1次贡献。贡献者是指持续且积极参与项目的人，包括但不限于问题、评审、提交和社区参与。"
+
+#~ msgid ""
+#~ "Contributors will be empowered [vllm-"
+#~ "project/vllm-kunlun](https://github.com/vllm-project"
+#~ "/vllm-kunlun) Github repo `Triage` "
+#~ "permissions (`Can read and clone this"
+#~ " repository. Can also manage issues "
+#~ "and pull requests`) to help community"
+#~ " developers collaborate more efficiently."
+#~ msgstr ""
+#~ "贡献者将被赋予 [vllm-project/vllm-"
+#~ "kunlun](https://github.com/vllm-project/vllm-kunlun) "
+#~ "Github 仓库的 `Triage` "
+#~ "权限（`可读取和克隆此仓库。还可以管理问题和拉取请求`），以帮助社区开发者更加高效地协作。"
+
+#~ msgid "Maintainer:"
+#~ msgstr "维护者："
+
+#~ msgid ""
+#~ "**Responsibility:** Develop the project's "
+#~ "vision and mission. Maintainers are "
+#~ "responsible for driving the technical "
+#~ "direction of the entire project and "
+#~ "ensuring its overall success, possessing "
+#~ "code merge permissions. They formulate "
+#~ "the roadmap, review contributions from "
+#~ "community members, continuously contribute "
+#~ "code, and actively engage in community"
+#~ " activities (such as regular "
+#~ "meetings/events)."
+#~ msgstr ""
+#~ "**责任：** "
+#~ "制定项目的愿景和使命。维护者负责引领整个项目的技术方向并确保其整体成功，拥有代码合并权限。他们制定路线图，审核社区成员的贡献，持续贡献代码，并积极参与社区活动（如定期会议/活动）。"
+
+#~ msgid ""
+#~ "**Requirements:** Deep understanding of ‌vLLM‌"
+#~ " and ‌vLLM Kunlun‌ codebases, with a"
+#~ " commitment to sustained code "
+#~ "contributions. Competency in ‌design/development/PR"
+#~ " review workflows‌."
+#~ msgstr ""
+#~ "**要求：** 深入理解 ‌vLLM‌ 和 ‌vLLM Kunlun‌ "
+#~ "代码库，并承诺持续贡献代码。具备 ‌设计/开发/PR 审核流程‌ 的能力。"
+
+#~ msgid ""
+#~ "**Review Quality‌:** Actively participate in"
+#~ " community code reviews, ensuring high-"
+#~ "quality code integration."
+#~ msgstr "**评审质量：** 积极参与社区代码评审，确保高质量的代码集成。"
+
+#~ msgid ""
+#~ "**Quality Contribution‌:** Successfully develop "
+#~ "and deliver at least one major "
+#~ "feature while maintaining consistent high-"
+#~ "quality contributions."
+#~ msgstr "**质量贡献‌：** 成功开发并交付至少一个主要功能，同时持续保持高质量的贡献。"
+
+#~ msgid ""
+#~ "**Community Involvement‌:** Actively address "
+#~ "issues, respond to forum inquiries, "
+#~ "participate in discussions, and engage "
+#~ "in community-driven tasks."
+#~ msgstr "**社区参与：** 积极解决问题，回复论坛询问，参与讨论，并参与社区驱动的任务。"
+
+#~ msgid ""
+#~ "Requires approval from existing Maintainers."
+#~ " The vLLM community has the final "
+#~ "decision-making authority."
+#~ msgstr "需要现有维护者的批准。vLLM社区拥有最终决策权。"
+
+#~ msgid ""
+#~ "Maintainer will be empowered [vllm-"
+#~ "project/vllm-kunlun](https://github.com/vllm-project"
+#~ "/vllm-kunlun) Github repo write permissions"
+#~ " (`Can read, clone, and push to "
+#~ "this repository. Can also manage issues"
+#~ " and pull requests`)."
+#~ msgstr ""
+#~ "维护者将被授予 [vllm-project/vllm-"
+#~ "kunlun](https://github.com/vllm-project/vllm-kunlun) "
+#~ "Github 仓库的写入权限（`可以读取、克隆和推送到此仓库。还可以管理问题和拉取请求`）。"
+
+#~ msgid "Nominating and Removing Maintainers"
+#~ msgstr "提名和移除维护者"
+
+#~ msgid "The Principles"
+#~ msgstr "原则"
+
+#~ msgid ""
+#~ "Membership in vLLM Kunlun is given "
+#~ "to individuals on merit basis after "
+#~ "they demonstrated strong expertise of "
+#~ "the vLLM / vLLM Kunlun through "
+#~ "contributions, reviews and discussions."
+#~ msgstr ""
+#~ "vLLM Kunlun 的成员资格是基于个人能力授予的，只有在通过贡献、评审和讨论展示出对 vLLM"
+#~ " / vLLM Kunlun 的深厚专业知识后，才可获得。"
+
+#~ msgid ""
+#~ "For membership in the maintainer group"
+#~ " the individual has to demonstrate "
+#~ "strong and continued alignment with the"
+#~ " overall vLLM / vLLM Kunlun "
+#~ "principles."
+#~ msgstr "要成为维护者组成员，个人必须表现出与 vLLM / vLLM Kunlun 总体原则的高度一致并持续支持。"
+
+#~ msgid ""
+#~ "Light criteria of moving module "
+#~ "maintenance to ‘emeritus’ status if they"
+#~ " don’t actively participate over long "
+#~ "periods of time."
+#~ msgstr "如果模块维护人员在长时间内没有积极参与，可根据较宽松的标准将其维护状态转为“荣誉”状态。"
+
+#~ msgid "The membership is for an individual, not a company."
+#~ msgstr "该会员资格属于个人，而非公司。"
+
+#~ msgid "Nomination and Removal"
+#~ msgstr "提名与罢免"
+
+#~ msgid ""
+#~ "Nomination: Anyone can nominate someone "
+#~ "to become a maintainer (include self-"
+#~ "nominate). All existing maintainers are "
+#~ "responsible for evaluating the nomination. "
+#~ "The nominator should provide nominee's "
+#~ "info around the strength of the "
+#~ "candidate to be a maintainer, include"
+#~ " but not limited to review quality,"
+#~ " quality contribution, community involvement."
+#~ msgstr "提名：任何人都可以提名他人成为维护者（包括自荐）。所有现有维护者都有责任评估提名。提名人应提供被提名人成为维护者的相关优势信息，包括但不限于评审质量、优质贡献、社区参与等。"
+
+#~ msgid ""
+#~ "Removal: Anyone can nominate a person"
+#~ " to be removed from maintainer "
+#~ "position (include self-nominate). All "
+#~ "existing maintainers are responsible for "
+#~ "evaluating the nomination. The nominator "
+#~ "should provide nominee's info, include "
+#~ "but not limited to lack of "
+#~ "activity, conflict with the overall "
+#~ "direction and other information that "
+#~ "makes them unfit to be a "
+#~ "maintainer."
+#~ msgstr "移除：任何人都可以提名某人被移出维护者职位（包括自荐）。所有现有维护者都有责任评估该提名。提名者应提供被提名人的相关信息，包括但不限于缺乏活动、与整体方向冲突以及使其不适合作为维护者的其他信息。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/community/user_stories/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/community/user_stories/index.po
@@ -0,0 +1,120 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/community/user_stories/index.md:1
+#, fuzzy
+msgid "User stories"
+msgstr "用户故事"
+
+#~ msgid "More details"
+#~ msgstr "更多细节"
+
+#~ msgid ""
+#~ "Read case studies on how users and"
+#~ " developers solves real, everyday problems"
+#~ " with vLLM Kunlun"
+#~ msgstr "阅读案例研究，了解用户和开发者如何使用 vLLM Kunlun 解决实际日常问题。"
+
+#~ msgid ""
+#~ "[LLaMA-Factory](./llamafactory.md) is an "
+#~ "easy-to-use and efficient platform "
+#~ "for training and fine-tuning large "
+#~ "language models, it supports vLLM Kunlun"
+#~ " to speed up inference since "
+#~ "[LLaMA-Factory#7739](https://github.com/hiyouga/LLaMA-"
+#~ "Factory/pull/7739), gain 2x performance "
+#~ "enhancement of inference."
+#~ msgstr ""
+#~ "[LLaMA-Factory](./llamafactory.md) "
+#~ "是一个易于使用且高效的大语言模型训练与微调平台，自 [LLaMA-"
+#~ "Factory#7739](https://github.com/hiyouga/LLaMA-"
+#~ "Factory/pull/7739) 起支持 vLLM Kunlun 加速推理，推理性能提升"
+#~ " 2 倍。"
+
+#~ msgid ""
+#~ "[Huggingface/trl](https://github.com/huggingface/trl) is a"
+#~ " cutting-edge library designed for "
+#~ "post-training foundation models using "
+#~ "advanced techniques like SFT, PPO and"
+#~ " DPO, it uses vLLM Kunlun since "
+#~ "[v0.17.0](https://github.com/huggingface/trl/releases/tag/v0.17.0) "
+#~ "to support RLHF on Kunlun XPU."
+#~ msgstr ""
+#~ "[Huggingface/trl](https://github.com/huggingface/trl) "
+#~ "是一个前沿的库，专为使用 SFT、PPO 和 DPO "
+#~ "等先进技术对基础模型进行后训练而设计。从 "
+#~ "[v0.17.0](https://github.com/huggingface/trl/releases/tag/v0.17.0) "
+#~ "版本开始，该库利用 vLLM Kunlun 来支持在 Kunlun XPU"
+#~ " 上进行 RLHF。"
+
+#~ msgid ""
+#~ "[MindIE Turbo](https://pypi.org/project/mindie-turbo) "
+#~ "is an LLM inference engine acceleration"
+#~ " plug-in library developed by Baidu"
+#~ " on Kunlun hardware, which includes "
+#~ "self-developed large language model "
+#~ "optimization algorithms and optimizations "
+#~ "related to the inference engine "
+#~ "framework. It supports vLLM Kunlun since"
+#~ " "
+#~ "[2.0rc1](https://www.hikunlun.com/document/detail/zh/mindie/20RC1/AcceleratePlugin/turbodev"
+#~ "/mindie-turbo-0001.html)."
+#~ msgstr ""
+#~ "[MindIE Turbo](https://pypi.org/project/mindie-turbo) "
+#~ "是华为在昇腾硬件上开发的一款用于加速LLM推理引擎的插件库，包含自主研发的大语言模型优化算法及与推理引擎框架相关的优化。从 "
+#~ "[2.0rc1](https://www.hikunlun.com/document/detail/zh/mindie/20RC1/AcceleratePlugin/turbodev"
+#~ "/mindie-turbo-0001.html) 起，支持 vLLM Kunlun。"
+
+#~ msgid ""
+#~ "[GPUStack](https://github.com/gpustack/gpustack) is an "
+#~ "open-source GPU cluster manager for "
+#~ "running AI models. It supports vLLM "
+#~ "Kunlun since "
+#~ "[v0.6.2](https://github.com/gpustack/gpustack/releases/tag/v0.6.2),"
+#~ " see more GPUStack performance evaluation"
+#~ " info on "
+#~ "[link](https://mp.weixin.qq.com/s/pkytJVjcH9_OnffnsFGaew)."
+#~ msgstr ""
+#~ "[GPUStack](https://github.com/gpustack/gpustack) 是一个开源的 "
+#~ "GPU 集群管理器，用于运行 AI 模型。从 "
+#~ "[v0.6.2](https://github.com/gpustack/gpustack/releases/tag/v0.6.2) "
+#~ "版本开始支持 vLLM Kunlun，更多 GPUStack 性能评测信息见 "
+#~ "[链接](https://mp.weixin.qq.com/s/pkytJVjcH9_OnffnsFGaew)。"
+
+#~ msgid ""
+#~ "[verl](https://github.com/volcengine/verl) is a "
+#~ "flexible, efficient and production-ready "
+#~ "RL training library for large language"
+#~ " models (LLMs), uses vLLM Kunlun "
+#~ "since "
+#~ "[v0.4.0](https://github.com/volcengine/verl/releases/tag/v0.4.0), "
+#~ "see more info on [verl x Kunlun"
+#~ " "
+#~ "Quickstart](https://verl.readthedocs.io/en/latest/kunlun_tutorial/kunlun_quick_start.html)."
+#~ msgstr ""
+#~ "[verl](https://github.com/volcengine/verl) "
+#~ "是一个灵活、高效且可用于生产环境的大型语言模型（LLM）强化学习训练库，自 "
+#~ "[v0.4.0](https://github.com/volcengine/verl/releases/tag/v0.4.0) "
+#~ "起支持 vLLM Kunlun，更多信息请参见 [verl x Kunlun"
+#~ " "
+#~ "快速上手](https://verl.readthedocs.io/en/latest/kunlun_tutorial/kunlun_quick_start.html)。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/community/user_stories/llamafactory.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/community/user_stories/llamafactory.po
@@ -0,0 +1,108 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/community/user_stories/llamafactory.md:1
+msgid "LLaMA-Factory"
+msgstr "LLaMA-Factory"
+
+#: ../../source/community/user_stories/llamafactory.md:3
+#, fuzzy
+msgid "**Introduction**"
+msgstr "**关于 / 介绍**"
+
+#: ../../source/community/user_stories/llamafactory.md:5
+msgid ""
+"[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) is an easy-to-"
+"use and efficient platform for training and fine-tuning large language "
+"models. With LLaMA-Factory, you can fine-tune hundreds of pre-trained "
+"models locally without writing any code."
+msgstr ""
+"[LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) "
+"是一个易于使用且高效的平台，用于训练和微调大型语言模型。有了 LLaMA-"
+"Factory，你可以在本地对数百个预训练模型进行微调，无需编写任何代码。"
+
+#: ../../source/community/user_stories/llamafactory.md:7
+#, fuzzy
+msgid ""
+"LLaMA-Facotory users need to evaluate and inference the model after fine-"
+"tuning."
+msgstr "LLaMA-Facotory 用户需要在对模型进行微调后对模型进行评估和推理。"
+
+#: ../../source/community/user_stories/llamafactory.md:9
+#, fuzzy
+msgid "**Business challenge**"
+msgstr "**业务挑战**"
+
+#: ../../source/community/user_stories/llamafactory.md:11
+#, fuzzy
+msgid ""
+"LLaMA-Factory uses Transformers to perform inference on Kunlun XPUs, but "
+"the speed is slow."
+msgstr "LLaMA-Factory 使用 transformers 在 Kunlun XPU 上进行推理，但速度较慢。"
+
+#: ../../source/community/user_stories/llamafactory.md:13
+#, fuzzy
+msgid "**Benefits with vLLM Kunlun**"
+msgstr "**通过 vLLM Kunlun 解决挑战与收益**"
+
+#: ../../source/community/user_stories/llamafactory.md:15
+msgid ""
+"With the joint efforts of LLaMA-Factory and vLLM Kunlun ([LLaMA-"
+"Factory#7739](https://github.com/hiyouga/LLaMA-Factory/pull/7739)), "
+"LLaMA-Factory has achieved significant performance gains during model "
+"inference. Benchmark results show that its inference speed is now up to "
+"2× faster compared to the Transformers implementation."
+msgstr ""
+
+#: ../../source/community/user_stories/llamafactory.md:17
+msgid "**Learn more**"
+msgstr "**了解更多**"
+
+#: ../../source/community/user_stories/llamafactory.md:19
+#, fuzzy
+msgid ""
+"See more details about LLaMA-Factory and how it uses vLLM Kunlun for "
+"inference on Kunlun XPUs in [LLaMA-Factory Kunlun XPU "
+"Inference](https://llamafactory.readthedocs.io/en/latest/advanced/npu_inference.html)."
+msgstr ""
+"在以下文档中查看更多关于 LLaMA-Factory 以及其如何在 Kunlun XPU 上使用 vLLM Kunlun 进行推理的信息"
+"：[LLaMA-Factory Kunlun XPU "
+"推理](https://llamafactory.readthedocs.io/en/latest/advanced/npu_inference.html)。"
+
+#~ msgid ""
+#~ "With the joint efforts of LLaMA-"
+#~ "Factory and vLLM Kunlun ([LLaMA-"
+#~ "Factory#7739](https://github.com/hiyouga/LLaMA-"
+#~ "Factory/pull/7739)), the performance of "
+#~ "LLaMA-Factory in the model inference "
+#~ "stage has been significantly improved. "
+#~ "According to the test results, the "
+#~ "inference speed of LLaMA-Factory has "
+#~ "been increased to 2x compared to "
+#~ "the transformers version."
+#~ msgstr ""
+#~ "在 LLaMA-Factory 和 vLLM Kunlun "
+#~ "的共同努力下（参见 [LLaMA-Factory#7739](https://github.com/hiyouga"
+#~ "/LLaMA-Factory/pull/7739)），LLaMA-Factory "
+#~ "在模型推理阶段的性能得到了显著提升。根据测试结果，LLaMA-Factory 的推理速度相比 "
+#~ "transformers 版本提升到了 2 倍。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/community/versioning_policy.po
@@ -0,0 +1,575 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/community/versioning_policy.md:1
+msgid "Versioning policy"
+msgstr "版本管理策略"
+
+#~ msgid ""
+#~ "Starting with vLLM 0.7.x, the vLLM "
+#~ "Kunlun Plugin ([vllm-project/vllm-"
+#~ "kunlun](https://github.com/vllm-project/vllm-kunlun)) "
+#~ "project follows the [PEP "
+#~ "440](https://peps.python.org/pep-0440/) to publish "
+#~ "matching with vLLM ([vllm-"
+#~ "project/vllm](https://github.com/vllm-project/vllm))."
+#~ msgstr ""
+#~ "从 vLLM 0.7.x 开始，vLLM Kunlun 插件（[vllm-"
+#~ "project/vllm-kunlun](https://github.com/vllm-project"
+#~ "/vllm-kunlun)）项目遵循 [PEP "
+#~ "440](https://peps.python.org/pep-0440/) ，以与 vLLM（[vllm-"
+#~ "project/vllm](https://github.com/vllm-project/vllm)）版本匹配发布。"
+
+#~ msgid "vLLM Kunlun Plugin versions"
+#~ msgstr "vLLM Kunlun 插件版本"
+
+#~ msgid ""
+#~ "Each vLLM Kunlun release will be "
+#~ "versioned: `v[major].[minor].[micro][rcN][.postN]` (such"
+#~ " as `v0.7.3rc1`, `v0.7.3`, `v0.7.3.post1`)"
+#~ msgstr ""
+#~ "每个 vLLM Kunlun "
+#~ "版本将采用以下版本格式：`v[major].[minor].[micro][rcN][.postN]`（例如 "
+#~ "`v0.7.3rc1`、`v0.7.3`、`v0.7.3.post1`）"
+
+#~ msgid ""
+#~ "**Final releases**: will typically be "
+#~ "released every **3 months**, will take"
+#~ " the vLLM upstream release plan and"
+#~ " Kunlun software product release plan "
+#~ "into comprehensive consideration."
+#~ msgstr "**正式版本**：通常每**3个月**发布一次，将综合考虑 vLLM 上游发行计划和昇腾软件产品发行计划。"
+
+#~ msgid ""
+#~ "**Pre releases**: will typically be "
+#~ "released **on demand**, ending with rcN,"
+#~ " represents the Nth release candidate "
+#~ "version, to support early testing by "
+#~ "our users prior to a final "
+#~ "release."
+#~ msgstr "**预发布版本**：通常会**按需发布**，以 rcN 结尾，表示第N个候选发布版本，旨在支持用户在正式发布前进行早期测试。"
+
+#~ msgid ""
+#~ "**Post releases**: will typically be "
+#~ "released **on demand** to support to "
+#~ "address minor errors in a final "
+#~ "release. It's different from [PEP-440 "
+#~ "post release note](https://peps.python.org/pep-0440"
+#~ "/#post-releases) suggestion, it will "
+#~ "contain actual bug fixes considering "
+#~ "that the final release version should"
+#~ " be matched strictly with the vLLM"
+#~ " final release version "
+#~ "(`v[major].[minor].[micro]`). The post version "
+#~ "has to be published as a patch "
+#~ "version of the final release."
+#~ msgstr ""
+#~ "**后续版本**：通常会根据需要发布，以支持解决正式发布中的小错误。这与 [PEP-440 "
+#~ "的后续版本说明](https://peps.python.org/pep-0440/#post-releases) "
+#~ "建议不同，它将包含实际的 bug 修复，因为最终发布版本应严格与 vLLM "
+#~ "的最终发布版本（`v[major].[minor].[micro]`）匹配。后续版本必须以正式发布的补丁版本形式发布。"
+
+#~ msgid "For example:"
+#~ msgstr "例如："
+
+#~ msgid ""
+#~ "`v0.7.x`: it's the first final release"
+#~ " to match the vLLM `v0.7.x` version."
+#~ msgstr "`v0.7.x`：这是第一个与 vLLM `v0.7.x` 版本相匹配的正式发布版本。"
+
+#~ msgid "`v0.7.3rc1`: will be the first pre version of vLLM Kunlun."
+#~ msgstr "`v0.7.3rc1`：将会是 vLLM Kunlun 的第一个预发布版本。"
+
+#~ msgid ""
+#~ "`v0.7.3.post1`: will be the post release"
+#~ " if the `v0.7.3` release has some "
+#~ "minor errors."
+#~ msgstr "`v0.7.3.post1`：如果 `v0.7.3` 版本发布有一些小错误，将作为后续修正版发布。"
+
+#~ msgid "Release Compatibility Matrix"
+#~ msgstr "版本兼容性矩阵"
+
+#~ msgid "Following is the Release Compatibility Matrix for vLLM Kunlun Plugin:"
+#~ msgstr "以下是 vLLM Kunlun 插件的版本兼容性矩阵："
+
+#~ msgid "vLLM Kunlun"
+#~ msgstr "vLLM Kunlun"
+
+#~ msgid "vLLM"
+#~ msgstr "vLLM"
+
+#~ msgid "Python"
+#~ msgstr "Python"
+
+#~ msgid "Stable CANN"
+#~ msgstr "Stable CANN"
+
+#~ msgid "PyTorch/torch_npu"
+#~ msgstr "PyTorch/torch_npu"
+
+#~ msgid "MindIE Turbo"
+#~ msgstr "MindIE Turbo"
+
+#~ msgid "v0.9.2rc1"
+#~ msgstr "v0.9.2rc1"
+
+#~ msgid "v0.9.2"
+#~ msgstr "v0.9.2"
+
+#~ msgid ">= 3.9, < 3.12"
+#~ msgstr ">= 3.9，< 3.12"
+
+#~ msgid "8.1.RC1"
+#~ msgstr "8.1.RC1"
+
+#~ msgid "2.5.1 / 2.5.1.post1.dev20250619"
+#~ msgstr "2.5.1 / 2.5.1.post1.dev20250619"
+
+#~ msgid "v0.9.1rc1"
+#~ msgstr "v0.9.1rc1"
+
+#~ msgid "v0.9.1"
+#~ msgstr "v0.9.1"
+
+#~ msgid "2.5.1 / 2.5.1.post1.dev20250528"
+#~ msgstr "2.5.1 / 2.5.1.post1.dev20250528"
+
+#~ msgid "v0.9.0rc2"
+#~ msgstr "v0.9.0rc2"
+
+#~ msgid "v0.9.0"
+#~ msgstr "v0.9.0"
+
+#~ msgid "2.5.1 / 2.5.1"
+#~ msgstr "2.5.1 / 2.5.1"
+
+#~ msgid "v0.9.0rc1"
+#~ msgstr "v0.9.0rc1"
+
+#~ msgid "v0.8.5rc1"
+#~ msgstr "v0.8.5rc1"
+
+#~ msgid "v0.8.5.post1"
+#~ msgstr "v0.8.5.post1"
+
+#~ msgid "v0.8.4rc2"
+#~ msgstr "v0.8.4rc2"
+
+#~ msgid "v0.8.4"
+#~ msgstr "v0.8.4"
+
+#~ msgid "8.0.0"
+#~ msgstr "8.0.0"
+
+#~ msgid "v0.7.3.post1"
+#~ msgstr "v0.7.3.post1"
+
+#~ msgid "v0.7.3"
+#~ msgstr "v0.7.3"
+
+#~ msgid "2.0rc1"
+#~ msgstr "2.0候选版本1"
+
+#~ msgid "Release cadence"
+#~ msgstr "发布节奏"
+
+#~ msgid "release window"
+#~ msgstr "发布窗口"
+
+#~ msgid "Date"
+#~ msgstr "日期"
+
+#~ msgid "Event"
+#~ msgstr "事件"
+
+#~ msgid "2025.07.11"
+#~ msgstr "2025.07.11"
+
+#~ msgid "Release candidates, v0.9.2rc1"
+#~ msgstr "候选发布版本，v0.9.2rc1"
+
+#~ msgid "2025.06.22"
+#~ msgstr "2025.06.22"
+
+#~ msgid "Release candidates, v0.9.1rc1"
+#~ msgstr "候选发布版本，v0.9.1rc1"
+
+#~ msgid "2025.06.10"
+#~ msgstr "2025.06.10"
+
+#~ msgid "Release candidates, v0.9.0rc2"
+#~ msgstr "候选发布版本，v0.9.0rc2"
+
+#~ msgid "2025.06.09"
+#~ msgstr "2025.06.09"
+
+#~ msgid "Release candidates, v0.9.0rc1"
+#~ msgstr "候选发布版本本，v0.9.0rc1"
+
+#~ msgid "2025.05.29"
+#~ msgstr "2025.05.29"
+
+#~ msgid "v0.7.x post release, v0.7.3.post1"
+#~ msgstr "v0.7.x 补丁版，v0.7.3.post1"
+
+#~ msgid "2025.05.08"
+#~ msgstr "2025.05.08"
+
+#~ msgid "v0.7.x Final release, v0.7.3"
+#~ msgstr "v0.7.x 正式版，v0.7.3"
+
+#~ msgid "2025.05.06"
+#~ msgstr "2025.05.06"
+
+#~ msgid "Release candidates, v0.8.5rc1"
+#~ msgstr "候选发布版本，v0.8.5rc1"
+
+#~ msgid "2025.04.28"
+#~ msgstr "2025.04.28"
+
+#~ msgid "Release candidates, v0.8.4rc2"
+#~ msgstr "候选发布版本，v0.8.4rc2"
+
+#~ msgid "2025.04.18"
+#~ msgstr "2025.04.18"
+
+#~ msgid "Release candidates, v0.8.4rc1"
+#~ msgstr "候选发布版本，v0.8.4rc1"
+
+#~ msgid "2025.03.28"
+#~ msgstr "2025.03.28"
+
+#~ msgid "Release candidates, v0.7.3rc2"
+#~ msgstr "候选发布版本，v0.7.3rc2"
+
+#~ msgid "2025.03.14"
+#~ msgstr "2025.03.14"
+
+#~ msgid "Release candidates, v0.7.3rc1"
+#~ msgstr "候选发布版本，v0.7.3rc1"
+
+#~ msgid "2025.02.19"
+#~ msgstr "2025.02.19"
+
+#~ msgid "Release candidates, v0.7.1rc1"
+#~ msgstr "候选发布版本，v0.7.1rc1"
+
+#~ msgid "Branch policy"
+#~ msgstr "分支策略"
+
+#~ msgid "vLLM Kunlun has main branch and dev branch."
+#~ msgstr "vLLM Kunlun 有主分支和开发分支。"
+
+#~ msgid ""
+#~ "**main**: main branch，corresponds to the "
+#~ "vLLM main branch and latest 1 or"
+#~ " 2 release version. It is "
+#~ "continuously monitored for quality through "
+#~ "Kunlun CI."
+#~ msgstr "**main**：main 分支，对应 vLLM 的主分支和最新的 1 或 2 个发布版本。该分支通过 Kunlun CI 持续监控质量。"
+
+#~ msgid ""
+#~ "**vX.Y.Z-dev**: development branch, created "
+#~ "with part of new releases of vLLM."
+#~ " For example, `v0.7.3-dev` is the dev"
+#~ " branch for vLLM `v0.7.3` version."
+#~ msgstr ""
+#~ "**vX.Y.Z-dev**：开发分支，是随着 vLLM 新版本的一部分一起创建的。例如，`v0.7.3-dev`"
+#~ " 是 vLLM `v0.7.3` 版本的开发分支。"
+
+#~ msgid ""
+#~ "Usually, a commit should be ONLY "
+#~ "first merged in the main branch, "
+#~ "and then backported to the dev "
+#~ "branch to reduce maintenance costs as"
+#~ " much as possible."
+#~ msgstr "通常，提交应该只先合并到主分支，然后再回溯合并到开发分支，以尽可能降低维护成本。"
+
+#~ msgid "Maintenance branch and EOL:"
+#~ msgstr "维护分支与生命周期结束（EOL）："
+
+#~ msgid "The branch status will be in one of the following states:"
+#~ msgstr "分支状态将处于以下几种状态之一："
+
+#~ msgid "Branch"
+#~ msgstr "分支"
+
+#~ msgid "Time frame"
+#~ msgstr "时间范围"
+
+#~ msgid "Summary"
+#~ msgstr "摘要"
+
+#~ msgid "Maintained"
+#~ msgstr "维护中"
+
+#~ msgid "Approximately 2-3 minor versions"
+#~ msgstr "大约 2-3 个小版本"
+
+#~ msgid "All bugfixes are appropriate. Releases produced, CI commitment."
+#~ msgstr "所有的错误修复都是合适的。正常发布版本，持续集成承诺。"
+
+#~ msgid "Unmaintained"
+#~ msgstr "无人维护"
+
+#~ msgid "Community interest driven"
+#~ msgstr "社区兴趣驱动"
+
+#~ msgid "All bugfixes are appropriate. No Releases produced, No CI commitment"
+#~ msgstr "所有的 bug 修复都是合适的。没有发布版本，不承诺持续集成（CI）。"
+
+#~ msgid "End of Life (EOL)"
+#~ msgstr "生命周期结束（EOL）"
+
+#~ msgid "N/A"
+#~ msgstr "不适用"
+
+#~ msgid "Branch no longer accepting changes"
+#~ msgstr "该分支不再接受更改"
+
+#~ msgid "Branch state"
+#~ msgstr "分支状态"
+
+#~ msgid ""
+#~ "Note that vLLM Kunlun will only be"
+#~ " released for a certain vLLM release"
+#~ " version rather than all versions. "
+#~ "Hence, You might see only part of"
+#~ " versions have dev branches (such as"
+#~ " only `0.7.1-dev` / `0.7.3-dev` but "
+#~ "no `0.7.2-dev`), this is as expected."
+#~ msgstr ""
+#~ "请注意，vLLM Kunlun 只会针对某些 vLLM "
+#~ "发布版本发布，而不是所有版本。因此，您可能会看到只有部分版本拥有开发分支（例如只有 `0.7.1-dev` /"
+#~ " `0.7.3-dev`，而没有 `0.7.2-dev`），这是正常现象。"
+
+#~ msgid ""
+#~ "Usually, each minor version of vLLM "
+#~ "(such as 0.7) will correspond to a"
+#~ " vLLM Kunlun version branch and "
+#~ "support its latest version (for example,"
+#~ " we plan to support version 0.7.3)"
+#~ " as following shown:"
+#~ msgstr ""
+#~ "通常，vLLM 的每一个小版本（例如 0.7）都会对应一个 vLLM Kunlun "
+#~ "版本分支，并支持其最新版本（例如，我们计划支持 0.7.3 版），如下所示："
+
+#~ msgid "Status"
+#~ msgstr "状态"
+
+#~ msgid "Note"
+#~ msgstr "注释"
+
+#~ msgid "main"
+#~ msgstr "main"
+
+#~ msgid "CI commitment for vLLM main branch and vLLM 0.9.2 branch"
+#~ msgstr "vLLM 主分支和 vLLM 0.9.2 分支的 CI 承诺"
+
+#~ msgid "v0.9.1-dev"
+#~ msgstr "v0.9.1-dev"
+
+#~ msgid "CI commitment for vLLM 0.9.1 version"
+#~ msgstr "vLLM 0.9.1 版本的 CI 承诺"
+
+#~ msgid "v0.7.3-dev"
+#~ msgstr "v0.7.3-dev"
+
+#~ msgid "CI commitment for vLLM 0.7.3 version"
+#~ msgstr "vLLM 0.7.3 版本的 CI 承诺"
+
+#~ msgid "v0.7.1-dev"
+#~ msgstr "v0.7.1-dev"
+
+#~ msgid "Replaced by v0.7.3-dev"
+#~ msgstr "已被 v0.7.3-dev 替代"
+
+#~ msgid "Backward compatibility"
+#~ msgstr "向后兼容性"
+
+#~ msgid ""
+#~ "For main branch, vLLM Kunlun should "
+#~ "works with vLLM main branch and "
+#~ "latest 1 or 2 release version. So"
+#~ " to ensure the backward compatibility, "
+#~ "we will do the following:"
+#~ msgstr ""
+#~ "对于主分支，vLLM Kunlun 应该与 vLLM 主分支以及最新的 1"
+#~ " 或 2 个发布版本兼容。因此，为了确保向后兼容性，我们将执行以下操作："
+
+#~ msgid ""
+#~ "Both main branch and target vLLM "
+#~ "release is tested by Kunlun E2E "
+#~ "CI. For example, currently, vLLM main"
+#~ " branch and vLLM 0.8.4 are tested "
+#~ "now."
+#~ msgstr "主分支和目标 vLLM 发行版都经过了 Kunlun E2E CI 的测试。例如，目前正在测试 vLLM 主分支和 vLLM 0.8.4。"
+
+#~ msgid ""
+#~ "For code changes, we will make "
+#~ "sure that the changes are compatible "
+#~ "with the latest 1 or 2 vLLM "
+#~ "release version as well. In this "
+#~ "case, vLLM Kunlun introduced a version"
+#~ " check machinism inner the code. "
+#~ "It'll check the version of installed "
+#~ "vLLM package first to decide which "
+#~ "code logic to use. If users hit"
+#~ " the `InvalidVersion` error, it sometimes"
+#~ " means that they have installed an"
+#~ " dev/editable version of vLLM package. "
+#~ "In this case, we provide the env"
+#~ " variable `VLLM_VERSION` to let users "
+#~ "specify the version of vLLM package "
+#~ "to use."
+#~ msgstr ""
+#~ "对于代码更改，我们也会确保这些更改与最新的 1 或 2 个 vLLM "
+#~ "发行版本兼容。在这种情况下，vLLM Kunlun 在代码中引入了版本检查机制。它会先检查已安装的 "
+#~ "vLLM 包的版本，然后决定使用哪段代码逻辑。如果用户遇到 `InvalidVersion` "
+#~ "错误，这有时意味着他们安装了 dev/可编辑版本的 vLLM 包。此时，我们提供了环境变量 "
+#~ "`VLLM_VERSION`，让用户可以指定要使用的 vLLM 包版本。"
+
+#~ msgid ""
+#~ "For documentation changes, we will make"
+#~ " sure that the changes are compatible"
+#~ " with the latest 1 or 2 vLLM"
+#~ " release version as well. Note should"
+#~ " be added if there are any "
+#~ "breaking changes."
+#~ msgstr "对于文档更改，我们会确保这些更改也兼容于最新的1个或2个 vLLM 发布版本。如果有任何重大变更，应添加说明。"
+
+#~ msgid "Document Branch Policy"
+#~ msgstr "文档分支政策"
+
+#~ msgid ""
+#~ "To reduce maintenance costs, **all "
+#~ "branch documentation content should remain "
+#~ "consistent, and version differences can "
+#~ "be controlled via variables in "
+#~ "[docs/source/conf.py](https://github.com/vllm-project/vllm-"
+#~ "kunlun/blob/main/docs/source/conf.py)**. While this "
+#~ "is not a simple task, it is "
+#~ "a principle we should strive to "
+#~ "follow."
+#~ msgstr ""
+#~ "为了减少维护成本，**所有分支的文档内容应保持一致，版本差异可以通过 "
+#~ "[docs/source/conf.py](https://github.com/vllm-project/vllm-"
+#~ "kunlun/blob/main/docs/source/conf.py) "
+#~ "中的变量进行控制**。虽然这并非易事，但这是我们应当努力遵循的原则。"
+
+#~ msgid "Version"
+#~ msgstr "版本"
+
+#~ msgid "Purpose"
+#~ msgstr "用途"
+
+#~ msgid "Code Branch"
+#~ msgstr "代码分支"
+
+#~ msgid "latest"
+#~ msgstr "最新"
+
+#~ msgid "Doc for the latest dev branch"
+#~ msgstr "最新开发分支的文档"
+
+#~ msgid "vX.Y.Z-dev (Will be `main` after the first final release)"
+#~ msgstr "vX.Y.Z-dev（在第一个正式版本发布后将成为 `main`）"
+
+#~ msgid "version"
+#~ msgstr "版本"
+
+#~ msgid "Doc for historical released versions"
+#~ msgstr "历史版本文档"
+
+#~ msgid "Git tags, like vX.Y.Z[rcN]"
+#~ msgstr "Git 标签，如 vX.Y.Z[rcN]"
+
+#~ msgid "stable（not yet released）"
+#~ msgstr "稳定版（尚未发布）"
+
+#~ msgid "Doc for latest final release branch"
+#~ msgstr "最新正式发布分支的文档"
+
+#~ msgid "Will be `vX.Y.Z-dev` after the first official release"
+#~ msgstr "首个正式发布后将会是 `vX.Y.Z-dev`"
+
+#~ msgid "As shown above:"
+#~ msgstr "如上所示："
+
+#~ msgid ""
+#~ "`latest` documentation: Matches the current"
+#~ " maintenance branch `vX.Y.Z-dev` (Will be"
+#~ " `main` after the first final "
+#~ "release). Continuously updated to ensure "
+#~ "usability for the latest release."
+#~ msgstr "`latest` 文档：匹配当前维护分支 `vX.Y.Z-dev`（在首次正式发布后将为 `main`）。持续更新，以确保适用于最新发布版本。"
+
+#~ msgid ""
+#~ "`version` documentation: Corresponds to "
+#~ "specific released versions (e.g., `v0.7.3`,"
+#~ " `v0.7.3rc1`). No further updates after "
+#~ "release."
+#~ msgstr "`version` 文档：对应特定的已发布版本（例如，`v0.7.3`、`v0.7.3rc1`）。发布后不再进行更新。"
+
+#~ msgid ""
+#~ "`stable` documentation (**not yet released**):"
+#~ " Official release documentation. Updates "
+#~ "are allowed in real-time after "
+#~ "release, typically based on vX.Y.Z-dev. "
+#~ "Once stable documentation is available, "
+#~ "non-stable versions should display a "
+#~ "header warning: `You are viewing the "
+#~ "latest developer preview docs. Click "
+#~ "here to view docs for the latest"
+#~ " stable release.`."
+#~ msgstr ""
+#~ "`stable` 文档（**尚未发布**）：官方发布版文档。发布后允许实时更新，通常基于 "
+#~ "vX.Y.Z-dev。一旦稳定版文档可用，非稳定版本应显示一个顶部警告：`您正在查看最新的开发预览文档。点击此处查看最新稳定版本文档。`"
+
+#~ msgid "Software Dependency Management"
+#~ msgstr "软件依赖管理"
+
+#~ msgid ""
+#~ "`torch-xpu`: Kunlun Extension for "
+#~ "PyTorch (torch-xpu) releases a stable"
+#~ " version to [PyPi](https://pypi.org/project/torch-"
+#~ "xpu) every 3 months, a development "
+#~ "version (aka the POC version) every "
+#~ "month, and a nightly version every "
+#~ "day. The PyPi stable version **CAN** "
+#~ "be used in vLLM Kunlun final "
+#~ "version, the monthly dev version **ONLY"
+#~ " CANN** be used in vLLM Kunlun "
+#~ "RC version for rapid iteration, the "
+#~ "nightly version **CANNOT** be used in"
+#~ " vLLM Kunlun any version and "
+#~ "branches."
+#~ msgstr ""
+#~ "`torch-xpu`：Kunlun Extension for PyTorch"
+#~ "（torch-xpu）每 3 个月会在 "
+#~ "[PyPi](https://pypi.org/project/torch-xpu) "
+#~ "上发布一个稳定版本，每个月发布一个开发版本（即 POC 版本），每天发布一个 nightly "
+#~ "版本。PyPi 上的稳定版本**可以**用于 vLLM Kunlun "
+#~ "的正式版本，月度开发版本**只能**用于 vLLM Kunlun 的 "
+#~ "RC（候选发布）版本以便快速迭代，nightly 版本**不能**用于 vLLM Kunlun "
+#~ "的任何版本和分支。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/index.po
@@ -0,0 +1,177 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/contribution/index.md:1
+msgid "Contributing"
+msgstr "贡献"
+
+#: ../../source/developer_guide/contribution/index.md:3
+#, fuzzy
+msgid "Building and Testing"
+msgstr "构建与测试"
+
+#~ msgid "Index"
+#~ msgstr "索引"
+
+#~ msgid ""
+#~ "It's recommended to set up a local"
+#~ " development environment to build and "
+#~ "test before you submit a PR."
+#~ msgstr "建议先搭建本地开发环境来进行构建和测试，再提交 PR。"
+
+#~ msgid "Setup development environment"
+#~ msgstr "搭建开发环境"
+
+#~ msgid ""
+#~ "Theoretically, the vllm-kunlun build is"
+#~ " only supported on Linux because "
+#~ "`vllm-kunlun` dependency `torch_npu` only "
+#~ "supports Linux."
+#~ msgstr ""
+#~ "理论上，vllm-kunlun 构建仅支持 Linux，因为 `vllm-"
+#~ "kunlun` 的依赖项 `torch_npu` 只支持 Linux。"
+
+#~ msgid ""
+#~ "But you can still set up dev "
+#~ "env on Linux/Windows/macOS for linting "
+#~ "and basic test as following commands:"
+#~ msgstr "但你仍然可以在 Linux/Windows/macOS 上按照以下命令设置开发环境，用于代码规约检查和基本测试："
+
+#~ msgid "Run lint locally"
+#~ msgstr "在本地运行 lint"
+
+#~ msgid "Run CI locally"
+#~ msgstr "本地运行CI"
+
+#~ msgid "After complete \"Run lint\" setup, you can run CI locally:"
+#~ msgstr "在完成“运行 lint”设置后，你可以在本地运行 CI："
+
+#~ msgid "Submit the commit"
+#~ msgstr "提交该提交"
+
+#~ msgid ""
+#~ "🎉 Congratulations! You have completed "
+#~ "the development environment setup."
+#~ msgstr "🎉 恭喜！你已经完成了开发环境的搭建。"
+
+#~ msgid "Test locally"
+#~ msgstr "本地测试"
+
+#~ msgid ""
+#~ "You can refer to [Testing](./testing.md) "
+#~ "doc to help you setup testing "
+#~ "environment and running tests locally."
+#~ msgstr "你可以参考 [测试](./testing.md) 文档，帮助你搭建测试环境并在本地运行测试。"
+
+#~ msgid "DCO and Signed-off-by"
+#~ msgstr "DCO 和签名确认"
+
+#~ msgid ""
+#~ "When contributing changes to this "
+#~ "project, you must agree to the "
+#~ "DCO. Commits must include a `Signed-"
+#~ "off-by:` header which certifies "
+#~ "agreement with the terms of the "
+#~ "DCO."
+#~ msgstr "当为本项目贡献更改时，您必须同意 DCO。提交必须包含 `Signed-off-by:` 头部，以证明您同意 DCO 的条款。"
+
+#~ msgid "Using `-s` with `git commit` will automatically add this header."
+#~ msgstr "在使用 `git commit` 时加上 `-s` 参数会自动添加这个头部信息。"
+
+#~ msgid "PR Title and Classification"
+#~ msgstr "PR 标题与分类"
+
+#~ msgid ""
+#~ "Only specific types of PRs will be"
+#~ " reviewed. The PR title is prefixed"
+#~ " appropriately to indicate the type "
+#~ "of change. Please use one of the"
+#~ " following:"
+#~ msgstr "只有特定类型的 PR 会被审核。PR 标题应使用合适的前缀以指明更改类型。请使用以下之一："
+
+#~ msgid "`[Attention]` for new features or optimization in attention."
+#~ msgstr "`[Attention]` 用于注意力机制中新特性或优化。"
+
+#~ msgid "`[Communicator]` for new features or optimization in communicators."
+#~ msgstr "`[Communicator]` 适用于通信器中的新特性或优化。"
+
+#~ msgid "`[ModelRunner]` for new features or optimization in model runner."
+#~ msgstr "`[ModelRunner]` 用于模型运行器中的新功能或优化。"
+
+#~ msgid "`[Platform]` for new features or optimization in platform."
+#~ msgstr "`[Platform]` 用于平台中新功能或优化。"
+
+#~ msgid "`[Worker]` for new features or optimization in worker."
+#~ msgstr "`[Worker]` 用于 worker 的新功能或优化。"
+
+#~ msgid ""
+#~ "`[Core]` for new features or "
+#~ "optimization  in the core vllm-kunlun"
+#~ " logic (such as platform, attention, "
+#~ "communicators, model runner)"
+#~ msgstr "`[Core]` 用于核心 vllm-kunlun 逻辑中的新特性或优化（例如平台、注意力机制、通信器、模型运行器）。"
+
+#~ msgid "`[Kernel]` changes affecting compute kernels and ops."
+#~ msgstr "`[Kernel]` 影响计算内核和操作的更改。"
+
+#~ msgid "`[Bugfix]` for bug fixes."
+#~ msgstr "`[Bugfix]` 用于表示错误修复。"
+
+#~ msgid "`[Doc]` for documentation fixes and improvements."
+#~ msgstr "`[Doc]` 用于文档修复和改进。"
+
+#~ msgid "`[Test]` for tests (such as unit tests)."
+#~ msgstr "`[Test]` 用于测试（如单元测试）。"
+
+#~ msgid "`[CI]` for build or continuous integration improvements."
+#~ msgstr "`[CI]` 用于构建或持续集成的改进。"
+
+#~ msgid ""
+#~ "`[Misc]` for PRs that do not fit"
+#~ " the above categories. Please use "
+#~ "this sparingly."
+#~ msgstr "对于不属于上述类别的 PR，请使用 `[Misc]`。请谨慎使用此标签。"
+
+#~ msgid ""
+#~ "If the PR spans more than one "
+#~ "category, please include all relevant "
+#~ "prefixes."
+#~ msgstr "如果拉取请求（PR）涵盖多个类别，请包含所有相关的前缀。"
+
+#~ msgid "Others"
+#~ msgstr "其他"
+
+#~ msgid ""
+#~ "You may find more information about "
+#~ "contributing to vLLM Kunlun backend "
+#~ "plugin on "
+#~ "[<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing/overview.html)."
+#~ " If you find any problem when "
+#~ "contributing, you can feel free to "
+#~ "submit a PR to improve the doc "
+#~ "to help other developers."
+#~ msgstr ""
+#~ "你可以在 "
+#~ "[<u>docs.vllm.ai</u>](https://docs.vllm.ai/en/latest/contributing/overview.html)"
+#~ " 上找到有关为 vLLM Kunlun "
+#~ "后端插件做贡献的更多信息。如果你在贡献过程中遇到任何问题，欢迎随时提交 PR 来改进文档，以帮助其他开发者。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/multi_node_test.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/multi_node_test.po
@@ -0,0 +1,133 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:1
+msgid "Multi Node Test"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:3
+msgid ""
+"Multi-Node CI is designed to test distributed scenarios of very large "
+"models, eg: disaggregated_prefill multi DP across multi nodes and so on."
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:5
+msgid "How is works"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:7
+msgid ""
+"The following picture shows the basic deployment view of the multi-node "
+"CI mechanism, It shows how the github action interact with "
+"[lws](https://lws.sigs.k8s.io/docs/overview/) (a kind of kubernetes crd "
+"resource)"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:9
+msgid "![alt text](../../assets/deployment.png)"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:9
+#: ../../source/developer_guide/contribution/multi_node_test.md:13
+msgid "alt text"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:11
+msgid ""
+"From the workflow perspective, we can see how the final test script is "
+"executed, The key point is that these two [lws.yaml and "
+"run.sh](https://github.com/vllm-project/vllm-"
+"kunlun/tree/main/tests/e2e/nightly/multi_node/scripts), The former "
+"defines how our k8s cluster is pulled up, and the latter defines the "
+"entry script when the pod is started, Each node executes different logic "
+"according to the "
+"[LWS_WORKER_INDEX](https://lws.sigs.k8s.io/docs/reference/labels-"
+"annotations-and-environment-variables/) environment variable, so that "
+"multiple nodes can form a distributed cluster to perform tasks."
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:13
+msgid "![alt text](../../assets/workflow.png)"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:15
+msgid "How to contribute"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:17
+msgid "Upload custom weights"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:19
+msgid ""
+"If you need customized weights, for example, you quantized a w8a8 weight "
+"for DeepSeek-V3 and you want your weight to run on CI, Uploading weights "
+"to ModelScope's [vllm-kunlun](https://www.modelscope.cn/organization"
+"/vllm-kunlun) organization is welcome, If you do not have permission to "
+"upload, please contact @Potabk"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:21
+msgid "Add config yaml"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:23
+msgid ""
+"As the entrypoint script [run.sh](https://github.com/vllm-project/vllm-"
+"kunlun/blob/0bf3f21a987aede366ec4629ad0ffec8e32fe90d/tests/e2e/nightly/multi_node/scripts/run.sh#L106)"
+" shows, A k8s pod startup means traversing all *.yaml files in the "
+"[directory](https://github.com/vllm-project/vllm-"
+"kunlun/tree/main/tests/e2e/nightly/multi_node/config/models), reading and"
+" executing according to different configurations, so what we need to do "
+"is just add \"yamls\" like [DeepSeek-V3.yaml](https://github.com/vllm-"
+"project/vllm-"
+"kunlun/blob/main/tests/e2e/nightly/multi_node/config/models/DeepSeek-V3.yaml)."
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:25
+msgid ""
+"Suppose you have **2 nodes** running a 1P1D setup (1 Prefillers + 1 "
+"Decoder):"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:27
+msgid "you may add a config file looks like:"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:69
+msgid ""
+"Add the case to nightly workflow currently, the multi-node test workflow "
+"defined in the [vllm_kunlun_test_nightly_a2/a3.yaml](https://github.com"
+"/vllm-project/vllm-"
+"kunlun/blob/main/.github/workflows/vllm_kunlun_test_nightly_a3.yaml)"
+msgstr ""
+
+#: ../../source/developer_guide/contribution/multi_node_test.md:99
+msgid ""
+"The matrix above defines all the parameters required to add a multi-"
+"machine use case, The parameters worth paying attention to (I mean if you"
+" are adding a new use case) are size and the path to the yaml "
+"configuration file. The former defines the number of nodes required for "
+"your use case, and the latter defines the path to the configuration file "
+"you have completed in step 2."
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/testing.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/contribution/testing.po
@@ -0,0 +1,265 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/contribution/testing.md:1
+msgid "Testing"
+msgstr "测试"
+
+#: ../../source/developer_guide/contribution/testing.md:3
+#, fuzzy
+msgid ""
+"This document explains how to write E2E tests and unit tests to verify "
+"the implementation of your feature."
+msgstr "本节介绍如何编写端到端测试和单元测试，以验证你的功能实现。"
+
+#: ../../source/developer_guide/contribution/testing.md:5
+#, fuzzy
+msgid "Setup a test environment"
+msgstr "设置测试环境"
+
+#: ../../source/developer_guide/contribution/testing.md:7
+#, fuzzy
+msgid ""
+"The fastest way to setup a test environment is to use the main branch's "
+"container image:"
+msgstr "搭建测试环境最快的方法是使用 main 分支的容器镜像："
+
+#: ../../source/developer_guide/contribution/testing.md
+msgid "Local (CPU)"
+msgstr "本地（CPU）"
+
+#: ../../source/developer_guide/contribution/testing.md:18
+#, fuzzy
+msgid "You can run the unit tests on CPUs with the following steps:"
+msgstr "你可以按照以下步骤在 CPU 上运行单元测试："
+
+#: ../../source/developer_guide/contribution/testing.md
+msgid "Single card"
+msgstr "单张卡片"
+
+#: ../../source/developer_guide/contribution/testing.md:86
+#: ../../source/developer_guide/contribution/testing.md:125
+msgid "After starting the container, you should install the required packages:"
+msgstr "启动容器后，你应该安装所需的软件包："
+
+#: ../../source/developer_guide/contribution/testing.md
+msgid "Multi cards"
+msgstr "多卡"
+
+#: ../../source/developer_guide/contribution/testing.md:139
+msgid "Running tests"
+msgstr "运行测试"
+
+#: ../../source/developer_guide/contribution/testing.md:141
+#, fuzzy
+msgid "Unit tests"
+msgstr "单元测试"
+
+#: ../../source/developer_guide/contribution/testing.md:143
+msgid "There are several principles to follow when writing unit tests:"
+msgstr "编写单元测试时需要遵循几个原则："
+
+#: ../../source/developer_guide/contribution/testing.md:145
+#, fuzzy
+msgid ""
+"The test file path should be consistent with the source file and start "
+"with the `test_` prefix, such as: `vllm_kunlun/worker/worker_v1.py` --> "
+"`tests/ut/worker/test_worker_v1.py`"
+msgstr ""
+"测试文件的路径应与源文件保持一致，并以 `test_` 前缀开头，例如：`vllm_kunlun/worker/worker_v1.py` -->"
+" `tests/ut/worker/test_worker_v1.py`"
+
+#: ../../source/developer_guide/contribution/testing.md:146
+#, fuzzy
+msgid ""
+"The vLLM Kunlun test uses unittest framework. See "
+"[here](https://docs.python.org/3/library/unittest.html#module-unittest) "
+"to understand how to write unit tests."
+msgstr ""
+"vLLM Kunlun 测试使用 unittest "
+"框架，参见[这里](https://docs.python.org/3/library/unittest.html#module-"
+"unittest)了解如何编写单元测试。"
+
+#: ../../source/developer_guide/contribution/testing.md:147
+#, fuzzy
+msgid ""
+"All unit tests can be run on CPUs, so you must mock the device-related "
+"function to host."
+msgstr "所有单元测试都可以在 CPU 上运行，因此你必须将与设备相关的函数模拟为 host。"
+
+#: ../../source/developer_guide/contribution/testing.md:148
+msgid ""
+"Example: [tests/ut/test_kunlun_config.py](https://github.com/vllm-project"
+"/vllm-kunlun/blob/main/tests/ut/test_kunlun_config.py)."
+msgstr ""
+"示例：[tests/ut/test_kunlun_config.py](https://github.com/vllm-project/vllm-"
+"kunlun/blob/main/tests/ut/test_kunlun_config.py)。"
+
+#: ../../source/developer_guide/contribution/testing.md:149
+msgid "You can run the unit tests using `pytest`:"
+msgstr "你可以使用 `pytest` 运行单元测试："
+
+#: ../../source/developer_guide/contribution/testing.md
+#, fuzzy
+msgid "Single-card"
+msgstr "单张卡片"
+
+#: ../../source/developer_guide/contribution/testing.md
+#, fuzzy
+msgid "Multi-card"
+msgstr "多卡"
+
+#: ../../source/developer_guide/contribution/testing.md:196
+msgid "E2E test"
+msgstr "端到端测试"
+
+#: ../../source/developer_guide/contribution/testing.md:198
+#, fuzzy
+msgid ""
+"Although vllm-kunlun CI provides the [E2E test](https://github.com/vllm-"
+"project/vllm-kunlun/blob/main/.github/workflows/vllm_kunlun_test.yaml) on"
+" Kunlun CI, you can run it locally."
+msgstr ""
+"虽然 vllm-kunlun CI 在 Kunlun CI 上提供了 [端到端测试](https://github.com/vllm-"
+"project/vllm-"
+"kunlun/blob/main/.github/workflows/vllm_kunlun_test.yaml)，你也可以在本地运行它。"
+
+#: ../../source/developer_guide/contribution/testing.md:208
+#, fuzzy
+msgid "You can't run the E2E test on CPUs."
+msgstr "你无法在 CPU 上运行 e2e 测试。"
+
+#: ../../source/developer_guide/contribution/testing.md:247
+#, fuzzy
+msgid ""
+"This will reproduce the E2E test. See "
+"[vllm_kunlun_test.yaml](https://github.com/vllm-project/vllm-"
+"kunlun/blob/main/.github/workflows/vllm_kunlun_test.yaml)."
+msgstr ""
+"这将复现端到端测试：[vllm_kunlun_test.yaml](https://github.com/vllm-project/vllm-"
+"kunlun/blob/main/.github/workflows/vllm_kunlun_test.yaml)。"
+
+#: ../../source/developer_guide/contribution/testing.md:249
+msgid "E2E test example:"
+msgstr "E2E 测试示例："
+
+#: ../../source/developer_guide/contribution/testing.md:251
+msgid ""
+"Offline test example: "
+"[`tests/e2e/singlecard/test_offline_inference.py`](https://github.com"
+"/vllm-project/vllm-"
+"kunlun/blob/main/tests/e2e/singlecard/test_offline_inference.py)"
+msgstr ""
+"离线测试示例：[`tests/e2e/singlecard/test_offline_inference.py`](https://github.com"
+"/vllm-project/vllm-"
+"kunlun/blob/main/tests/e2e/singlecard/test_offline_inference.py)"
+
+#: ../../source/developer_guide/contribution/testing.md:252
+msgid ""
+"Online test examples: "
+"[`tests/e2e/singlecard/test_prompt_embedding.py`](https://github.com"
+"/vllm-project/vllm-"
+"kunlun/blob/main/tests/e2e/singlecard/test_prompt_embedding.py)"
+msgstr ""
+"在线测试示例：[`tests/e2e/singlecard/test_prompt_embedding.py`](https://github.com"
+"/vllm-project/vllm-"
+"kunlun/blob/main/tests/e2e/singlecard/test_prompt_embedding.py)"
+
+#: ../../source/developer_guide/contribution/testing.md:253
+msgid ""
+"Correctness test example: "
+"[`tests/e2e/singlecard/test_aclgraph.py`](https://github.com/vllm-project"
+"/vllm-kunlun/blob/main/tests/e2e/singlecard/test_aclgraph.py)"
+msgstr ""
+"正确性测试示例：[`tests/e2e/singlecard/test_aclgraph.py`](https://github.com"
+"/vllm-project/vllm-"
+"kunlun/blob/main/tests/e2e/singlecard/test_aclgraph.py)"
+
+#: ../../source/developer_guide/contribution/testing.md:254
+msgid ""
+"Reduced Layer model test example: [test_torchair_graph_mode.py - "
+"DeepSeek-V3-Pruning](https://github.com/vllm-project/vllm-"
+"kunlun/blob/20767a043cccb3764214930d4695e53941de87ec/tests/e2e/multicard/test_torchair_graph_mode.py#L48)"
+msgstr ""
+"简化层模型测试示例：[test_torchair_graph_mode.py - "
+"DeepSeek-V3-Pruning](https://github.com/vllm-project/vllm-"
+"kunlun/blob/20767a043cccb3764214930d4695e53941de87ec/tests/e2e/multicard/test_torchair_graph_mode.py#L48)"
+
+#: ../../source/developer_guide/contribution/testing.md:256
+#, fuzzy
+msgid ""
+"The CI resource is limited, and you might need to reduce the number of "
+"layers of a model. Below is an example of how to generate a reduced layer"
+" model:"
+msgstr "CI 资源有限，您可能需要减少模型的层数，下面是一个生成减少层数模型的示例："
+
+#: ../../source/developer_guide/contribution/testing.md:257
+#, fuzzy
+msgid ""
+"Fork the original model repo in modelscope. All the files in the repo "
+"except for weights are required."
+msgstr "在 modelscope 中 fork 原始模型仓库，我们需要仓库中的所有文件，除了权重文件。"
+
+#: ../../source/developer_guide/contribution/testing.md:258
+#, python-brace-format
+msgid ""
+"Set `num_hidden_layers` to the expected number of layers, e.g., "
+"`{\"num_hidden_layers\": 2,}`"
+msgstr "将 `num_hidden_layers` 设置为期望的层数，例如 `{\"num_hidden_layers\": 2,}`"
+
+#: ../../source/developer_guide/contribution/testing.md:259
+msgid ""
+"Copy the following python script as `generate_random_weight.py`. Set the "
+"relevant parameters `MODEL_LOCAL_PATH`, `DIST_DTYPE` and "
+"`DIST_MODEL_PATH` as needed:"
+msgstr ""
+"将以下 Python 脚本复制为 `generate_random_weight.py`。根据需要设置相关参数 "
+"`MODEL_LOCAL_PATH`、`DIST_DTYPE` 和 `DIST_MODEL_PATH`："
+
+#: ../../source/developer_guide/contribution/testing.md:277
+msgid "Run doctest"
+msgstr "运行 doctest"
+
+#: ../../source/developer_guide/contribution/testing.md:279
+#, fuzzy
+msgid ""
+"vllm-kunlun provides a `vllm-kunlun/tests/e2e/run_doctests.sh` command to"
+" run all doctests in the doc files. The doctest is a good way to make "
+"sure docs stay current and examples remain executable, which can be run "
+"locally as follows:"
+msgstr ""
+"vllm-kunlun 提供了一个 `vllm-kunlun/tests/e2e/run_doctests.sh` 命令，用于运行文档文件中的所有"
+" doctest。doctest 是确保文档保持最新且示例可执行的好方法，你可以按照以下方式在本地运行它："
+
+#: ../../source/developer_guide/contribution/testing.md:287
+#, fuzzy
+msgid ""
+"This will reproduce the same environment as the CI. See "
+"[vllm_kunlun_doctest.yaml](https://github.com/vllm-project/vllm-"
+"kunlun/blob/main/.github/workflows/vllm_kunlun_doctest.yaml)."
+msgstr ""
+"这将复现与 CI 相同的环境：[vllm_kunlun_doctest.yaml](https://github.com/vllm-project"
+"/vllm-kunlun/blob/main/.github/workflows/vllm_kunlun_doctest.yaml)。"
+
+#~ msgid "Multi cards test"
+#~ msgstr "多卡测试"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/accuracy_report/DeepSeek-V2-Lite.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/accuracy_report/DeepSeek-V2-Lite.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/evaluation/accuracy_report/DeepSeek-V2-Lite.md:1
+msgid "deepseek-ai/DeepSeek-V2-Lite"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/evaluation/accuracy_report/Qwen2.5-VL-7B-Instruct.md:1
+msgid "Qwen/Qwen2.5-VL-7B-Instruct"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/evaluation/accuracy_report/Qwen3-30B-A3B.md:1
+msgid "Qwen/Qwen3-30B-A3B"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/evaluation/accuracy_report/Qwen3-8B-Base.md:1
+msgid "Qwen/Qwen3-8B-Base"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/accuracy_report/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/accuracy_report/index.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../developer_guide/evaluation/accuracy_report/index.md:1
+#: ../../developer_guide/evaluation/accuracy_report/index.md:3
+msgid "Accuracy Report"
+msgstr "准确性报告"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/index.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../developer_guide/evaluation/index.md:1
+#: ../../developer_guide/evaluation/index.md:3
+msgid "Accuracy"
+msgstr "准确性"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_ais_bench.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_ais_bench.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/evaluation/using_ais_bench.md:1
+msgid "Using AISBench"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_evalscope.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_evalscope.po
@@ -0,0 +1,100 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/evaluation/using_evalscope.md:1
+msgid "Using EvalScope"
+msgstr "使用 EvalScope"
+
+#~ msgid ""
+#~ "This document will guide you have "
+#~ "model inference stress testing and "
+#~ "accuracy testing using "
+#~ "[EvalScope](https://github.com/modelscope/evalscope)."
+#~ msgstr ""
+#~ "本文档将指导您如何使用 [EvalScope](https://github.com/modelscope/evalscope)"
+#~ " 进行模型推理压力测试和精度测试。"
+
+#~ msgid "1. Online serving"
+#~ msgstr "1. 在线服务"
+
+#~ msgid "You can run docker container to start the vLLM server on a single XPU:"
+#~ msgstr "你可以运行 docker 容器，在单个 XPU 上启动 vLLM 服务器："
+
+#~ msgid "If your service start successfully, you can see the info shown below:"
+#~ msgstr "如果你的服务启动成功，你会看到如下所示的信息："
+
+#~ msgid ""
+#~ "Once your server is started, you "
+#~ "can query the model with input "
+#~ "prompts in new terminal:"
+#~ msgstr "一旦你的服务器启动后，你可以在新的终端中用输入提示词查询模型："
+
+#~ msgid "2. Install EvalScope using pip"
+#~ msgstr "2. 使用 pip 安装 EvalScope"
+
+#~ msgid "You can install EvalScope by using:"
+#~ msgstr "你可以使用以下方式安装 EvalScope："
+
+#~ msgid "3. Run gsm8k accuracy test using EvalScope"
+#~ msgstr "3. 使用 EvalScope 运行 gsm8k 准确率测试"
+
+#~ msgid "You can `evalscope eval` run gsm8k accuracy test:"
+#~ msgstr "你可以使用 `evalscope eval` 运行 gsm8k 准确率测试："
+
+#~ msgid "After 1-2 mins, the output is as shown below:"
+#~ msgstr "1-2 分钟后，输出如下所示："
+
+#~ msgid ""
+#~ "See more detail in: [EvalScope doc "
+#~ "- Model API Service "
+#~ "Evaluation](https://evalscope.readthedocs.io/en/latest/get_started/basic_usage.html"
+#~ "#model-api-service-evaluation)."
+#~ msgstr ""
+#~ "更多详情请见：[EvalScope 文档 - 模型 API "
+#~ "服务评测](https://evalscope.readthedocs.io/en/latest/get_started/basic_usage.html"
+#~ "#model-api-service-evaluation)。"
+
+#~ msgid "4. Run model inference stress testing using EvalScope"
+#~ msgstr "4. 使用 EvalScope 运行模型推理压力测试"
+
+#~ msgid "Install EvalScope[perf] using pip"
+#~ msgstr "使用 pip 安装 EvalScope[perf]"
+
+#~ msgid "Basic usage"
+#~ msgstr "基本用法"
+
+#~ msgid "You can use `evalscope perf` run perf test:"
+#~ msgstr "你可以使用 `evalscope perf` 运行性能测试："
+
+#~ msgid "Output results"
+#~ msgstr "输出结果"
+
+#~ msgid ""
+#~ "See more detail in: [EvalScope doc "
+#~ "- Model Inference Stress "
+#~ "Testing](https://evalscope.readthedocs.io/en/latest/user_guides/stress_test/quick_start.html"
+#~ "#basic-usage)."
+#~ msgstr ""
+#~ "更多详情见：[EvalScope 文档 - "
+#~ "模型推理压力测试](https://evalscope.readthedocs.io/en/latest/user_guides/stress_test/quick_start.html"
+#~ "#basic-usage)。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_lm_eval.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_lm_eval.po
@@ -0,0 +1,62 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/evaluation/using_lm_eval.md:1
+msgid "Using lm-eval"
+msgstr "使用 lm-eval"
+
+#~ msgid ""
+#~ "This document will guide you have "
+#~ "a accuracy testing using [lm-"
+#~ "eval](https://github.com/EleutherAI/lm-evaluation-"
+#~ "harness)."
+#~ msgstr ""
+#~ "本文将指导你如何使用 [lm-eval](https://github.com/EleutherAI/lm-"
+#~ "evaluation-harness) 进行准确率测试。"
+
+#~ msgid "1. Run docker container"
+#~ msgstr "1. 运行 docker 容器"
+
+#~ msgid "You can run docker container on a single XPU:"
+#~ msgstr "你可以在单个XPU上运行docker容器："
+
+#~ msgid "2. Run ceval accuracy test using lm-eval"
+#~ msgstr "2. 使用 lm-eval 运行 ceval 准确性测试"
+
+#~ msgid "Install lm-eval in the container."
+#~ msgstr "在容器中安装 lm-eval。"
+
+#~ msgid "Run the following command:"
+#~ msgstr "运行以下命令："
+
+#~ msgid "After 1-2 mins, the output is as shown below:"
+#~ msgstr "1-2 分钟后，输出如下所示："
+
+#~ msgid ""
+#~ "You can see more usage on [Lm-"
+#~ "eval Docs](https://github.com/EleutherAI/lm-evaluation-"
+#~ "harness/blob/main/docs/README.md)."
+#~ msgstr ""
+#~ "你可以在 [Lm-eval 文档](https://github.com/EleutherAI"
+#~ "/lm-evaluation-harness/blob/main/docs/README.md) "
+#~ "上查看更多用法。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_opencompass.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/evaluation/using_opencompass.po
@@ -0,0 +1,77 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/evaluation/using_opencompass.md:1
+msgid "Using OpenCompass"
+msgstr "使用 OpenCompass"
+
+#~ msgid ""
+#~ "This document will guide you have "
+#~ "a accuracy testing using "
+#~ "[OpenCompass](https://github.com/open-compass/opencompass)."
+#~ msgstr ""
+#~ "本文档将指导你如何使用 [OpenCompass](https://github.com/open-"
+#~ "compass/opencompass) 进行准确率测试。"
+
+#~ msgid "1. Online Serving"
+#~ msgstr "1. 在线服务"
+
+#~ msgid "You can run docker container to start the vLLM server on a single XPU:"
+#~ msgstr "你可以运行 docker 容器，在单个 XPU 上启动 vLLM 服务器："
+
+#~ msgid "If your service start successfully, you can see the info shown below:"
+#~ msgstr "如果你的服务启动成功，你会看到如下所示的信息："
+
+#~ msgid ""
+#~ "Once your server is started, you "
+#~ "can query the model with input "
+#~ "prompts in new terminal:"
+#~ msgstr "一旦你的服务器启动后，你可以在新的终端中用输入提示词查询模型："
+
+#~ msgid "2. Run ceval accuracy test using OpenCompass"
+#~ msgstr "2. 使用 OpenCompass 运行 ceval 准确率测试"
+
+#~ msgid ""
+#~ "Install OpenCompass and configure the "
+#~ "environment variables in the container."
+#~ msgstr "在容器中安装 OpenCompass 并配置环境变量。"
+
+#~ msgid ""
+#~ "Add `opencompass/configs/eval_vllm_kunlun_demo.py` with"
+#~ " the following content:"
+#~ msgstr "添加 `opencompass/configs/eval_vllm_kunlun_demo.py`，内容如下："
+
+#~ msgid "Run the following command:"
+#~ msgstr "运行以下命令："
+
+#~ msgid "After 1-2 mins, the output is as shown below:"
+#~ msgstr "1-2 分钟后，输出如下所示："
+
+#~ msgid ""
+#~ "You can see more usage on "
+#~ "[OpenCompass "
+#~ "Docs](https://opencompass.readthedocs.io/en/latest/index.html)."
+#~ msgstr ""
+#~ "你可以在 [OpenCompass "
+#~ "文档](https://opencompass.readthedocs.io/en/latest/index.html) "
+#~ "查看更多用法。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/ACL_Graph.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/ACL_Graph.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/feature_guide/ACL_Graph.md:1
+msgid "Graph"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/KV_Cache_Pool_Guide.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/KV_Cache_Pool_Guide.po
@@ -0,0 +1,30 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md:1
+msgid "KV Cache Pool"
+msgstr ""
+
+#: ../../source/developer_guide/feature_guide/KV_Cache_Pool_Guide.md:3
+msgid "Why KV Cache Pool?"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/ModelRunner_prepare_inputs.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/ModelRunner_prepare_inputs.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/feature_guide/ModelRunner_prepare_inputs.md:1
+msgid "Prepare inputs for model forwarding"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/Multi_Token_Prediction.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/Multi_Token_Prediction.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/feature_guide/Multi_Token_Prediction.md:1
+msgid "Multi Token Prediction (MTP)"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/eplb_swift_balancer.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/eplb_swift_balancer.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/feature_guide/eplb_swift_balancer.md:1
+msgid "Expert Parallelism Load Balancer (EPLB)"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/index.po
@@ -0,0 +1,33 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../developer_guide/feature_guide/index.md:1
+#: ../../developer_guide/feature_guide/index.md:5
+msgid "Feature Guide"
+msgstr "功能指南"
+
+#: ../../developer_guide/feature_guide/index.md:3
+msgid ""
+"This section provides an overview of the features implemented in vLLM "
+"Kunlun. Developers can refer to this guide to understand how vLLM Kunlun "
+"works."
+msgstr "本节概述了 vLLM Kunlun 中实现的功能。开发者可以参考本指南以了解 vLLM Kunlun 的工作原理。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/patch.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/feature_guide/patch.po
@@ -0,0 +1,288 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/feature_guide/patch.md:1
+#, fuzzy
+msgid "Patch in vLLM"
+msgstr "在 vLLM Kunlun 中的补丁"
+
+#~ msgid ""
+#~ "vLLM Kunlun is a platform plugin "
+#~ "for vLLM. Due to the release cycle"
+#~ " of vLLM and vLLM Kunlun is "
+#~ "different, and the hardware limitation "
+#~ "in some case, we need to patch "
+#~ "some code in vLLM to make it "
+#~ "compatible with vLLM Kunlun."
+#~ msgstr ""
+#~ "vLLM Kunlun 是 vLLM 的一个平台插件。由于 vLLM "
+#~ "和 vLLM Kunlun 的发布周期不同，并且在某些情况下存在硬件限制，我们需要对 "
+#~ "vLLM 进行一些代码补丁，以使其能够兼容 vLLM Kunlun。"
+
+#~ msgid ""
+#~ "In vLLM Kunlun code, we provide a"
+#~ " patch module `vllm_kunlun/patch` to "
+#~ "address the change for vLLM."
+#~ msgstr "在 vLLM Kunlun 代码中，我们提供了一个补丁模块 `vllm_kunlun/patch` 用于应对 vLLM 的变更。"
+
+#~ msgid "Principle"
+#~ msgstr "原理"
+
+#~ msgid ""
+#~ "We should keep in mind that Patch"
+#~ " is not the best way to make"
+#~ " vLLM Kunlun compatible. It's just a"
+#~ " temporary solution. The best way is"
+#~ " to contribute the change to vLLM "
+#~ "to make it compatible with vLLM "
+#~ "Kunlun originally. In vLLM Kunlun, we"
+#~ " have the basic principle for Patch"
+#~ " strategy:"
+#~ msgstr ""
+#~ "我们需要记住，Patch 不是让 vLLM 兼容 Kunlun "
+#~ "的最佳方式，这只是一个临时的解决方案。最好的方法是将修改贡献到 vLLM 项目中，从而让 vLLM"
+#~ " 原生支持 Kunlun。对于 vLLM Kunlun，我们对 Patch "
+#~ "策略有一个基本原则："
+
+#~ msgid "Less is more. Please do not patch unless it's the only way currently."
+#~ msgstr "少即是多。请不要打补丁，除非这是目前唯一的方法。"
+
+#~ msgid ""
+#~ "Once a patch is added, it's "
+#~ "required to describe the future plan "
+#~ "for removing the patch."
+#~ msgstr "一旦补丁被添加，必须说明将来移除该补丁的计划。"
+
+#~ msgid "Anytime, clean the patch code is welcome."
+#~ msgstr "任何时候，欢迎清理补丁代码。"
+
+#~ msgid "How it works"
+#~ msgstr "工作原理"
+
+#~ msgid "In `vllm_kunlun/patch`, you can see the code structure as follows:"
+#~ msgstr "在 `vllm_kunlun/patch` 目录中，你可以看到如下代码结构："
+
+#~ msgid ""
+#~ "**platform**: The patch code in this "
+#~ "directory is for patching the code "
+#~ "in vLLM main process. It's called "
+#~ "by `vllm_kunlun/platform::XPUPlatform::pre_register_and_update`"
+#~ " very early when vLLM is initialized."
+#~ msgstr ""
+#~ "**platform**：此目录下的补丁代码用于修补 vLLM 主进程中的代码。当 vLLM "
+#~ "初始化时，会在很早的阶段由 "
+#~ "`vllm_kunlun/platform::XPUPlatform::pre_register_and_update` 调用。"
+
+#~ msgid ""
+#~ "For online mode, vLLM process calls "
+#~ "the platform patch here "
+#~ "`vllm/vllm/engine/arg_utils.py::AsyncEngineArgs.add_cli_args` "
+#~ "when parsing the cli args."
+#~ msgstr ""
+#~ "对于在线模式，vLLM 进程在解析命令行参数时，会在 "
+#~ "`vllm/vllm/engine/arg_utils.py::AsyncEngineArgs.add_cli_args` "
+#~ "这里调用平台补丁。"
+
+#~ msgid ""
+#~ "For offline mode, vLLM process calls "
+#~ "the platform patch here "
+#~ "`vllm/vllm/engine/arg_utils.py::EngineArgs.create_engine_config` "
+#~ "when parsing the input parameters."
+#~ msgstr ""
+#~ "对于离线模式，vLLM 进程在解析输入参数时，会在此处调用平台补丁 "
+#~ "`vllm/vllm/engine/arg_utils.py::EngineArgs.create_engine_config`。"
+
+#~ msgid ""
+#~ "**worker**: The patch code in this "
+#~ "directory is for patching the code "
+#~ "in vLLM worker process. It's called "
+#~ "by `vllm_kunlun/worker/worker_v1::XPUWorker::__init__` "
+#~ "when the vLLM worker process is "
+#~ "initialized."
+#~ msgstr ""
+#~ "**worker**：此目录中的补丁代码用于修补 vLLM worker 进程中的代码。在初始化 "
+#~ "vLLM worker 进程时，会被 "
+#~ "`vllm_kunlun/worker/worker_v1::XPUWorker::__init__` 调用。"
+
+#~ msgid ""
+#~ "For both online and offline mode, "
+#~ "vLLM engine core process calls the "
+#~ "worker patch here "
+#~ "`vllm/vllm/worker/worker_base.py::WorkerWrapperBase.init_worker` "
+#~ "when initializing the worker process."
+#~ msgstr ""
+#~ "无论是在线还是离线模式，vLLM 引擎核心进程在初始化 worker 进程时，都会在这里调用 "
+#~ "worker "
+#~ "补丁：`vllm/vllm/worker/worker_base.py::WorkerWrapperBase.init_worker`。"
+
+#~ msgid ""
+#~ "In both **platform** and **worker** "
+#~ "folder, there are several patch modules."
+#~ " They are used for patching different"
+#~ " version of vLLM."
+#~ msgstr "在 **platform** 和 **worker** 文件夹中都有一些补丁模块。它们用于修补不同版本的 vLLM。"
+
+#~ msgid ""
+#~ "`patch_0_9_2`: This module is used for"
+#~ " patching vLLM 0.9.2. The version is"
+#~ " always the nearest version of vLLM."
+#~ " Once vLLM is released, we will "
+#~ "drop this patch module and bump to"
+#~ " a new version. For example, "
+#~ "`patch_0_9_2` is used for patching vLLM"
+#~ " 0.9.2."
+#~ msgstr ""
+#~ "`patch_0_9_2`：此模块用于修补 vLLM 0.9.2。该版本始终对应于 vLLM "
+#~ "的最近版本。一旦 vLLM 发布新版本，我们将移除此补丁模块并升级到新版本。例如，`patch_0_9_2` "
+#~ "就是用于修补 vLLM 0.9.2 的。"
+
+#~ msgid ""
+#~ "`patch_main`: This module is used for"
+#~ " patching the code in vLLM main "
+#~ "branch."
+#~ msgstr "`patch_main`：该模块用于修补 vLLM 主分支代码。"
+
+#~ msgid ""
+#~ "`patch_common`: This module is used for"
+#~ " patching both vLLM 0.9.2 and vLLM"
+#~ " main branch."
+#~ msgstr "`patch_common`：此模块用于同时修补 vLLM 0.9.2 版本和 vLLM 主分支。"
+
+#~ msgid "How to write a patch"
+#~ msgstr "如何撰写补丁"
+
+#~ msgid ""
+#~ "Before writing a patch, following the"
+#~ " principle above, we should patch the"
+#~ " least code. If it's necessary, we"
+#~ " can patch the code in either "
+#~ "**platform** and **worker** folder. Here "
+#~ "is an example to patch `distributed` "
+#~ "module in vLLM."
+#~ msgstr ""
+#~ "在编写补丁之前，遵循上述原则，我们应尽量修改最少的代码。如果有必要，我们可以修改 **platform** 和"
+#~ " **worker** 文件夹中的代码。下面是一个在 vLLM 中修改 "
+#~ "`distributed` 模块的示例。"
+
+#~ msgid ""
+#~ "Decide which version of vLLM we "
+#~ "should patch. For example, after "
+#~ "analysis, here we want to patch "
+#~ "both 0.9.2 and main of vLLM."
+#~ msgstr "决定我们应该修补哪个版本的 vLLM。例如，经过分析后，这里我们想要同时修补 vLLM 的 0.9.2 版和主分支（main）。"
+
+#~ msgid ""
+#~ "Decide which process we should patch."
+#~ " For example, here `distributed` belongs"
+#~ " to the vLLM main process, so "
+#~ "we should patch `platform`."
+#~ msgstr "决定我们应该修补哪个进程。例如，这里 `distributed` 属于 vLLM 主进程，所以我们应该修补 `platform`。"
+
+#~ msgid ""
+#~ "Create the patch file in the right"
+#~ " folder. The file should be named "
+#~ "as `patch_{module_name}.py`. The example here"
+#~ " is "
+#~ "`vllm_kunlun/patch/platform/patch_common/patch_distributed.py`."
+#~ msgstr ""
+#~ "在正确的文件夹中创建补丁文件。文件应命名为 `patch_{module_name}.py`。此处的示例是 "
+#~ "`vllm_kunlun/patch/platform/patch_common/patch_distributed.py`。"
+
+#~ msgid "Write your patch code in the new file. Here is an example:"
+#~ msgstr "在新文件中编写你的补丁代码。以下是一个示例："
+
+#~ msgid ""
+#~ "Import the patch file in `__init__.py`."
+#~ " In this example, add `import "
+#~ "vllm_kunlun.patch.platform.patch_common.patch_distributed` into"
+#~ " `vllm_kunlun/patch/platform/patch_common/__init__.py`."
+#~ msgstr ""
+#~ "在 `__init__.py` 中导入补丁文件。在这个示例中，将 `import "
+#~ "vllm_kunlun.patch.platform.patch_common.patch_distributed` 添加到"
+#~ " `vllm_kunlun/patch/platform/patch_common/__init__.py` 中。"
+
+#~ msgid ""
+#~ "Add the description of the patch "
+#~ "in `vllm_kunlun/patch/__init__.py`. The description"
+#~ " format is as follows:"
+#~ msgstr "在 `vllm_kunlun/patch/__init__.py` 中添加补丁的描述。描述格式如下："
+
+#~ msgid ""
+#~ "Add the Unit Test and E2E Test."
+#~ " Any newly added code in vLLM "
+#~ "Kunlun should contain the Unit Test "
+#~ "and E2E Test as well. You can "
+#~ "find more details in [test "
+#~ "guide](../contribution/testing.md)"
+#~ msgstr ""
+#~ "添加单元测试和端到端（E2E）测试。在 vLLM Kunlun "
+#~ "中新增的任何代码也应包含单元测试和端到端测试。更多详情请参见 "
+#~ "[测试指南](../contribution/testing.md)。"
+
+#~ msgid "Limitation"
+#~ msgstr "限制"
+
+#~ msgid ""
+#~ "In V1 Engine, vLLM starts three "
+#~ "kinds of process: Main process, "
+#~ "EngineCore process and Worker process. "
+#~ "Now vLLM Kunlun only support patch "
+#~ "the code in Main process and "
+#~ "Worker process by default. If you "
+#~ "want to patch the code runs in "
+#~ "EngineCore process, you should patch "
+#~ "EngineCore process entirely during setup, "
+#~ "the entry code is here "
+#~ "`vllm.v1.engine.core`. Please override "
+#~ "`EngineCoreProc` and `DPEngineCoreProc` entirely."
+#~ msgstr ""
+#~ "在 V1 引擎中，vLLM 会启动三种类型的进程：主进程、EngineCore 进程和"
+#~ " Worker 进程。现在 vLLM Kunlun 默认只支持在主进程和 "
+#~ "Worker 进程中打补丁代码。如果你想要在 EngineCore 进程中打补丁，你需要在设置阶段对"
+#~ " EngineCore 进程整体打补丁，入口代码在 `vllm.v1.engine.core`。请完全重写"
+#~ " `EngineCoreProc` 和 `DPEngineCoreProc`。"
+
+#~ msgid ""
+#~ "If you are running an edited vLLM"
+#~ " code, the version of the vLLM "
+#~ "may be changed automatically. For "
+#~ "example, if you runs an edited "
+#~ "vLLM based on v0.9.n, the version "
+#~ "of vLLM may be change to "
+#~ "v0.9.nxxx, in this case, the patch "
+#~ "for v0.9.n in vLLM Kunlun would "
+#~ "not work as expect, because that "
+#~ "vLLM Kunlun can't distinguish the "
+#~ "version of vLLM you're using. In "
+#~ "this case, you can set the "
+#~ "environment variable `VLLM_VERSION` to specify"
+#~ " the version of vLLM you're using,"
+#~ " then the patch for v0.9.2 should "
+#~ "work."
+#~ msgstr ""
+#~ "如果你运行的是经过编辑的 vLLM 代码，vLLM 的版本可能会被自动更改。例如，如果你基于 "
+#~ "v0.9.n 运行了编辑后的 vLLM，vLLM 的版本可能会变为 "
+#~ "v0.9.nxxx，在这种情况下，vLLM Kunlun 的 v0.9.n "
+#~ "补丁将无法正常工作，因为 vLLM Kunlun 无法区分你所使用的 vLLM "
+#~ "版本。这时，你可以设置环境变量 `VLLM_VERSION` 来指定你所使用的 vLLM "
+#~ "版本，这样对 v0.9.2 的补丁就应该可以正常工作。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/modeling/adding_a_new_model.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/modeling/adding_a_new_model.po
@@ -0,0 +1,333 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:1
+msgid "Adding a New Model"
+msgstr "添加新模型"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:3
+msgid ""
+"This guide demonstrates how to integrate a novel or customized model into "
+"vllm-kunlun. For foundational concepts, it is highly recommended to refer to"
+" [vllm official doc: Adding a New "
+"Model](https://docs.vllm.ai/en/stable/contributing/model/) first."
+msgstr ""
+"本指南演示如何将新颖或自定义的模型集成到 vllm-kunlun 中。对于基础概念，强烈建议先参考 [vllm "
+"官方文档：添加新模型](https://docs.vllm.ai/en/stable/contributing/model/)。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:6
+msgid "Step 1: Implementing Models with `torch` and `torch_npu`"
+msgstr "步骤 1：使用 `torch` 和 `torch_npu` 实现模型"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:8
+msgid ""
+"This section provides instructions for implementing new models compatible "
+"with vllm and vllm-kunlun."
+msgstr "本节提供了实现与 vllm 和 vllm-kunlun 兼容的新模型的相关说明。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:10
+msgid "**Before starting:**"
+msgstr "**开始之前：**"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:12
+msgid ""
+"Verify whether your model already exists in vllm's "
+"[models](https://github.com/vllm-"
+"project/vllm/tree/main/vllm/model_executor/models) directory."
+msgstr ""
+"请确认你的模型是否已经存在于 vllm 的 [models](https://github.com/vllm-"
+"project/vllm/tree/main/vllm/model_executor/models) 目录中。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:13
+msgid ""
+"Use existing models' implementation as templates to accelerate your "
+"development."
+msgstr "使用已有模型的实现作为模板以加速您的开发。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:15
+msgid "Method 1: Implementing New Models from Scratch"
+msgstr "方法一：从零开始实现新模型"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:17
+msgid ""
+"Follow vllm's [OPT model "
+"adaptation](https://docs.vllm.ai/en/stable/contributing/model/basic.html) "
+"example for guidance."
+msgstr ""
+"请参考 vllm 的 [OPT "
+"模型适配](https://docs.vllm.ai/en/stable/contributing/model/basic.html) 示例进行操作。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:19
+msgid "**Key implementation requirements:**"
+msgstr "**关键实现要求：**"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:21
+msgid "Place model files in `vllm_kunlun/models/` directory."
+msgstr "请将模型文件放在 `vllm_kunlun/models/` 目录下。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:23
+msgid ""
+"Standard module structure for decoder-only LLMs (please checkout vllm's "
+"implementations for other kinds of model):"
+msgstr "解码器-only LLMs 的标准模块结构（请参考 vllm 对其他类型模型的实现）："
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:25
+msgid "`*ModelForCausalLM` (top-level wrapper)"
+msgstr "`*ModelForCausalLM`（顶层包装器）"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:26
+msgid "`*Model` (main architecture)"
+msgstr "`*Model`（主架构）"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:27
+msgid "`*DecoderLayer` (transformer block)"
+msgstr "`*DecoderLayer` （transformer 块）"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:28
+msgid "`*Attention` and `*MLP` (specific computation unit)"
+msgstr "`*Attention` 和 `*MLP`（特定计算单元）"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:31
+msgid "`*` denotes your model's unique identifier."
+msgstr "`*` 表示你的模型的唯一标识符。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:34
+msgid "Critical Implementation Details:"
+msgstr "关键实现细节："
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:36
+msgid "All modules must include a `prefix` argument in `__init__()`."
+msgstr "所有模块在 `__init__()` 方法中都必须包含一个 `prefix` 参数。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:38
+msgid "**Required interfaces:**"
+msgstr "**必需的接口：**"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:30
+msgid "Module Type"
+msgstr "模块类型"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:30
+msgid "Required Methods"
+msgstr "必需的方法"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:30
+msgid "`*ModelForCausalLM`"
+msgstr "`*ModelForCausalLM`"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:30
+msgid "`get_input_embeddings`, `compute_logits`, `load_weights`"
+msgstr "`get_input_embeddings`，`compute_logits`，`load_weights`"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:30
+msgid "`*Model`"
+msgstr "`*模型`"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:30
+msgid "`get_input_embeddings`, `load_weights`"
+msgstr "`get_input_embeddings`，`load_weights`"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:45
+msgid "Attention Backend Integration:"
+msgstr "注意后端集成："
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:47
+msgid ""
+"Importing attention via `from vllm.attention import Attention` can "
+"automatically leverage the attention backend routing of vllm-kunlun (see: "
+"`get_attn_backend_cls()` in `vllm_kunlun/platform.py`)."
+msgstr ""
+"通过 `from vllm.attention import Attention` 导入 attention 可以自动利用 vllm-kunlun "
+"的注意力后端路由（详见：`vllm_kunlun/platform.py` 中的 `get_attn_backend_cls()`）。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:49
+msgid "Tensor Parallelism:"
+msgstr "张量并行："
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:51
+msgid ""
+"Use vllm's parallel layers (`ColumnParallelLinear`, "
+"`VocabParallelEmbedding`, etc.) to implement models supporting tensor "
+"parallelism. Note that Kunlun-specific customizations are implemented in "
+"`vllm_kunlun/ops/` directory (RMSNorm, VocabParallelEmbedding, etc.)."
+msgstr ""
+"使用 vllm 的并行层（如 `ColumnParallelLinear`、`VocabParallelEmbedding` "
+"等）来实现支持张量并行的模型。需要注意的是，Kunlun 特有的自定义实现（如 RMSNorm、VocabParallelEmbedding 等）位于 "
+"`vllm_kunlun/ops/` 目录下。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:53
+msgid ""
+"**Reference Implementation Template** (assumed path: "
+"`vllm_kunlun/models/custom_model.py`):"
+msgstr "**参考实现模板**（假定路径：`vllm_kunlun/models/custom_model.py`）："
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:135
+msgid "Method 2: Customizing Existing vLLM Models"
+msgstr "方法二：自定义已有的 vLLM 模型"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:137
+msgid ""
+"For most use cases, extending existing implementations is preferable. We "
+"demonstrate an example to inherit from base classes and implement a custom "
+"deepseek model below (assumed path: `vllm_kunlun/models/deepseek_v2.py`)."
+msgstr ""
+"对于大多数使用场景，建议扩展已有的实现。我们在下面演示了一个示例，通过继承基类并实现一个自定义的 deepseek "
+"模型（假定路径：`vllm_kunlun/models/deepseek_v2.py`）。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:175
+msgid ""
+"For a complete implementation reference, see: "
+"`vllm_kunlun/models/deepseek_v2.py`."
+msgstr "完整的实现参考请见：`vllm_kunlun/models/deepseek_v2.py`。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:178
+msgid "Step 2: Registering Custom Models using ModelRegistry Plugins in vLLM"
+msgstr "第2步：使用 vLLM 中的 ModelRegistry 插件注册自定义模型"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:180
+msgid ""
+"vllm provides a plugin mechanism for registering externally implemented "
+"models without modifying its codebase."
+msgstr "vllm 提供了一种插件机制，可用于注册外部实现的模型，而无需修改其代码库。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:182
+msgid ""
+"To integrate your implemented model from `vllm_kunlun/models/` directory:"
+msgstr "要集成你在 `vllm_kunlun/models/` 目录下实现的模型："
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:184
+msgid ""
+"Import your model implementation in `vllm_kunlun/models/__init__.py` using "
+"relative imports."
+msgstr "使用相对导入在 `vllm_kunlun/models/__init__.py` 中导入你的模型实现。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:185
+msgid ""
+"Register the model wrapper class via `vllm.ModelRegistry.register_model()` "
+"function."
+msgstr "通过 `vllm.ModelRegistry.register_model()` 函数注册模型包装类。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:187
+msgid ""
+"**Reference Registration Template** (an example of registering new models in"
+" `vllm_kunlun/models/__init__.py`):"
+msgstr "**参考注册模板**（在 `vllm_kunlun/models/__init__.py` 注册新模型的示例）："
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:210
+msgid ""
+"The first argument of `vllm.ModelRegistry.register_model()` indicates the "
+"unique architecture identifier which must match `architectures` in "
+"`config.json` of the model."
+msgstr ""
+"`vllm.ModelRegistry.register_model()` 的第一个参数表示唯一的架构标识符，这个标识符必须与模型的 "
+"`config.json` 文件中的 `architectures` 匹配。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:221
+msgid "Step 3: Verification"
+msgstr "第 3 步：验证"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:223
+msgid "Case 1: Overriding Existing vLLM Model Architecture"
+msgstr "案例 1：重载已有的 vLLM 模型架构"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:225
+msgid ""
+"If you're registering a customized model architecture based on vllm's "
+"existing implementation (overriding vllm's original class), when executing "
+"vllm offline/online inference (using any model), you'll observe warning logs"
+" similar to the following output from "
+"`vllm/models_executor/models/registry.py`."
+msgstr ""
+"如果你基于 vllm 的现有实现注册了一个自定义的模型架构（覆盖了 vllm 的原始类），在执行 vllm "
+"的离线/在线推理（无论使用哪个模型）时，你会看到类似于 `vllm/models_executor/models/registry.py` "
+"输出的警告日志。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:231
+msgid "Case 2: Registering New Model Architecture"
+msgstr "案例2：注册新模型架构"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:233
+msgid ""
+"If you're registering a novel model architecture not present in vllm "
+"(creating a completely new class), current logs won't provide explicit "
+"confirmation by default. It's recommended to add the following logging "
+"statement at the end of the `register_model` method in "
+"`vllm/models_executor/models/registry.py`."
+msgstr ""
+"如果你注册了 vllm 中不存在的新模型架构（创建一个全新的类），当前日志默认不会提供明确的确认信息。建议在 "
+"`vllm/models_executor/models/registry.py` 文件中的 `register_model` "
+"方法末尾添加如下日志语句。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:239
+msgid ""
+"After adding this line, you will see confirmation logs shown below when "
+"running vllm offline/online inference (using any model)."
+msgstr "添加这一行之后，当你运行 vllm 离线/在线推理（使用任何模型）时，将会看到如下确认日志。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:245
+msgid ""
+"This log output confirms your novel model architecture has been successfully"
+" registered in vllm."
+msgstr "该日志输出确认了你的新模型架构已成功在 vllm 中注册。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:247
+msgid "Step 4: Testing"
+msgstr "第4步：测试"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:249
+msgid ""
+"After adding a new model, we should do basic functional test (offline/online"
+" inference), accuracy test and performance benchmark for the model."
+msgstr "在添加新模型后，我们应对该模型进行基本功能测试（离线/在线推理）、准确率测试和性能基准测试。"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:251
+msgid "Find more details at:"
+msgstr "更多详情请见："
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:253
+msgid ""
+"[Accuracy test guide](https://vllm-"
+"kunlun.readthedocs.io/en/latest/developer_guide/evaluation/index.html)"
+msgstr ""
+"[精度测试指南](https://vllm-"
+"kunlun.readthedocs.io/en/latest/developer_guide/evaluation/index.html)"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:254
+msgid ""
+"[Performance benchmark guide](https://vllm-"
+"kunlun.readthedocs.io/en/latest/developer_guide/performance/performance_benchmark.html)"
+msgstr ""
+"[性能基准指南](https://vllm-"
+"kunlun.readthedocs.io/en/latest/developer_guide/performance/performance_benchmark.html)"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:256
+msgid "Step 5: Updating Supported Models Doc"
+msgstr "第5步：更新支持的模型文档"
+
+#: ../../developer_guide/modeling/adding_a_new_model.md:258
+msgid ""
+"At last, if all the steps above are completed, you should add the new model "
+"into our [Supported Models](https://vllm-"
+"kunlun.readthedocs.io/en/latest/user_guide/supported_models.html) doc."
+msgstr ""
+"最后，如果以上所有步骤都已完成，你应该将新模型添加到我们的[支持的模型](https://vllm-"
+"kunlun.readthedocs.io/en/latest/user_guide/supported_models.html)文档中。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/modeling/adding_a_new_multimodal_model.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/modeling/adding_a_new_multimodal_model.po
@@ -0,0 +1,29 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../developer_guide/modeling/adding_a_new_multimodal_model.md:1
+msgid "Adding a New Multi-Modal Model"
+msgstr "添加新的多模态模型"
+
+#: ../../developer_guide/modeling/adding_a_new_multimodal_model.md:3
+msgid "**_Comming soon ..._**"
+msgstr "**_敬请期待 ..._**"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/modeling/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/modeling/index.po
@@ -0,0 +1,32 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../developer_guide/modeling/index.md:1
+#: ../../developer_guide/modeling/index.md:5
+msgid "Modeling"
+msgstr "新模型"
+
+#: ../../developer_guide/modeling/index.md:3
+msgid ""
+"This section provides tutorials of how to implement and register a new model"
+" into vllm-kunlun."
+msgstr "本节提供了如何在 vllm-kunlun 中实现并注册新模型的教程。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/index.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../developer_guide/performance/index.md:1
+#: ../../developer_guide/performance/index.md:3
+msgid "Performance"
+msgstr "性能"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/optimization_and_tuning.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/optimization_and_tuning.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/performance/optimization_and_tuning.md:1
+msgid "Optimization and Tuning"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/performance_benchmark.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/performance_benchmark.po
@@ -0,0 +1,92 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/performance/performance_benchmark.md:1
+msgid "Performance Benchmark"
+msgstr "性能基准"
+
+#~ msgid ""
+#~ "This document details the benchmark "
+#~ "methodology for vllm-kunlun, aimed at"
+#~ " evaluating the performance under a "
+#~ "variety of workloads. To maintain "
+#~ "alignment with vLLM, we use the "
+#~ "[benchmark](https://github.com/vllm-"
+#~ "project/vllm/tree/main/benchmarks) script provided "
+#~ "by the vllm project."
+#~ msgstr ""
+#~ "本文档详细说明了 vllm-kunlun 的基准测试方法，旨在评估其在多种工作负载下的性能。为了与"
+#~ " vLLM 保持一致，我们使用 vllm 项目提供的 "
+#~ "[benchmark](https://github.com/vllm-"
+#~ "project/vllm/tree/main/benchmarks) 脚本。"
+
+#~ msgid ""
+#~ "**Benchmark Coverage**: We measure offline "
+#~ "e2e latency and throughput, and "
+#~ "fixed-QPS online serving benchmarks, for"
+#~ " more details see [vllm-kunlun "
+#~ "benchmark scripts](https://github.com/vllm-project"
+#~ "/vllm-kunlun/tree/main/benchmarks)."
+#~ msgstr ""
+#~ "**基准测试覆盖范围**：我们测量离线端到端延迟和吞吐量，以及固定 QPS 的在线服务基准测试。更多详情请参见"
+#~ " [vllm-kunlun 基准测试脚本](https://github.com/vllm-"
+#~ "project/vllm-kunlun/tree/main/benchmarks)。"
+
+#~ msgid "1. Run docker container"
+#~ msgstr "1. 运行 docker 容器"
+
+#~ msgid "2. Install dependencies"
+#~ msgstr "2. 安装依赖项"
+
+#~ msgid "3. (Optional)Prepare model weights"
+#~ msgstr "3.（可选）准备模型权重"
+
+#~ msgid ""
+#~ "For faster running speed, we recommend"
+#~ " downloading the model in advance："
+#~ msgstr "为了更快的运行速度，建议提前下载模型："
+
+#~ msgid ""
+#~ "You can also replace all model "
+#~ "paths in the [json](https://github.com/vllm-"
+#~ "project/vllm-kunlun/tree/main/benchmarks/tests) files "
+#~ "with your local paths:"
+#~ msgstr ""
+#~ "你也可以将 [json](https://github.com/vllm-project/vllm-"
+#~ "kunlun/tree/main/benchmarks/tests) 文件中的所有模型路径替换为你的本地路径："
+
+#~ msgid "4. Run benchmark script"
+#~ msgstr "4. 运行基准测试脚本"
+
+#~ msgid "Run benchmark script:"
+#~ msgstr "运行基准测试脚本："
+
+#~ msgid "After about 10 mins, the output is as shown below:"
+#~ msgstr "大约 10 分钟后，输出如下所示："
+
+#~ msgid ""
+#~ "The result json files are generated "
+#~ "into the path `benchmark/results` These "
+#~ "files contain detailed benchmarking results"
+#~ " for further analysis."
+#~ msgstr "结果 json 文件会生成到路径 `benchmark/results`。这些文件包含了用于进一步分析的详细基准测试结果。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/profile_execute_duration.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/developer_guide/performance/profile_execute_duration.po
@@ -0,0 +1,86 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/developer_guide/performance/profile_execute_duration.md:1
+msgid "Profile Execute Duration"
+msgstr "配置执行持续时间"
+
+#~ msgid ""
+#~ "The execution duration of each stage "
+#~ "(including pre/post-processing, model forward,"
+#~ " etc.) usually needs to be captured"
+#~ " during a complete inference process. "
+#~ "Typically, this is done by using "
+#~ "`torch.xpu.synchronize()` and obtaining CPU "
+#~ "timestamps, which increases the performance"
+#~ " overhead of host/device synchronization."
+#~ msgstr ""
+#~ "在完整的推理过程中，通常需要记录每个阶段（包括前/后处理、模型前向等）的执行时长。一般通过使用 "
+#~ "`torch.xpu.synchronize()` 并获取 CPU "
+#~ "时间戳来实现，这会增加主机/设备同步的性能开销。"
+
+#~ msgid ""
+#~ "**To reduce the performance overhead, we"
+#~ " add this feature, using the XPU "
+#~ "event timestamp mechanism to observe the"
+#~ " device execution time asynchronously.**"
+#~ msgstr "**为了减少性能开销，我们添加了此功能，使用 XPU 事件时间戳机制异步观测设备的执行时间。**"
+
+#~ msgid "Usage"
+#~ msgstr "用法"
+
+#~ msgid ""
+#~ "Use the environment variable "
+#~ "`VLLM_KUNLUN_MODEL_EXECUTE_TIME_OBSERVE` to enable "
+#~ "this feature."
+#~ msgstr "使用环境变量 `VLLM_KUNLUN_MODEL_EXECUTE_TIME_OBSERVE` 来启用此功能。"
+
+#~ msgid ""
+#~ "Use the non-blocking API "
+#~ "`ProfileExecuteDuration().capture_async` to set "
+#~ "observation points asynchronously when you "
+#~ "need to observe the execution duration."
+#~ msgstr ""
+#~ "当你需要观察执行时长时，可以使用非阻塞 API "
+#~ "`ProfileExecuteDuration().capture_async` 异步设置观察点。"
+
+#~ msgid ""
+#~ "Use the blocking API "
+#~ "`ProfileExecuteDuration().pop_captured_sync` at an "
+#~ "appropriate time to get and print "
+#~ "the execution durations of all observed"
+#~ " stages."
+#~ msgstr ""
+#~ "在适当的时机使用阻塞式 API "
+#~ "`ProfileExecuteDuration().pop_captured_sync` 获取并打印所有已观察到阶段的执行时长。"
+
+#~ msgid ""
+#~ "**We have instrumented the key inference"
+#~ " stages (including pre-processing, model"
+#~ " forward pass, etc.) for execute "
+#~ "duration profiling. Execute the script "
+#~ "as follows:**"
+#~ msgstr "**我们已经对关键的推理阶段（包括预处理、模型前向传递等）进行了执行时长分析的检测。请按如下方式执行脚本：**"
+
+#~ msgid "Example Output"
+#~ msgstr "示例输出"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/faqs.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/faqs.po
@@ -0,0 +1,507 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/faqs.md:1
+msgid "FAQs"
+msgstr ""
+
+#: ../../source/faqs.md:3
+msgid "Version Specific FAQs"
+msgstr "特定版本常见问题"
+
+#~ msgid ""
+#~ "[[v0.7.3.post1] FAQ & Feedback](https://github.com"
+#~ "/vllm-project/vllm-kunlun/issues/1007)"
+#~ msgstr ""
+#~ "[[v0.7.3.post1] 常见问题与反馈](https://github.com/vllm-project"
+#~ "/vllm-kunlun/issues/1007)"
+
+#~ msgid ""
+#~ "[[v0.9.2rc1] FAQ & Feedback](https://github.com"
+#~ "/vllm-project/vllm-kunlun/issues/1742)"
+#~ msgstr ""
+#~ "[[v0.9.2rc1] 常见问题与反馈](https://github.com/vllm-project"
+#~ "/vllm-kunlun/issues/1742)"
+
+#~ msgid "General FAQs"
+#~ msgstr "常见问题解答"
+
+#~ msgid "1. What devices are currently supported?"
+#~ msgstr "1. 目前支持哪些设备？"
+
+#~ msgid ""
+#~ "Currently, **ONLY** Atlas A2 series(Kunlun-"
+#~ "cann-kernels-910b) and Atlas 300I"
+#~ "(Kunlun-cann-kernels-310p) series are "
+#~ "supported:"
+#~ msgstr ""
+#~ "目前，**仅**支持 Atlas A2 系列（Kunlun-cann-"
+#~ "kernels-910b）和 Atlas 300I（Kunlun-cann-"
+#~ "kernels-310p）系列："
+
+#~ msgid ""
+#~ "Atlas A2 Training series (Atlas 800T "
+#~ "A2, Atlas 900 A2 PoD, Atlas 200T"
+#~ " A2 Box16, Atlas 300T A2)"
+#~ msgstr ""
+#~ "Atlas A2 训练系列（Atlas 800T A2，Atlas 900"
+#~ " A2 PoD，Atlas 200T A2 Box16，Atlas "
+#~ "300T A2）"
+
+#~ msgid "Atlas 800I A2 Inference series (Atlas 800I A2)"
+#~ msgstr "Atlas 800I A2 推理系列（Atlas 800I A2）"
+
+#~ msgid "Atlas 300I Inference series (Atlas 300I Duo)"
+#~ msgstr "Atlas 300I 推理系列（Atlas 300I Duo）"
+
+#~ msgid "Below series are NOT supported yet:"
+#~ msgstr "以下系列目前尚不受支持："
+
+#~ msgid "Atlas 200I A2 (Kunlun-cann-kernels-310b) unplanned yet"
+#~ msgstr "Atlas 200I A2（Kunlun-cann-kernels-310b）尚未计划"
+
+#~ msgid "Kunlun 910, Kunlun 910 Pro B (Kunlun-cann-kernels-910) unplanned yet"
+#~ msgstr "Kunlun 910，Kunlun 910 Pro B（Kunlun-cann-kernels-910）尚未计划"
+
+#~ msgid ""
+#~ "From a technical view, vllm-kunlun "
+#~ "support would be possible if the "
+#~ "torch-xpu is supported. Otherwise, we "
+#~ "have to implement it by using "
+#~ "custom ops. We are also welcome to"
+#~ " join us to improve together."
+#~ msgstr ""
+#~ "从技术角度来看，如果支持 torch-xpu，则可以支持 vllm-"
+#~ "kunlun。否则，我们需要通过自定义算子来实现。我们也欢迎大家一起加入，共同改进。"
+
+#~ msgid "2. How to get our docker containers?"
+#~ msgstr "2. 如何获取我们的 docker 容器？"
+
+#~ msgid ""
+#~ "You can get our containers at "
+#~ "`Quay.io`, e.g., [<u>vllm-"
+#~ "kunlun</u>](https://quay.io/repository/kunlun/vllm-"
+#~ "kunlun?tab=tags) and "
+#~ "[<u>cann</u>](https://quay.io/repository/kunlun/cann?tab=tags)."
+#~ msgstr ""
+#~ "你可以在 `Quay.io` 获取我们的容器，例如，[<u>vllm-"
+#~ "kunlun</u>](https://quay.io/repository/kunlun/vllm-"
+#~ "kunlun?tab=tags) 和 "
+#~ "[<u>cann</u>](https://quay.io/repository/kunlun/cann?tab=tags)。"
+
+#~ msgid ""
+#~ "If you are in China, you can "
+#~ "use `daocloud` to accelerate your "
+#~ "downloading:"
+#~ msgstr "如果你在中国，可以使用 `daocloud` 来加速下载："
+
+#~ msgid "3. What models does vllm-kunlun supports?"
+#~ msgstr "3. vllm-kunlun 支持哪些模型？"
+
+#~ msgid ""
+#~ "Find more details [<u>here</u>](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/user_guide/support_matrix/supported_models.html)."
+#~ msgstr ""
+#~ "在[<u>此处</u>](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/user_guide/support_matrix/supported_models.html)查看更多详细信息。"
+
+#~ msgid "4. How to get in touch with our community?"
+#~ msgstr "4. 如何与我们的社区取得联系？"
+
+#~ msgid ""
+#~ "There are many channels that you "
+#~ "can communicate with our community "
+#~ "developers / users:"
+#~ msgstr "你可以通过多种渠道与我们的社区开发者/用户进行交流："
+
+#~ msgid ""
+#~ "Submit a GitHub [<u>issue</u>](https://github.com"
+#~ "/vllm-project/vllm-kunlun/issues?page=1)."
+#~ msgstr ""
+#~ "提交一个 GitHub [<u>issue</u>](https://github.com/vllm-"
+#~ "project/vllm-kunlun/issues?page=1)。"
+
+#~ msgid ""
+#~ "Join our [<u>weekly "
+#~ "meeting</u>](https://docs.google.com/document/d/1hCSzRTMZhIB8vRq1_qOOjx4c9uYUxvdQvDsMV2JcSrw/edit?tab=t.0#heading=h.911qu8j8h35z)"
+#~ " and share your ideas."
+#~ msgstr "加入我们的[<u>每周会议</u>](https://docs.google.com/document/d/1hCSzRTMZhIB8vRq1_qOOjx4c9uYUxvdQvDsMV2JcSrw/edit?tab=t.0#heading=h.911qu8j8h35z)，并分享你的想法。"
+
+#~ msgid ""
+#~ "Join our [<u>WeChat</u>](https://github.com/vllm-"
+#~ "project/vllm-kunlun/issues/227) group and ask"
+#~ " your quenstions."
+#~ msgstr ""
+#~ "加入我们的 [<u>微信群</u>](https://github.com/vllm-project"
+#~ "/vllm-kunlun/issues/227) 并提问你的问题。"
+
+#~ msgid ""
+#~ "Join our kunlun channel in [<u>vLLM "
+#~ "forums</u>](https://discuss.vllm.ai/c/hardware-support/vllm-"
+#~ "kunlun-support/6) and publish your "
+#~ "topics."
+#~ msgstr ""
+#~ "加入我们在 [<u>vLLM 论坛</u>](https://discuss.vllm.ai/c"
+#~ "/hardware-support/vllm-kunlun-support/6) 的 "
+#~ "kunlun 频道并发布你的话题。"
+
+#~ msgid "5. What features does vllm-kunlun V1 supports?"
+#~ msgstr "5. vllm-kunlun V1 支持哪些功能？"
+
+#~ msgid ""
+#~ "Find more details [<u>here</u>](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/user_guide/support_matrix/supported_features.html)."
+#~ msgstr ""
+#~ "在[<u>这里</u>](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/user_guide/support_matrix/supported_features.html)找到更多详细信息。"
+
+#~ msgid ""
+#~ "6. How to solve the problem of "
+#~ "\"Failed to infer device type\" or "
+#~ "\"libatb.so: cannot open shared object "
+#~ "file\"?"
+#~ msgstr "6. 如何解决“无法推断设备类型”或“libatb.so：无法打开共享对象文件”问题？"
+
+#~ msgid ""
+#~ "Basically, the reason is that the "
+#~ "XPU environment is not configured "
+#~ "correctly. You can:"
+#~ msgstr "基本上，原因是 XPU 环境没有正确配置。你可以："
+
+#~ msgid ""
+#~ "try `source /usr/local/Kunlun/nnal/atb/set_env.sh` "
+#~ "to enable NNAL package."
+#~ msgstr "尝试运行 `source /usr/local/Kunlun/nnal/atb/set_env.sh` 以启用 NNAL 包。"
+
+#~ msgid ""
+#~ "try `source /usr/local/Kunlun/kunlun-"
+#~ "toolkit/set_env.sh` to enable CANN package."
+#~ msgstr "尝试运行 `source /usr/local/Kunlun/kunlun-toolkit/set_env.sh` 以启用 CANN 包。"
+
+#~ msgid "try `xpu-smi info` to check whether the XPU is working."
+#~ msgstr "尝试运行 `xpu-smi info` 来检查 XPU 是否正常工作。"
+
+#~ msgid ""
+#~ "If all above steps are not "
+#~ "working, you can try the following "
+#~ "code with python to check whether "
+#~ "there is any error:"
+#~ msgstr "如果以上所有步骤都无效，你可以尝试使用以下 python 代码来检查是否有错误："
+
+#~ msgid "If all above steps are not working, feel free to submit a GitHub issue."
+#~ msgstr "如果以上所有步骤都无法解决问题，欢迎提交一个 GitHub issue。"
+
+#~ msgid "7. How does vllm-kunlun perform?"
+#~ msgstr "7. vllm-kunlun 的性能如何？"
+
+#~ msgid ""
+#~ "Currently, only some models are "
+#~ "improved. Such as `Qwen2.5 VL`, `Qwen3`,"
+#~ " `Deepseek  V3`. Others are not good"
+#~ " enough. From 0.9.0rc2, Qwen and "
+#~ "Deepseek works with graph mode to "
+#~ "play a good performance. What's more,"
+#~ " you can install `mindie-turbo` with"
+#~ " `vllm-kunlun v0.7.3` to speed up "
+#~ "the inference as well."
+#~ msgstr ""
+#~ "目前，只有部分模型得到了改进，比如 `Qwen2.5 VL`、`Qwen3` 和 "
+#~ "`Deepseek V3`。其他模型的效果还不够理想。从 0.9.0rc2 开始，Qwen "
+#~ "和 Deepseek 已经支持图模式，以获得更好的性能。此外，你还可以在 `vllm-"
+#~ "kunlun v0.7.3` 上安装 `mindie-turbo`，进一步加速推理。"
+
+#~ msgid "8. How vllm-kunlun work with vllm?"
+#~ msgstr "8. vllm-kunlun 如何与 vllm 协同工作？"
+
+#~ msgid ""
+#~ "vllm-kunlun is a plugin for vllm."
+#~ " Basically, the version of vllm-"
+#~ "kunlun is the same as the version"
+#~ " of vllm. For example, if you "
+#~ "use vllm 0.7.3, you should use "
+#~ "vllm-kunlun 0.7.3 as well. For main"
+#~ " branch, we will make sure `vllm-"
+#~ "kunlun` and `vllm` are compatible by "
+#~ "each commit."
+#~ msgstr ""
+#~ "vllm-kunlun 是 vllm 的一个插件。基本上，vllm-kunlun"
+#~ " 的版本与 vllm 的版本是相同的。例如，如果你使用 vllm "
+#~ "0.7.3，你也应该使用 vllm-kunlun 0.7.3。对于主分支，我们会确保每次提交都让 "
+#~ "`vllm-kunlun` 和 `vllm` 保持兼容。"
+
+#~ msgid "9. Does vllm-kunlun support Prefill Disaggregation feature?"
+#~ msgstr "9. vllm-kunlun 支持 Prefill Disaggregation 功能吗？"
+
+#~ msgid ""
+#~ "Currently, only 1P1D is supported on "
+#~ "V0 Engine. For V1 Engine or NPND"
+#~ " support, We will make it stable "
+#~ "and supported by vllm-kunlun in "
+#~ "the future."
+#~ msgstr "目前，V0引擎只支持1P1D。对于V1引擎或NPND的支持，我们将在未来使其稳定并由vllm-kunlun支持。"
+
+#~ msgid "10. Does vllm-kunlun support quantization method?"
+#~ msgstr "10. vllm-kunlun 支持量化方法吗？"
+
+#~ msgid ""
+#~ "Currently, w8a8 quantization is already "
+#~ "supported by vllm-kunlun originally on"
+#~ " v0.8.4rc2 or higher, If you're using"
+#~ " vllm 0.7.3 version, w8a8 quantization "
+#~ "is supporeted with the integration of"
+#~ " vllm-kunlun and mindie-turbo, please"
+#~ " use `pip install vllm-kunlun[mindie-"
+#~ "turbo]`."
+#~ msgstr ""
+#~ "目前，w8a8 量化已在 v0.8.4rc2 或更高版本的 vllm-"
+#~ "kunlun 中原生支持。如果你使用的是 vllm 0.7.3 版本，集成了 "
+#~ "vllm-kunlun 和 mindie-turbo 后也支持 w8a8"
+#~ " 量化，请使用 `pip install vllm-kunlun[mindie-"
+#~ "turbo]`。"
+
+#~ msgid "11. How to run w8a8 DeepSeek model?"
+#~ msgstr "11. 如何运行 w8a8 DeepSeek 模型？"
+
+#~ msgid ""
+#~ "Please following the [inferencing "
+#~ "tutorail](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/tutorials/multi_node.html) and"
+#~ " replace model to DeepSeek."
+#~ msgstr ""
+#~ "请按照[inferencing 教程](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/tutorials/multi_node.html)进行操作，并将模型更换为"
+#~ " DeepSeek。"
+
+#~ msgid ""
+#~ "12. There is no output in log "
+#~ "when loading models using vllm-kunlun,"
+#~ " How to solve it?"
+#~ msgstr "12. 使用 vllm-kunlun 加载模型时日志没有输出，如何解决？"
+
+#~ msgid ""
+#~ "If you're using vllm 0.7.3 version, "
+#~ "this is a known progress bar "
+#~ "display issue in VLLM, which has "
+#~ "been resolved in [this PR](https://github.com"
+#~ "/vllm-project/vllm/pull/12428), please cherry-"
+#~ "pick it locally by yourself. Otherwise,"
+#~ " please fill up an issue."
+#~ msgstr ""
+#~ "如果你正在使用 vllm 0.7.3 版本，这是 VLLM "
+#~ "已知的进度条显示问题，已在 [此 PR](https://github.com/vllm-"
+#~ "project/vllm/pull/12428) 中解决，请自行在本地进行 cherry-"
+#~ "pick。否则，请提交一个 issue。"
+
+#~ msgid "13. How vllm-kunlun is tested"
+#~ msgstr "13. 如何测试 vllm-kunlun"
+
+#~ msgid ""
+#~ "vllm-kunlun is tested by functional "
+#~ "test, performance test and accuracy "
+#~ "test."
+#~ msgstr "vllm-kunlun 经过功能测试、性能测试和精度测试。"
+
+#~ msgid ""
+#~ "**Functional test**: we added CI, "
+#~ "includes portion of vllm's native unit"
+#~ " tests and vllm-kunlun's own unit "
+#~ "tests，on vllm-kunlun's test, we test "
+#~ "basic functionality、popular models availability "
+#~ "and [supported features](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/user_guide/support_matrix/supported_features.html)"
+#~ " via e2e test"
+#~ msgstr ""
+#~ "**功能测试**：我们添加了CI，包含了vllm原生单元测试的一部分以及vllm-kunlun自己的单元测试。在vllm-"
+#~ "kunlun的测试中，我们通过e2e测试验证了基本功能、主流模型可用性和[支持的特性](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/user_guide/support_matrix/supported_features.html)。"
+
+#~ msgid ""
+#~ "**Performance test**: we provide "
+#~ "[benchmark](https://github.com/vllm-project/vllm-"
+#~ "kunlun/tree/main/benchmarks) tools for end-"
+#~ "to-end performance benchmark which can "
+#~ "easily to re-route locally, we'll "
+#~ "publish a perf website to show the"
+#~ " performance test results for each "
+#~ "pull request"
+#~ msgstr ""
+#~ "**性能测试**：我们提供了用于端到端性能基准测试的[基准测试](https://github.com/vllm-project"
+#~ "/vllm-"
+#~ "kunlun/tree/main/benchmarks)工具，可以方便地在本地重新运行。我们将发布一个性能网站，用于展示每个拉取请求的性能测试结果。"
+
+#~ msgid "**Accuracy test**: we're working on adding accuracy test to CI as well."
+#~ msgstr "**准确性测试**：我们也在努力将准确性测试添加到CI中。"
+
+#~ msgid ""
+#~ "Finnall, for each release, we'll publish"
+#~ " the performance test and accuracy "
+#~ "test report in the future."
+#~ msgstr "最后，未来每个版本发布时，我们都会公开性能测试和准确性测试报告。"
+
+#~ msgid "14. How to fix the error \"InvalidVersion\" when using vllm-kunlun?"
+#~ msgstr "14. 使用 vllm-kunlun 时如何解决 “InvalidVersion” 错误？"
+
+#~ msgid ""
+#~ "It's usually because you have installed"
+#~ " an dev/editable version of vLLM "
+#~ "package. In this case, we provide "
+#~ "the env variable `VLLM_VERSION` to let"
+#~ " users specify the version of vLLM"
+#~ " package to use. Please set the "
+#~ "env variable `VLLM_VERSION` to the "
+#~ "version of vLLM package you have "
+#~ "installed. The format of `VLLM_VERSION` "
+#~ "should be `X.Y.Z`."
+#~ msgstr ""
+#~ "这通常是因为你安装了开发版或可编辑版本的 vLLM 包。在这种情况下，我们提供了环境变量 "
+#~ "`VLLM_VERSION`，以便用户指定要使用的 vLLM 包版本。请将环境变量 "
+#~ "`VLLM_VERSION` 设置为你已安装的 vLLM 包的版本。`VLLM_VERSION` "
+#~ "的格式应为 `X.Y.Z`。"
+
+#~ msgid "15. How to handle Out Of Memory?"
+#~ msgstr "15. 如何处理内存溢出？"
+
+#~ msgid ""
+#~ "OOM errors typically occur when the "
+#~ "model exceeds the memory capacity of "
+#~ "a single XPU. For general guidance, "
+#~ "you can refer to [vLLM's OOM "
+#~ "troubleshooting "
+#~ "documentation](https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html"
+#~ "#out-of-memory)."
+#~ msgstr ""
+#~ "当模型超出单个 XPU 的内存容量时，通常会发生 OOM（内存溢出）错误。一般性的指导可以参考 "
+#~ "[vLLM 的 OOM "
+#~ "故障排除文档](https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html"
+#~ "#out-of-memory)。"
+
+#~ msgid ""
+#~ "In scenarios where XPUs have limited "
+#~ "HBM (High Bandwidth Memory) capacity, "
+#~ "dynamic memory allocation/deallocation during "
+#~ "inference can exacerbate memory fragmentation,"
+#~ " leading to OOM. To address this:"
+#~ msgstr ""
+#~ "在 XPU 的 "
+#~ "HBM（高带宽内存）容量有限的场景下，推理过程中动态内存分配和释放会加剧内存碎片，从而导致 "
+#~ "OOM（内存溢出）。为了解决这个问题："
+
+#~ msgid ""
+#~ "**Adjust `--gpu-memory-utilization`**: If "
+#~ "unspecified, will use the default value"
+#~ " of `0.9`. You can decrease this "
+#~ "param to reserve more memory to "
+#~ "reduce fragmentation risks. See more "
+#~ "note in: [vLLM - Inference and "
+#~ "Serving - Engine "
+#~ "Arguments](https://docs.vllm.ai/en/latest/serving/engine_args.html#vllm.engine"
+#~ ".arg_utils-_engine_args_parser-cacheconfig)."
+#~ msgstr ""
+#~ "**调整 `--gpu-memory-utilization`**：如果未指定，将使用默认值 "
+#~ "`0.9`。你可以降低此参数来预留更多内存，从而降低内存碎片风险。参见更多说明：[vLLM - 推理与服务 "
+#~ "- "
+#~ "引擎参数](https://docs.vllm.ai/en/latest/serving/engine_args.html#vllm.engine"
+#~ ".arg_utils-_engine_args_parser-cacheconfig)。"
+
+#~ msgid ""
+#~ "**Configure `PYTORCH_XPU_ALLOC_CONF`**: Set this "
+#~ "environment variable to optimize XPU "
+#~ "memory management. For example, you can"
+#~ " `export PYTORCH_XPU_ALLOC_CONF=expandable_segments:True` "
+#~ "to enable virtual memory feature to "
+#~ "mitigate memory fragmentation caused by "
+#~ "frequent dynamic memory size adjustments "
+#~ "during runtime, see more note in: "
+#~ "[PYTORCH_XPU_ALLOC_CONF](https://www.hikunlun.com/document/detail/zh/Pytorch/700/comref/Envvariables/Envir_012.html)."
+#~ msgstr ""
+#~ "**配置 `PYTORCH_XPU_ALLOC_CONF`**：设置此环境变量以优化XPU内存管理。例如，你可以通过 "
+#~ "`export PYTORCH_XPU_ALLOC_CONF=expandable_segments:True` "
+#~ "来启用虚拟内存功能，以缓解运行时频繁动态调整内存大小导致的内存碎片问题，更多说明参见：[PYTORCH_XPU_ALLOC_CONF](https://www.hikunlun.com/document/detail/zh/Pytorch/700/comref/Envvariables/Envir_012.html)。"
+
+#~ msgid "16. Failed to enable XPU graph mode when running DeepSeek?"
+#~ msgstr "16. 运行 DeepSeek 时无法启用 XPU 图模式？"
+
+#~ msgid ""
+#~ "You may encounter the following error"
+#~ " if running DeepSeek with XPU graph"
+#~ " mode enabled. The allowed number of"
+#~ " queries per kv when enabling both"
+#~ " MLA and Graph mode only support "
+#~ "{32, 64, 128}, **Thus this is not"
+#~ " supported for DeepSeek-V2-Lite**, as it"
+#~ " only has 16 attention heads. The "
+#~ "XPU graph mode support on "
+#~ "DeepSeek-V2-Lite will be done in the "
+#~ "future."
+#~ msgstr ""
+#~ "如果在启用XPU图模式（Graph "
+#~ "mode）运行DeepSeek时，您可能会遇到以下错误。当同时启用MLA和图模式时，每个kv允许的查询数只支持{32, 64,"
+#~ " "
+#~ "128}，**因此这不支持DeepSeek-V2-Lite**，因为它只有16个注意力头。未来会增加对DeepSeek-V2-Lite在XPU图模式下的支持。"
+
+#~ msgid ""
+#~ "And if you're using DeepSeek-V3 or "
+#~ "DeepSeek-R1, please make sure after the"
+#~ " tensor parallel split, num_heads / "
+#~ "num_kv_heads in {32, 64, 128}."
+#~ msgstr ""
+#~ "如果你正在使用 DeepSeek-V3 或 "
+#~ "DeepSeek-R1，请确保在张量并行切分后，num_heads / num_kv_heads 的值为"
+#~ " {32, 64, 128} 中的一个。"
+
+#~ msgid ""
+#~ "17. Failed to reinstall vllm-kunlun "
+#~ "from source after uninstalling vllm-"
+#~ "kunlun?"
+#~ msgstr "17. 卸载 vllm-kunlun 后无法从源码重新安装 vllm-kunlun？"
+
+#~ msgid ""
+#~ "You may encounter the problem of C"
+#~ " compilation failure when reinstalling "
+#~ "vllm-kunlun from source using pip. If"
+#~ " the installation fails, it is "
+#~ "recommended to use `python setup.py "
+#~ "install` to install, or use `python "
+#~ "setup.py clean` to clear the cache."
+#~ msgstr ""
+#~ "当你使用 pip 从源码重新安装 vllm-kunlun 时，可能会遇到 "
+#~ "C 编译失败的问题。如果安装失败，建议使用 `python setup.py "
+#~ "install` 进行安装，或者使用 `python setup.py clean` "
+#~ "清除缓存。"
+
+#~ msgid "18. How to generate determinitic results when using vllm-kunlun?"
+#~ msgstr "18. 使用 vllm-kunlun 时如何生成确定性结果？"
+
+#~ msgid "There are several factors that affect output certainty:"
+#~ msgstr "有几个因素会影响输出的确定性："
+
+#~ msgid ""
+#~ "Sampler Method: using **Greedy sample** "
+#~ "by setting `temperature=0` in "
+#~ "`SamplingParams`, e.g.:"
+#~ msgstr ""
+#~ "采样方法：通过在 `SamplingParams` 中设置 `temperature=0` "
+#~ "来使用 **贪婪采样（Greedy sample）**，例如："
+
+#~ msgid "Set the following enveriments parameters:"
+#~ msgstr "设置以下环境参数："
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/index.po
@@ -0,0 +1,78 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 17:48+0800\n"
+"PO-Revision-Date: 2025-07-18 10:05+0800\n"
+"Last-Translator: \n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/index.md:34
+msgid "Getting Started"
+msgstr "快速开始"
+
+#: ../../source/index.md:44
+msgid "User Guide"
+msgstr "用户指南"
+
+#: ../../source/index.md:54
+msgid "Developer Guide"
+msgstr "开发者指南"
+
+#: ../../source/index.md:64
+msgid "Community"
+msgstr "社区"
+
+#: ../../source/index.md:1
+msgid "Welcome to vLLM Kunlun Plugin"
+msgstr "欢迎使用 vLLM Kunlun 插件"
+
+#: ../../source/index.md:3
+msgid "vLLM"
+msgstr "vLLM"
+
+#: ../../source/index.md:25
+msgid ""
+"vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin "
+"designed to seamlessly run vLLM on the Kunlun XPU. It is the recommended "
+"approach for integrating the Kunlun backend within the vLLM community, "
+"adhering to the principles outlined in the [[RFC]: Hardware "
+"pluggable](https://github.com/vllm-project/vllm/issues/11162). This "
+"plugin provides a hardware-pluggable interface that decouples the "
+"integration of the Kunlun XPU with vLLM."
+msgstr "vLLM Kunlun（vllm-kunlun）是一个由社区维护的硬件插件，旨在无缝地在昆仑 XPU 上运行 vLLM。它是将昆仑后端集成到 vLLM 社区的推荐方法，遵循 [[RFC]：硬件可插拔](https://github.com/vllm-project/vllm/issues/11162) 中提出的原则，提供了一个硬件可插拔接口，实现了昆仑 XPU 与 vLLM 集成的解耦。"
+
+
+#: ../../source/index.md:27
+msgid ""
+"By utilizing the vLLM Kunlun plugin, popular open-source models, "
+"including Transformer-like, Mixture-of-Expert, Embedding, and Multi-modal"
+" LLMs, can run effortlessly on the Kunlun XPU."
+msgstr ""
+"通过使用 vLLM Kunlun 插件，流行的开源模型，包括 Transformer 类、混合专家、嵌入式、多模态大模型等，都可以在 Kunlun"
+" XPU 上无缝运行。"
+
+#: ../../source/index.md:31
+msgid "Documentation"
+msgstr "文档"
+
+#~ msgid ""
+#~ "vLLM Kunlun plugin (vllm-kunlun) is "
+#~ "a community maintained hardware plugin "
+#~ "for running vLLM on the Kunlun "
+#~ "XPU."
+#~ msgstr "vLLM Kunlun 插件（vllm-kunlun）是一个由社区维护的硬件插件，用于在 Kunlun XPU 上运行 vLLM。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/installation.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/installation.po
@@ -0,0 +1,260 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: 2025-07-18 10:09+0800\n"
+"Last-Translator: \n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/installation.md:1
+msgid "Installation"
+msgstr "安装"
+
+#~ msgid "This document describes how to install vllm-kunlun manually."
+#~ msgstr "本文档介绍如何手动安装 vllm-kunlun。"
+
+#~ msgid "Requirements"
+#~ msgstr "要求"
+
+#~ msgid "OS: Linux"
+#~ msgstr "操作系统：Linux"
+
+#~ msgid "Python: >= 3.9, < 3.12"
+#~ msgstr "Python：>= 3.9，< 3.12"
+
+#~ msgid "A hardware with Kunlun XPU. It's usually the Atlas 800 A2 series."
+#~ msgstr "配备有昇腾XPU的硬件，通常是Atlas 800 A2系列。"
+
+#~ msgid "Software:"
+#~ msgstr "软件："
+
+#~ msgid "Software"
+#~ msgstr "软件"
+
+#~ msgid "Supported version"
+#~ msgstr "支持的版本"
+
+#~ msgid "Note"
+#~ msgstr "注释"
+
+#~ msgid "CANN"
+#~ msgstr "CANN"
+
+#~ msgid ">= 8.1.RC1"
+#~ msgstr ">= 8.1.RC1"
+
+#~ msgid "Required for vllm-kunlun and torch-xpu"
+#~ msgstr "vllm-kunlun 和 torch-xpu 必需"
+
+#~ msgid "torch-xpu"
+#~ msgstr "torch-xpu"
+
+#~ msgid ">= 2.5.1.post1.dev20250619"
+#~ msgstr ">= 2.5.1.post1.dev20250619"
+
+#~ msgid ""
+#~ "Required for vllm-kunlun, No need "
+#~ "to install manually, it will be "
+#~ "auto installed in below steps"
+#~ msgstr "vllm-kunlun 必需，无需手动安装，后续步骤会自动安装。"
+
+#~ msgid "torch"
+#~ msgstr "torch"
+
+#~ msgid ">= 2.5.1"
+#~ msgstr ">= 2.5.1"
+
+#~ msgid "Required for torch-xpu and vllm"
+#~ msgstr "torch-xpu 和 vllm 所需"
+
+#~ msgid "You have 2 way to install:"
+#~ msgstr "你有两种安装方式："
+
+#~ msgid ""
+#~ "**Using pip**: first prepare env "
+#~ "manually or via CANN image, then "
+#~ "install `vllm-kunlun` using pip."
+#~ msgstr "**使用 pip**：首先手动准备环境或通过 CANN 镜像准备环境，然后使用 pip 安装 `vllm-kunlun`。"
+
+#~ msgid ""
+#~ "**Using docker**: use the `vllm-kunlun`"
+#~ " pre-built docker image directly."
+#~ msgstr "**使用 docker**：直接使用 `vllm-kunlun` 预构建的 docker 镜像。"
+
+#~ msgid "Configure a new environment"
+#~ msgstr "配置一个新环境"
+
+#~ msgid ""
+#~ "Before installing, you need to make "
+#~ "sure firmware/driver and CANN are "
+#~ "installed correctly, refer to "
+#~ "[link](https://kunlun.github.io/docs/sources/kunlun/quick_install.html)"
+#~ " for more details."
+#~ msgstr ""
+#~ "在安装之前，您需要确保固件/驱动和 CANN 已正确安装，更多详情请参考 "
+#~ "[链接](https://kunlun.github.io/docs/sources/kunlun/quick_install.html)。"
+
+#~ msgid "Configure hardware environment"
+#~ msgstr "配置硬件环境"
+
+#~ msgid ""
+#~ "To verify that the Kunlun XPU "
+#~ "firmware and driver were correctly "
+#~ "installed, run:"
+#~ msgstr "要验证 Kunlun XPU 固件和驱动程序是否正确安装，请运行："
+
+#~ msgid ""
+#~ "Refer to [Kunlun Environment Setup "
+#~ "Guide](https://kunlun.github.io/docs/sources/kunlun/quick_install.html)"
+#~ " for more details."
+#~ msgstr "更多详情请参考[Kunlun环境搭建指南](https://kunlun.github.io/docs/sources/kunlun/quick_install.html)。"
+
+#~ msgid "Configure software environment"
+#~ msgstr "配置软件环境"
+
+#~ msgid "Before using pip"
+#~ msgstr "在使用 pip 之前"
+
+#~ msgid ""
+#~ "The easiest way to prepare your "
+#~ "software environment is using CANN image"
+#~ " directly:"
+#~ msgstr "最简单的方式是直接使用 CANN 镜像来准备您的软件环境："
+
+#~ msgid "Click here to see \"Install CANN manually\""
+#~ msgstr "点击此处查看“手动安装 CANN”"
+
+#~ msgid "You can also install CANN manually:"
+#~ msgstr "你也可以手动安装 CANN："
+
+#~ msgid "Before using docker"
+#~ msgstr "在使用 docker 之前"
+
+#~ msgid ""
+#~ "No more extra step if you are "
+#~ "using `vllm-kunlun` prebuilt docker "
+#~ "image."
+#~ msgstr "如果你使用 `vllm-kunlun` 预构建的 docker 镜像，就无需额外的步骤。"
+
+#~ msgid "Once it's done, you can start to set up `vllm` and `vllm-kunlun`."
+#~ msgstr "完成后，你可以开始配置 `vllm` 和 `vllm-kunlun`。"
+
+#~ msgid "Setup vllm and vllm-kunlun"
+#~ msgstr "安装 vllm 和 vllm-kunlun"
+
+#~ msgid "Using pip"
+#~ msgstr "使用 pip"
+
+#~ msgid "First install system dependencies and config pip mirror:"
+#~ msgstr "首先安装系统依赖并配置 pip 镜像："
+
+#~ msgid ""
+#~ "**[Optional]** Then config the extra-"
+#~ "index of `pip` if you are working"
+#~ " on a x86 machine or using "
+#~ "torch-xpu dev version:"
+#~ msgstr "**[可选]** 如果你在 x86 机器上工作或使用 torch-xpu 开发版，请配置 `pip` 的额外索引："
+
+#~ msgid "Then you can install `vllm` and `vllm-kunlun` from **pre-built wheel**:"
+#~ msgstr "然后你可以从**预编译的 wheel 包**安装 `vllm` 和 `vllm-kunlun`："
+
+#~ msgid "Click here to see \"Build from source code\""
+#~ msgstr "点击此处查看“从源代码构建”"
+
+#~ msgid "or build from **source code**:"
+#~ msgstr "或者从**源代码**构建："
+
+#~ msgid ""
+#~ "vllm-kunlun will build custom ops "
+#~ "by default. If you don't want to"
+#~ " build it, set `COMPILE_CUSTOM_KERNELS=0` "
+#~ "environment to disable it."
+#~ msgstr ""
+#~ "vllm-kunlun 默认会编译自定义算子。如果你不想编译它，可以设置环境变量 "
+#~ "`COMPILE_CUSTOM_KERNELS=0` 来禁用。"
+
+#~ msgid ""
+#~ "If you are building from v0.7.3-dev "
+#~ "and intend to use sleep mode "
+#~ "feature, you should set "
+#~ "`COMPILE_CUSTOM_KERNELS=1` manually. To build "
+#~ "custom ops, gcc/g++ higher than 8 "
+#~ "and c++ 17 or higher is required."
+#~ " If you're using `pip install -e "
+#~ ".` and encourage a torch-xpu "
+#~ "version conflict, please install with "
+#~ "`pip install --no-build-isolation -e "
+#~ ".` to build on system env. If "
+#~ "you encounter other problems during "
+#~ "compiling, it is probably because "
+#~ "unexpected compiler is being used, you"
+#~ " may export `CXX_COMPILER` and `C_COMPILER`"
+#~ " in env to specify your g++ and"
+#~ " gcc locations before compiling."
+#~ msgstr ""
+#~ "如果你是从 v0.7.3-dev 版本开始构建，并且打算使用休眠模式功能，你需要手动设置 "
+#~ "`COMPILE_CUSTOM_KERNELS=1`。构建自定义算子时，要求 gcc/g++ 版本高于 "
+#~ "8 且支持 c++ 17 或更高标准。如果你正在使用 `pip "
+#~ "install -e .` 并且出现了 torch-xpu "
+#~ "版本冲突，请使用 `pip install --no-build-"
+#~ "isolation -e .` "
+#~ "在系统环境下进行安装。如果在编译过程中遇到其它问题，可能是因为使用了非预期的编译器，你可以在编译前通过环境变量导出 "
+#~ "`CXX_COMPILER` 和 `C_COMPILER`，以指定你的 g++ 和 "
+#~ "gcc 路径。"
+
+#~ msgid "Using docker"
+#~ msgstr "使用 docker"
+
+#~ msgid "You can just pull the **prebuilt image** and run it with bash."
+#~ msgstr "你可以直接拉取**预构建镜像**并用 bash 运行它。"
+
+#~ msgid "Click here to see \"Build from Dockerfile\""
+#~ msgstr "点击这里查看“从 Dockerfile 构建”"
+
+#~ msgid "or build IMAGE from **source code**:"
+#~ msgstr "或从**源代码**构建 IMAGE："
+
+#~ msgid ""
+#~ "The default workdir is `/workspace`, "
+#~ "vLLM and vLLM Kunlun code are "
+#~ "placed in `/vllm-workspace` and "
+#~ "installed in [development "
+#~ "mode](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)(`pip"
+#~ " install -e`) to help developer "
+#~ "immediately take place changes without "
+#~ "requiring a new installation."
+#~ msgstr ""
+#~ "默认的工作目录是 `/workspace`，vLLM 和 vLLM Kunlun "
+#~ "代码被放置在 `/vllm-"
+#~ "workspace`，并以[开发模式](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)（`pip"
+#~ " install -e`）安装，以便开发者能够即时生效更改，而无需重新安装。"
+
+#~ msgid "Extra information"
+#~ msgstr "额外信息"
+
+#~ msgid "Verify installation"
+#~ msgstr "验证安装"
+
+#~ msgid "Create and run a simple inference test. The `example.py` can be like:"
+#~ msgstr "创建并运行一个简单的推理测试。`example.py` 可以如下："
+
+#~ msgid "Then run:"
+#~ msgstr "然后运行："
+
+#~ msgid "The output will be like:"
+#~ msgstr "输出将会像这样："
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/quick_start.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/quick_start.po
@@ -0,0 +1,139 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: 2025-07-18 10:09+0800\n"
+"Last-Translator: \n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/quick_start.md:1
+msgid "Quickstart"
+msgstr "快速入门"
+
+#: ../../source/quick_start.md:3
+msgid "Prerequisites"
+msgstr "先决条件"
+
+#: ../../source/quick_start.md:5
+msgid "Supported Devices"
+msgstr "支持的设备"
+
+#~ msgid ""
+#~ "Atlas A2 Training series (Atlas 800T "
+#~ "A2, Atlas 900 A2 PoD, Atlas 200T"
+#~ " A2 Box16, Atlas 300T A2)"
+#~ msgstr ""
+#~ "Atlas A2 训练系列（Atlas 800T A2，Atlas 900"
+#~ " A2 PoD，Atlas 200T A2 Box16，Atlas "
+#~ "300T A2）"
+
+#~ msgid "Atlas 800I A2 Inference series (Atlas 800I A2)"
+#~ msgstr "Atlas 800I A2 推理系列（Atlas 800I A2）"
+
+#~ msgid "Setup environment using container"
+#~ msgstr "使用容器设置环境"
+
+#~ msgid "Ubuntu"
+#~ msgstr "Ubuntu"
+
+#~ msgid "openEuler"
+#~ msgstr "openEuler"
+
+#~ msgid ""
+#~ "The default workdir is `/workspace`, "
+#~ "vLLM and vLLM Kunlun code are "
+#~ "placed in `/vllm-workspace` and "
+#~ "installed in [development "
+#~ "mode](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)(`pip"
+#~ " install -e`) to help developer "
+#~ "immediately take place changes without "
+#~ "requiring a new installation."
+#~ msgstr ""
+#~ "默认的工作目录是 `/workspace`，vLLM 和 vLLM Kunlun "
+#~ "代码被放置在 `/vllm-"
+#~ "workspace`，并以[开发模式](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)（`pip"
+#~ " install -e`）安装，以便开发者能够即时生效更改，而无需重新安装。"
+
+#~ msgid "Usage"
+#~ msgstr "用法"
+
+#~ msgid "You can use Modelscope mirror to speed up download:"
+#~ msgstr "你可以使用 Modelscope 镜像来加速下载："
+
+#~ msgid "There are two ways to start vLLM on Kunlun XPU:"
+#~ msgstr "在昇腾 XPU 上启动 vLLM 有两种方式："
+
+#~ msgid "Offline Batched Inference"
+#~ msgstr "离线批量推理"
+
+#~ msgid ""
+#~ "With vLLM installed, you can start "
+#~ "generating texts for list of input "
+#~ "prompts (i.e. offline batch inferencing)."
+#~ msgstr "安装了 vLLM 后，您可以开始为一系列输入提示生成文本（即离线批量推理）。"
+
+#~ msgid ""
+#~ "Try to run below Python script "
+#~ "directly or use `python3` shell to "
+#~ "generate texts:"
+#~ msgstr "尝试直接运行下面的 Python 脚本，或者使用 `python3` 交互式命令行来生成文本："
+
+#~ msgid "OpenAI Completions API"
+#~ msgstr "OpenAI Completions API"
+
+#~ msgid ""
+#~ "vLLM can also be deployed as a "
+#~ "server that implements the OpenAI API"
+#~ " protocol. Run the following command "
+#~ "to start the vLLM server with the"
+#~ " [Qwen/Qwen2.5-0.5B-"
+#~ "Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) "
+#~ "model:"
+#~ msgstr ""
+#~ "vLLM 也可以作为实现 OpenAI API 协议的服务器进行部署。运行以下命令，使用"
+#~ " [Qwen/Qwen2.5-0.5B-"
+#~ "Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) "
+#~ "模型启动 vLLM 服务器："
+
+#~ msgid "If you see log as below:"
+#~ msgstr "如果你看到如下日志："
+
+#~ msgid "Congratulations, you have successfully started the vLLM server!"
+#~ msgstr "恭喜，你已经成功启动了 vLLM 服务器！"
+
+#~ msgid "You can query the list the models:"
+#~ msgstr "你可以查询模型列表："
+
+#~ msgid "You can also query the model with input prompts:"
+#~ msgstr "你也可以通过输入提示来查询模型："
+
+#~ msgid ""
+#~ "vLLM is serving as background process,"
+#~ " you can use `kill -2 $VLLM_PID` "
+#~ "to stop the background process "
+#~ "gracefully, it's equal to `Ctrl-C` to"
+#~ " stop foreground vLLM process:"
+#~ msgstr ""
+#~ "vLLM 正作为后台进程运行，你可以使用 `kill -2 $VLLM_PID` "
+#~ "来优雅地停止后台进程，这等同于使用 `Ctrl-C` 停止前台 vLLM 进程："
+
+#~ msgid "You will see output as below:"
+#~ msgstr "你将会看到如下输出："
+
+#~ msgid "Finally, you can exit container by using `ctrl-D`."
+#~ msgstr "最后，你可以通过按 `ctrl-D` 退出容器。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/DeepSeek-V3.2-Exp.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/DeepSeek-V3.2-Exp.po
@@ -0,0 +1,30 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/DeepSeek-V3.2-Exp.md:1
+msgid "DeepSeek-V3.2-Exp"
+msgstr ""
+
+#: ../../source/tutorials/DeepSeek-V3.2-Exp.md:3
+msgid "Introduction"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/index.po
@@ -0,0 +1,29 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/index.md:3
+msgid "Deployment"
+msgstr "部署"
+
+#: ../../tutorials/index.md:1
+msgid "Tutorials"
+msgstr "教程"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node.po
@@ -0,0 +1,213 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_node.md:1
+msgid "Multi-Node-DP (DeepSeek)"
+msgstr "多节点分布式处理（DeepSeek）"
+
+#: ../../source/tutorials/multi_node.md:3
+msgid "Getting Start"
+msgstr "快速开始"
+
+#~ msgid ""
+#~ "vLLM-Kunlun now supports Data Parallel"
+#~ " (DP) deployment, enabling model weights"
+#~ " to be replicated across multiple "
+#~ "XPUs or instances, each processing "
+#~ "independent batches of requests. This is"
+#~ " particularly useful for scaling throughput"
+#~ " across devices while maintaining high "
+#~ "resource utilization."
+#~ msgstr ""
+#~ "vLLM-Kunlun 现在支持数据并行（DP）部署，可以在多个 XPU "
+#~ "或实例之间复制模型权重，每个实例处理独立的请求批次。这对于在保证高资源利用率的同时，实现跨设备的吞吐量扩展特别有用。"
+
+#~ msgid ""
+#~ "Each DP rank is deployed as a "
+#~ "separate “core engine” process which "
+#~ "communicates with front-end process(es) "
+#~ "via ZMQ sockets. Data Parallel can "
+#~ "be combined with Tensor Parallel, in "
+#~ "which case each DP engine owns a"
+#~ " number of per-XPU worker processes"
+#~ " equal to the TP size."
+#~ msgstr ""
+#~ "每个 DP 进程作为一个单独的“核心引擎”进程部署，并通过 ZMQ "
+#~ "套接字与前端进程通信。数据并行可以与张量并行结合使用，此时每个 DP 引擎拥有数量等于 TP "
+#~ "大小的每 XPU 工作进程。"
+
+#~ msgid ""
+#~ "For Mixture-of-Experts (MoE) models "
+#~ "— especially advanced architectures like "
+#~ "DeepSeek that utilize Multi-head Latent"
+#~ " Attention (MLA) — a hybrid "
+#~ "parallelism approach is recommended:     - "
+#~ "Use **Data Parallelism (DP)** for "
+#~ "attention layers, which are replicated "
+#~ "across devices and handle separate "
+#~ "batches.     - Use **Expert or Tensor"
+#~ " Parallelism (EP/TP)** for expert layers,"
+#~ " which are sharded across devices to"
+#~ " distribute the computation."
+#~ msgstr ""
+#~ "对于混合专家（Mixture-of-Experts, MoE）模型——尤其是像 "
+#~ "DeepSeek 这样采用多头潜在注意力（Multi-head Latent "
+#~ "Attention, MLA）的高级架构——推荐使用混合并行策略：\n"
+#~ "    - 对于注意力层，使用 **数据并行（Data Parallelism, DP）**，这些层会在各设备间复刻，并处理不同的批次。\n"
+#~ "    - 对于专家层，使用 **专家并行或张量并行（Expert or "
+#~ "Tensor Parallelism, EP/TP）**，这些层会在设备间分片，从而分担计算。"
+
+#~ msgid ""
+#~ "This division enables attention layers "
+#~ "to be replicated across Data Parallel"
+#~ " (DP) ranks, enabling them to process"
+#~ " different batches independently. Meanwhile, "
+#~ "expert layers are partitioned (sharded) "
+#~ "across devices using Expert or Tensor"
+#~ " Parallelism(DP*TP), maximizing hardware "
+#~ "utilization and efficiency."
+#~ msgstr "这种划分使得注意力层能够在数据并行（DP）组内复制，从而能够独立处理不同的批次。同时，专家层通过专家或张量并行（DP*TP）在设备间进行分区（切片），最大化硬件利用率和效率。"
+
+#~ msgid ""
+#~ "In these cases the data parallel "
+#~ "ranks are not completely independent, "
+#~ "forward passes must be aligned and "
+#~ "expert layers across all ranks are "
+#~ "required to synchronize during every "
+#~ "forward pass, even if there are "
+#~ "fewer requests to be processed than "
+#~ "DP ranks."
+#~ msgstr ""
+#~ "在这些情况下，数据并行的各个 rank 不是完全独立的，前向传播必须对齐，并且所有 rank "
+#~ "上的专家层在每次前向传播时都需要同步，即使待处理的请求数量少于 DP rank 的数量。"
+
+#~ msgid ""
+#~ "For MoE models, when any requests "
+#~ "are in progress in any rank, we"
+#~ " must ensure that empty “dummy” "
+#~ "forward passes are performed in all "
+#~ "ranks which don’t currently have any "
+#~ "requests scheduled. This is handled via"
+#~ " a separate DP `Coordinator` process "
+#~ "which communicates with all of the "
+#~ "ranks, and a collective operation "
+#~ "performed every N steps to determine "
+#~ "when all ranks become idle and can"
+#~ " be paused. When TP is used in"
+#~ " conjunction with DP, expert layers "
+#~ "form an EP or TP group of "
+#~ "size (DP x TP)."
+#~ msgstr ""
+#~ "对于 MoE 模型，当任何一个 rank 有请求正在进行时，必须确保所有当前没有请求的"
+#~ " rank 都执行空的“虚拟”前向传播。这是通过一个单独的 DP `Coordinator`"
+#~ " 协调器进程来实现的，该进程与所有 rank 通信，并且每隔 N "
+#~ "步执行一次集体操作，以判断所有 rank 是否都处于空闲状态并可以暂停。当 TP 与 "
+#~ "DP 结合使用时，专家层会组成一个规模为（DP x TP）的 EP 或 "
+#~ "TP 组。"
+
+#~ msgid "Verify Multi-Node Communication Environment"
+#~ msgstr "验证多节点通信环境"
+
+#~ msgid "Physical Layer Requirements:"
+#~ msgstr "物理层要求："
+
+#~ msgid ""
+#~ "The physical machines must be located"
+#~ " on the same WLAN, with network "
+#~ "connectivity."
+#~ msgstr "物理机器必须位于同一个 WLAN 中，并且具有网络连接。"
+
+#~ msgid ""
+#~ "All XPUs are connected with optical "
+#~ "modules, and the connection status must"
+#~ " be normal."
+#~ msgstr "所有 XPU 都通过光模块连接，且连接状态必须正常。"
+
+#~ msgid "Verification Process:"
+#~ msgstr "验证流程："
+
+#~ msgid ""
+#~ "Execute the following commands on each"
+#~ " node in sequence. The results must"
+#~ " all be `success` and the status "
+#~ "must be `UP`:"
+#~ msgstr "在每个节点上依次执行以下命令。所有结果必须为 `success` 且状态必须为 `UP`："
+
+#~ msgid "XPU Interconnect Verification:"
+#~ msgstr "XPU 互连验证："
+
+#~ msgid "1. Get XPU IP Addresses"
+#~ msgstr "1. 获取 XPU IP 地址"
+
+#~ msgid "2. Cross-Node PING Test"
+#~ msgstr "2. 跨节点PING测试"
+
+#~ msgid "Run with docker"
+#~ msgstr "用 docker 运行"
+
+#~ msgid ""
+#~ "Assume you have two Atlas 800 "
+#~ "A2(64G*8) nodes, and want to deploy "
+#~ "the `deepseek-v3-w8a8` quantitative model "
+#~ "across multi-node."
+#~ msgstr "假设你有两台 Atlas 800 A2（64G*8）节点，并且想要在多节点上部署 `deepseek-v3-w8a8` 量化模型。"
+
+#~ msgid ""
+#~ "Before launch the inference server, "
+#~ "ensure some environment variables are "
+#~ "set for multi node communication"
+#~ msgstr "在启动推理服务器之前，确保已经为多节点通信设置了一些环境变量。"
+
+#~ msgid "Run the following scripts on two nodes respectively"
+#~ msgstr "分别在两台节点上运行以下脚本"
+
+#~ msgid "**node0**"
+#~ msgstr "**节点0**"
+
+#~ msgid "**node1**"
+#~ msgstr "**节点1**"
+
+#~ msgid ""
+#~ "The Deployment view looks like:  ![alt"
+#~ " text](../assets/multi_node_dp.png)"
+#~ msgstr "部署视图如下所示：![替代文本](../assets/multi_node_dp.png)"
+
+#~ msgid "alt text"
+#~ msgstr "替代文本"
+
+#~ msgid ""
+#~ "Once your server is started, you "
+#~ "can query the model with input "
+#~ "prompts:"
+#~ msgstr "一旦你的服务器启动，你可以通过输入提示词来查询模型："
+
+#~ msgid "Run benchmarks"
+#~ msgstr "运行基准测试"
+
+#~ msgid ""
+#~ "For details please refer to "
+#~ "[benchmark](https://github.com/vllm-project/vllm-"
+#~ "kunlun/tree/main/benchmarks)"
+#~ msgstr ""
+#~ "详细信息请参阅 [benchmark](https://github.com/vllm-project"
+#~ "/vllm-kunlun/tree/main/benchmarks)"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node_kimi.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node_kimi.po
@@ -0,0 +1,30 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_node_kimi.md:1
+msgid "Multi-Node-DP (Kimi-K2)"
+msgstr ""
+
+#: ../../source/tutorials/multi_node_kimi.md:3
+msgid "Verify Multi-Node Communication Environment"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node_pd_disaggregation_llmdatadist.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node_pd_disaggregation_llmdatadist.po
@@ -0,0 +1,30 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_node_pd_disaggregation_llmdatadist.md:1
+msgid "Prefill-Decode Disaggregation Llmdatadist Verification (Qwen)"
+msgstr ""
+
+#: ../../source/tutorials/multi_node_pd_disaggregation_llmdatadist.md:3
+msgid "Getting Start"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node_pd_disaggregation_mooncake.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node_pd_disaggregation_mooncake.po
@@ -0,0 +1,30 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_node_pd_disaggregation_mooncake.md:1
+msgid "Prefill-Decode Disaggregation Mooncake Verification (Qwen)"
+msgstr ""
+
+#: ../../source/tutorials/multi_node_pd_disaggregation_mooncake.md:3
+msgid "Getting Start"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node_qwen3vl.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node_qwen3vl.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_node_qwen3vl.md:1
+msgid "Multi-Node-DP (Qwen3-VL-235B-A22B)"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node_ray.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_node_ray.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_node_ray.md:1
+msgid "Multi-Node-Ray (Qwen/Qwen3-235B-A22B)"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu.po
@@ -0,0 +1,53 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_npu.md:1
+msgid "Multi-XPU (QwQ 32B)"
+msgstr "多-XPU（QwQ 32B）"
+
+#~ msgid "Run vllm-kunlun on Multi-XPU"
+#~ msgstr "在多XPU上运行 vllm-kunlun"
+
+#~ msgid "Run docker container:"
+#~ msgstr "运行 docker 容器："
+
+#~ msgid "Setup environment variables:"
+#~ msgstr "设置环境变量："
+
+#~ msgid "Online Inference on Multi-XPU"
+#~ msgstr "多XPU的在线推理"
+
+#~ msgid "Run the following script to start the vLLM server on Multi-XPU:"
+#~ msgstr "运行以下脚本，在多XPU上启动 vLLM 服务器："
+
+#~ msgid "Once your server is started, you can query the model with input prompts"
+#~ msgstr "一旦服务器启动，就可以通过输入提示词来查询模型。"
+
+#~ msgid "Offline Inference on Multi-XPU"
+#~ msgstr "多XPU离线推理"
+
+#~ msgid "Run the following script to execute offline inference on multi-XPU:"
+#~ msgstr "运行以下脚本以在多XPU上执行离线推理："
+
+#~ msgid "If you run this script successfully, you can see the info shown below:"
+#~ msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_moge.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_moge.po
@@ -0,0 +1,74 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_npu_moge.md:1
+msgid "Multi-XPU (Pangu Pro MoE)"
+msgstr "多XPU（Pangu Pro MoE）"
+
+#~ msgid "Run vllm-kunlun on Multi-XPU"
+#~ msgstr "在多XPU上运行 vllm-kunlun"
+
+#~ msgid "Run container:"
+#~ msgstr "运行容器："
+
+#~ msgid "Setup environment variables:"
+#~ msgstr "设置环境变量："
+
+#~ msgid "Download the model:"
+#~ msgstr "下载该模型："
+
+#~ msgid "Online Inference on Multi-XPU"
+#~ msgstr "多XPU上的在线推理"
+
+#~ msgid "Run the following script to start the vLLM server on Multi-XPU:"
+#~ msgstr "运行以下脚本，在多XPU上启动 vLLM 服务器："
+
+#~ msgid ""
+#~ "Once your server is started, you "
+#~ "can query the model with input "
+#~ "prompts:"
+#~ msgstr "一旦你的服务器启动，你可以通过输入提示词来查询模型："
+
+#~ msgid "v1/completions"
+#~ msgstr "v1/补全"
+
+#~ msgid "v1/chat/completions"
+#~ msgstr "v1/chat/completions"
+
+#~ msgid "If you run this successfully, you can see the info shown below:"
+#~ msgstr "如果你成功运行这个，你可以看到如下所示的信息："
+
+#~ msgid "Offline Inference on Multi-XPU"
+#~ msgstr "多XPU离线推理"
+
+#~ msgid "Run the following script to execute offline inference on multi-XPU:"
+#~ msgstr "运行以下脚本以在多XPU上执行离线推理："
+
+#~ msgid "Graph Mode"
+#~ msgstr "图模式"
+
+#~ msgid "Eager Mode"
+#~ msgstr "即时模式"
+
+#~ msgid "If you run this script successfully, you can see the info shown below:"
+#~ msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_quantization.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_quantization.po
@@ -0,0 +1,82 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_npu_quantization.md:1
+msgid "Multi-XPU (QwQ 32B W8A8)"
+msgstr "多XPU（QwQ 32B W8A8）"
+
+#: ../../source/tutorials/multi_npu_quantization.md:3
+#, fuzzy
+msgid "Run Docker Container"
+msgstr "运行 docker 容器"
+
+#~ msgid "w8a8 quantization feature is supported by v0.8.4rc2 or higher"
+#~ msgstr "w8a8 量化功能由 v0.8.4rc2 或更高版本支持"
+
+#~ msgid "Install modelslim and convert model"
+#~ msgstr "安装 modelslim 并转换模型"
+
+#~ msgid ""
+#~ "You can choose to convert the "
+#~ "model yourself or use the quantized "
+#~ "model we uploaded,  see "
+#~ "https://www.modelscope.cn/models/vllm-kunlun/QwQ-32B-"
+#~ "W8A8"
+#~ msgstr ""
+#~ "你可以选择自己转换模型，或者使用我们上传的量化模型，详见 https://www.modelscope.cn/models"
+#~ "/vllm-kunlun/QwQ-32B-W8A8"
+
+#~ msgid "Verify the quantized model"
+#~ msgstr "验证量化模型"
+
+#~ msgid "The converted model files looks like:"
+#~ msgstr "转换后的模型文件如下所示："
+
+#~ msgid "Run the following script to start the vLLM server with quantized model:"
+#~ msgstr "运行以下脚本以启动带有量化模型的 vLLM 服务器："
+
+#~ msgid ""
+#~ "The value \"kunlun\" for \"--"
+#~ "quantization\" argument will be supported "
+#~ "after [a specific PR](https://github.com/vllm-"
+#~ "project/vllm-kunlun/pull/877) is merged and"
+#~ " released, you can cherry-pick this"
+#~ " commit for now."
+#~ msgstr ""
+#~ "在 [特定的PR](https://github.com/vllm-project/vllm-"
+#~ "kunlun/pull/877) 合并并发布后， \"--quantization\" "
+#~ "参数将支持值 \"kunlun\"，你也可以现在手动挑选该提交。"
+
+#~ msgid "Once your server is started, you can query the model with input prompts"
+#~ msgstr "一旦服务器启动，就可以通过输入提示词来查询模型。"
+
+#~ msgid ""
+#~ "Run the following script to execute "
+#~ "offline inference on multi-XPU with "
+#~ "quantized model:"
+#~ msgstr "运行以下脚本，在多XPU上使用量化模型执行离线推理："
+
+#~ msgid ""
+#~ "To enable quantization for kunlun, "
+#~ "quantization method must be \"kunlun\""
+#~ msgstr "要在kunlun上启用量化，量化方法必须为“kunlun”。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_qwen3_moe.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_qwen3_moe.po
@@ -0,0 +1,63 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_npu_qwen3_moe.md:1
+msgid "Multi-XPU (Qwen3-30B-A3B)"
+msgstr "多XPU（Qwen3-30B-A3B）"
+
+#~ msgid "Run vllm-kunlun on Multi-XPU with Qwen3 MoE"
+#~ msgstr "在多XPU上运行带有Qwen3 MoE的vllm-kunlun"
+
+#~ msgid "Run docker container:"
+#~ msgstr "运行 docker 容器："
+
+#~ msgid "Setup environment variables:"
+#~ msgstr "设置环境变量："
+
+#~ msgid "Online Inference on Multi-XPU"
+#~ msgstr "多XPU的在线推理"
+
+#~ msgid "Run the following script to start the vLLM server on Multi-XPU:"
+#~ msgstr "运行以下脚本以在多XPU上启动 vLLM 服务器："
+
+#~ msgid ""
+#~ "For an Atlas A2 with 64GB of "
+#~ "XPU card memory, tensor-parallel-size"
+#~ " should be at least 2, and for"
+#~ " 32GB of memory, tensor-parallel-size"
+#~ " should be at least 4."
+#~ msgstr ""
+#~ "对于拥有64GB XPU卡内存的Atlas A2，tensor-parallel-size"
+#~ " 至少应为2；对于32GB内存的XPU卡，tensor-parallel-size 至少应为4。"
+
+#~ msgid "Once your server is started, you can query the model with input prompts"
+#~ msgstr "一旦服务器启动，就可以通过输入提示词来查询模型。"
+
+#~ msgid "Offline Inference on Multi-XPU"
+#~ msgstr "多XPU离线推理"
+
+#~ msgid "Run the following script to execute offline inference on multi-XPU:"
+#~ msgstr "运行以下脚本以在多XPU上执行离线推理："
+
+#~ msgid "If you run this script successfully, you can see the info shown below:"
+#~ msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_qwen3_next.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/multi_npu_qwen3_next.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/multi_npu_qwen3_next.md:1
+msgid "Multi-XPU (Qwen3-Next)"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_node_300i.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_node_300i.po
@@ -0,0 +1,94 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/single_node_300i.md:1
+#, fuzzy
+msgid "Single Node (Atlas 300I Series)"
+msgstr "单节点（Atlas 300I 系列）"
+
+#~ msgid ""
+#~ "This Atlas 300I series is currently "
+#~ "experimental. In future versions, there "
+#~ "may be behavioral changes around model"
+#~ " coverage, performance improvement."
+#~ msgstr "Atlas 300I 系列目前处于实验阶段。在未来的版本中，模型覆盖范围和性能提升方面可能会有行为上的变化。"
+
+#~ msgid "Run vLLM on Altlas 300I series"
+#~ msgstr "在 Altlas 300I 系列上运行 vLLM"
+
+#~ msgid "Run docker container:"
+#~ msgstr "运行 docker 容器："
+
+#~ msgid "Setup environment variables:"
+#~ msgstr "设置环境变量："
+
+#~ msgid "Online Inference on XPU"
+#~ msgstr "在XPU上进行在线推理"
+
+#~ msgid ""
+#~ "Run the following script to start "
+#~ "the vLLM server on XPU(Qwen3-0.6B:1 "
+#~ "card, Qwen2.5-7B-Instruct:2 cards, Pangu-"
+#~ "Pro-MoE-72B: 8 cards):"
+#~ msgstr ""
+#~ "运行以下脚本，在 XPU 上启动 vLLM 服务器（Qwen3-0.6B：1 "
+#~ "张卡，Qwen2.5-7B-Instruct：2 张卡，Pangu-Pro-MoE-"
+#~ "72B：8 张卡）："
+
+#~ msgid "Qwen3-0.6B"
+#~ msgstr "Qwen3-0.6B"
+
+#~ msgid "Run the following command to start the vLLM server:"
+#~ msgstr "运行以下命令以启动 vLLM 服务器："
+
+#~ msgid "Once your server is started, you can query the model with input prompts"
+#~ msgstr "一旦服务器启动，就可以通过输入提示词来查询模型。"
+
+#~ msgid "Qwen/Qwen2.5-7B-Instruct"
+#~ msgstr "Qwen/Qwen2.5-7B-Instruct"
+
+#~ msgid "Pangu-Pro-MoE-72B"
+#~ msgstr "Pangu-Pro-MoE-72B"
+
+#~ msgid "Download the model:"
+#~ msgstr "下载该模型："
+
+#~ msgid "If you run this script successfully, you can see the results."
+#~ msgstr "如果你成功运行此脚本，你就可以看到结果。"
+
+#~ msgid "Offline Inference"
+#~ msgstr "离线推理"
+
+#~ msgid ""
+#~ "Run the following script (`example.py`) "
+#~ "to execute offline inference on XPU:"
+#~ msgstr "运行以下脚本（`example.py`）以在 XPU 上执行离线推理："
+
+#~ msgid "Qwen2.5-7B-Instruct"
+#~ msgstr "Qwen2.5-7B-指令版"
+
+#~ msgid "Run script:"
+#~ msgstr "运行脚本："
+
+#~ msgid "If you run this script successfully, you can see the info shown below:"
+#~ msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu.po
@@ -0,0 +1,106 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/single_npu.md:1
+msgid "Single XPU (Qwen3 8B)"
+msgstr "单个XPU（Qwen3 8B）"
+
+#: ../../source/tutorials/single_npu.md:3
+msgid "Run vllm-kunlun on Single XPU"
+msgstr "在单个 XPU 上运行 vllm-kunlun"
+
+#: ../../source/tutorials/single_npu.md:5
+msgid "Offline Inference on Single XPU"
+msgstr "在单个XPU上进行离线推理"
+
+#~ msgid "Run docker container:"
+#~ msgstr "运行 docker 容器："
+
+#~ msgid "Setup environment variables:"
+#~ msgstr "设置环境变量："
+
+#~ msgid ""
+#~ "`max_split_size_mb` prevents the native "
+#~ "allocator from splitting blocks larger "
+#~ "than this size (in MB). This can"
+#~ " reduce fragmentation and may allow "
+#~ "some borderline workloads to complete "
+#~ "without running out of memory. You "
+#~ "can find more details "
+#~ "[<u>here</u>](https://www.hikunlun.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
+#~ msgstr ""
+#~ "`max_split_size_mb` 防止本地分配器拆分超过此大小（以 MB "
+#~ "为单位）的内存块。这可以减少内存碎片，并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hikunlun.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
+
+#~ msgid "Run the following script to execute offline inference on a single XPU:"
+#~ msgstr "运行以下脚本以在单个 XPU 上执行离线推理："
+
+#~ msgid "Graph Mode"
+#~ msgstr "图模式"
+
+#~ msgid "Eager Mode"
+#~ msgstr "即时模式"
+
+#~ msgid "If you run this script successfully, you can see the info shown below:"
+#~ msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
+#~ msgid "Online Serving on Single XPU"
+#~ msgstr "单个 XPU 上的在线服务"
+
+#~ msgid "Run docker container to start the vLLM server on a single XPU:"
+#~ msgstr "运行 docker 容器，在单个 XPU 上启动 vLLM 服务器："
+
+#~ msgid ""
+#~ "Add `--max_model_len` option to avoid "
+#~ "ValueError that the Qwen2.5-7B model's "
+#~ "max seq len (32768) is larger than"
+#~ " the maximum number of tokens that"
+#~ " can be stored in KV cache "
+#~ "(26240). This will differ with different"
+#~ " XPU series base on the HBM "
+#~ "size. Please modify the value according"
+#~ " to a suitable value for your "
+#~ "XPU series."
+#~ msgstr ""
+#~ "添加 `--max_model_len` 选项，以避免出现 Qwen2.5-7B "
+#~ "模型的最大序列长度（32768）大于 KV 缓存能存储的最大 token "
+#~ "数（26240）时的 ValueError。不同 XPU 系列由于 HBM "
+#~ "容量不同，该值也会有所不同。请根据您的 XPU 系列，修改为合适的数值。"
+
+#~ msgid "If your service start successfully, you can see the info shown below:"
+#~ msgstr "如果你的服务启动成功，你会看到如下所示的信息："
+
+#~ msgid ""
+#~ "Once your server is started, you "
+#~ "can query the model with input "
+#~ "prompts:"
+#~ msgstr "一旦你的服务器启动，你可以通过输入提示词来查询模型："
+
+#~ msgid ""
+#~ "If you query the server successfully,"
+#~ " you can see the info shown "
+#~ "below (client):"
+#~ msgstr "如果你成功查询了服务器，你可以看到如下所示的信息（客户端）："
+
+#~ msgid "Logs of the vllm server:"
+#~ msgstr "vllm 服务器的日志："
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_audio.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_audio.po
@@ -0,0 +1,77 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/single_npu_audio.md:1
+msgid "Single XPU (Qwen2-Audio 7B)"
+msgstr "单个 XPU（Qwen2-Audio 7B）"
+
+#: ../../tutorials/single_npu_audio.md:3
+msgid "Run vllm-kunlun on Single XPU"
+msgstr "在单个 XPU 上运行 vllm-kunlun"
+
+#: ../../tutorials/single_npu_audio.md:5
+msgid "Offline Inference on Single XPU"
+msgstr "在单个XPU上进行离线推理"
+
+#: ../../tutorials/single_npu_audio.md:7
+msgid "Run docker container:"
+msgstr "运行 docker 容器："
+
+#: ../../tutorials/single_npu_audio.md:29
+msgid "Setup environment variables:"
+msgstr "设置环境变量："
+
+#: ../../tutorials/single_npu_audio.md:40
+msgid ""
+"`max_split_size_mb` prevents the native allocator from splitting blocks "
+"larger than this size (in MB). This can reduce fragmentation and may allow "
+"some borderline workloads to complete without running out of memory. You can"
+" find more details "
+"[<u>here</u>](https://www.hikunlun.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
+msgstr ""
+"`max_split_size_mb` 防止本地分配器拆分超过此大小（以 MB "
+"为单位）的内存块。这可以减少内存碎片，并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hikunlun.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
+
+#: ../../tutorials/single_npu_audio.md:43
+msgid "Install packages required for audio processing:"
+msgstr "安装音频处理所需的软件包："
+
+#: ../../tutorials/single_npu_audio.md:50
+msgid "Run the following script to execute offline inference on a single XPU:"
+msgstr "运行以下脚本以在单个 XPU 上执行离线推理："
+
+#: ../../tutorials/single_npu_audio.md:114
+msgid "If you run this script successfully, you can see the info shown below:"
+msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
+#: ../../tutorials/single_npu_audio.md:120
+msgid "Online Serving on Single XPU"
+msgstr "单个 XPU 上的在线服务"
+
+#: ../../tutorials/single_npu_audio.md:122
+msgid ""
+"Currently, vllm's OpenAI-compatible server doesn't support audio inputs, "
+"find more details [<u>here</u>](https://github.com/vllm-"
+"project/vllm/issues/19977)."
+msgstr ""
+"目前，vllm 的兼容 OpenAI 的服务器不支持音频输入，更多详情请查看[<u>这里</u>](https://github.com/vllm-"
+"project/vllm/issues/19977)。"
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_multimodal.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_multimodal.po
@@ -0,0 +1,99 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-07-18 09:01+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Language: zh_CN\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../tutorials/single_npu_multimodal.md:1
+msgid "Single XPU (Qwen2.5-VL 7B)"
+msgstr "单个XPU（Qwen2.5-VL 7B）"
+
+#: ../../tutorials/single_npu_multimodal.md:3
+msgid "Run vllm-kunlun on Single XPU"
+msgstr "在单个 XPU 上运行 vllm-kunlun"
+
+#: ../../tutorials/single_npu_multimodal.md:5
+msgid "Offline Inference on Single XPU"
+msgstr "在单个XPU上进行离线推理"
+
+#: ../../tutorials/single_npu_multimodal.md:7
+msgid "Run docker container:"
+msgstr "运行 docker 容器："
+
+#: ../../tutorials/single_npu_multimodal.md:29
+msgid "Setup environment variables:"
+msgstr "设置环境变量："
+
+#: ../../tutorials/single_npu_multimodal.md:40
+msgid ""
+"`max_split_size_mb` prevents the native allocator from splitting blocks "
+"larger than this size (in MB). This can reduce fragmentation and may allow "
+"some borderline workloads to complete without running out of memory. You can"
+" find more details "
+"[<u>here</u>](https://www.hikunlun.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)."
+msgstr ""
+"`max_split_size_mb` 防止本地分配器拆分超过此大小（以 MB "
+"为单位）的内存块。这可以减少内存碎片，并且可能让一些边缘情况下的工作负载顺利完成而不会耗尽内存。你可以在[<u>这里</u>](https://www.hikunlun.com/document/detail/zh/CANNCommunityEdition/800alpha003/apiref/envref/envref_07_0061.html)找到更多详细信息。"
+
+#: ../../tutorials/single_npu_multimodal.md:43
+msgid "Run the following script to execute offline inference on a single XPU:"
+msgstr "运行以下脚本以在单个 XPU 上执行离线推理："
+
+#: ../../tutorials/single_npu_multimodal.md:109
+msgid "If you run this script successfully, you can see the info shown below:"
+msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
+#: ../../tutorials/single_npu_multimodal.md:121
+msgid "Online Serving on Single XPU"
+msgstr "单个 XPU 上的在线服务"
+
+#: ../../tutorials/single_npu_multimodal.md:123
+msgid "Run docker container to start the vLLM server on a single XPU:"
+msgstr "运行 docker 容器，在单个 XPU 上启动 vLLM 服务器："
+
+#: ../../tutorials/single_npu_multimodal.md:154
+msgid ""
+"Add `--max_model_len` option to avoid ValueError that the "
+"Qwen2.5-VL-7B-Instruct model's max seq len (128000) is larger than the "
+"maximum number of tokens that can be stored in KV cache. This will differ "
+"with different XPU series base on the HBM size. Please modify the value "
+"according to a suitable value for your XPU series."
+msgstr ""
+"新增 `--max_model_len` 选项，以避免出现 ValueError，即 Qwen2.5-VL-7B-Instruct "
+"模型的最大序列长度（128000）大于 KV 缓存可存储的最大 token 数。该数值会根据不同 XPU 系列的 HBM 大小而不同。请根据你的 XPU"
+" 系列，将该值设置为合适的数值。"
+
+#: ../../tutorials/single_npu_multimodal.md:157
+msgid "If your service start successfully, you can see the info shown below:"
+msgstr "如果你的服务启动成功，你会看到如下所示的信息："
+
+#: ../../tutorials/single_npu_multimodal.md:165
+msgid ""
+"Once your server is started, you can query the model with input prompts:"
+msgstr "一旦你的服务器启动，你可以通过输入提示词来查询模型："
+
+#: ../../tutorials/single_npu_multimodal.md:182
+msgid ""
+"If you query the server successfully, you can see the info shown below "
+"(client):"
+msgstr "如果你成功查询了服务器，你可以看到如下所示的信息（客户端）："
+
+#: ../../tutorials/single_npu_multimodal.md:188
+msgid "Logs of the vllm server:"
+msgstr "vllm 服务器的日志："
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_qwen2.5_vl.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_qwen2.5_vl.po
@@ -0,0 +1,38 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/single_npu_qwen2.5_vl.md:1
+msgid "Single XPU (Qwen2.5-VL 7B)"
+msgstr ""
+
+#: ../../source/tutorials/single_npu_qwen2.5_vl.md:3
+msgid "Run vllm-kunlun on Single XPU"
+msgstr ""
+
+#: ../../source/tutorials/single_npu_qwen2.5_vl.md:5
+msgid "Offline Inference on Single XPU"
+msgstr ""
+
+#: ../../source/tutorials/single_npu_qwen2.5_vl.md:7
+msgid "Run docker container:"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_qwen2_audio.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_qwen2_audio.po
@@ -0,0 +1,38 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/single_npu_qwen2_audio.md:1
+msgid "Single XPU (Qwen2-Audio 7B)"
+msgstr ""
+
+#: ../../source/tutorials/single_npu_qwen2_audio.md:3
+msgid "Run vllm-kunlun on Single XPU"
+msgstr ""
+
+#: ../../source/tutorials/single_npu_qwen2_audio.md:5
+msgid "Offline Inference on Single XPU"
+msgstr ""
+
+#: ../../source/tutorials/single_npu_qwen2_audio.md:7
+msgid "Run docker container:"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_qwen3_embedding.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_qwen3_embedding.po
@@ -0,0 +1,77 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/single_npu_qwen3_embedding.md:1
+msgid "Single XPU (Qwen3-Embedding-8B)"
+msgstr "单个XPU（Qwen3-Embedding-8B）"
+
+#: ../../source/tutorials/single_npu_qwen3_embedding.md:3
+msgid ""
+"The Qwen3 Embedding model series is the latest proprietary model of the "
+"Qwen family,"
+msgstr ""
+
+#~ msgid ""
+#~ "The Qwen3 Embedding model series is "
+#~ "the latest proprietary model of the "
+#~ "Qwen family, specifically designed for "
+#~ "text embedding and ranking tasks. "
+#~ "Building upon the dense foundational "
+#~ "models of the Qwen3 series, it "
+#~ "provides a comprehensive range of text"
+#~ " embeddings and reranking models in "
+#~ "various sizes (0.6B, 4B, and 8B). "
+#~ "This guide describes how to run "
+#~ "the model with vLLM Kunlun. Note "
+#~ "that only 0.9.2rc1 and higher versions"
+#~ " of vLLM Kunlun support the model."
+#~ msgstr ""
+#~ "Qwen3 Embedding 模型系列是 Qwen "
+#~ "家族最新的专有模型，专为文本嵌入和排序任务设计。在 Qwen3 "
+#~ "系列的密集基础模型之上，它提供了多种尺寸（0.6B、4B 和 8B）的文本嵌入与重排序模型。本指南介绍如何使用"
+#~ " vLLM Kunlun 运行该模型。请注意，只有 vLLM Kunlun "
+#~ "0.9.2rc1 及更高版本才支持该模型。"
+
+#~ msgid "Run docker container"
+#~ msgstr "运行 docker 容器"
+
+#~ msgid ""
+#~ "Take Qwen3-Embedding-8B model as an "
+#~ "example, first run the docker container"
+#~ " with the following command:"
+#~ msgstr "以 Qwen3-Embedding-8B 模型为例，首先使用以下命令运行 docker 容器："
+
+#~ msgid "Setup environment variables:"
+#~ msgstr "设置环境变量："
+
+#~ msgid "Online Inference"
+#~ msgstr "在线推理"
+
+#~ msgid "Once your server is started, you can query the model with input prompts"
+#~ msgstr "一旦服务器启动，就可以通过输入提示词来查询模型。"
+
+#~ msgid "Offline Inference"
+#~ msgstr "离线推理"
+
+#~ msgid "If you run this script successfully, you can see the info shown below:"
+#~ msgstr "如果你成功运行此脚本，你可以看到如下所示的信息："
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_qwen3_quantization.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/tutorials/single_npu_qwen3_quantization.po
@@ -0,0 +1,30 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/tutorials/single_npu_qwen3_quantization.md:1
+msgid "Single-XPU (Qwen3 8B W4A8)"
+msgstr ""
+
+#: ../../source/tutorials/single_npu_qwen3_quantization.md:3
+msgid "Run Docker Container"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/additional_config.po
@@ -0,0 +1,245 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/configuration/additional_config.md:1
+msgid "Additional Configuration"
+msgstr "附加配置"
+
+#~ msgid ""
+#~ "additional configuration is a mechanism "
+#~ "provided by vLLM to allow plugins "
+#~ "to control inner behavior by their "
+#~ "own. vLLM Kunlun uses this mechanism "
+#~ "to make the project more flexible."
+#~ msgstr "额外配置是 vLLM 提供的一种机制，允许插件自行控制内部行为。vLLM Kunlun 利用这种机制使项目更加灵活。"
+
+#~ msgid "How to use"
+#~ msgstr "如何使用"
+
+#~ msgid ""
+#~ "With either online mode or offline "
+#~ "mode, users can use additional "
+#~ "configuration. Take Qwen3 as an example:"
+#~ msgstr "无论是在线模式还是离线模式，用户都可以使用额外的配置。以 Qwen3 为例："
+
+#~ msgid "**Online mode**:"
+#~ msgstr "**在线模式**："
+
+#~ msgid "**Offline mode**:"
+#~ msgstr "**离线模式**："
+
+#~ msgid "Configuration options"
+#~ msgstr "配置选项"
+
+#~ msgid ""
+#~ "The following table lists the additional"
+#~ " configuration options available in vLLM"
+#~ " Kunlun:"
+#~ msgstr "下表列出了 vLLM Kunlun 中可用的其他配置选项："
+
+#~ msgid "Name"
+#~ msgstr "名称"
+
+#~ msgid "Type"
+#~ msgstr "类型"
+
+#~ msgid "Default"
+#~ msgstr "默认"
+
+#~ msgid "Description"
+#~ msgstr "描述"
+
+#~ msgid "`torchair_graph_config`"
+#~ msgstr "`torchair_graph_config`"
+
+#~ msgid "dict"
+#~ msgstr "dict"
+
+#~ msgid "`{}`"
+#~ msgstr "`{}`"
+
+#~ msgid "The config options for torchair graph mode"
+#~ msgstr "torchair 图模式的配置选项"
+
+#~ msgid "`kunlun_scheduler_config`"
+#~ msgstr "`kunlun_scheduler_config`"
+
+#~ msgid "The config options for kunlun scheduler"
+#~ msgstr "kunlun 调度器的配置选项"
+
+#~ msgid "`expert_tensor_parallel_size`"
+#~ msgstr "`expert_tensor_parallel_size`"
+
+#~ msgid "str"
+#~ msgstr "str"
+
+#~ msgid "`0`"
+#~ msgstr "`0`"
+
+#~ msgid "Expert tensor parallel size the model to use."
+#~ msgstr "专家张量并行的模型大小设置。"
+
+#~ msgid "`refresh`"
+#~ msgstr "`刷新`"
+
+#~ msgid "bool"
+#~ msgstr "bool"
+
+#~ msgid "`false`"
+#~ msgstr "`false`"
+
+#~ msgid ""
+#~ "Whether to refresh global kunlun config"
+#~ " content. This value is usually used"
+#~ " by rlhf or ut/e2e test case."
+#~ msgstr "是否刷新全局 kunlun 配置信息。此值通常由 rlhf 或 ut/e2e 测试用例使用。"
+
+#~ msgid "`expert_map_path`"
+#~ msgstr "`expert_map_path`"
+
+#~ msgid "`None`"
+#~ msgstr "`None`"
+
+#~ msgid ""
+#~ "When using expert load balancing for "
+#~ "the MOE model, an expert map path"
+#~ " needs to be passed in."
+#~ msgstr "在为MOE模型使用专家负载均衡时，需要传入专家映射路径。"
+
+#~ msgid "`False`"
+#~ msgstr "`False`"
+
+#~ msgid "Whether to enable the fused operator-like chunked_prefill."
+#~ msgstr "是否启用类似算子融合的 chunked_prefill 功能。"
+
+#~ msgid "`kv_cache_dtype`"
+#~ msgstr "`kv_cache_dtype`"
+
+#~ msgid ""
+#~ "When using the kv cache quantization "
+#~ "method, kv cache dtype needs to be"
+#~ " set, currently only int8 is "
+#~ "supported."
+#~ msgstr "当使用kv缓存量化方法时，需要设置kv缓存的数据类型，目前仅支持int8。"
+
+#~ msgid "The details of each config option are as follows:"
+#~ msgstr "每个配置选项的详细信息如下："
+
+#~ msgid "**torchair_graph_config**"
+#~ msgstr "**torchair_graph_config**"
+
+#~ msgid "`enabled`"
+#~ msgstr "`启用`"
+
+#~ msgid ""
+#~ "Whether to enable torchair graph mode."
+#~ " Currently only DeepSeek series models "
+#~ "and PanguProMoE are supported to use "
+#~ "torchair graph mode"
+#~ msgstr "是否启用 torchair 图模式。目前仅支持 DeepSeek 系列模型和 PanguProMoE 使用 torchair 图模式。"
+
+#~ msgid "`enable_multistream_mla`"
+#~ msgstr "`enable_multistream_mla`"
+
+#~ msgid ""
+#~ "Whether to put vector ops of MLA"
+#~ " to another stream. This option only"
+#~ " takes effects on models using MLA"
+#~ " (e.g., DeepSeek)."
+#~ msgstr "是否将MLA的向量操作放到另一个流中。此选项仅对使用MLA的模型（例如，DeepSeek）有效。"
+
+#~ msgid "`multistream_overlap_shared_expert`"
+#~ msgstr "`multistream_overlap_shared_expert`"
+
+#~ msgid ""
+#~ "Whether to enable multistream shared "
+#~ "expert. This option only takes effects"
+#~ " on DeepSeek moe models."
+#~ msgstr "是否启用多流共享专家功能。此选项仅对 DeepSeek MoE 模型生效。"
+
+#~ msgid "`enable_view_optimize`"
+#~ msgstr "`enable_view_optimize` （启用视图优化）"
+
+#~ msgid "`True`"
+#~ msgstr "`True`"
+
+#~ msgid "Whether to enable torchair view optimization"
+#~ msgstr "是否启用torchair视图优化"
+
+#~ msgid "`use_cached_graph`"
+#~ msgstr "`use_cached_graph`"
+
+#~ msgid "Whether to use cached graph"
+#~ msgstr "是否使用缓存的图"
+
+#~ msgid "`graph_batch_sizes`"
+#~ msgstr "`graph_batch_sizes`"
+
+#~ msgid "list[int]"
+#~ msgstr "list[int]"
+
+#~ msgid "`[]`"
+#~ msgstr "`[]`"
+
+#~ msgid "The batch size for torchair graph cache"
+#~ msgstr "torchair 图缓存的批量大小"
+
+#~ msgid "`graph_batch_sizes_init`"
+#~ msgstr "`graph_batch_sizes_init`"
+
+#~ msgid "Init graph batch size dynamically if `graph_batch_sizes` is empty"
+#~ msgstr "如果 `graph_batch_sizes` 为空，则动态初始化图批大小"
+
+#~ msgid "`enable_kv_nz`"
+#~ msgstr "`enable_kv_nz`"
+
+#~ msgid ""
+#~ "Whether to enable kvcache NZ layout. "
+#~ "This option only takes effects on "
+#~ "models using MLA (e.g., DeepSeek)."
+#~ msgstr "是否启用 kvcache NZ 布局。此选项仅对使用 MLA 的模型（例如 DeepSeek）生效。"
+
+#~ msgid "**kunlun_scheduler_config**"
+#~ msgstr "**kunlun_scheduler_config**"
+
+#~ msgid "Whether to enable kunlun scheduler for V1 engine"
+#~ msgstr "是否为 V1 引擎启用 kunlun 调度器"
+
+#~ msgid ""
+#~ "kunlun_scheduler_config also support the "
+#~ "options from [vllm scheduler "
+#~ "config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig)."
+#~ " For example, you can add "
+#~ "`enable_chunked_prefill: True` to "
+#~ "kunlun_scheduler_config as well."
+#~ msgstr ""
+#~ "kunlun_scheduler_config 也支持来自 [vllm scheduler "
+#~ "config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig)"
+#~ " 的选项。例如，你也可以在 kunlun_scheduler_config 中添加 "
+#~ "`enable_chunked_prefill: True`。"
+
+#~ msgid "Example"
+#~ msgstr "示例"
+
+#~ msgid "An example of additional configuration is as follows:"
+#~ msgstr "以下是额外配置的一个示例："
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/env_vars.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/env_vars.po
@@ -0,0 +1,29 @@
+# Translations template for PROJECT.
+# Copyright (C) 2025 ORGANIZATION
+# This file is distributed under the same license as the PROJECT project.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: PROJECT VERSION\n"
+"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: LANGUAGE <LL@li.org>\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/configuration/env_vars.md:1
+msgid "Environment Variables"
+msgstr "环境变量"
+
+#~ msgid ""
+#~ "vllm-kunlun uses the following "
+#~ "environment variables to configure the "
+#~ "system:"
+#~ msgstr "vllm-kunlun 使用以下环境变量来配置系统："
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/configuration/index.po
@@ -0,0 +1,32 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 19:12+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/configuration/index.md:1
+#: ../../source/user_guide/configuration/index.md:5
+msgid "Configuration Guide"
+msgstr "配置指南"
+
+#: ../../source/user_guide/configuration/index.md:3
+#, fuzzy
+msgid "This section provides a detailed configuration guide of vLLM Kunlun."
+msgstr "本节提供了 vLLM Kunlun 的详细配置指南。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/dynamic_batch.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/dynamic_batch.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/feature_guide/dynamic_batch.md:1
+msgid "Dynamic Batch"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/eplb_swift_balancer.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/eplb_swift_balancer.po
@@ -0,0 +1,30 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/feature_guide/eplb_swift_balancer.md:1
+msgid "Expert Load Balance (EPLB)"
+msgstr ""
+
+#: ../../source/user_guide/feature_guide/eplb_swift_balancer.md:3
+msgid "Overview"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/graph_mode.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/graph_mode.po
@@ -0,0 +1,126 @@
+# Translations template for PROJECT.
+# Copyright (C) 2025 ORGANIZATION
+# This file is distributed under the same license as the PROJECT project.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: PROJECT VERSION\n"
+"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: LANGUAGE <LL@li.org>\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/feature_guide/graph_mode.md:1
+msgid "Graph Mode Guide"
+msgstr "图模式指南"
+
+#~ msgid ""
+#~ "This feature is currently experimental. "
+#~ "In future versions, there may be "
+#~ "behavioral changes around configuration, "
+#~ "coverage, performance improvement."
+#~ msgstr "此功能目前为实验性功能。在未来的版本中，配置、覆盖率和性能改进等方面的行为可能会有变化。"
+
+#~ msgid ""
+#~ "This guide provides instructions for "
+#~ "using Kunlun Graph Mode with vLLM "
+#~ "Kunlun. Please note that graph mode "
+#~ "is only available on V1 Engine. "
+#~ "And only Qwen, DeepSeek series models"
+#~ " are well tested from 0.9.0rc1. We'll"
+#~ " make it stable and generalize in "
+#~ "the next release."
+#~ msgstr ""
+#~ "本指南提供了在 vLLM Kunlun 上使用 Kunlun "
+#~ "图模式的操作说明。请注意，图模式仅在 V1 引擎上可用，并且从 0.9.0rc1 起，仅对"
+#~ " Qwen、DeepSeek 系列模型进行了充分测试。我们将在下一个版本中使其更加稳定和通用。"
+
+#~ msgid "Getting Started"
+#~ msgstr "快速入门"
+
+#~ msgid ""
+#~ "From v0.9.1rc1 with V1 Engine, vLLM "
+#~ "Kunlun will run models in graph "
+#~ "mode by default to keep the same"
+#~ " behavior with vLLM. If you hit "
+#~ "any issues, please feel free to "
+#~ "open an issue on GitHub and "
+#~ "fallback to eager mode temporarily by"
+#~ " set `enforce_eager=True` when initializing "
+#~ "the model."
+#~ msgstr ""
+#~ "从 v0.9.1rc1 版本起，使用 V1 引擎时，vLLM Kunlun"
+#~ " 默认将在图模式下运行模型，以保持与 vLLM 同样的行为。如果遇到任何问题，欢迎在 GitHub"
+#~ " 上提交 issue，并在初始化模型时通过设置 `enforce_eager=True` "
+#~ "临时切换回 eager 模式。"
+
+#~ msgid "There are two kinds for graph mode supported by vLLM Kunlun:"
+#~ msgstr "vLLM Kunlun 支持两种图模式："
+
+#~ msgid ""
+#~ "**ACLGraph**: This is the default graph"
+#~ " mode supported by vLLM Kunlun. In"
+#~ " v0.9.1rc1, only Qwen series models "
+#~ "are well tested."
+#~ msgstr ""
+#~ "**ACLGraph**：这是 vLLM Kunlun 支持的默认图模式。在 "
+#~ "v0.9.1rc1 版本中，只有 Qwen 系列模型得到了充分测试。"
+
+#~ msgid ""
+#~ "**TorchAirGraph**: This is the GE graph"
+#~ " mode. In v0.9.1rc1, only DeepSeek "
+#~ "series models are supported."
+#~ msgstr "**TorchAirGraph**：这是GE图模式。在v0.9.1rc1版本中，仅支持DeepSeek系列模型。"
+
+#~ msgid "Using ACLGraph"
+#~ msgstr "使用 ACLGraph"
+
+#~ msgid ""
+#~ "ACLGraph is enabled by default. Take "
+#~ "Qwen series models as an example, "
+#~ "just set to use V1 Engine is "
+#~ "enough."
+#~ msgstr "ACLGraph 默认启用。以 Qwen 系列模型为例，只需设置为使用 V1 引擎即可。"
+
+#~ msgid "offline example:"
+#~ msgstr "离线示例："
+
+#~ msgid "online example:"
+#~ msgstr "在线示例："
+
+#~ msgid "Using TorchAirGraph"
+#~ msgstr "使用 TorchAirGraph"
+
+#~ msgid ""
+#~ "If you want to run DeepSeek series"
+#~ " models with graph mode, you should"
+#~ " use "
+#~ "[TorchAirGraph](https://www.hikunlun.com/document/detail/zh/Pytorch/700/modthirdparty/torchairuseguide/torchair_0002.html)."
+#~ " In this case, additional config is"
+#~ " required."
+#~ msgstr ""
+#~ "如果你想通过图模式运行 DeepSeek 系列模型，你应该使用 "
+#~ "[TorchAirGraph](https://www.hikunlun.com/document/detail/zh/Pytorch/700/modthirdparty/torchairuseguide/torchair_0002.html)。在这种情况下，需要额外的配置。"
+
+#~ msgid ""
+#~ "You can find more detail about "
+#~ "additional config "
+#~ "[here](../configuration/additional_config.md)."
+#~ msgstr "你可以在[这里](../configuration/additional_config.md)找到关于附加配置的更多详细信息。"
+
+#~ msgid "Fallback to Eager Mode"
+#~ msgstr "回退到 Eager 模式"
+
+#~ msgid ""
+#~ "If both `ACLGraph` and `TorchAirGraph` "
+#~ "fail to run, you should fallback "
+#~ "to eager mode."
+#~ msgstr "如果 `ACLGraph` 和 `TorchAirGraph` 都无法运行，你应该退回到 eager 模式。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/index.po
@@ -0,0 +1,32 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 19:12+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/feature_guide/index.md:1
+#: ../../source/user_guide/feature_guide/index.md:5
+msgid "Feature Guide"
+msgstr "功能指南"
+
+#: ../../source/user_guide/feature_guide/index.md:3
+#, fuzzy
+msgid "This section provides a detailed usage guide of vLLM Kunlun features."
+msgstr "本节提供了 vLLM Kunlun 功能的详细使用指南。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/kv_pool_mooncake.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/kv_pool_mooncake.po
@@ -0,0 +1,30 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/feature_guide/kv_pool_mooncake.md:1
+msgid "Mooncacke Store Deployment Guide"
+msgstr ""
+
+#: ../../source/user_guide/feature_guide/kv_pool_mooncake.md:3
+msgid "Environmental Dependencies"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/lora.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/lora.po
@@ -0,0 +1,68 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/feature_guide/lora.md:1
+msgid "LoRA Adapters Guide"
+msgstr "LoRA 适配器指南"
+
+#: ../../source/user_guide/feature_guide/lora.md:3
+msgid "Overview"
+msgstr ""
+
+#~ msgid ""
+#~ "Like vLLM, vllm-kunlun supports LoRA "
+#~ "as well. The usage and more "
+#~ "details can be found in [vLLM "
+#~ "official "
+#~ "document](https://docs.vllm.ai/en/latest/features/lora.html)."
+#~ msgstr ""
+#~ "与 vLLM 类似，vllm-kunlun 也支持 "
+#~ "LoRA。用法及更多详情可参见 [vLLM "
+#~ "官方文档](https://docs.vllm.ai/en/latest/features/lora.html)。"
+
+#~ msgid ""
+#~ "You can also refer to "
+#~ "[this](https://docs.vllm.ai/en/latest/models/supported_models.html"
+#~ "#list-of-text-only-language-models) "
+#~ "to find which models support LoRA "
+#~ "in vLLM."
+#~ msgstr ""
+#~ "你也可以参考[这个链接](https://docs.vllm.ai/en/latest/models/supported_models.html"
+#~ "#list-of-text-only-language-models)来查找哪些模型在"
+#~ " vLLM 中支持 LoRA。"
+
+#~ msgid "Tips"
+#~ msgstr "提示"
+
+#~ msgid ""
+#~ "If you fail to run vllm-kunlun "
+#~ "with LoRA, you may follow [this "
+#~ "instruction](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/user_guide/feature_guide/graph_mode.html"
+#~ "#fallback-to-eager-mode) to disable "
+#~ "graph mode and try again."
+#~ msgstr ""
+#~ "如果你在使用 LoRA 运行 vllm-kunlun "
+#~ "时失败，可以按照[此说明](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/user_guide/feature_guide/graph_mode.html"
+#~ "#fallback-to-eager-mode)禁用图模式后再重试。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/netloader.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/netloader.po
@@ -0,0 +1,26 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: vllm-kunlun \n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/feature_guide/netloader.md:1
+msgid "Netloader Guide"
+msgstr ""
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/quantization.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/quantization.po
@@ -0,0 +1,198 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/feature_guide/quantization.md:1
+msgid "Quantization Guide"
+msgstr "量化指南"
+
+#~ msgid ""
+#~ "Model quantization is a technique that"
+#~ " reduces the size and computational "
+#~ "requirements of a model by lowering "
+#~ "the data precision of the weights "
+#~ "and activation values in the model, "
+#~ "thereby saving the memory and improving"
+#~ " the inference speed."
+#~ msgstr "模型量化是一种通过降低模型中权重和激活值的数据精度，从而减少模型大小和计算需求的技术，这样可以节省内存并提高推理速度。"
+
+#~ msgid ""
+#~ "Since 0.9.0rc2 version, quantization feature"
+#~ " is experimentally supported in vLLM "
+#~ "Kunlun. Users can enable quantization "
+#~ "feature by specifying `--quantization kunlun`."
+#~ " Currently, only Qwen, DeepSeek series "
+#~ "models are well tested. We’ll support"
+#~ " more quantization algorithm and models "
+#~ "in the future."
+#~ msgstr ""
+#~ "自 0.9.0rc2 版本起，vLLM Kunlun 实验性地支持量化特性。用户可以通过指定"
+#~ " `--quantization kunlun` 启用量化功能。目前，只有 "
+#~ "Qwen、DeepSeek 系列模型经过了充分测试。未来我们将支持更多的量化算法和模型。"
+
+#~ msgid "Install modelslim"
+#~ msgstr "安装 modelslim"
+
+#~ msgid ""
+#~ "To quantize a model, users should "
+#~ "install "
+#~ "[ModelSlim](https://gitcode.com/Kunlun/msit/blob/master/msmodelslim/README.md)"
+#~ " which is the Kunlun compression and"
+#~ " acceleration tool. It is an "
+#~ "affinity-based compression tool designed "
+#~ "for acceleration, using compression as "
+#~ "its core technology and built upon "
+#~ "the Kunlun platform."
+#~ msgstr "要对模型进行量化，用户应安装[ModelSlim](https://gitcode.com/Kunlun/msit/blob/master/msmodelslim/README.md)，这是昇腾的压缩与加速工具。它是一种基于亲和性的压缩工具，专为加速设计，以压缩为核心技术，并基于昇腾平台构建。"
+
+#~ msgid ""
+#~ "Currently, only the specific tag "
+#~ "[modelslim-"
+#~ "VLLM-8.1.RC1.b020_001](https://gitcode.com/Kunlun/msit/blob"
+#~ "/modelslim-VLLM-8.1.RC1.b020_001/msmodelslim/README.md) of"
+#~ " modelslim works with vLLM Kunlun. "
+#~ "Please do not install other version "
+#~ "until modelslim master version is "
+#~ "available for vLLM Kunlun in the "
+#~ "future."
+#~ msgstr ""
+#~ "目前，只有 modelslim 的特定标签 [modelslim-"
+#~ "VLLM-8.1.RC1.b020_001](https://gitcode.com/Kunlun/msit/blob"
+#~ "/modelslim-VLLM-8.1.RC1.b020_001/msmodelslim/README.md) 支持"
+#~ " vLLM Kunlun。在未来 modelslim 的主版本支持 vLLM "
+#~ "Kunlun 之前，请不要安装其他版本。"
+
+#~ msgid "Install modelslim:"
+#~ msgstr "安装 modelslim："
+
+#~ msgid "Quantize model"
+#~ msgstr "量化模型"
+
+#~ msgid ""
+#~ "Take [DeepSeek-V2-Lite](https://modelscope.cn/models"
+#~ "/deepseek-ai/DeepSeek-V2-Lite) as an example, "
+#~ "you just need to download the "
+#~ "model, and then execute the convert "
+#~ "command. The command is shown below. "
+#~ "More info can be found in "
+#~ "modelslim doc [deepseek w8a8 dynamic "
+#~ "quantization docs](https://gitcode.com/Kunlun/msit/blob"
+#~ "/modelslim-"
+#~ "VLLM-8.1.RC1.b020_001/msmodelslim/example/DeepSeek/README.md#deepseek-v2-w8a8-dynamic%E9%87%8F%E5%8C%96)."
+#~ msgstr ""
+#~ "以 [DeepSeek-V2-Lite](https://modelscope.cn/models/deepseek-"
+#~ "ai/DeepSeek-V2-Lite) 为例，你只需要下载模型，然后执行转换命令。命令如下所示。更多信息可参考 "
+#~ "modelslim 文档 [deepseek w8a8 "
+#~ "动态量化文档](https://gitcode.com/Kunlun/msit/blob/modelslim-"
+#~ "VLLM-8.1.RC1.b020_001/msmodelslim/example/DeepSeek/README.md#deepseek-v2-w8a8-dynamic%E9%87%8F%E5%8C%96)。"
+
+#~ msgid ""
+#~ "You can also download the quantized "
+#~ "model that we uploaded. Please note "
+#~ "that these weights should be used "
+#~ "for test only. For example, "
+#~ "https://www.modelscope.cn/models/vllm-kunlun/DeepSeek-V2"
+#~ "-Lite-W8A8"
+#~ msgstr ""
+#~ "你也可以下载我们上传的量化模型。请注意，这些权重仅应用于测试。例如：https://www.modelscope.cn/models"
+#~ "/vllm-kunlun/DeepSeek-V2-Lite-W8A8"
+
+#~ msgid "Once convert action is done, there are two important files generated."
+#~ msgstr "转换操作完成后，会生成两个重要的文件。"
+
+#~ msgid ""
+#~ "[config.json](https://www.modelscope.cn/models/vllm-"
+#~ "kunlun/DeepSeek-V2-Lite-"
+#~ "W8A8/file/view/master/config.json?status=1). Please make"
+#~ " sure that there is no "
+#~ "`quantization_config` field in it."
+#~ msgstr ""
+#~ "[config.json](https://www.modelscope.cn/models/vllm-"
+#~ "kunlun/DeepSeek-V2-Lite-"
+#~ "W8A8/file/view/master/config.json?status=1)。请确保其中没有 "
+#~ "`quantization_config` 字段。"
+
+#~ msgid ""
+#~ "[quant_model_description.json](https://www.modelscope.cn/models"
+#~ "/vllm-kunlun/DeepSeek-V2-Lite-"
+#~ "W8A8/file/view/master/quant_model_description.json?status=1). "
+#~ "All the converted weights info are "
+#~ "recorded in this file."
+#~ msgstr ""
+#~ "[quant_model_description.json](https://www.modelscope.cn/models"
+#~ "/vllm-kunlun/DeepSeek-V2-Lite-"
+#~ "W8A8/file/view/master/quant_model_description.json?status=1)。所有被转换的权重信息都记录在该文件中。"
+
+#~ msgid "Here is the full converted model files:"
+#~ msgstr "以下是完整转换后的模型文件："
+
+#~ msgid "Run the model"
+#~ msgstr "运行模型"
+
+#~ msgid ""
+#~ "Now, you can run the quantized "
+#~ "models with vLLM Kunlun. Here is "
+#~ "the example for online and offline "
+#~ "inference."
+#~ msgstr "现在，你可以使用 vLLM Kunlun 运行量化模型。下面是在线和离线推理的示例。"
+
+#~ msgid "Offline inference"
+#~ msgstr "离线推理"
+
+#~ msgid "Online inference"
+#~ msgstr "在线推理"
+
+#~ msgid "FAQs"
+#~ msgstr "常见问题解答"
+
+#~ msgid ""
+#~ "1. How to solve the KeyError: "
+#~ "'xxx.layers.0.self_attn.q_proj.weight' problem?"
+#~ msgstr "1. 如何解决 KeyError: 'xxx.layers.0.self_attn.q_proj.weight' 问题？"
+
+#~ msgid ""
+#~ "First, make sure you specify `kunlun`"
+#~ " quantization method. Second, check if "
+#~ "your model is converted by this "
+#~ "`modelslim-VLLM-8.1.RC1.b020_001` modelslim version."
+#~ " Finally, if it still doesn't work,"
+#~ " please submit a issue, maybe some"
+#~ " new models need to be adapted."
+#~ msgstr ""
+#~ "首先，请确保你指定了 `kunlun` 量化方法。其次，检查你的模型是否由 `modelslim-"
+#~ "VLLM-8.1.RC1.b020_001` 这个 modelslim "
+#~ "版本转换。如果仍然无法使用，请提交一个 issue，可能有一些新模型需要适配。"
+
+#~ msgid ""
+#~ "2. How to solve the error \"Could"
+#~ " not locate the configuration_deepseek.py\"?"
+#~ msgstr "2. 如何解决“无法找到 configuration_deepseek.py”错误？"
+
+#~ msgid ""
+#~ "Please convert DeepSeek series models "
+#~ "using `modelslim-VLLM-8.1.RC1.b020_001` modelslim,"
+#~ " this version has fixed the missing"
+#~ " configuration_deepseek.py error."
+#~ msgstr ""
+#~ "请使用 `modelslim-VLLM-8.1.RC1.b020_001` 的 "
+#~ "modelslim 转换 DeepSeek 系列模型，该版本已修复缺少 "
+#~ "configuration_deepseek.py 的错误。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/sleep_mode.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/sleep_mode.po
@@ -0,0 +1,165 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/feature_guide/sleep_mode.md:1
+msgid "Sleep Mode Guide"
+msgstr "睡眠模式指南"
+
+#: ../../source/user_guide/feature_guide/sleep_mode.md:3
+msgid "Overview"
+msgstr "概述"
+
+#~ msgid ""
+#~ "Sleep Mode is an API designed to"
+#~ " offload model weights and discard KV"
+#~ " cache from XPU memory. This "
+#~ "functionality is essential for reinforcement"
+#~ " learning (RL) post-training workloads, "
+#~ "particularly in online algorithms such "
+#~ "as PPO, GRPO, or DPO. During "
+#~ "training, the policy model typically "
+#~ "performs auto-regressive generation using "
+#~ "inference engines like vLLM, followed by"
+#~ " forward and backward passes for "
+#~ "optimization."
+#~ msgstr ""
+#~ "Sleep Mode 是一个用于卸载模型权重并清除 XPU 内存中 KV "
+#~ "缓存的 API。此功能对于强化学习（RL）后训练任务尤其重要，特别是在 PPO、GRPO 或 "
+#~ "DPO 等在线算法中。在训练过程中，策略模型通常会使用像 vLLM "
+#~ "这样的推理引擎进行自回归生成，然后进行前向和反向传播以进行优化。"
+
+#~ msgid ""
+#~ "Since the generation and training phases"
+#~ " may employ different model parallelism "
+#~ "strategies, it becomes crucial to free"
+#~ " KV cache and even offload model "
+#~ "parameters stored within vLLM during "
+#~ "training. This ensures efficient memory "
+#~ "utilization and avoids resource contention "
+#~ "on the XPU."
+#~ msgstr ""
+#~ "由于生成和训练阶段可能采用不同的模型并行策略，因此在训练过程中及时释放 KV 缓存，甚至卸载存储在 "
+#~ "vLLM 内的模型参数变得至关重要。这可以确保内存的高效利用，并避免 XPU 上的资源争用。"
+
+#~ msgid "Getting started"
+#~ msgstr "快速上手"
+
+#~ msgid ""
+#~ "With `enable_sleep_mode=True`, the way we "
+#~ "manage memory(malloc, free) in vllm will"
+#~ " under a specific memory pool, during"
+#~ " loading model and initialize kv_caches,"
+#~ " we tag the memory as a map:"
+#~ " `{\"weight\": data, \"kv_cache\": data}`."
+#~ msgstr ""
+#~ "当 `enable_sleep_mode=True` 时，我们在 vllm "
+#~ "中管理内存（malloc, free）的方式会在一个特定的内存池下进行，在加载模型和初始化 kv_caches"
+#~ " 期间，我们会将内存打上标签，组织成一个映射：`{\"weight\": data, "
+#~ "\"kv_cache\": data}`。"
+
+#~ msgid ""
+#~ "The engine(v0/v1) supports two sleep "
+#~ "levels to manage memory during idle "
+#~ "periods:"
+#~ msgstr "该引擎（v0/v1）支持两种睡眠等级，以在空闲期间管理内存："
+
+#~ msgid "Level 1 Sleep"
+#~ msgstr "一级睡眠"
+
+#~ msgid "Action: Offloads model weights and discards the KV cache."
+#~ msgstr "操作：卸载模型权重并清除KV缓存。"
+
+#~ msgid "Memory: Model weights are moved to CPU memory; KV cache is forgotten."
+#~ msgstr "内存：模型权重被移动到CPU内存；KV缓存被清除。"
+
+#~ msgid "Use Case: Suitable when reusing the same model later."
+#~ msgstr "用例：适用于之后需要重复使用同一个模型的情况。"
+
+#~ msgid ""
+#~ "Note: Ensure sufficient CPU memory is"
+#~ " available to hold the model weights."
+#~ msgstr "注意：请确保有足够的CPU内存来存储模型权重。"
+
+#~ msgid "Level 2 Sleep"
+#~ msgstr "二级睡眠"
+
+#~ msgid "Action: Discards both model weights and KV cache."
+#~ msgstr "操作：同时丢弃模型权重和KV缓存。"
+
+#~ msgid ""
+#~ "Memory: The content of both the "
+#~ "model weights and kv cache is "
+#~ "forgotten."
+#~ msgstr "内存：模型权重和kv缓存的内容都会被遗忘。"
+
+#~ msgid ""
+#~ "Use Case: Ideal when switching to "
+#~ "a different model or updating the "
+#~ "current one."
+#~ msgstr "用例：当切换到不同的模型或更新当前模型时非常理想。"
+
+#~ msgid ""
+#~ "Since this feature uses the low-"
+#~ "level API "
+#~ "[KunlunCL](https://www.hikunlun.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/API/appdevgapi/appdevgapi_07_0000.html),"
+#~ " in order to use sleep mode, "
+#~ "you should follow the [installation "
+#~ "guide](https://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/installation.html) and "
+#~ "building from source, if you are "
+#~ "using v0.7.3, remember to set `export"
+#~ " COMPILE_CUSTOM_KERNELS=1`, for the latest "
+#~ "version(v0.9.x+), the environment variable "
+#~ "`COMPILE_CUSTOM_KERNELS` will be set 1 "
+#~ "by default while building from source."
+#~ msgstr ""
+#~ "由于此功能使用了底层 API "
+#~ "[KunlunCL](https://www.hikunlun.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/API/appdevgapi/appdevgapi_07_0000.html)，为了使用休眠模式，你应按照[安装指南](https"
+#~ "://vllm-"
+#~ "kunlun.readthedocs.io/en/latest/installation.html)进行操作，并从源码编译。如果你使用的是"
+#~ " v0.7.3，请记得设置 `export COMPILE_CUSTOM_KERNELS=1` "
+#~ "；对于最新版本（v0.9.x+），在从源码编译时环境变量 `COMPILE_CUSTOM_KERNELS` "
+#~ "默认会被设置为 1。"
+
+#~ msgid "Usage"
+#~ msgstr "用法"
+
+#~ msgid "The following is a simple example of how to use sleep mode."
+#~ msgstr "以下是如何使用睡眠模式的一个简单示例。"
+
+#~ msgid "offline inference:"
+#~ msgstr "离线推理："
+
+#~ msgid "online serving:"
+#~ msgstr "在线服务："
+
+#~ msgid ""
+#~ "Considering there may be a risk of"
+#~ " malicious access, please make sure "
+#~ "you are under a dev-mode, and "
+#~ "explicit specify the develop env: "
+#~ "`VLLM_SERVER_DEV_MODE` to expose these "
+#~ "endpoints(sleep/wake up)."
+#~ msgstr ""
+#~ "鉴于可能存在恶意访问的风险，请确保您处于开发模式，并明确指定开发环境：`VLLM_SERVER_DEV_MODE`，以便开放这些端点（sleep/wake"
+#~ " up）。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/structured_output.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/feature_guide/structured_output.po
@@ -0,0 +1,235 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/feature_guide/structured_output.md:1
+msgid "Structured Output Guide"
+msgstr "结构化输出指南"
+
+#: ../../source/user_guide/feature_guide/structured_output.md:3
+msgid "Overview"
+msgstr "概述"
+
+#: ../../source/user_guide/feature_guide/structured_output.md:5
+#, fuzzy
+msgid "What is structured output?"
+msgstr "什么是结构化输出？"
+
+#~ msgid ""
+#~ "LLMs can be unpredictable when you "
+#~ "need output in specific formats. Think"
+#~ " of asking a model to generate "
+#~ "JSON - without guidance, it might "
+#~ "produce valid text that breaks JSON "
+#~ "specification. **Structured Output (also "
+#~ "called Guided Decoding)** enables LLMs "
+#~ "to generate outputs that follow a "
+#~ "desired structure while preserving the "
+#~ "non-deterministic nature of the system."
+#~ msgstr ""
+#~ "当你需要特定格式输出时，大型语言模型（LLMs）可能表现出不可预测性。比如让模型生成 "
+#~ "JSON，如果没有指导，模型可能会生成有效的文本，但这些文本却不符合 JSON "
+#~ "规范。**结构化输出（也称为引导解码）** 能让大型语言模型生成符合预期结构的输出，同时保留系统的非确定性特性。"
+
+#~ msgid ""
+#~ "In simple terms, structured decoding "
+#~ "gives LLMs a “template” to follow. "
+#~ "Users provide a schema that “influences”"
+#~ " the model’s output, ensuring compliance"
+#~ " with the desired structure."
+#~ msgstr "简单来说，结构化解码为LLM提供了一个“模板”来遵循。用户提供一个模式来“影响”模型的输出，从而确保输出符合期望的结构。"
+
+#~ msgid "![structured decoding](./images/structured_output_1.png)"
+#~ msgstr "![结构化解码](./images/structured_output_1.png)"
+
+#~ msgid "structured decoding"
+#~ msgstr "结构化解码"
+
+#~ msgid "Structured Output in vllm-kunlun"
+#~ msgstr "vllm-kunlun 中的结构化输出"
+
+#~ msgid ""
+#~ "Currently, vllm-kunlun supports **xgrammar**"
+#~ " and **guidance** backend for structured"
+#~ " output with vllm v1 engine."
+#~ msgstr "目前，vllm-kunlun 支持 vllm v1 引擎的结构化输出，后端包括 **xgrammar** 和 **guidance**。"
+
+#~ msgid ""
+#~ "XGrammar introduces a new technique that"
+#~ " batch constrained decoding via pushdown"
+#~ " automaton (PDA). You can think of"
+#~ " a PDA as a “collection of "
+#~ "FSMs, and each FSM represents a "
+#~ "context-free grammar (CFG).” One "
+#~ "significant advantage of PDA is its "
+#~ "recursive nature, allowing us to execute"
+#~ " multiple state transitions. They also "
+#~ "include additional optimisation (for those "
+#~ "who are interested) to reduce grammar"
+#~ " compilation overhead. Besides, you can "
+#~ "also find more details about guidance"
+#~ " by yourself."
+#~ msgstr ""
+#~ "XGrammar 引入了一种通过下推自动机（PDA）进行批量约束解码的新技术。你可以把 PDA "
+#~ "理解为“有限状态机（FSM）的集合，每个 FSM 代表一个上下文无关文法（CFG）。” PDA "
+#~ "的一个重要优点是其递归特性，使我们能够执行多次状态转移。此外，PDA "
+#~ "还包含了额外的优化（供感兴趣的用户参考），以减少语法编译的开销。除此之外，你还可以自己找到更多关于指导的信息。"
+
+#~ msgid "How to Use Structured Output?"
+#~ msgstr "如何使用结构化输出？"
+
+#~ msgid "Online Inference"
+#~ msgstr "在线推理"
+
+#~ msgid ""
+#~ "You can also generate structured outputs"
+#~ " using the OpenAI's Completions and "
+#~ "Chat API. The following parameters are"
+#~ " supported, which must be added as"
+#~ " extra parameters:"
+#~ msgstr "你也可以使用 OpenAI 的 Completions 和 Chat API 生成结构化输出。支持以下参数，这些参数必须作为额外参数添加："
+
+#~ msgid "`guided_choice`: the output will be exactly one of the choices."
+#~ msgstr "`guided_choice`：输出将会是其中一个选项。"
+
+#~ msgid "`guided_regex`: the output will follow the regex pattern."
+#~ msgstr "`guided_regex`：输出将遵循正则表达式模式。"
+
+#~ msgid "`guided_json`: the output will follow the JSON schema."
+#~ msgstr "`guided_json`：输出将遵循 JSON 架构。"
+
+#~ msgid "`guided_grammar`: the output will follow the context free grammar."
+#~ msgstr "`guided_grammar`：输出将遵循上下文无关文法。"
+
+#~ msgid ""
+#~ "Structured outputs are supported by "
+#~ "default in the OpenAI-Compatible Server."
+#~ " You can choose to specify the "
+#~ "backend to use by setting the "
+#~ "`--guided-decoding-backend` flag to vllm"
+#~ " serve. The default backend is "
+#~ "`auto`, which will try to choose "
+#~ "an appropriate backend based on the "
+#~ "details of the request. You may "
+#~ "also choose a specific backend, along"
+#~ " with some options."
+#~ msgstr ""
+#~ "OpenAI 兼容服务器默认支持结构化输出。你可以通过设置 `--guided-decoding-"
+#~ "backend` 标志为 vllm serve 来指定要使用的后端。默认后端为 "
+#~ "`auto`，它会根据请求的详细信息尝试选择合适的后端。你也可以选择特定的后端，并设置一些选项。"
+
+#~ msgid ""
+#~ "Now let´s see an example for each"
+#~ " of the cases, starting with the "
+#~ "guided_choice, as it´s the easiest one:"
+#~ msgstr "现在让我们来看每种情况的示例，首先是 guided_choice，因为它是最简单的："
+
+#~ msgid ""
+#~ "The next example shows how to use"
+#~ " the guided_regex. The idea is to "
+#~ "generate an email address, given a "
+#~ "simple regex template:"
+#~ msgstr "下一个例子展示了如何使用 guided_regex。其思路是基于一个简单的正则表达式模板生成一个电子邮件地址："
+
+#~ msgid ""
+#~ "One of the most relevant features "
+#~ "in structured text generation is the "
+#~ "option to generate a valid JSON "
+#~ "with pre-defined fields and formats. "
+#~ "For this we can use the "
+#~ "guided_json parameter in two different "
+#~ "ways:"
+#~ msgstr ""
+#~ "在结构化文本生成中，最相关的特性之一是能够生成具有预定义字段和格式的有效 JSON。为此，我们可以通过两种不同的方式使用 "
+#~ "guided_json 参数："
+
+#~ msgid "Using a JSON Schema."
+#~ msgstr "使用 JSON 架构。"
+
+#~ msgid "Defining a Pydantic model and then extracting the JSON Schema from it."
+#~ msgstr "定义一个 Pydantic 模型，然后从中提取 JSON Schema。"
+
+#~ msgid ""
+#~ "The next example shows how to use"
+#~ " the guided_json parameter with a "
+#~ "Pydantic model:"
+#~ msgstr "下一个示例展示了如何将 guided_json 参数与 Pydantic 模型一起使用："
+
+#~ msgid ""
+#~ "Finally we have the guided_grammar "
+#~ "option, which is probably the most "
+#~ "difficult to use, but it´s really "
+#~ "powerful. It allows us to define "
+#~ "complete languages like SQL queries. It"
+#~ " works by using a context free "
+#~ "EBNF grammar. As an example, we "
+#~ "can use to define a specific "
+#~ "format of simplified SQL queries:"
+#~ msgstr ""
+#~ "最后，我们有 guided_grammar 选项，这可能是最难使用的，但它非常强大。它允许我们定义完整的语言，比如"
+#~ " SQL 查询。它通过使用上下文无关的 EBNF 语法来实现。例如，我们可以用它来定义一种简化"
+#~ " SQL 查询的特定格式："
+
+#~ msgid ""
+#~ "Find more examples [here](https://github.com"
+#~ "/vllm-"
+#~ "project/vllm/blob/main/examples/offline_inference/structured_outputs.py)."
+#~ msgstr ""
+#~ "在[这里](https://github.com/vllm-"
+#~ "project/vllm/blob/main/examples/offline_inference/structured_outputs.py)可以找到更多示例。"
+
+#~ msgid "Offline Inference"
+#~ msgstr "离线推理"
+
+#~ msgid ""
+#~ "To use Structured Output, we'll need "
+#~ "to configure the guided decoding using"
+#~ " the class `GuidedDecodingParams` inside "
+#~ "`SamplingParams`. The main available options"
+#~ " inside `GuidedDecodingParams` are:"
+#~ msgstr ""
+#~ "要使用结构化输出，我们需要在 `SamplingParams` 内通过 "
+#~ "`GuidedDecodingParams` 类配置引导解码。`GuidedDecodingParams` "
+#~ "中主要可用的选项有："
+
+#~ msgid "json"
+#~ msgstr "json"
+
+#~ msgid "regex"
+#~ msgstr "正则表达式"
+
+#~ msgid "choice"
+#~ msgstr "选择"
+
+#~ msgid "grammar"
+#~ msgstr "语法"
+
+#~ msgid "One example for the usage of the choice parameter is shown below:"
+#~ msgstr "choice 参数用法的一个示例如下："
+
+#~ msgid ""
+#~ "Find more examples of other usages "
+#~ "[here](https://github.com/vllm-"
+#~ "project/vllm/blob/main/examples/offline_inference/structured_outputs.py)."
+#~ msgstr ""
+#~ "查看更多其他用法的示例 [在这里](https://github.com/vllm-"
+#~ "project/vllm/blob/main/examples/offline_inference/structured_outputs.py)。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/release_notes.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/release_notes.po
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/support_matrix/index.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/support_matrix/index.po
@@ -0,0 +1,33 @@
+# Translations template for PROJECT.
+# Copyright (C) 2025 ORGANIZATION
+# This file is distributed under the same license as the PROJECT project.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version: PROJECT VERSION\n"
+"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language-Team: LANGUAGE <LL@li.org>\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/support_matrix/index.md:5
+msgid "Support Matrix"
+msgstr "支持矩阵"
+
+#: ../../source/user_guide/support_matrix/index.md:1
+#, fuzzy
+msgid "Features and Models"
+msgstr "特性与模型"
+
+#: ../../source/user_guide/support_matrix/index.md:3
+#, fuzzy
+msgid "This section provides a detailed matrix supported by vLLM Kunlun."
+msgstr "本节提供了 vLLM Kunlun 的详细支持矩阵。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/support_matrix/supported_features.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/support_matrix/supported_features.po
@@ -0,0 +1,221 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/support_matrix/supported_features.md:1
+msgid "Supported Features"
+msgstr ""
+
+#: ../../source/user_guide/support_matrix/supported_features.md:3
+msgid "The feature support principle of vLLM"
+msgstr ""
+
+#~ msgid "Feature Support"
+#~ msgstr "功能支持"
+
+#~ msgid ""
+#~ "The feature support principle of vLLM"
+#~ " Kunlun is: **aligned with the "
+#~ "vLLM**. We are also actively "
+#~ "collaborating with the community to "
+#~ "accelerate support."
+#~ msgstr "vLLM Kunlun 的特性支持原则是：**与 vLLM 保持一致**。我们也在积极与社区合作，加快支持进度。"
+
+#~ msgid ""
+#~ "You can check the [support status "
+#~ "of vLLM V1 Engine][v1_user_guide]. Below "
+#~ "is the feature support status of "
+#~ "vLLM Kunlun:"
+#~ msgstr "你可以查看 [vLLM V1 引擎的支持状态][v1_user_guide]。下面是 vLLM Kunlun 的功能支持情况："
+
+#~ msgid "Feature"
+#~ msgstr "特性"
+
+#~ msgid "vLLM V0 Engine"
+#~ msgstr "vLLM V0 引擎"
+
+#~ msgid "vLLM V1 Engine"
+#~ msgstr "vLLM V1 引擎"
+
+#~ msgid "Next Step"
+#~ msgstr "下一步"
+
+#~ msgid "Chunked Prefill"
+#~ msgstr "分块预填充"
+
+#~ msgid "🟢 Functional"
+#~ msgstr "🟢 功能性"
+
+#~ msgid "Functional, see detail note: [Chunked Prefill][cp]"
+#~ msgstr "功能性，详见说明：[分块预填充][cp]"
+
+#~ msgid "Automatic Prefix Caching"
+#~ msgstr "自动前缀缓存"
+
+#~ msgid "Functional, see detail note: [vllm-kunlun#732][apc]"
+#~ msgstr "可用，请参见详细说明：[vllm-kunlun#732][apc]"
+
+#~ msgid "LoRA"
+#~ msgstr "LoRA"
+
+#~ msgid "[vllm-kunlun#396][multilora], [vllm-kunlun#893][v1 multilora]"
+#~ msgstr "[vllm-kunlun#396][multilora]，[vllm-kunlun#893][v1 multilora]"
+
+#~ msgid "Prompt adapter"
+#~ msgstr "提示适配器"
+
+#~ msgid "🔴 No plan"
+#~ msgstr "🔴 无计划"
+
+#~ msgid "This feature has been deprecated by vllm."
+#~ msgstr "此功能已被 vllm 弃用。"
+
+#~ msgid "Speculative decoding"
+#~ msgstr "猜测式解码"
+
+#~ msgid "Basic support"
+#~ msgstr "基础支持"
+
+#~ msgid "Pooling"
+#~ msgstr "池化"
+
+#~ msgid "🟡 Planned"
+#~ msgstr "🟡 计划中"
+
+#~ msgid "CI needed and adapting more models; V1 support rely on vLLM support."
+#~ msgstr "需要持续集成（CI）并适配更多模型；V1 的支持依赖于 vLLM 的支持。"
+
+#~ msgid "Enc-dec"
+#~ msgstr "Enc-dec（编码-解码）"
+
+#~ msgid "🔴 NO plan"
+#~ msgstr "🔴 没有计划"
+
+#~ msgid "Plan in 2025.06.30"
+#~ msgstr "2025.06.30 的计划"
+
+#~ msgid "Multi Modality"
+#~ msgstr "多模态"
+
+#~ msgid "[Tutorial][multimodal], optimizing and adapting more models"
+#~ msgstr "[教程][multimodal]，优化和适配更多模型"
+
+#~ msgid "LogProbs"
+#~ msgstr "LogProbs"
+
+#~ msgid "CI needed"
+#~ msgstr "需要持续集成（CI）"
+
+#~ msgid "Prompt logProbs"
+#~ msgstr "提示 logProbs"
+
+#~ msgid "Async output"
+#~ msgstr "异步输出"
+
+#~ msgid "Multi step scheduler"
+#~ msgstr "多步调度器"
+
+#~ msgid "🔴 Deprecated"
+#~ msgstr "🔴 已弃用"
+
+#~ msgid "[vllm#8779][v1_rfc], replaced by [vLLM V1 Scheduler][v1_scheduler]"
+#~ msgstr "[vllm#8779][v1_rfc]，已被 [vLLM V1 调度器][v1_scheduler] 替代"
+
+#~ msgid "Best of"
+#~ msgstr "精选"
+
+#~ msgid "[vllm#13361][best_of], CI needed"
+#~ msgstr "[vllm#13361][best_of]，需要持续集成（CI）"
+
+#~ msgid "Beam search"
+#~ msgstr "束搜索"
+
+#~ msgid "Guided Decoding"
+#~ msgstr "引导解码"
+
+#~ msgid "[vllm-kunlun#177][guided_decoding]"
+#~ msgstr "[vllm-kunlun#177][guided_decoding]"
+
+#~ msgid "Tensor Parallel"
+#~ msgstr "张量并行"
+
+#~ msgid "Pipeline Parallel"
+#~ msgstr "流水线并行"
+
+#~ msgid "Expert Parallel"
+#~ msgstr "专家并行"
+
+#~ msgid "CI needed; No plan on V0 support"
+#~ msgstr "需要持续集成；没有支持V0的计划"
+
+#~ msgid "Data Parallel"
+#~ msgstr "数据并行"
+
+#~ msgid "CI needed;  No plan on V0 support"
+#~ msgstr "需要 CI；暂无 V0 支持计划"
+
+#~ msgid "Prefill Decode Disaggregation"
+#~ msgstr "预填充 解码 拆分"
+
+#~ msgid "1P1D available, working on xPyD and V1 support."
+#~ msgstr "1P1D 已可用，正在开发 xPyD 和 V1 支持。"
+
+#~ msgid "Quantization"
+#~ msgstr "量化"
+
+#~ msgid "W8A8 available, CI needed; working on more quantization method support"
+#~ msgstr "W8A8 已可用，需要持续集成（CI）；正在开发对更多量化方法的支持。"
+
+#~ msgid "Graph Mode"
+#~ msgstr "图模式"
+
+#~ msgid "🔵 Experimental"
+#~ msgstr "🔵 实验性"
+
+#~ msgid "Experimental, see detail note: [vllm-kunlun#767][graph_mode]"
+#~ msgstr "实验性功能，详见说明：[vllm-kunlun#767][graph_mode]"
+
+#~ msgid "Sleep Mode"
+#~ msgstr "睡眠模式"
+
+#~ msgid "level=1 available, CI needed, working on V1 support"
+#~ msgstr "level=1 可用，需要CI，正在开发 V1 支持"
+
+#~ msgid "🟢 Functional: Fully operational, with ongoing optimizations."
+#~ msgstr "🟢 功能性：完全可用，正在持续优化中。"
+
+#~ msgid ""
+#~ "🔵 Experimental: Experimental support, "
+#~ "interfaces and functions may change."
+#~ msgstr "🔵 实验性：实验性支持，接口和功能可能会发生变化。"
+
+#~ msgid "🚧 WIP: Under active development, will be supported soon."
+#~ msgstr "🚧 WIP：正在积极开发中，很快将会支持。"
+
+#~ msgid ""
+#~ "🟡 Planned: Scheduled for future "
+#~ "implementation (some may have open "
+#~ "PRs/RFCs)."
+#~ msgstr "🟡 计划中：已安排将来实现（其中一些可能已有开放的PR/RFC）。"
+
+#~ msgid "🔴 NO plan / Deprecated: No plan for V0 or deprecated by vLLM v1."
+#~ msgstr "🔴 没有计划 / 已弃用：V0 没有计划或已被 vLLM v1 弃用。"
+
--- a/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/support_matrix/supported_models.po
+++ b/docs/source/locale/zh_CN/LC_MESSAGES/user_guide/support_matrix/supported_models.po
@@ -0,0 +1,168 @@
+# SOME DESCRIPTIVE TITLE.
+# Copyright (C) 2025, vllm-kunlun team
+# This file is distributed under the same license as the vllm-kunlun
+# package.
+# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
+#
+#, fuzzy
+msgid ""
+msgstr ""
+"Project-Id-Version:  vllm-kunlun\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-11-10 16:59+0800\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
+"Language: zh_CN\n"
+"Language-Team: zh_CN <LL@li.org>\n"
+"Plural-Forms: nplurals=1; plural=0;\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Babel 2.17.0\n"
+
+#: ../../source/user_guide/support_matrix/supported_models.md:1
+#, fuzzy
+msgid "Supported Models"
+msgstr "支持"
+
+#~ msgid "Model Support"
+#~ msgstr "模型支持"
+
+#~ msgid "Text-only Language Models"
+#~ msgstr "纯文本语言模型"
+
+#~ msgid "Generative Models"
+#~ msgstr "生成模型"
+
+#~ msgid "Model"
+#~ msgstr "模型"
+
+#~ msgid "Note"
+#~ msgstr "注释"
+
+#~ msgid "DeepSeek v3"
+#~ msgstr "DeepSeek v3"
+
+#~ msgid "✅"
+#~ msgstr "✅"
+
+#~ msgid "DeepSeek R1"
+#~ msgstr "DeepSeek R1"
+
+#~ msgid "DeepSeek Distill (Qwen/LLama)"
+#~ msgstr "DeepSeek 精炼（Qwen/LLama）"
+
+#~ msgid "Qwen3"
+#~ msgstr "Qwen3"
+
+#~ msgid "Qwen3-Moe"
+#~ msgstr "Qwen3-Moe"
+
+#~ msgid "Qwen2.5"
+#~ msgstr "Qwen2.5"
+
+#~ msgid "QwQ-32B"
+#~ msgstr "QwQ-32B"
+
+#~ msgid "LLama3.1/3.2"
+#~ msgstr "LLama3.1/3.2"
+
+#~ msgid "Internlm"
+#~ msgstr "Internlm"
+
+#~ msgid "Baichuan"
+#~ msgstr "百川"
+
+#~ msgid "Phi-4-mini"
+#~ msgstr "Phi-4-mini"
+
+#~ msgid "MiniCPM"
+#~ msgstr "MiniCPM"
+
+#~ msgid "MiniCPM3"
+#~ msgstr "MiniCPM3"
+
+#~ msgid "LLama4"
+#~ msgstr "LLama4"
+
+#~ msgid "Mistral"
+#~ msgstr "Mistral"
+
+#~ msgid "Need test"
+#~ msgstr "需要测试"
+
+#~ msgid "DeepSeek v2.5"
+#~ msgstr "DeepSeek v2.5"
+
+#~ msgid "Gemma-2"
+#~ msgstr "Gemma-2"
+
+#~ msgid "Mllama"
+#~ msgstr "Mllama"
+
+#~ msgid "Gemma-3"
+#~ msgstr "Gemma-3"
+
+#~ msgid "❌"
+#~ msgstr "❌"
+
+#~ msgid "[#496](https://github.com/vllm-project/vllm-kunlun/issues/496)"
+#~ msgstr "[#496](https://github.com/vllm-project/vllm-kunlun/issues/496)"
+
+#~ msgid "ChatGLM"
+#~ msgstr "ChatGLM"
+
+#~ msgid "[#554](https://github.com/vllm-project/vllm-kunlun/issues/554)"
+#~ msgstr "[#554](https://github.com/vllm-project/vllm-kunlun/issues/554)"
+
+#~ msgid "Pooling Models"
+#~ msgstr "池化模型"
+
+#~ msgid "XLM-RoBERTa-based"
+#~ msgstr "基于XLM-RoBERTa"
+
+#~ msgid "Molmo"
+#~ msgstr "Molmo"
+
+#~ msgid "Multimodal Language Models"
+#~ msgstr "多模态语言模型"
+
+#~ msgid "Qwen2-VL"
+#~ msgstr "Qwen2-VL"
+
+#~ msgid "Qwen2.5-VL"
+#~ msgstr "Qwen2.5-VL"
+
+#~ msgid "LLaVA 1.5"
+#~ msgstr "LLaVA 1.5"
+
+#~ msgid "LLaVA 1.6"
+#~ msgstr "LLaVA 1.6"
+
+#~ msgid "[#553](https://github.com/vllm-project/vllm-kunlun/issues/553)"
+#~ msgstr "[#553](https://github.com/vllm-project/vllm-kunlun/issues/553)"
+
+#~ msgid "InternVL2"
+#~ msgstr "InternVL2"
+
+#~ msgid "InternVL2.5"
+#~ msgstr "InternVL2.5"
+
+#~ msgid "Qwen2-Audio"
+#~ msgstr "Qwen2-Audio"
+
+#~ msgid "LLaVA-Next"
+#~ msgstr "LLaVA-Next"
+
+#~ msgid "LLaVA-Next-Video"
+#~ msgstr "LLaVA-Next-Video"
+
+#~ msgid "Phi-3-Vison/Phi-3.5-Vison"
+#~ msgstr "Phi-3-Vison/Phi-3.5-Vison"
+
+#~ msgid "GLM-4v"
+#~ msgstr "GLM-4v"
+
+#~ msgid "Ultravox"
+#~ msgstr "Ultravox"
+
--- a/docs/source/quick_start.md
+++ b/docs/source/quick_start.md
@@ -1,9 +1,7 @@
 # Quickstart

 ## Prerequisites
-
 ### Supported Devices
-
 - Kunlun3 P800

 ## Setup environment using container
@@ -22,7 +20,7 @@ if [ $XPU_NUM -gt 0 ]; then
    done
    DOCKER_DEVICE_CONFIG="${DOCKER_DEVICE_CONFIG} --device=/dev/xpuctrl:/dev/xpuctrl"
 fi
-export build_image="wjie520/vllm_kunlun:v0.0.1"
+export build_image="xxxxx"
 docker run -itd ${DOCKER_DEVICE_CONFIG} \
    --net=host \
    --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
@@ -34,12 +32,10 @@ docker run -itd ${DOCKER_DEVICE_CONFIG} \
    -w /workspace \
    "$build_image" /bin/bash
 ```
-
 ::::
 :::::

 Start docker:
-
 ```bash
 #start
 bash ./rundocker.sh <container_name>
@@ -48,18 +44,16 @@ docker exec -it <container_name> bash
 ```

 The default working directory is `/workspace`. With the fully provisioned environment image we provide, you can quickly start developing and running tasks within this directory.
-
 ## Set up system environment
-
 ```
-#Set environment
+#Set environment 
 chmod +x /workspace/vllm-kunlun/setup_env.sh && source /workspace/vllm-kunlun/setup_env.sh
 ```
-
 ## Usage

 You can start the service quickly using the script below.

+
 :::::{tab-set}
 ::::{tab-item} Offline Batched Inference

@@ -74,49 +68,65 @@ import os
 from vllm import LLM, SamplingParams

 def main():
-    model_path = "/models/Qwen3-8B"

-    llm_params = {
-        "model": model_path,
-        "tensor_parallel_size": 1,
-        "trust_remote_code": True,
-        "dtype": "float16",
-        "enable_chunked_prefill": False,
-        "distributed_executor_backend": "mp",
-    }
+    model_path = "models/Qwen3-VL-30B-A3B-Instruct"

-    llm = LLM(**llm_params)
+    llm = LLM(
+        model=model_path,
+        tokenizer=model_path,
+        tensor_parallel_size=1,
+        trust_remote_code=True,
+        dtype="float16",
+        distributed_executor_backend="mp",
+        max_model_len=32768,
+        gpu_memory_utilization=0.9,
+        block_size=128,
+        max_num_seqs=128,
+        max_num_batched_tokens=32768,
+        enable_prefix_caching=False,
+        enable_chunked_prefill=False,
+        served_model_name="Qwen3-VL",
+        compilation_config={
+            "splitting_ops": [
+                "vllm.unified_attention",
+                "vllm.unified_attention_with_output",
+                "vllm.unified_attention_with_output_kunlun",
+                "vllm.mamba_mixer2",
+                "vllm.mamba_mixer",
+                "vllm.short_conv",
+                "vllm.linear_attention",
+                "vllm.plamo2_mamba_mixer",
+                "vllm.gdn_attention",
+                "vllm.sparse_attn_indexer",
+            ]
+        },
+    )

+    # === test chat ===
    messages = [
        {
            "role": "user",
-            "content": [
-                {
-                    "type": "text",
-                    "text": "What is your name?"
-                }
-            ]
+            "content": [{"type": "text", "text": "Hello, what can you do?"}]
        }
    ]

-    sampling_params = SamplingParams(
+    sampling = SamplingParams(
        max_tokens=200,
-        temperature=1.0,
+        temperature=0.8,
        top_k=50,
        top_p=1.0,
-        stop_token_ids=[181896]
    )

-    outputs = llm.chat(messages, sampling_params=sampling_params)
+    print("开始推理...")
+    outputs = llm.chat(messages, sampling_params=sampling)
+
+    print("模型输出：\n")
+    print(outputs[0].outputs[0].text)

-    response = outputs[0].outputs[0].text
-    print("=" * 50)
-    print("Input content:", messages)
-    print("Model response:\n", response)
-    print("=" * 50)

 if __name__ == "__main__":
    main()
+
 ```

 ::::
@@ -125,7 +135,7 @@ if __name__ == "__main__":

 vLLM can also be deployed as a server that implements the OpenAI API protocol. Run
 the following command to start the vLLM server with the
-[Qwen3-8B]model:
+[Qwen3-VL-30B-A3B-Instruct]model:

 <!-- tests/e2e/doctest/001-quickstart-test.sh should be considered updating as well -->

@@ -133,7 +143,7 @@ the following command to start the vLLM server with the
 python -m vllm.entrypoints.openai.api_server \
      --host 0.0.0.0 \
      --port 8356 \
-      --model /models/Qwen3-8B\
+      --model models/Qwen3-VL-30B-A3B-Instruct \
      --gpu-memory-utilization 0.9 \
      --trust-remote-code \
      --max-model-len 32768 \
@@ -141,15 +151,21 @@ python -m vllm.entrypoints.openai.api_server \
      --dtype float16 \
      --max_num_seqs 128 \
      --max_num_batched_tokens 32768 \
-      --max-seq-len-to-capture 32768 \
      --block-size 128 \
      --no-enable-prefix-caching \
      --no-enable-chunked-prefill \
      --distributed-executor-backend mp \
-      --served-model-name Qwen3-8B \
-      --compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun",
-            "vllm.unified_attention", "vllm.unified_attention_with_output",
-            "vllm.mamba_mixer2"]}' \
+      --served-model-name Qwen3-VL-30B-A3B-Instruct \
+      --compilation-config '{"splitting_ops": ["vllm.unified_attention", 
+                                                "vllm.unified_attention_with_output",
+                                                "vllm.unified_attention_with_output_kunlun",
+                                                "vllm.mamba_mixer2", 
+                                                "vllm.mamba_mixer", 
+                                                "vllm.short_conv", 
+                                                "vllm.linear_attention", 
+                                                "vllm.plamo2_mamba_mixer", 
+                                                "vllm.gdn_attention", 
+                                                "vllm.sparse_attn_indexer"]}' \  
 ```

 If you see a log as below:
@@ -166,12 +182,14 @@ Congratulations, you have successfully started the vLLM server!
 You can query the model with input prompts:

 ```bash
-curl http://localhost:8356/v1/completions \
+curl http://localhost:8356/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
-        "model": "Qwen3-8B",
-        "prompt": "What is your name?",
-        "max_tokens": 7,
+        "model": "Qwen3-VL",
+        "messages": [
+          {"role": "user", "content": "What is your name?"}
+        ],
+        "max_tokens": 200,
        "temperature": 0
      }'

@@ -197,4 +215,4 @@ INFO:     Application shutdown complete.

 Finally, you can exit the container by using `ctrl-D`.
 ::::
-:::::
+:::::
--- a/docs/source/tutorials/multi_xpu_GLM-4.5.md
+++ b/docs/source/tutorials/multi_xpu_GLM-4.5.md
@@ -17,10 +17,9 @@ docker run -itd \
        -v /usr/local/bin/:/usr/local/bin/ \
        -v /lib/x86_64-linux-gnu/libxpunvidia-ml.so.1:/lib/x86_64-linux-gnu/libxpunvidia-ml.so.1 \
        iregistry.baidu-int.com/hac_test/aiak-inference-llm:xpu_dev_20251113_221821 bash
-
+        
 docker exec -it glm-vllm-01011 /bin/bash
 ```
-
 ### Offline Inference on multi XPU

 Start the server in a container:
@@ -31,7 +30,7 @@ import os
 from vllm import LLM, SamplingParams

 def main():
-
+    
    model_path = "/data/GLM-4.5"

    llm_params = {
@@ -51,7 +50,7 @@ def main():
            "content": [
                {
                    "type": "text",
-                    "text": "Hello, who are you?"
+                    "text": "你好，请问你是谁?"
                }
            ]
        }
@@ -69,8 +68,8 @@ def main():

    response = outputs[0].outputs[0].text
    print("=" * 50)
-    print("Input content:", messages)
-    print("Model response:\n", response)
+    print("输入内容:", messages)
+    print("模型回复:\n", response)
    print("=" * 50)

 if __name__ == "__main__":
@@ -84,10 +83,12 @@ If you run this script successfully, you can see the info shown below:

 ```bash
 ==================================================
-Input content: [{'role': 'user', 'content': [{'type': 'text', 'text': 'Hello, who are you?'}]}]
-Model response:
+输入内容: [{'role': 'user', 'content': [{'type': 'text', 'text': '你好，请问你是谁?'}]}]
+模型回复:
 <think>
-Well, the user asked a rather direct question about identity. This question seems simple, but there could be several underlying intentions—perhaps they are testing my reliability for the first time, or they simply want to confirm the identity of the conversational partner. From the common positioning of AI assistants, the user has provided a clear and flat way to define identity while leaving room for potential follow-up questions.\n\nThe user used "you" instead of "your", which leans towards a more informal tone, so the response style can be a bit more relaxed. However, since this is the initial response, it is better to maintain a moderate level of professionalism. Mentioning
+嗯，用户问了一个相当身份的直接问题。这个问题看似简单，但背后可能
+有几种可能性意—ta或许初次测试我的可靠性，或者单纯想确认对话方。从AI助手的常见定位，用户给出清晰平的方式明确身份，同时为后续可能
+的留出生进行的空间。\n\n用户用“你”这个“您”，语气更倾向非正式交流，所以回复风格可以轻松些。不过既然是初次回复，保持适度的专业性比较好稳妥。提到
 ==================================================
 ```

@@ -113,9 +114,8 @@ python -m vllm.entrypoints.openai.api_server \
      --no-enable-chunked-prefill \
      --distributed-executor-backend mp \
      --served-model-name GLM-4.5 \
-      --compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun", "vllm.unified_attention", "vllm.unified_attention_with_output", "vllm.mamba_mixer2"]}'  > log_glm_plugin.txt 2>&1 &
+      --compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun", "vllm.unified_attention", "vllm.unified_attention_with_output", "vllm.mamba_mixer2"]}'  > log_glm_plugin.txt 2>&1 & 
 ```
-
 If your service start successfully, you can see the info shown below:

 ```bash
@@ -132,7 +132,7 @@ curl http://localhost:8989/v1/chat/completions \
  -d '{
    "model": "GLM-4.5",
    "messages": [
-      {"role": "user", "content": "Hello, who are you?"}
+      {"role": "user", "content": "你好，请问你是谁?"}
    ],
    "max_tokens": 100,
    "temperature": 0.7
@@ -142,7 +142,7 @@ curl http://localhost:8989/v1/chat/completions \
 If you query the server successfully, you can see the info shown below (client):

 ```bash
-{"id":"chatcmpl-6af7318de7394bc4ae569e6324a162fa","object":"chat.completion","created":1763101638,"model":"GLM-4.5","choices":[{"index":0,"message":{"role":"assistant","content":"\n<think>The user asked, \"Hello, who are you?\" This is a question about my identity. First, I need to confirm the user's intent. They might be using this service for the first time or have never interacted with similar AI assistants before, so they want to know my background and capabilities.\n\nNext, I should ensure my answer is clear and friendly, focusing on key points: who I am, who developed me, and what I can do. I should avoid technical jargon and keep the response conversational so it's easy to understand.\n\nAdditionally, the user may have potential needs, such as wanting to know what I am capable of.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":11,"total_tokens":111,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_tr
+{"id":"chatcmpl-6af7318de7394bc4ae569e6324a162fa","object":"chat.completion","created":1763101638,"model":"GLM-4.5","choices":[{"index":0,"message":{"role":"assistant","content":"\n<think>用户问“你好，请问你是谁？”，这是一个应该是个了解我的身份。首先，我需要确认用户的需求是什么。可能他们是第一次使用这个服务，或者之前没有接触过类似的AI助手，所以想确认我的背景和能力。 \n\n接下来，我要确保回答清晰明了，同时友好关键点：我是谁，由谁开发，能做什么。需要避免使用专业术语，保持口语化，让不同容易理解。 \n\n然后，用户可能有潜在的需求，比如想了解我能","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":11,"total_tokens":111,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_tr
 ```

 Logs of the vllm server:
@@ -150,4 +150,4 @@ Logs of the vllm server:
 ```bash
 (APIServer pid=54567) INFO:     127.0.0.1:60338 - "POST /v1/completions HTTP/1.1" 200 OK
 (APIServer pid=54567) INFO 11-13 14:35:48 [loggers.py:123] Engine 000: Avg prompt throughput: 0.5 tokens/s, Avg generation throughput: 0.7 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
-```
+```
--- a/Show More
+++ b/Show More