[Doc] fix the nit in docs (#6826)

Refresh the doc, fix the nit in the docs - vLLM version: v0.15.0 - vLLM main: 83b47f67b1 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-02-27 11:50:27 +08:00
parent 981d803cb7
commit a95c0b8b82
30 changed files with 145 additions and 118 deletions
--- a/docs/source/tutorials/models/GLM5.md
+++ b/docs/source/tutorials/models/GLM5.md
@@ -2,15 +2,15 @@

 ## Introduction

-[GLM-5](https://huggingface.co/zai-org/GLM-5)use a Mixture-of-Experts (MoE) architecture and targeting at complex systems engineering and long-horizon agentic tasks.
+[GLM-5](https://huggingface.co/zai-org/GLM-5) use a Mixture-of-Experts (MoE) architecture and targeting at complex systems engineering and long-horizon agentic tasks.

 This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node and multi-node deployment, accuracy and performance evaluation.

 ## Supported Features

-Refer to [supported features](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html)to get the model's supported feature matrix.
+Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.

-Refer to [feature guide](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_features.html) to get the feature's configuration.
+Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.

 ## Environment Preparation

@@ -241,7 +241,7 @@ The parameters are explained as follows:

 ### Multi-node Deployment

-If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](https://docs.vllm.ai/projects/ascend/en/latest/installation.html#verify-multi-node-communication).
+If you want to deploy multi-node environment, you need to verify multi-node communication according to [verify multi-node communication environment](../../installation.md#verify-multi-node-communication).

 :::::{tab-set}
 :sync-group: install
@@ -450,7 +450,7 @@ vllm serve /root/.cache/modelscope/hub/models/vllm-ascend/GLM-5-w4a8 \
 ::::
 :::::

- For bf16 weight, use this script on each node to enable [Multi Token Prediction (MTP)](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/Multi_Token_Prediction.html).
+- For bf16 weight, use this script on each node to enable [Multi Token Prediction (MTP)](../../user_guide/feature_guide/Multi_Token_Prediction.md).

 ```shell
 python adjust_weight.py "path_of_bf16_weight"
@@ -518,7 +518,7 @@ Here are two accuracy evaluation methods.

 ### Using AISBench

-1. Refer to [Using AISBench](https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_ais_bench.html) for details.
+1. Refer to [Using AISBench](../../developer_guide/evaluation/using_ais_bench.md) for details.

 2. After execution, you can get the result.

@@ -530,7 +530,7 @@ Not test yet.

 ### Using AISBench

-Refer to [Using AISBench for performance evaluation](https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/evaluation/using_ais_bench.html#execute-performance-evaluation) for details.
+Refer to [Using AISBench for performance evaluation](../../developer_guide/evaluation/using_ais_bench.md#execute-performance-evaluation) for details.

 ### Using vLLM Benchmark

--- a/docs/source/tutorials/models/Kimi-K2-Thinking.md
+++ b/docs/source/tutorials/models/Kimi-K2-Thinking.md
@@ -1,5 +1,31 @@
 # Kimi-K2-Thinking

+## Introduction
+
+Kimi-K2-Thinking is a large-scale Mixture-of-Experts (MoE) model developed by Moonshot AI. It features a hybrid thinking architecture that excels in complex reasoning and problem-solving tasks.
+
+This document will show the main verification steps of the model, including supported features, environment preparation, single-node deployment, and functional verification.
+
+## Supported Features
+
+Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.
+
+Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.
+
+## Environment Preparation
+
+### Model Weight
+
+- `Kimi-K2-Thinking`(bfloat16): require 1 Atlas 800 A3 (64G × 16) node. [Download model weight](https://huggingface.co/moonshotai/Kimi-K2-Thinking).
+
+It is recommended to download the model weight to the shared directory, such as `/mnt/sfs_turbo/.cache/`.
+
+### Installation
+
+You can use our official docker image to run `Kimi-K2-Thinking` directly.
+
+Select an image based on your machine type and start the docker image on your node, refer to [using docker](../../installation.md#set-up-using-docker).
+
 ## Run with Docker

 ```{code-block} bash
@@ -90,7 +116,7 @@ For an Atlas 800 A3 (64G*16) node, tensor-parallel-size should be at least 16.
 vllm serve Kimi-K2-Thinking \
 --served-model-name kimi-k2-thinking \
 --tensor-parallel-size 16 \
--enable_expert_parallel \
+--enable-expert-parallel \
 --trust-remote-code \
 --no-enable-prefix-caching
 ```
--- a/docs/source/tutorials/models/PaddleOCR-VL.md
+++ b/docs/source/tutorials/models/PaddleOCR-VL.md
@@ -8,9 +8,9 @@ This document provides a detailed workflow for the complete deployment and verif

 ## Supported Features

-Refer to [supported features](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/support_matrix/supported_models.html) to get the model's supported feature matrix.
+Refer to [supported features](../../user_guide/support_matrix/supported_models.md) to get the model's supported feature matrix.

-Refer to [feature guide](https://docs.vllm.ai/projects/ascend/en/latest/user_guide/feature_guide/index.html) to get the feature's configuration.
+Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the feature's configuration.

 ## Environment Preparation

--- a/docs/source/tutorials/models/Qwen2.5-7B.md
+++ b/docs/source/tutorials/models/Qwen2.5-7B.md
@@ -18,7 +18,7 @@ Refer to [feature guide](../../user_guide/feature_guide/index.md) to get the fea

 ### Model Weight

- `Qwen2.5-7B-Instruct`(BF16 version): require 1 910B4 cards(32G × 1) [Qwen2.5-7B-Instruct](https://modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct)
+- `Qwen2.5-7B-Instruct`(BF16 version): require 1 Atlas 910B4 (32G × 1) card. [Download model weight](https://modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct)

 It is recommended to download the model weights to a local directory (e.g., `./Qwen2.5-7B-Instruct/`) for quick access during deployment.

--- a/docs/source/tutorials/models/Qwen3-30B-A3B.md
+++ b/docs/source/tutorials/models/Qwen3-30B-A3B.md
@@ -48,7 +48,7 @@ Run the following script to start the vLLM server on Multi-NPU:
 For an Atlas A2 with 64 GB of NPU card memory, tensor-parallel-size should be at least 2, and for 32 GB of memory, tensor-parallel-size should be at least 4.

 ```bash
-vllm serve Qwen/Qwen3-30B-A3B --tensor-parallel-size 4 --enable_expert_parallel
+vllm serve Qwen/Qwen3-30B-A3B --tensor-parallel-size 4 --enable-expert-parallel
 ```

 Once your server is started, you can query the model with input prompts.