[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-10-29 11:32:12 +08:00
parent 6188450269
commit 75de3fa172
49 changed files with 724 additions and 701 deletions
--- a/docs/source/tutorials/single_node_300i.md
+++ b/docs/source/tutorials/single_node_300i.md
@@ -1,13 +1,13 @@
-# Single Node (Atlas 300I series)
+# Single Node (Atlas 300I Series)

 ```{note}
-1. This Atlas 300I series is currently experimental. In future versions, there may be behavioral changes around model coverage, performance improvement.
-2. Currently, the 310I series only supports eager mode and the data type is float16.
-3. There are some known issues for running vLLM on 310p series, you can refer to vllm-ascend [<u>#3316</u>](https://github.com/vllm-project/vllm-ascend/issues/3316),
- [<u>#2795</u>](https://github.com/vllm-project/vllm-ascend/issues/2795), you can use v0.10.0rc1 version first.
+1. This Atlas 300I series is currently experimental. In future versions, there may be behavioral changes related to model coverage and performance improvement.
+2. Currently, the 310I series only supports eager mode and the float16 data type.
+3. There are some known issues for running vLLM on 310p series, you can refer to vllm-ascend [<u>#3316</u>](https://github.com/vllm-project/vllm-ascend/issues/3316) and 
+ [<u>#2795</u>](https://github.com/vllm-project/vllm-ascend/issues/2795). You can use v0.10.0rc1 version first.
 ```

-## Run vLLM on Altlas 300I series
+## Run vLLM on Atlas 300I Series

 Run docker container:

@@ -38,7 +38,7 @@ docker run --rm \
 -it $IMAGE bash
 ```

-Setup environment variables:
+Set up environment variables:

 ```bash
 # Load model from ModelScope to speed up download
@@ -50,7 +50,7 @@ export PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256

 ### Online Inference on NPU

-Run the following script to start the vLLM server on NPU(Qwen3-0.6B:1 card, Qwen2.5-7B-Instruct:2 cards, Pangu-Pro-MoE-72B: 8 cards):
+Run the following script to start the vLLM server on NPU (Qwen3-0.6B:1 card, Qwen2.5-7B-Instruct:2 cards, Pangu-Pro-MoE-72B: 8 cards):

 :::::{tab-set}
 :sync-group: inference
@@ -170,7 +170,7 @@ vllm serve /home/pangu-pro-moe-mode/ \

 ```

-Once your server is started, you can query the model with input prompts
+Once your server is started, you can query the model with input prompts.

 ```bash
 export question="你是谁？"