[v0.11.0][Doc] Update doc (#3852)

### What this PR does / why we need it? Update doc Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-10-29 11:32:12 +08:00
parent 6188450269
commit 75de3fa172
49 changed files with 724 additions and 701 deletions
--- a/docs/source/tutorials/single_npu_qwen3_quantization.md
+++ b/docs/source/tutorials/single_npu_qwen3_quantization.md
@@ -1,8 +1,8 @@
 # Single-NPU (Qwen3 8B W4A8)

-## Run docker container
+## Run Docker Container
 :::{note}
-w4a8 quantization feature is supported by v0.9.1rc2 or higher
+w4a8 quantization feature is supported by v0.9.1rc2 and later.
 :::

 ```{code-block} bash
@@ -25,7 +25,7 @@ docker run --rm \
 -it $IMAGE bash
 ```

-## Install modelslim and convert model
+## Install modelslim and Convert Model
 :::{note}
 You can choose to convert the model yourself or use the quantized model we uploaded,
 see https://www.modelscope.cn/models/vllm-ascend/Qwen3-8B-W4A8
@@ -65,8 +65,8 @@ python quant_qwen.py \
          --w_method HQQ
 ```

-## Verify the quantized model
-The converted model files looks like:
+## Verify the Quantized Model
+The converted model files look like:

 ```bash
 .
@@ -84,13 +84,13 @@ The converted model files looks like:
 `-- tokenizer_config.json
 ```

-Run the following script to start the vLLM server with quantized model:
+Run the following script to start the vLLM server with the quantized model:

 ```bash
 vllm serve /home/models/Qwen3-8B-w4a8 --served-model-name "qwen3-8b-w4a8" --max-model-len 4096 --quantization ascend
 ```

-Once your server is started, you can query the model with input prompts
+Once your server is started, you can query the model with input prompts.

 ```bash
 curl http://localhost:8000/v1/completions \
@@ -105,10 +105,10 @@ curl http://localhost:8000/v1/completions \
    }'
 ```

-Run the following script to execute offline inference on Single-NPU with quantized model:
+Run the following script to execute offline inference on single-NPU with the quantized model:

 :::{note}
-To enable quantization for ascend, quantization method must be "ascend"
+To enable quantization for ascend, quantization method must be "ascend".
 :::

 ```python