[Doc][v0.18.0] Fix documentation formatting and improve code examples (#8701)

### What this PR does / why we need it?
This PR fixes various documentation issues and improves code examples
throughout the project.

Signed-off-by: MrZ20 <2609716663@qq.com>
This commit is contained in:
SILONG ZENG
2026-04-28 09:01:25 +08:00
committed by GitHub
parent 9a0b786f2b
commit 2e2aaa2fae
38 changed files with 205 additions and 188 deletions

View File

@@ -96,7 +96,7 @@ Run the following steps to start the vLLM service on NPU for the Qwen3 Dense ser
--served_model_name qwen --dtype float16 \
--additional-config '{"ascend_compilation_config": {"fuse_norm_quant": false}}' \
--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes": [1,2,4,8,16,32]}' \
--quantization ascend --max_model_len 16384
--quantization ascend --max-model-len 16384
# `--load_format` is required only for the W8A8SC quantized weight format.
#
```
@@ -134,7 +134,7 @@ Run the following steps to start the vLLM service on NPU for the Qwen3 Dense ser
--enforce-eager \
--dtype float16 \
--quantization ascend \
--max_model_len 10240
--max-model-len 10240
```
Argument notes: `--tensor-parallel-size`: `W8A8SC` quantized weights are tightly coupled to the TP size, so you must specify the TP size you plan to use at serving time when running compression. `--model` is the path to the input `w8a8s` weights, and `--output` is the output path for the compressed `w8a8sc` weights.
@@ -159,7 +159,7 @@ Run the following steps to start the vLLM service on NPU for the Qwen3 Dense ser
--additional-config '{"ascend_compilation_config": {"fuse_norm_quant": false}}' \
--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes": [1,2,4,8,16,32]}' \
--quantization ascend \
--max_model_len 16384 \
--max-model-len 16384 \
--no-enable-prefix-caching \
--load_format="sharded_state"
```
@@ -178,7 +178,7 @@ Run the following steps to start the vLLM service on NPU for the Qwen3 Dense ser
--additional-config '{"ascend_compilation_config": {"fuse_norm_quant": false}}' \
--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes": [1,2,4,8,16]}' \
--quantization ascend \
--max_model_len 16384 \
--max-model-len 16384 \
--no-enable-prefix-caching \
--load_format="sharded_state"
```
@@ -199,7 +199,7 @@ Run the following steps to start the vLLM service on NPU for the Qwen3 Dense ser
--additional-config '{"ascend_compilation_config": {"fuse_norm_quant": false}}' \
--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes": [16,32]}' \
--quantization ascend \
--max_model_len 20480 \
--max-model-len 20480 \
--no-enable-prefix-caching \
--load_format="sharded_state"
```