[Doc][v0.18.0] Fix documentation formatting and improve code examples (#8701)

### What this PR does / why we need it? This PR fixes various documentation issues and improves code examples throughout the project. Signed-off-by: MrZ20 <2609716663@qq.com>
2026-04-28 09:01:25 +08:00
parent 9a0b786f2b
commit 2e2aaa2fae
38 changed files with 205 additions and 188 deletions
--- a/docs/source/tutorials/hardwares/310p.md
+++ b/docs/source/tutorials/hardwares/310p.md
@@ -96,7 +96,7 @@ Run the following steps to start the vLLM service on NPU for the Qwen3 Dense ser
            --served_model_name qwen --dtype float16 \
            --additional-config '{"ascend_compilation_config": {"fuse_norm_quant": false}}' \
            --compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes": [1,2,4,8,16,32]}' \
-            --quantization ascend --max_model_len 16384
+            --quantization ascend --max-model-len 16384
        # `--load_format` is required only for the W8A8SC quantized weight format.
        #
        ```
@@ -134,7 +134,7 @@ Run the following steps to start the vLLM service on NPU for the Qwen3 Dense ser
                --enforce-eager \
                --dtype float16 \
                --quantization ascend \
-                --max_model_len 10240
+                --max-model-len 10240
            ```

            Argument notes: `--tensor-parallel-size`: `W8A8SC` quantized weights are tightly coupled to the TP size, so you must specify the TP size you plan to use at serving time when running compression. `--model` is the path to the input `w8a8s` weights, and `--output` is the output path for the compressed `w8a8sc` weights.
@@ -159,7 +159,7 @@ Run the following steps to start the vLLM service on NPU for the Qwen3 Dense ser
            --additional-config '{"ascend_compilation_config": {"fuse_norm_quant": false}}' \
            --compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes": [1,2,4,8,16,32]}' \
            --quantization ascend \
-            --max_model_len 16384 \
+            --max-model-len 16384 \
            --no-enable-prefix-caching \
            --load_format="sharded_state"
        ```
@@ -178,7 +178,7 @@ Run the following steps to start the vLLM service on NPU for the Qwen3 Dense ser
            --additional-config '{"ascend_compilation_config": {"fuse_norm_quant": false}}' \
            --compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes": [1,2,4,8,16]}' \
            --quantization ascend \
-            --max_model_len 16384 \
+            --max-model-len 16384 \
            --no-enable-prefix-caching \
            --load_format="sharded_state"
        ```
@@ -199,7 +199,7 @@ Run the following steps to start the vLLM service on NPU for the Qwen3 Dense ser
            --additional-config '{"ascend_compilation_config": {"fuse_norm_quant": false}}' \
            --compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY", "cudagraph_capture_sizes": [16,32]}' \
            --quantization ascend \
-            --max_model_len 20480 \
+            --max-model-len 20480 \
            --no-enable-prefix-caching \
            --load_format="sharded_state"
        ```