[1/2/N] Enable pymarkdown and python __init__ for lint system (#2011)

### What this PR does / why we need it? 1. Enable pymarkdown check 2. Enable python `__init__.py` check for vllm and vllm-ascend 3. Make clean code ### How was this patch tested? - vLLM version: v0.9.2 - vLLM main: 29c6fbe58c --------- Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-25 22:16:10 +08:00
parent d629f0b2b5
commit bdfb065b5d
31 changed files with 215 additions and 64 deletions
--- a/docs/source/tutorials/multi_node.md
+++ b/docs/source/tutorials/multi_node.md
@@ -43,11 +43,13 @@ Execute the following commands on each node in sequence. The results must all be

 ### NPU Interconnect Verification:
 #### 1. Get NPU IP Addresses
+
 ```bash
 for i in {0..7}; do hccn_tool -i $i -ip -g | grep ipaddr; done
 ```

 #### 2. Cross-Node PING Test
+
 ```bash
 # Execute on the target node (replace with actual IP)
 hccn_tool -i 0 -ping -g address 10.20.0.20
@@ -95,6 +97,7 @@ Before launch the inference server, ensure some environment variables are set fo
 Run the following scripts on two nodes respectively

 **node0**
+
 ```shell
 #!/bin/sh

@@ -135,6 +138,7 @@ vllm serve /root/.cache/ds_v3 \
 ```

 **node1**
+
 ```shell
 #!/bin/sh

@@ -173,7 +177,7 @@ vllm serve /root/.cache/ds_v3 \
 --additional-config '{"ascend_scheduler_config":{"enabled":true},"torchair_graph_config":{"enabled":true}}'
 ```

-The Deployment view looks like: 
+The Deployment view looks like:
 ![alt text](../assets/multi_node_dp.png)

 Once your server is started, you can query the model with input prompts:
@@ -191,6 +195,7 @@ curl http://{ node0 ip:8004 }/v1/completions \

 ## Run benchmarks
 For details please refer to [benchmark](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks)
+
 ```shell
 vllm bench serve --model /root/.cache/ds_v3  --served-model-name deepseek_v3 \
 --dataset-name random --random-input-len 128 --random-output-len 128 \
--- a/docs/source/tutorials/multi_npu_moge.md
+++ b/docs/source/tutorials/multi_npu_moge.md
@@ -71,6 +71,7 @@ curl http://localhost:8000/v1/completions \
    "temperature": 0.6
  }'
 ```
+
 ::::

 ::::{tab-item} v1/chat/completions
@@ -91,6 +92,7 @@ curl http://localhost:8000/v1/chat/completions \
        "add_special_tokens" : true
    }'
 ```
+
 ::::
 :::::

@@ -170,9 +172,11 @@ if __name__ == "__main__":
    del llm
    clean_up()
 ```
+
 ::::

 ::::{tab-item} Eager Mode
+
 ```{code-block} python
   :substitutions:
 import gc
@@ -226,6 +230,7 @@ if __name__ == "__main__":
    del llm
    clean_up()
 ```
+
 ::::
 :::::

--- a/docs/source/tutorials/multi_npu_quantization.md
+++ b/docs/source/tutorials/multi_npu_quantization.md
@@ -30,7 +30,7 @@ docker run --rm \

 ## Install modelslim and convert model
 :::{note}
-You can choose to convert the model yourself or use the quantized model we uploaded, 
+You can choose to convert the model yourself or use the quantized model we uploaded,
 see https://www.modelscope.cn/models/vllm-ascend/QwQ-32B-W8A8
 :::

@@ -55,6 +55,7 @@ python3 quant_qwen.py --model_path $MODEL_PATH --save_directory $SAVE_PATH --cal

 ## Verify the quantized model
 The converted model files looks like:
+
 ```bash
 .
 |-- config.json
@@ -72,11 +73,13 @@ Run the following script to start the vLLM server with quantized model:
 :::{note}
 The value "ascend" for "--quantization" argument will be supported after [a specific PR](https://github.com/vllm-project/vllm-ascend/pull/877) is merged and released, you can cherry-pick this commit for now.
 :::
+
 ```bash
 vllm serve /home/models/QwQ-32B-w8a8  --tensor-parallel-size 4 --served-model-name "qwq-32b-w8a8" --max-model-len 4096 --quantization ascend
 ```

 Once your server is started, you can query the model with input prompts
+
 ```bash
 curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
@@ -93,7 +96,7 @@ curl http://localhost:8000/v1/completions \
 Run the following script to execute offline inference on multi-NPU with quantized model:

 :::{note}
-To enable quantization for ascend, quantization method must be "ascend" 
+To enable quantization for ascend, quantization method must be "ascend"
 :::

 ```python
@@ -131,4 +134,4 @@ for output in outputs:

 del llm
 clean_up()
-```
+```
--- a/docs/source/tutorials/single_node_300i.md
+++ b/docs/source/tutorials/single_node_300i.md
@@ -80,6 +80,7 @@ curl http://localhost:8000/v1/completions \
    "temperature": 0.6
  }'
 ```
+
 ::::

 ::::{tab-item} Qwen/Qwen2.5-7B-Instruct
@@ -318,6 +319,7 @@ if __name__ == "__main__":
 :::::

 Run script:
+
 ```bash
 python example.py
 ```
--- a/docs/source/tutorials/single_npu.md
+++ b/docs/source/tutorials/single_npu.md
@@ -66,6 +66,7 @@ for output in outputs:
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```
+
 ::::

 ::::{tab-item} Eager Mode
@@ -92,6 +93,7 @@ for output in outputs:
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 ```
+
 ::::
 :::::

@@ -131,6 +133,7 @@ docker run --rm \
 -it $IMAGE \
 vllm serve Qwen/Qwen3-8B --max_model_len 26240
 ```
+
 ::::

 ::::{tab-item} Eager Mode
@@ -156,6 +159,7 @@ docker run --rm \
 -it $IMAGE \
 vllm serve Qwen/Qwen3-8B --max_model_len 26240 --enforce-eager
 ```
+
 ::::
 :::::

--- a/docs/source/tutorials/single_npu_multimodal.md
+++ b/docs/source/tutorials/single_npu_multimodal.md
@@ -191,4 +191,4 @@ Logs of the vllm server:
 INFO 03-12 11:16:50 logger.py:39] Received request chatcmpl-92148a41eca64b6d82d3d7cfa5723aeb: prompt: '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>\nWhat is the text in the illustrate?<|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16353, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
 INFO 03-12 11:16:50 engine.py:280] Added request chatcmpl-92148a41eca64b6d82d3d7cfa5723aeb.
 INFO:     127.0.0.1:54004 - "POST /v1/chat/completions HTTP/1.1" 200 OK
-```
+```