[1/2/N] Enable pymarkdown and python __init__ for lint system (#2011)

### What this PR does / why we need it? 1. Enable pymarkdown check 2. Enable python `__init__.py` check for vllm and vllm-ascend 3. Make clean code ### How was this patch tested? - vLLM version: v0.9.2 - vLLM main: 29c6fbe58c --------- Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-25 22:16:10 +08:00
parent d629f0b2b5
commit bdfb065b5d
31 changed files with 215 additions and 64 deletions
--- a/docs/source/developer_guide/performance/performance_benchmark.md
+++ b/docs/source/developer_guide/performance/performance_benchmark.md
@@ -4,6 +4,7 @@ This document details the benchmark methodology for vllm-ascend, aimed at evalua
 **Benchmark Coverage**: We measure offline e2e latency and throughput, and fixed-QPS online serving benchmarks, for more details see [vllm-ascend benchmark scripts](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks).

 ## 1. Run docker container
+
 ```{code-block} bash
   :substitutions:
 # Update DEVICE according to your device (/dev/davinci[0-7])
@@ -29,6 +30,7 @@ docker run --rm \
 ```

 ## 2. Install dependencies
+
 ```bash
 cd /workspace/vllm-ascend
 pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
@@ -37,11 +39,13 @@ pip install -r benchmarks/requirements-bench.txt

 ## 3. (Optional)Prepare model weights
 For faster running speed, we recommend downloading the model in advance：
+
 ```bash
 modelscope download --model LLM-Research/Meta-Llama-3.1-8B-Instruct
 ```

 You can also replace all model paths in the [json](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks/tests) files with your local paths:
+
 ```bash
 [
  {
@@ -59,11 +63,13 @@ You can also replace all model paths in the [json](https://github.com/vllm-proje

 ## 4. Run benchmark script
 Run benchmark script:
+
 ```bash
 bash benchmarks/scripts/run-performance-benchmarks.sh
 ```

 After about 10 mins, the output is as shown below:
+
 ```bash
 online serving:
 qps 1:
@@ -173,6 +179,7 @@ Throughput: 4.64 requests/s, 2000.51 total tokens/s, 1010.54 output tokens/s
 Total num prompt tokens:  42659
 Total num output tokens:  43545
 ```
+
 The result json files are generated into the path `benchmark/results`
 These files contain detailed benchmarking results for further analysis.

--- a/docs/source/developer_guide/performance/profile_execute_duration.md
+++ b/docs/source/developer_guide/performance/profile_execute_duration.md
@@ -10,6 +10,7 @@ The execution duration of each stage (including pre/post-processing, model forwa
 * Use the blocking API `ProfileExecuteDuration().pop_captured_sync` at an appropriate time to get and print the execution durations of all observed stages.

 **We have instrumented the key inference stages (including pre-processing, model forward pass, etc.) for execute duration profiling. Execute the script as follows:**
+
 ```
 VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE=1 python3 vllm-ascend/examples/offline_inference_npu.py
 ```
@@ -36,4 +37,4 @@ VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE=1 python3 vllm-ascend/examples/offline_in
 5747:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:14.21ms [prepare input and forward]:10.10ms [forward]:4.52ms
 5751:(IntegratedWorker pid=1502524) Profile execute duration [Decode]: [post process]:15.03ms [prepare input and forward]:10.00ms [forward]:4.42ms

-```
+```