[Lint]Style: reformat markdown files via markdownlint (#5884)

### What this PR does / why we need it? reformat markdown files via markdownlint - vLLM version: v0.13.0 - vLLM main: bde38c11df --------- Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain> Signed-off-by: MrZ20 <2609716663@qq.com> Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
2026-01-15 09:06:01 +08:00
parent 96edd4673f
commit 4811ba62e0
75 changed files with 711 additions and 308 deletions
--- a/docs/source/developer_guide/performance_and_debug/msprobe_guide.md
+++ b/docs/source/developer_guide/performance_and_debug/msprobe_guide.md
@@ -53,7 +53,7 @@ To restrict the operators that are captured, configure the `list` block:

 - `scope` (list[str]): In PyTorch pynative scenarios this field restricts the dump range. Provide two module or API names that follow the tool's naming convention to lock a range; only data between the two names will be dumped. Examples:

-  ```
+  ```json
  "scope": ["Module.conv1.Conv2d.forward.0", "Module.fc2.Linear.forward.0"]
  "scope": ["Cell.conv1.Conv2d.forward.0", "Cell.fc2.Dense.backward.0"]
  "scope": ["Tensor.add.0.forward", "Functional.square.2.forward"]
@@ -62,9 +62,9 @@ To restrict the operators that are captured, configure the `list` block:
  The `level` setting determines what can be provided—modules when `level=L0`, APIs when `level=L1`, and either modules or APIs when `level=mix`.

 - `list` (list[str]): Custom operator list. Options include:
-  - Supply the full names of specific APIs in PyTorch pynative scenarios to only dump those APIs. Example: `"list": ["Tensor.permute.1.forward", "Tensor.transpose.2.forward", "Torch.relu.3.backward"]`.
-  - When `level=mix`, you can provide module names so that the dump expands to everything produced while the module is running. Example: `"list": ["Module.module.language_model.encoder.layers.0.mlp.ParallelMlp.forward.0"]`.
-  - Provide a substring such as `"list": ["relu"]` to dump every API whose name contains the substring. When `level=mix`, modules whose names contain the substring are also expanded.
+    - Supply the full names of specific APIs in PyTorch pynative scenarios to only dump those APIs. Example: `"list": ["Tensor.permute.1.forward", "Tensor.transpose.2.forward", "Torch.relu.3.backward"]`.
+    - When `level=mix`, you can provide module names so that the dump expands to everything produced while the module is running. Example: `"list": ["Module.module.language_model.encoder.layers.0.mlp.ParallelMlp.forward.0"]`.
+    - Provide a substring such as `"list": ["relu"]` to dump every API whose name contains the substring. When `level=mix`, modules whose names contain the substring are also expanded.

 Example configuration:

@@ -188,7 +188,7 @@ Use `msprobe graph_visualize` to generate results that can be opened inside `tb_
   Replace the paths with your dump directories before invoking `msprobe graph_visualize`. **If you only need to build a single graph**, omit `bench_path` to visualize one dump.  
   Multi-rank scenarios (single rank, multi-rank, or multi-step multi-rank) are also supported. `npu_path` or `bench_path` must contain folders named `rank+number`, and every rank folder must contain a non-empty `construct.json` together with `dump.json` and `stack.json`. If any `construct.json` is empty, verify that the dump level includes `L0` or `mix`. When comparing graphs, both `npu_path` and `bench_path` must contain the same set of rank folders so they can be paired one-to-one.

-   ```
+   ```shell
   ├── npu_path or bench_path
   |   ├── rank0
   |   |   ├── dump_tensor_data (only when the `tensor` option is enabled)
--- a/docs/source/developer_guide/performance_and_debug/optimization_and_tuning.md
+++ b/docs/source/developer_guide/performance_and_debug/optimization_and_tuning.md
@@ -200,10 +200,12 @@ echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
 ```

 Purpose
+
 - Forces all CPU cores to run under the `performance` governor
 - Disables dynamic frequency scaling (e.g., `ondemand`, `powersave`)

 Benefits
+
 - Keeps CPU cores at maximum frequency
 - Reduces latency jitter
 - Improves predictability for inference workloads
@@ -224,6 +226,7 @@ Benefits
 - Improves stability for large in-memory models

 Notes
+
 - For inference workloads, swap can introduce second-level latency
 - Recommended values are `0` or `1`

@@ -244,6 +247,7 @@ Benefits
 - Improves performance stability on NUMA systems

 Recommended For
+
 - Multi-socket servers
 - Ascend / NPU deployments with explicit NUMA binding
 - Systems with manually managed CPU and memory affinity
@@ -255,14 +259,17 @@ sysctl -w kernel.sched_migration_cost_ns=50000
 ```

 Purpose
+
 - Increases the cost for the scheduler to migrate tasks between CPU cores

 Benefits
+
 - Reduces frequent thread migration
 - Improves CPU cache locality
 - Lowers latency jitter for inference workloads
  
 Parameter Details
+
 - Unit: nanoseconds (ns)
 - Typical recommended range: 50000–100000
 - Higher values encourage threads to stay on the same CPU core
--- a/docs/source/developer_guide/performance_and_debug/performance_benchmark.md
+++ b/docs/source/developer_guide/performance_and_debug/performance_benchmark.md
@@ -1,4 +1,5 @@
 # Performance Benchmark
+
 This document details the benchmark methodology for vllm-ascend, aimed at evaluating the performance under a variety of workloads. To maintain alignment with vLLM, we use the [benchmark](https://github.com/vllm-project/vllm/tree/main/benchmarks) script provided by the vllm project.

 **Benchmark Coverage**: We measure offline E2E latency and throughput, and fixed-QPS online serving benchmarks. For more details, see [vllm-ascend benchmark scripts](https://github.com/vllm-project/vllm-ascend/tree/main/benchmarks).
@@ -38,10 +39,12 @@ pip install -r benchmarks/requirements-bench.txt
 ```

 ## 3. Run basic benchmarks
+
 This section introduces how to perform performance testing using the benchmark suite built into VLLM.

 ### 3.1 Dataset
-VLLM supports a variety of (datasets)[https://github.com/vllm-project/vllm/blob/main/vllm/benchmarks/datasets.py].
+
+VLLM supports a variety of [datasets](https://github.com/vllm-project/vllm/blob/main/vllm/benchmarks/datasets.py).

 <style>
 th {
--- a/docs/source/developer_guide/performance_and_debug/profile_execute_duration.md
+++ b/docs/source/developer_guide/performance_and_debug/profile_execute_duration.md
@@ -5,19 +5,20 @@ The execution duration of each stage (including pre/post-processing, model forwa
 **To reduce the performance overhead, we add this feature, using the NPU event timestamp mechanism to observe the device execution time asynchronously.**

 ## Usage
+
 * Use the environment variable `VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE` to enable this feature.
 * Use the non-blocking API `ProfileExecuteDuration().capture_async` to set observation points asynchronously when you need to observe the execution duration.
 * Use the blocking API `ProfileExecuteDuration().pop_captured_sync` at an appropriate time to get and print the execution durations of all observed stages.

 **We have instrumented the key inference stages (including pre-processing, model forward pass, etc.) for execution duration profiling. Execute the script as follows:**

-```
+```shell
 VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE=1 python3 vllm-ascend/examples/offline_inference_npu.py
 ```

 ## Example Output

-```
+```shell
 5691:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.17ms [prepare input and forward]:9.57ms [forward]:4.14ms
 5695:(IntegratedWorker pid=1502285) Profile execute duration [Decode]: [post process]:14.29ms [prepare input and forward]:10.19ms [forward]:4.14ms
 5697:(IntegratedWorker pid=1502343) Profile execute duration [Decode]: [post process]:14.81ms [prepare input and forward]:10.29ms [forward]:3.99ms
--- a/docs/source/developer_guide/performance_and_debug/service_profiling_guide.md
+++ b/docs/source/developer_guide/performance_and_debug/service_profiling_guide.md
@@ -15,6 +15,7 @@ pip install msserviceprofiler==1.2.2
 ```

 ### 1 Preparation
+
 Before starting the service, set the environment variable `SERVICE_PROF_CONFIG_PATH` to point to the profiling configuration file, and set the environment variable `PROFILING_SYMBOLS_PATH` to specify the YAML configuration file for the symbols that need to be imported. After that, start the vLLM service according to your deployment method.

 ```bash
@@ -32,6 +33,7 @@ The file `ms_service_profiler_config.json` is the profiling configuration. If it
 `service_profiling_symbols.yaml` is the configuration file containing the profiling points to be imported. You can choose **not** to set the `PROFILING_SYMBOLS_PATH` environment variable, in which case the default configuration file will be used. If the file does not exist at the path you specified, likewise, the system will generate a configuration file at your specified path for future configuration. You can customize it according to the instructions in the `Symbols Configuration File` section below.

 ### 2 Enable Profiling
+
 To enable the performance data collection switch, change the `enable` field from `0` to `1` in the configuration file `ms_service_profiler_config.json`. This can be accomplished by executing the following sed command:

 ```bash
@@ -39,6 +41,7 @@ sed -i 's/"enable":\s*0/"enable": 1/' ./ms_service_profiler_config.json
 ```

 ### 3 Send Requests
+
 Choose a request-sending method that suits your actual profiling needs:

 ```bash
@@ -65,6 +68,7 @@ msserviceprofiler analyze --input-path=./ --output-path output
 ### 5 View Results

 After analysis, the `output` directory will contain:
+
 - `chrome_tracing.json`: Chrome tracing format data, which can be opened in [MindStudio Insight](https://www.hiascend.com/document/detail/zh/mindstudio/81RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html).
 - `profiler.db`: Performance data in database format.
 - `request.csv`: Request-related data.
@@ -77,7 +81,9 @@ After analysis, the `output` directory will contain:
 ---

 ## Appendix
+
 (profiling-configuration-file)=
+
 ### 1 Profiling Configuration File

 The profiling configuration file controls profiling parameters and behavior.
@@ -116,6 +122,7 @@ The configuration is in JSON format. Main parameters:
 ---

 (symbols-configuration-file)=
+
 ### 2 Symbols Configuration File

 The symbols configuration file defines which functions/methods to profile and supports flexible configuration with custom attribute collection.