[Misc][Doc] Add service profiling feature with user guide (#3756)

### What this PR does / why we need it?
To support the data collection capabilities of the msServiceProfiler on
vLLM-ascned framework and enable customization of data collection points
via configuration file, a default profiling configuration has been added
to vllm-ascend, facilitating debugging and optimization for developers
and users.

### Does this PR introduce _any_ user-facing change?
None

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: minghangc <29514143@qq.com>
This commit is contained in:
thonean
2025-11-12 09:07:14 +08:00
committed by GitHub
parent 1c677c3b87
commit e38fe92f40
7 changed files with 1044 additions and 4 deletions

View File

@@ -1,9 +1,10 @@
# Performance
:::{toctree}
::::{toctree}
:caption: Performance
:maxdepth: 1
performance_benchmark
profile_execute_duration
optimization_and_tuning
:::
service_profiling_guide
::::

View File

@@ -0,0 +1,250 @@
# Service Profiling Guide
In inference service processes, we sometimes need to monitor the internal execution flow of the inference service framework to identify performance issues. By collecting start and end timestamps of key processes, identifying critical functions or iterations, recording key events, and capturing diverse types of information, we can quickly pinpoint performance bottlenecks.
This guide walks you through collecting performance data for the vllm-ascend service framework and operators. It covers the full workflow from preparation and collection to analysis and visualization, helping you quickly get started with the profiling tool.
## Quick Start
### 0 Installation
Install the `msserviceprofiler` package using pip:
```bash
pip install msserviceprofiler==1.2.2
```
### 1 Preparation
Before starting the service, set the environment variable `SERVICE_PROF_CONFIG_PATH` to point to the profiling configuration file, and set the environment variable `PROFILING_SYMBOLS_PATH` to specify the YAML configuration file for the symbols that need to be imported. After that, start the vLLM service according to your deployment method.
```bash
cd ${path_to_store_profiling_files}
# Set environment variable
export SERVICE_PROF_CONFIG_PATH=ms_service_profiler_config.json
export PROFILING_SYMBOLS_PATH=service_profiling_symbols.yaml
# Start vLLM service
vllm serve Qwen/Qwen2.5-0.5B-Instruct &
```
The file `ms_service_profiler_config.json` is the profiling configuration. If it does not exist at the specified path, a default configuration will be generated automatically. If needed, you can customize it in advance according to the instructions in the `Profiling Configuration File` section below.
`service_profiling_symbols.yaml` is the configuration file containing the profiling points to be imported. You can choose **not** to set the `PROFILING_SYMBOLS_PATH` environment variable, in which case the default configuration file will be used. If the file does not exist at the path you specified, likewise, the system will generate a configuration file at your specified path for future configuration. You can customize it according to the instructions in the `Symbols Configuration File` section below.
### 2 Enable Profiling
To enable the performance data collection switch, change the `enable` field from `0` to `1` in the configuration file `ms_service_profiler_config.json`. This can be accomplished by executing the following sed command:
```bash
sed -i 's/"enable":\s*0/"enable": 1/' ./ms_service_profiler_config.json
```
### 3 Send Requests
Choose a request-sending method that suits your actual profiling needs:
```bash
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-0.5B-Instruct",
"prompt": "Beijing is a",
"max_tokens": 5,
"temperature": 0
}' | python3 -m json.tool
```
### 4 Analyze Data
```bash
# xxxx-xxxx is the directory automatically created based on vLLM startup time
cd /root/.ms_server_profiler/xxxx-xxxx
# Analyze data
msserviceprofiler analyze --input-path=./ --output-path output
```
### 5 View Results
After analysis, the `output` directory will contain:
- `chrome_tracing.json`: Chrome tracing format data, which can be opened in [MindStudio Insight](https://www.hiascend.com/document/detail/zh/mindstudio/81RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html).
- `profiler.db`: Performance data in database format.
- `request.csv`: Request-related data.
- `request_summary.csv`: Overall request metrics.
- `kvcache.csv`: KV Cache-related data.
- `batch.csv`: Batch scheduling-related data.
- `batch_summary.csv`: Overall batch scheduling metrics.
- `service_summary.csv`: Overall service-level metrics.
---
## Appendix
(profiling-configuration-file)=
### 1 Profiling Configuration File
The profiling configuration file controls profiling parameters and behavior.
#### File Format
The configuration is in JSON format. Main parameters:
| Parameter | Description | Required |
|:------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----:|
| enable | Switch for profiling: <br />0: disable<br />1: enable<br />Default: 0 | Yes |
| prof_dir | Directory to store collected performance data. <br />Default: $HOME/.ms_service_profiler | No |
| profiler_level | Data collection level. Default is "INFO" (normal level). | No |
| host_system_usage_freq | Sampling frequency of host CPU and memory metrics. Disabled by default. Range: integer 150, unit: Hz (times per second). Set to -1 to disable. <br />Note: Enabling this may consume significant memory. | No |
| npu_memory_usage_freq | Sampling frequency of NPU memory utilization. Disabled by default. Range: integer 150, unit: Hz (times per second). Set to -1 to disable. <br />Note: Enabling this may consume significant memory. | No |
| acl_task_time | Switch to collect operator dispatch latency and execution latency: <br />0: disable (default; 0 or invalid values mean disabled).<br />1: enable; calls `aclprofCreateConfig` with `ACL_PROF_TASK_TIME_L0`.<br />2: enable MSPTI-based data dumping; uses MSPTI for profiling and requires: `export LD_PRELOAD=$ASCEND_TOOLKIT_HOME/lib64/libmspti.so` | No |
| acl_prof_task_time_level | Level and duration for profiling: <br />L0: collect operator dispatch and execution latency only; lower overhead (no operator basic info).<br />L1: collect AscendCL interface performance (hostdevice and inter-device sync/async memory copy latencies), plus operator dispatch, execution, and basic info for comprehensive analysis.<br />time: profiling duration, integer 1999, in seconds.<br />If unset, defaults to L0 until program exit; invalid values fall back to defaults.<br />Level and duration can be combined, e.g., `"acl_prof_task_time_level": "L1,10"`. | No |
| api_filter | Filter to select API performance data to dump. For example, specifying "matmul" dumps all API data whose `name` contains "matmul". String, case-sensitive; use "" to separate multiple targets. Empty means dump all. <br />Effective only when `acl_task_time` is 2. | No |
| kernel_filter | Filter to select kernel performance data to dump. For example, specifying "matmul" dumps all kernel data whose `name` contains "matmul". String, case-sensitive; use "" to separate multiple targets. Empty means dump all. <br />Effective only when `acl_task_time` is 2. | No |
| timelimit | Profiling duration for the service. The process stops automatically after this time. Range: integer 07200, unit: seconds. Default 0 means unlimited. | No |
| domain | Limit profiling to the specified domains to reduce data volume. String, separated by semicolons, case-sensitive, e.g., "Request; KVCache".<br />Empty means all available domains.<br />Available domains: Request, KVCache, ModelExecute, BatchSchedule, Communication.<br />Note: If the selected domains are incomplete, analysis output may show warnings due to missing data. See [Reference Table 1](https://www.hiascend.com/document/detail/zh/canncommercial/82RC1/devaids/Profiling/mindieprofiling_0009.html#ZH-CN_TOPIC_0000002370256365__table1985410131831). | No |
#### Example Configuration
```json
{
"enable": 1,
"prof_dir": "vllm_prof",
"profiler_level": "INFO",
"acl_task_time": 0,
"acl_prof_task_time_level": "",
"timelimit": 0
}
```
---
(symbols-configuration-file)=
### 2 Symbols Configuration File
The symbols configuration file defines which functions/methods to profile and supports flexible configuration with custom attribute collection.
#### 2.1 File Name and Loading
- Default load path:`~/.config/vllm_ascend/service_profiling_symbols.MAJOR.MINOR.PATCH.yaml`( According to the installed version of vllm )
If you need to customize the profiling points, it is highly recommended to copy a profiling configuration file to your working directory using the `PROFILING_SYMBOLS_PATH` environment variable.
#### 2.2 Field Descriptions
| Field | Description | Example |
|:-----:|:-----|:-----|
| symbol | Python import path + attribute chain | `"vllm.v1.core.kv_cache_manager:KVCacheManager.free"` |
| handler | Handler type | `"timer"` (default) or `"pkg.mod:func"` (custom) |
| domain | Domain tag | `"KVCache"`, `"ModelExecute"` |
| name | Event name | `"EngineCoreExecute"` |
| min_version | Upper version constraint | `"0.9.1"` |
| max_version | Lower version constraint | `"0.11.0"` |
| attributes | Custom attribute collection | Only support for `"timer"` handler. See the section below |
#### 2.3 Examples
- Example 1: Custom handler
```yaml
- symbol: vllm.v1.core.kv_cache_manager:KVCacheManager.free
handler: vllm_profiler.config.custom_handler_example:kvcache_manager_free_example_handler
domain: Example
name: example_custom
```
- Example 2: Default timer
```yaml
- symbol: vllm.v1.engine.core:EngineCore.execute_model
domain: ModelExecute
name: EngineCoreExecute
```
- Example 3: Version constraint
```yaml
- symbol: vllm.v1.executor.abstract:Executor.execute_model
min_version: "0.9.1"
# No handler specified -> default timer
```
#### 2.4 Custom Attribute Collection
The `attributes` field supports flexible custom attribute collection and allows operations and transformations on function arguments and return values.
##### Basic Syntax
- Argument access: use the parameter name directly, e.g., `input_ids`
- Return value access: use the `return` keyword
- Pipeline operations: use `|` to chain multiple operations
- Attribute access: use `attr` to access object attributes
##### Example
```yaml
- symbol: vllm_ascend.worker.model_runner_v1:NPUModelRunner.execute_model
name: ModelRunnerExecuteModel
domain: ModelExecute
attributes:
- name: device
expr: args[0] | attr device | str
- name: dp
expr: args[0] | attr dp_rank | str
- name: batch_size
expr: args[0] | attr input_batch | attr _req_ids | len
```
##### Expression Notes
1. `len(input_ids)`: get the length of parameter `input_ids`.
2. `len(return) | str`: get the length of the return value and convert to string (equivalent to `str(len(return))`).
3. `return[0] | attr input_ids | len`: get the length of the `input_ids` attribute of the first element in the return value.
##### Supported Expression Types
- Basic operations: `len()`, `str()`, `int()`, `float()`
- Index access: `return[0]`, `return['key']`
- Attribute access: `return | attr attr_name`
- Pipeline composition: chain operations with `|`
##### Advanced Examples
```yaml
attributes:
# Get tensor shape
- name: tensor_shape
expr: input_tensor | attr shape | str
# Get specific value from a dict
- name: batch_size
expr: kwargs['batch_size']
# Conditional expression (requires custom handler support)
- name: is_training_mode
expr: training | bool
# Complex data processing
- name: processed_data_len
expr: data | attr items | len | str
```
#### 2.5 Custom Handler
When `handler` specifies a custom function, it must match the following signature:
```python
def custom_handler(original_func, this, *args, **kwargs):
"""
Custom handler
Args:
original_func: the original function object
this: the bound object (for methods)
*args: positional arguments
**kwargs: keyword arguments
Returns:
processing result
"""
# Custom logic
pass
```
If the custom handler fails to import, the system will automatically fall back to the default timer mode.

View File

@@ -0,0 +1,575 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2025, vllm-ascend team
# This file is distributed under the same license as the vllm-ascend
# package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2025.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: vllm-ascend\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-10-31 00:00+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: zh_CN <LL@li.org>\n"
"Language: zh_CN\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=1; plural=0;\n"
"Generated-By: Babel 2.17.0\n"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Service Profiling Guide"
msgstr "服务化性能采集指南"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "In inference service processes, we sometimes need to monitor the internal execution flow of the inference service framework to identify performance issues. By collecting start and end timestamps of key processes, identifying critical functions or iterations, recording key events, and capturing diverse types of information, we can quickly pinpoint performance bottlenecks."
msgstr "在推理服务过程中,我们有时需要监控推理服务框架的内部执行流程以定位性能问题。通过采集关键流程的起止时间、识别关键函数或迭代、记录关键事件并捕获多种类型的信息,可以快速定位性能瓶颈。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"This guide walks you through collecting performance data for the vllm-ascend "
"service framework and operators. It covers the full workflow from preparation "
"and collection to analysis and visualization, helping you quickly get started "
"with the profiling tool."
msgstr ""
"本部分将指导你如何采集 vllm-ascend 的服务化框架性能数据以及算子性能数据,覆盖从准备、采集、解析到结果展示的完整流程,帮助你快速上手性能采集工具。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Quick Start"
msgstr "快速开始"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "0 Installation"
msgstr "0 安装"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Install the `msserviceprofiler` package using pip:"
msgstr "使用 pip 安装 `msserviceprofiler` 包:"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "1 Preparation"
msgstr "1 准备采集"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"Before starting the service, set the environment variable "
"`SERVICE_PROF_CONFIG_PATH` to point to the profiling configuration file, "
"and set the environment variable `PROFILING_SYMBOLS_PATH` to specify the YAML "
"configuration file for the symbols that need to be imported. After that, start "
"the vLLM service according to your deployment method."
msgstr ""
"在启动服务之前,请设置环境变量`SERVICE_PROF_CONFIG_PATH`指定需要加载的性能分析配置文件,并设置环境变量`PROFILING_SYMBOLS_PATH`来指定需要导入的符号的 YAML 配置文件。之后,根据您的部署方式启动 vLLM 服务。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "cd ${path_to_store_profiling_files}"
msgstr "cd ${profiling 文件存放路径}"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Set environment variable"
msgstr "设置环境变量"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Start vLLM service"
msgstr "启动 vLLM 服务"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"The file `ms_service_profiler_config.json` is the profiling configuration. "
"If it does not exist at the specified path, a default configuration will be "
"generated automatically. If needed, you can customize it in advance according "
"to the instructions in the `Profiling Configuration File` section below."
msgstr ""
"其中 `ms_service_profiler_config.json` 为采集配置文件。若指定路径下不存在该文件,将自动生成一份默认配置。若有需要,可参照 `采集配置文件说明` 章节提前进行自定义配置。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"`service_profiling_symbols.yaml` is the configuration file containing "
"the profiling points to be imported. You can choose **not** to set the "
"`PROFILING_SYMBOLS_PATH` environment variable, in which case the default "
"configuration file will be used. If the file does not exist at the path "
"you specified, likewise, the system will generate a configuration file at "
"your specified path for future configuration. You can customize it according "
"to the instructions in the `Symbols Configuration File` section below."
msgstr "`service_profiling_symbols.yaml` 为需要导入的埋点配置文件。你也可以选择不设置环境变量 `PROFILING_SYMBOLS_PATH`,此时将使用默认的配置文件;若你指定的路径下不存在该文件,系统同样会在你指定的路径生成一份配置文件以便后续修改。可参考 `点位配置文件说明` 一节进行自定义。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "2 Enable Profiling"
msgstr "2 开启采集"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"To enable the performance data collection switch, change the `enable` field from "
"`0` to `1` in the configuration file `ms_service_profiler_config.json`. This can "
"be accomplished by executing the following sed command:"
msgstr "将配置文件`ms_service_profiler_config.json`中的 `enable` 字段由 `0` 修改为 `1`即可开启性能数据采集的开关可以通过执行下面sed指令完成采集服务的开启"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "3 Send Requests"
msgstr "3 发送请求"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"Choose a request-sending method that suits your actual profiling needs:"
msgstr "根据实际采集需求选择请求发送方式:"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "4 Analyze Data"
msgstr "4 解析数据"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "xxxx-xxxx is the directory automatically created based on vLLM startup time"
msgstr "xxxx-xxxx 为采集工具根据 vLLM 启动时间自动创建的存放目录"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Analyze data"
msgstr "解析数据"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "5 View Results"
msgstr "5 查看结果"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "After analysis, the `output` directory will contain:"
msgstr "解析完成后,`output` 目录下会生成:"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"`chrome_tracing.json`: Chrome tracing format data, which can be opened in "
"[MindStudio Insight](https://www.hiascend.com/document/detail/zh/mindstudio/81RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html)."
msgstr ""
"`chrome_tracing.json`Chrome 追踪格式数据,可在 [MindStudio Insight](https://www.hiascend.com/document/detail/zh/mindstudio/81RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html) 中打开。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`profiler.db`: Performance data in database format."
msgstr "`profiler.db`:数据库格式的性能数据。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`request.csv`: Request-related data."
msgstr "`request.csv`:请求相关数据。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`request_summary.csv`: Overall request metrics."
msgstr "`request_summary.csv`:请求总体统计指标。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`kvcache.csv`: KV Cache-related data."
msgstr "`kvcache.csv`KV Cache 相关数据。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`batch.csv`: Batch scheduling-related data."
msgstr "`batch.csv`:批次调度相关数据。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`batch_summary.csv`: Overall batch scheduling metrics."
msgstr "`batch_summary.csv`:批次调度总体统计指标。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`service_summary.csv`: Overall service-level metrics."
msgstr "`service_summary.csv`:服务化维度总体统计指标。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Appendix"
msgstr "附录"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "1 Profiling Configuration File"
msgstr "1 采集配置文件说明"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "The profiling configuration file controls profiling parameters and behavior."
msgstr "采集配置文件用于控制性能数据采集的参数与行为。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "File Format"
msgstr "配置文件格式"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "The configuration is in JSON format. Main parameters:"
msgstr "配置文件为 JSON 格式,主要参数如下:"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Parameter"
msgstr "参数"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Description"
msgstr "说明"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Required"
msgstr "是否必选"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "enable"
msgstr "enable"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Switch for profiling: <br />0: disable<br />1: enable<br />Default: 0"
msgstr "是否开启性能数据采集的开关:<br />0关闭<br />1开启<br />默认值0"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Yes"
msgstr "是"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "prof_dir"
msgstr "prof_dir"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Directory to store collected performance data. <br />Default: $HOME/.ms_service_profiler"
msgstr "采集到性能数据的存放路径,支持用户自定义。<br />默认值:$HOME/.ms_service_profiler"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "No"
msgstr "否"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "profiler_level"
msgstr "profiler_level"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Data collection level. Default is \"INFO\" (normal level)."
msgstr "数据采集等级。默认值为\"INFO\",指普通级别的性能数据。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "host_system_usage_freq"
msgstr "host_system_usage_freq"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Sampling frequency of host CPU and memory metrics. Disabled by default. Range: integer 150, unit: Hz (times per second). Set to -1 to disable. <br />Note: Enabling this may consume significant memory."
msgstr "CPU和内存系统指标采集频率默认关闭不采集。范围整数1~50单位hz表示每秒采集的次数。设置为-1时关闭采集该指标。<br />说明:开启该功能可能占用较大内存"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "npu_memory_usage_freq"
msgstr "npu_memory_usage_freq"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Sampling frequency of NPU memory utilization. Disabled by default. Range: integer 150, unit: Hz (times per second). Set to -1 to disable. <br />Note: Enabling this may consume significant memory."
msgstr "NPU Memory使用率指标的采集频率默认关闭不采集。范围整数1~50单位hz表示每秒采集的次数。设置为-1时关闭采集该指标。<br />说明:开启该功能可能占用较大内存"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "acl_task_time"
msgstr "acl_task_time"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Switch to collect operator dispatch latency and execution latency: <br />0: disable (default; 0 or invalid values mean disabled).<br />1: enable; calls `aclprofCreateConfig` with `ACL_PROF_TASK_TIME_L0`.<br />2: enable MSPTI-based data dumping; uses MSPTI for profiling and requires: `export LD_PRELOAD=$ASCEND_TOOLKIT_HOME/lib64/libmspti.so`"
msgstr "开启采集算子下发耗时、算子执行耗时数据的开关,取值为:<br />0关闭。默认值配置为0或其他非法值均表示关闭。<br />1开启。该功能开启时调用aclprofCreateConfig接口的ACL_PROF_TASK_TIME_L0参数。<br />2开启基于MSPTI接口的数据落盘。该功能开启时调用MSPTI接口进行性能数据采集需要配置如下环境变量export LD_PRELOAD=$ASCEND_TOOLKIT_HOME/lib64/libmspti.so"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "acl_prof_task_time_level"
msgstr "acl_prof_task_time_level"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Level and duration for profiling: <br />L0: collect operator dispatch and execution latency only; lower overhead (no operator basic info).<br />L1: collect AscendCL interface performance (hostdevice and inter-device sync/async memory copy latencies), plus operator dispatch, execution, and basic info for comprehensive analysis.<br />time: profiling duration, integer 1999, in seconds.<br />If unset, defaults to L0 until program exit; invalid values fall back to defaults.<br />Level and duration can be combined, e.g., `\"acl_prof_task_time_level\": \"L1,10\"`."
msgstr "设置性能数据采集的Level等级和时长取值为<br />L0Level0等级表示采集算子下发耗时、算子执行耗时数据。与L1相比由于不采集算子基本信息数据采集时性能开销较小可更精准统计相关耗时数据。<br />L1Level1等级采集AscendCL接口的性能数据包括Host与Device之间、Device间的同步异步内存复制时延采集算子下发耗时、算子执行耗时数据以及算子基本信息数据提供更全面的性能分析数据。<br />time采集时长取值范围为1~999的正整数单位s。<br />默认未配置本参数表示采集L0数据且采集到程序执行结束。配置其他非法值时取默认值。<br />采集的Level等级和时长可同时配置例如\"acl_prof_task_time_level\": \"L1,10\"。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "api_filter"
msgstr "api_filter"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Filter to select API performance data to dump. For example, specifying \"matmul\" dumps all API data whose `name` contains \"matmul\". String, case-sensitive; use \"\" to separate multiple targets. Empty means dump all. <br />Effective only when `acl_task_time` is 2."
msgstr "对性能数据进行过滤配置该参数可自定义采集配置的API性能数据例如传入\"matmul\"会落盘所有API数据中name字段包含matmul的性能数据。str类型区分大小写多个不同的筛选目标用\"\"隔开,默认为空,表示落盘所有数据。<br />仅当acl_task_time参数值为2时生效。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "kernel_filter"
msgstr "kernel_filter"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Filter to select kernel performance data to dump. For example, specifying \"matmul\" dumps all kernel data whose `name` contains \"matmul\". String, case-sensitive; use \"\" to separate multiple targets. Empty means dump all. <br />Effective only when `acl_task_time` is 2."
msgstr "对性能数据进行过滤配置该参数可自定义采集配置的kernel性能数据例如传入\"matmul\"会落盘所有kernel数据中name字段包含matmul的性能数据。str类型区分大小写多个不同的筛选目标用\"\"隔开,默认为空,表示落盘所有数据。<br />仅当acl_task_time参数值为2时生效。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "timelimit"
msgstr "timelimit"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Profiling duration for the service. The process stops automatically after this time. Range: integer 07200, unit: seconds. Default 0 means unlimited."
msgstr "设置服务化性能数据采集的时长配置该参数后采集进程将在运行指定的时间后自动停止取值范围为0~7200的整数单位s默认值0表示不限制采集时间"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "domain"
msgstr "domain"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Limit profiling to the specified domains to reduce data volume. String, separated by semicolons, case-sensitive, e.g., \"Request; KVCache\".<br />Empty means all available domains.<br />Available domains: Request, KVCache, ModelExecute, BatchSchedule, Communication.<br />Note: If the selected domains are incomplete, analysis output may show warnings due to missing data. See [Reference Table 1](https://www.hiascend.com/document/detail/zh/canncommercial/82RC1/devaids/Profiling/mindieprofiling_0009.html#ZH-CN_TOPIC_0000002370256365__table1985410131831)."
msgstr "设置采集指定domain域下的性能数据减少采集数据量。输入参数为字符串格式英文分号作为分隔符区分大小写例如\"Request; KVCache\"。<br />默认为空表示采集当前所有domain域内性能数据。<br />当前已有domain域为Request、KVCache、ModelExecute、BatchSchedule、Communication。<br />说明:<br />若指定domain域不全采集数据不满足解析输出件生成时会有告警提示。[查看表1](https://www.hiascend.com/document/detail/zh/canncommercial/82RC1/devaids/Profiling/mindieprofiling_0009.html#ZH-CN_TOPIC_0000002370256365__table1985410131831)"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Example Configuration"
msgstr "配置示例"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "2 Symbols Configuration File"
msgstr "2 点位配置文件说明"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "The symbols configuration file defines which functions/methods to profile and supports flexible configuration with custom attribute collection."
msgstr "点位配置文件用于定义需要采集的函数/方法,支持灵活配置与自定义属性采集。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "2.1 File Name and Loading"
msgstr "2.1 文件命名与加载"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Default load path:`~/.config/vllm_ascend/service_profiling_symbols.MAJOR.MINOR.PATCH.yaml`( According to the installed version of vllm )"
msgstr "默认加载路径:`~/.config/vllm_ascend/service_profiling_symbols.MAJOR.MINOR.PATCH.yaml`(随已安装的 vllm 版本变化)"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "If you need to customize the profiling points, it is highly recommended to copy a profiling configuration file to your working directory using the `PROFILING_SYMBOLS_PATH` environment variable."
msgstr "如需自定义采集点,推荐通过设置环境变量`PROFILING_SYMBOLS_PATH`,将一份点位配置文件复制到工作目录进行修改使用。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "2.2 Field Descriptions"
msgstr "2.2 配置字段说明"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Field"
msgstr "字段"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Example"
msgstr "示例"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "symbol"
msgstr "symbol"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Python import path + attribute chain"
msgstr "Python 导入路径 + 属性链"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`\"vllm.v1.core.kv_cache_manager:KVCacheManager.free\"`"
msgstr "`\"vllm.v1.core.kv_cache_manager:KVCacheManager.free\"`"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "handler"
msgstr "handler"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Handler type"
msgstr "处理函数类型"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`\"timer\"` (default) or `\"pkg.mod:func\"` (custom)"
msgstr "`\"timer\"`(默认)或 `\"pkg.mod:func\"`(自定义)"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "domain"
msgstr "domain"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Domain tag"
msgstr "埋点域标识"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`\"KVCache\"`, `\"ModelExecute\"`"
msgstr "`\"KVCache\"`, `\"ModelExecute\"`"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "name"
msgstr "name"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Event name"
msgstr "埋点名称"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`\"EngineCoreExecute\"`"
msgstr "`\"EngineCoreExecute\"`"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "min_version"
msgstr "min_version"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "max_version"
msgstr "max_version"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Upper version constraint"
msgstr "最高版本约束"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Lower version constraint"
msgstr "最低版本约束"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`\"0.9.1\"`"
msgstr "`\"0.9.1\"`"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`\"0.11.0\"`"
msgstr "`\"0.11.0\"`"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "attributes"
msgstr "attributes"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Custom attribute collection"
msgstr "自定义属性采集"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Only support for `"timer"` handler. See the section below"
msgstr "只支持 `"timer"` handler。详见下方自定义属性采集机制"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "2.3 Examples"
msgstr "2.3 配置示例"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Example 1: Custom handler"
msgstr "示例 1自定义处理函数"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Example 2: Default timer"
msgstr "示例 2默认计时器"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Example 3: Version constraint"
msgstr "示例 3版本约束"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "No handler specified -> default timer"
msgstr "未指定 handler -> 默认 timer"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "2.4 Custom Attribute Collection"
msgstr "2.4 自定义属性采集机制"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "The `attributes` field supports flexible custom attribute collection and allows operations and transformations on function arguments and return values."
msgstr "`attributes` 字段支持灵活的自定义属性采集,可对函数参数与返回值进行多种操作与转换。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Basic Syntax"
msgstr "基本语法"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Argument access: use the parameter name directly, e.g., `input_ids`"
msgstr "参数访问:直接使用参数名,如 `input_ids`"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Return value access: use the `return` keyword"
msgstr "返回值访问:使用 `return` 关键字"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Pipeline operations: use `|` to chain multiple operations"
msgstr "管道操作:使用 `|` 分隔多个操作"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Attribute access: use `attr` to access object attributes"
msgstr "属性访问:使用 `attr` 获取对象属性"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Example"
msgstr "配置示例"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Expression Notes"
msgstr "表达式说明"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "`len(input_ids)`: get the length of parameter `input_ids`."
msgstr "`len(input_ids)`:获取 `input_ids` 参数的长度。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"`len(return) | str`: get the length of the return value and convert to "
"string (equivalent to `str(len(return))`)."
msgstr "`len(return) | str`:获取返回值长度并转换为字符串(等价于 `str(len(return))`)。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"`return[0] | attr input_ids | len`: get the length of the `input_ids` "
"attribute of the first element in the return value."
msgstr "`return[0] | attr input_ids | len`:获取返回值第一个元素的 `input_ids` 属性长度。"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Supported Expression Types"
msgstr "支持的表达式类型"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Basic operations: `len()`, `str()`, `int()`, `float()`"
msgstr "基础操作:`len()`, `str()`, `int()`, `float()`"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Index access: `return[0]`, `return['key']`"
msgstr "索引访问:`return[0]`, `return['key']`"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Attribute access: `return | attr attr_name`"
msgstr "属性访问:`return | attr attr_name`"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Pipeline composition: chain operations with `|`"
msgstr "管道组合:多个操作通过 `|` 连接"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Advanced Examples"
msgstr "高级示例"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Get tensor shape"
msgstr "获取张量形状"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Get specific value from a dict"
msgstr "获取字典中的特定值"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Conditional expression (requires custom handler support)"
msgstr "条件表达式(需要自定义处理函数支持)"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Complex data processing"
msgstr "复杂的数据处理"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "2.5 Custom Handler"
msgstr "2.5 自定义处理函数"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"When `handler` specifies a custom function, it must match the following "
"signature:"
msgstr "当 `handler` 字段指定自定义处理函数时,该函数需满足以下签名:"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Custom handler"
msgstr "自定义处理函数"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "original_func: the original function object"
msgstr "original_func: 原始函数对象"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "this: the bound object (for methods)"
msgstr "this: 调用对象(对于方法调用)"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "*args: positional arguments"
msgstr "*args: 位置参数"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "**kwargs: keyword arguments"
msgstr "**kwargs: 关键字参数"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "processing result"
msgstr "处理结果"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid "Custom logic"
msgstr "自定义处理逻辑"
#: ../../developer_guide/performance/service_profiling_guide.md
msgid ""
"If the custom handler fails to import, the system will automatically fall "
"back to the default timer mode."
msgstr "若自定义处理函数导入失败,系统会自动回退至默认计时器模式。"

View File

@@ -18,3 +18,4 @@ protobuf>3.20.0
librosa
soundfile
pytest_mock
msserviceprofiler>=1.2.2

View File

@@ -393,7 +393,8 @@ setup(
"vllm.general_plugins": [
"ascend_enhanced_model = vllm_ascend:register_model",
"ascend_kv_connector = vllm_ascend:register_connector",
"ascend_model_loader = vllm_ascend:register_model_loader"
"ascend_model_loader = vllm_ascend:register_model_loader",
"ascend_service_profiling = vllm_ascend:register_service_profiling"
],
},
)

View File

@@ -34,4 +34,9 @@ def register_connector():
def register_model_loader():
from .model_loader.netloader import register_netloader
register_netloader()
register_netloader()
def register_service_profiling():
from .profiling_config import generate_service_profiling_config
generate_service_profiling_config()

View File

@@ -0,0 +1,207 @@
#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is a part of the vllm-ascend project.
#
"""
Service profiling configuration generator module.
This module generates the service_profiling_symbols.yaml configuration file
to ~/.config/vllm_ascend/ directory.
"""
import tempfile
from pathlib import Path
from typing import Optional
import vllm
from vllm.logger import logger
VLLM_VERSION = vllm.__version__
# Configuration file name
CONFIG_FILENAME = f"service_profiling_symbols.{VLLM_VERSION}.yaml"
# Hard-coded YAML content, default symbols changed by user can be added here.
SERVICE_PROFILING_SYMBOLS_YAML = """
# ===== Batch / Scheduler =====
- symbol: vllm.v1.engine.processor:Processor.process_inputs
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.batch_hookers:process_inputs
- symbol: vllm.v1.core.sched.scheduler:Scheduler.schedule
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.batch_hookers:schedule
name: batchFrameworkProcessing
- symbol: vllm_ascend.core.scheduler:AscendScheduler.schedule
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.batch_hookers:schedule
name: batchFrameworkProcessing
- symbol: vllm.v1.core.sched.scheduler:Scheduler._free_request
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.batch_hookers:free_request
- symbol: vllm.v1.core.sched.scheduler:Scheduler.add_request
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.batch_hookers:add_request
# ===== KV Cache =====
- symbol: vllm.v1.core.kv_cache_manager:KVCacheManager.allocate_slots
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.kvcache_hookers:allocate_slots
- symbol: vllm.v1.core.kv_cache_manager:KVCacheManager.free
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.kvcache_hookers:free
- symbol: vllm.v1.core.kv_cache_manager:KVCacheManager.get_computed_blocks
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.kvcache_hookers:get_computed_blocks
# ===== Model Execute =====
- symbol: vllm.model_executor.layers.logits_processor:LogitsProcessor.forward
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.model_hookers:compute_logits
name: computing_logits
- symbol: vllm.v1.sample.sampler:Sampler.forward
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.model_hookers:sampler_forward
name: sample
- symbol: vllm.v1.executor.abstract:Executor.execute_model
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.model_hookers:execute_model
name: modelExec
- symbol: vllm.v1.executor.multiproc_executor:MultiprocExecutor.execute_model
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.model_hookers:execute_model
name: modelExec
- symbol: vllm_ascend.worker.model_runner_v1:NPUModelRunner.execute_model
name: modelRunnerExec
domain: ModelExecute
- symbol: vllm_ascend.worker.model_runner_v1:NPUModelRunner._update_states
name: _update_states
domain: ModelExecute
- symbol: vllm_ascend.worker.model_runner_v1:NPUModelRunner._prepare_inputs
name: _prepare_inputs
domain: ModelExecute
- symbol: vllm_ascend.utils:ProfileExecuteDuration.capture_async
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.model_hookers:capture_async
# ===== Request Lifecycle =====
- symbol: vllm.v1.engine.async_llm:AsyncLLM.add_request
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.request_hookers:add_request_async
- symbol: vllm.engine.async_llm_engine:AsyncLLMEngine.add_request
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.request_hookers:add_request_async
- symbol: vllm.v1.engine.output_processor:OutputProcessor.process_outputs
min_version: "0.9.1"
handler: msserviceprofiler.vllm_profiler.vllm_v1.request_hookers:process_outputs
"""
def get_config_dir() -> Path:
"""
Get the vllm_ascend configuration directory path.
Returns:
Path: The path to ~/.config/vllm_ascend/ directory.
"""
home_dir = Path.home()
config_dir = home_dir / ".config" / "vllm_ascend"
return config_dir
def _cleanup_temp_file(tmp_path: Optional[Path]) -> None:
"""
Clean up a temporary file if it exists.
Args:
tmp_path: Path to the temporary file to clean up.
"""
if tmp_path is not None and tmp_path.exists():
try:
tmp_path.unlink()
except OSError:
pass # Ignore cleanup errors
def generate_service_profiling_config() -> Optional[Path]:
"""
Generate the service_profiling_symbols.yaml configuration file
to ~/.config/vllm_ascend/ directory.
If the configuration file already exists, this function will skip
creating it and return the existing file path.
If any error occurs during file creation, it will be logged but
will not interrupt the execution. The function will return None
to indicate that the file could not be created.
Returns:
Optional[Path]: The path to the generated (or existing) configuration file.
Returns None if file creation failed.
"""
config_dir = get_config_dir()
config_file = config_dir / CONFIG_FILENAME
# Check if the configuration file already exists
if config_file.exists():
return config_file
# Create the configuration directory if it doesn't exist
try:
config_dir.mkdir(parents=True, exist_ok=True)
except (OSError, PermissionError) as e:
logger.error(
f"Failed to create configuration directory {config_dir}: {e}",
exc_info=True)
return None
# Write the configuration file atomically using a temporary file
# This ensures the file is only written if the write succeeds completely
tmp_path = None
try:
# Create a temporary file in the same directory for atomic write
with tempfile.NamedTemporaryFile(mode='w',
encoding='utf-8',
dir=config_dir,
delete=False,
suffix='.tmp',
prefix=CONFIG_FILENAME +
'.') as tmp_file:
tmp_file.write(SERVICE_PROFILING_SYMBOLS_YAML)
tmp_path = Path(tmp_file.name)
# Atomically replace the target file with the temporary file
tmp_path.replace(config_file)
return config_file
except (OSError, PermissionError) as e:
logger.error(f"Failed to write configuration file {config_file}: {e}",
exc_info=True)
return None
finally:
# Clean up the temporary file if it wasn't successfully replaced
_cleanup_temp_file(tmp_path)