provide an e2e guide for execute duration profiling (#1113)

### What this PR does / why we need it?
provide an e2e guide for execute duration profiling


Signed-off-by: depeng1994 <depengzhang@foxmail.com>
This commit is contained in:
depeng1994
2025-06-11 10:02:11 +08:00
committed by GitHub
parent 7bdc606677
commit 860a5ef7fd
2 changed files with 7 additions and 2 deletions

View File

@@ -9,6 +9,11 @@ The execution duration of each stage (including pre/post-processing, model forwa
* Use the non-blocking API `ProfileExecuteDuration().capture_async` to set observation points asynchronously when you need to observe the execution duration.
* Use the blocking API `ProfileExecuteDuration().pop_captured_sync` at an appropriate time to get and print the execution durations of all observed stages.
**We have instrumented the key inference stages (including pre-processing, model forward pass, etc.) for execute duration profiling. Execute the script as follows:**
```
VLLM_ASCEND_MODEL_EXECUTE_TIME_OBSERVE=1 python3 vllm-ascend/examples/offline_inference_npu.py
```
## Example Output
```