xc-llm-ascend/vllm_ascend at 41eb71d665ab9f0b72b6d3bc15d41dee7fcc0f5f - xc-llm-ascend - Gitea: Git with a cup of tea

EngineX/xc-llm-ascend

Files

History

TMC 41eb71d665 [Refactor] profiler config optimze (#6141 )

### What this PR does / why we need it?
This PR optimizes the torch_npu profiler configuration to significantly
reduce overhead and trace file size. The key changes include:
**Enable Data Simplification**: Explicitly sets data_simplification=True
in _ExperimentalConfig. This filters out unnecessary intermediate data
during profiling, drastically reducing the memory footprint and I/O
overhead.
**Use Lightweight Stack Tracing**: Replaces with_stack with with_modules
when torch_profiler_with_stack is enabled. In torch_npu, with_stack
introduces heavy latency. with_modules provides equivalent semantic
information with much lower overhead.
**Code Simplification:** Removes redundant parameter configurations in
_ExperimentalConfig by utilizing default values, making the codebase
cleaner and easier to maintain.

**Test setup:**
 max length = 50, profiler + stack enabled

**Before optimization:**
Profiler data size: 651 MB
Generate time: 3 seconds

**After optimization:**
Profiler data size: 156 MB (≈76% reduction)
Generate time: <1 second

### Does this PR introduce _any_ user-facing change?
No API changes. Users profiling on Ascend will experience faster
profiling execution and smaller trace files when stack tracing is
enabled.
### How was this patch tested?
Manually verified on Ascend NPU by running vLLM with the profiler
enabled. Confirmed that trace files are generated correctly containing
necessary stack/module info, while showing the reported reduction in
size and time.

- vLLM version: v0.13.0
- vLLM main:
d68209402d

Signed-off-by: mengchengTang <745274877@qq.com>

2026-01-27 22:09:50 +08:00

..

[Refact.]: refactoring 310p-kv cache allocator, align with main branch (#6270 )

2026-01-27 16:26:48 +08:00

_cann_ops_custom

[Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804 )

2025-11-28 18:06:39 +08:00

[Main2Main] Upgrade vllm commit to 0123 (#6169 )

2026-01-27 08:44:36 +08:00

[Graph][Fusion] Add MatmulAllReduceAddRMSNorm graph fusion for npugraph_ex. (#6006 )

2026-01-27 16:41:48 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #3 ) (#5978 )

2026-01-24 22:10:18 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

device_allocator

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

[Feature] Mooncake connector get remote ptp size (#5822 )

2026-01-26 14:28:33 +08:00

[EPLB][Bugfix] EPLB support fp/bf16 (#5531 )

2026-01-26 14:28:16 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5 ) (#5996 )

2026-01-24 22:45:38 +08:00

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6 ) (#6001 )

2026-01-24 22:08:33 +08:00

BugFix: Fix moe_load accumulation error in ACL graph mode (#6182 )

2026-01-26 17:18:46 +08:00

[Main2Main] Upgrade vllm commit to 0123 (#6169 )

2026-01-27 08:44:36 +08:00

[Refactor] Quantization Module Refactor (#5738 )

2026-01-23 14:13:47 +08:00

[ops] support advanced apply_top_k_top_p without top_k constraint (#6098 )

2026-01-26 09:08:42 +08:00

[Main2Main] Upgrade vllm commit to 0123 (#6169 )

2026-01-27 08:44:36 +08:00

[Refactor] profiler config optimze (#6141 )

2026-01-27 22:09:50 +08:00

[CI] optimize lint term (#5986 )

2026-01-22 15:46:59 +08:00

__init__.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

ascend_config.py

[Graph][Fusion] Add MatmulAllReduceAddRMSNorm graph fusion for npugraph_ex. (#6006 )

2026-01-27 16:41:48 +08:00

ascend_forward_context.py

[Main2Main] Upgrade vllm commit to 0123 (#6169 )

2026-01-27 08:44:36 +08:00

batch_invariant.py

[Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2 ) (#5977 )

2026-01-19 08:59:46 +08:00

cpu_binding.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

envs.py

Default enable MLAPO (#5952 )

2026-01-22 09:26:39 +08:00

flash_common3_context.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

meta_registration.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

platform.py

[Misc] Removes unnecessary graph size re-initialization (#6280 )

2026-01-27 14:38:07 +08:00

profiling_config.py

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

utils.py

[Misc] Removes unnecessary graph size re-initialization (#6280 )

2026-01-27 14:38:07 +08:00