xc-llm-ascend

Files

Angazenn 1d3544c887 [BugFix]converting pa get_workspace back to capturing (#5833 )

### What this PR does / why we need it?

This helps to fix a bug in for pa get_workspace. In earlier
implementation, we use `_npu_paged_attention_get_workspace` in
`_update_pa_attn_params`. However, this might cause some potential
memory problems as it dynamically allocate new memory for workspace when
calling this api. Therefor, we move this back to capturing, and use a
fixed `SEQ_LEN_WITH_MAX_PA_WORKSPACE` to get max workspace.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

Signed-off-by: Angazenn <supperccell@163.com>

2026-01-22 15:49:22 +08:00

npugraph_ex_passes

[Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912 )

2026-01-16 20:57:46 +08:00

passes

[Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (#5034 )

2026-01-19 09:28:07 +08:00

__init__.py

[Bugfix] add compilation/__init__.py to fix import error (#1152 )

2025-06-10 17:14:25 +08:00

acl_graph.py

[BugFix]converting pa get_workspace back to capturing (#5833 )