xc-llm-ascend

Files

Angazenn a970b27e2d [WIP][Perf]remove unnecessary padding before MLA V1 prefill (#917 )

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
Currently, the implementation for MLA V1 pads q, k, v to `head_dim` 256
to conform to early MLA kernel. But the new MLA kernel supports
`head_dim` that can't be devided by 128. Therefore we can remove those
unnecessary paddings to boost the performance

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>

2025-05-23 14:14:06 +08:00

attention

[WIP][Perf]remove unnecessary padding before MLA V1 prefill (#917 )

2025-05-23 14:14:06 +08:00

core

[Feature] Impl v1 disaggregated prefill in ascend scheduler (#852 )

2025-05-23 10:15:29 +08:00

device_allocator

[bugfix] Improve log level and info for custom ops build (#937 )

2025-05-23 10:05:57 +08:00

distributed

[BugFix]add all2all when dp_size > 1 && downgrade npu_dequant_swiglu_quant (#819 )