xc-llm-ascend

Author SHA1 Message Date

Author	SHA1	Message	Date
Icey	cadfa5ddc1	[Fusion] [Graph] Add qknorm rope fusion operator (#4711 ) ### What this PR does / why we need it? This PR add `qkv_rmsnorm_rope` operator and introduces a graph fusion pass for `qknorm_rope` operations. The implementation includes a new configuration flag, a pattern matching pass using `torch._inductor.pattern_matcher`, and a custom Triton kernel for the fused operation. Co-authored-by: Angazenn [supperccell@163.com](mailto:supperccell@163.com) ### Does this PR introduce _any_ user-facing change? Yes, add new additional_config - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wxsIcey <1790571317@qq.com>	2025-12-17 08:53:44 +08:00
wujinyuan1	545e856971	[Refactor]3/N Refactor mla_v1.py & extract mla_cp (#4933 ) RFC: https://github.com/vllm-project/vllm-ascend/issues/4629 Reason： The functions related to Cp differ significantly from those of normal MLA-Attention, but the coupling is quite severe. Steps： Isolate PCP and DCP (1) create a new python file: mla_cp.py (2) add classes AscendMlaCPImpl and AscendMlaCPMetadataBuilder，Inheritance AscendMLAImpl and AscendMLAMetadataBuilder (3) Remove PCP and DCP-related methods from mla_v1.py to mla_cp.py vLLM version: v0.12.0 - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: wujinyuan1 <wjy9595@qq.com> Co-authored-by: wujinyuan1 <wjy9595@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>	2025-12-15 12:59:18 +08:00

Icey

cadfa5ddc1

[Fusion] [Graph] Add qknorm rope fusion operator (#4711 )

### What this PR does / why we need it?
This PR add `qkv_rmsnorm_rope` operator and introduces a graph fusion
pass for `qknorm_rope` operations. The implementation includes a new
configuration flag, a pattern matching pass using
`torch._inductor.pattern_matcher`, and a custom Triton kernel for the
fused operation.

Co-authored-by: Angazenn
[supperccell@163.com](mailto:supperccell@163.com)

### Does this PR introduce _any_ user-facing change?
Yes, add new additional_config

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: wxsIcey <1790571317@qq.com>

2025-12-17 08:53:44 +08:00

wujinyuan1

545e856971

[Refactor]3/N Refactor mla_v1.py & extract mla_cp (#4933 )

RFC: https://github.com/vllm-project/vllm-ascend/issues/4629
Reason：
The functions related to Cp differ significantly from those of normal
MLA-Attention, but the coupling is quite severe.

Steps：
Isolate PCP and DCP
(1) create a new python file: mla_cp.py
(2) add classes AscendMlaCPImpl and
AscendMlaCPMetadataBuilder，Inheritance AscendMLAImpl and
AscendMLAMetadataBuilder
(3) Remove PCP and DCP-related methods from mla_v1.py to mla_cp.py

vLLM version: v0.12.0

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: wujinyuan1 <wjy9595@qq.com>
Co-authored-by: wujinyuan1 <wjy9595@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>

2025-12-15 12:59:18 +08:00

2 Commits