xc-llm-ascend

Author SHA1 Message Date

Author	SHA1	Message	Date
Wang Yixuan	e0c5073956	[Bugfix]fix bmm_transpose ops for cann version (#4653 ) ### What this PR does / why we need it? Due to the upgrade of CANN version, custom op cannot be used in high version. In the high level cann version, the ops will start with redundant vector core while this ops will only use cube core, this results in the missalign when copy data from ub memory to global memory. So add limitation to the ops to make it use cube core only. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: hust17yixuan <303660421@qq.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-06 10:52:46 +08:00
Wang Yixuan	c68ddc11ce	[OPS] add bmm_transpose ops (#3990 ) ### What this PR does / why we need it? Add a new fusion ops to custom_op, which can cobime the torch.bmm() and transpsose to achieve better peformance. This ops is used in mla_v1 to replace the bmm and transpose ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.11.2 --------- Signed-off-by: hust17yixuan <303660421@qq.com>	2025-12-01 09:09:51 +08:00

Wang Yixuan

e0c5073956

[Bugfix]fix bmm_transpose ops for cann version (#4653 )

### What this PR does / why we need it?
Due to the upgrade of CANN version, custom op cannot be used in high
version. In the high level cann version, the ops will start with
redundant vector core while this ops will only use cube core, this
results in the missalign when copy data from ub memory to global memory.
So add limitation to the ops to make it use cube core only.
### Does this PR introduce _any_ user-facing change?
No

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: hust17yixuan <303660421@qq.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>

2025-12-06 10:52:46 +08:00

Wang Yixuan

c68ddc11ce

[OPS] add bmm_transpose ops (#3990 )

### What this PR does / why we need it?
Add a new fusion ops to custom_op, which can cobime the torch.bmm() and
transpsose to achieve better peformance. This ops is used in mla_v1 to
replace the bmm and transpose

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?


- vLLM version: v0.11.2

---------

Signed-off-by: hust17yixuan <303660421@qq.com>

2025-12-01 09:09:51 +08:00

2 Commits