xc-llm-ascend

Author	SHA1	Message	Date
shiro-zzzz	bd8be2e759	[Kernel] Add moe normal ops (#4810 ) ### What this PR does / why we need it? 1.Add the implementation of normal Aclnn operators: MoeCombineNormal, MoeDispatchNormal, NotifyDispatch，and DispatchLayout. - MoeCombineNormal: Implements the combine logic within MoE operations. - MoeDispatchNormal: Implements the dispatch logic within MoE operations. - NotifyDispatch: Exchanges topk_idx information among different ranks to calculate the device memory required for the dispatch stage. - DispatchLayout: Used to calculate information related to the device memory layout for the dispatch stage. 2.Provide PyTorch interfaces for normal operators—get_dispatch_layout, dispatch_prefill, and combine_prefill—to be used for MoE communication during the prefill stage in vLLM. - get_dispatch_layout: Calculates information related to the device memory layout for the dispatch operator, and is called before dispatch_prefill. - dispatch_prefill: Initiates the dispatch operation. - combine_prefill: Initiates the combine operation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The functionality has already been validated using the local Qwen model. Test cases will be added after support for multi-NPU use cases in the CI pipeline is finalized. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: shiro-zzzz <zhangdianhao@huawei.com>	2025-12-10 17:15:28 +08:00
Mengqing Cao	7e70da9fb7	Revert "[Kernel] add custom moe ops for prefill" (#4806 ) Reverts vllm-project/vllm-ascend#4194 as it broke CI in https://github.com/vllm-project/vllm-ascend/actions/runs/20030369087/job/57437687382?pr=4791 Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-12-08 23:20:32 +08:00
shiro-zzzz	0617d7d394	[Kernel] add custom moe ops for prefill (#4194 ) ### What this PR does / why we need it? 1.Add the implementation of normal Aclnn operators: MoeCombineNormal, MoeDispatchNormal, NotifyDispatch，and DispatchLayout. - MoeCombineNormal: Implements the combine logic within MoE operations. - MoeDispatchNormal: Implements the dispatch logic within MoE operations. - NotifyDispatch: Exchanges topk_idx information among different ranks to calculate the device memory required for the dispatch stage. - DispatchLayout: Used to calculate information related to the device memory layout for the dispatch stage. 2.Provide PyTorch interfaces for normal operators—get_dispatch_layout, dispatch_prefill, and combine_prefill—to be used for MoE communication during the prefill stage in vLLM. - get_dispatch_layout: Calculates information related to the device memory layout for the dispatch operator, and is called before dispatch_prefill. - dispatch_prefill: Initiates the dispatch operation. - combine_prefill: Initiates the combine operation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? The functionality has already been validated using the local Qwen model. Test cases will be added after support for multi-NPU use cases in the CI pipeline is finalized. - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` Signed-off-by: shiro-zzzz <zhangdianhao@huawei.com>	2025-12-08 19:11:58 +08:00
Pleaplusone	c0f0b70813	[core] Support capture custom ops into aclgraph (#2113 ) ### What this PR does / why we need it? Thanks to the PR https://github.com/vllm-project/vllm-ascend/pull/426 make vllm-ascend support the aclgraph inference to reduce the host overhead. However, the capability of aclgraph strongly relies on the functionality provided by `torch.compile`, which is the key feature supported in torch 2.x . Therefore, capture custom op into aclgraph is only possible when it can be recognize and captured by `torch.compile`. In this PR, we register the meta implementation of current custom ops to enable the fx graph capture. And by doing that, insert those custom ops into aclgraph become a natural thing to the ascend runtime. ### Does this PR introduce _any_ user-facing change? No user face change. ### How was this patch tested? Tested in unittest, we will integrate the `rotary_embedding` op into a small custom model and use `torch.compile` and aclgraph to capture and replay it to verify its functionality. - vLLM version: v0.10.0 - vLLM main: `1b99028069` --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>	2025-08-11 15:59:42 +08:00
Pleaplusone	ce8259975e	[core] Support custom ascendc kernels in vllm-ascend (#233 ) This PR add custom ascendc kernel rotary_embedding support in vllm-ascend, related CMakeLists and setuptools is also added in this PR. Related: https://github.com/vllm-project/vllm-ascend/issues/156 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>	2025-04-03 14:52:34 +08:00

5 Commits