5 Commits

Author SHA1 Message Date
luomin2005
f41eeeb11e Refactor the ops PyTorch adapter,cleanup for csrc/torch_binding.cpp (#6732)
### What this PR does / why we need it?
Refactor the ops PyTorch adapter,cleanup for csrc/torch_binding.cpp,
more details see
https://github.com/vllm-project/vllm-ascend/issues/6486

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
install the new package to test the new modification, here is the
result:


- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: luomin2005 <luomin2005@huawei.com>
Co-authored-by: liziyu <56102866+liziyu179@users.noreply.github.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
2026-02-24 09:12:43 +08:00
Li Wang
c26ad78f86 [CI][lint] Add rule codespell back (#6236)
### What this PR does / why we need it?
After removing codepsell a while, we discovered that typo had a problem
correctly recognizing certain misspelled words, so I suggested adding it
back.

- vLLM version: v0.14.1
- vLLM main:
d68209402d

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-01-26 14:12:33 +08:00
shiro-zzzz
bd8be2e759 [Kernel] Add moe normal ops (#4810)
### What this PR does / why we need it?
1.Add the implementation of normal Aclnn operators: MoeCombineNormal,
MoeDispatchNormal, NotifyDispatch,and DispatchLayout.

- MoeCombineNormal: Implements the combine logic within MoE operations.
- MoeDispatchNormal: Implements the dispatch logic within MoE
operations.
- NotifyDispatch: Exchanges topk_idx information among different ranks
to calculate the device memory required for the dispatch stage.
- DispatchLayout: Used to calculate information related to the device
memory layout for the dispatch stage.

2.Provide PyTorch interfaces for normal operators—get_dispatch_layout,
dispatch_prefill, and combine_prefill—to be used for MoE communication
during the prefill stage in vLLM.

- get_dispatch_layout: Calculates information related to the device
memory layout for the dispatch operator, and is called before
dispatch_prefill.
- dispatch_prefill: Initiates the dispatch operation.
- combine_prefill: Initiates the combine operation.

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
The functionality has already been validated using the local Qwen model.
Test cases will be added after support for multi-NPU use cases in the CI
pipeline is finalized.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: shiro-zzzz <zhangdianhao@huawei.com>
2025-12-10 17:15:28 +08:00
Mengqing Cao
7e70da9fb7 Revert "[Kernel] add custom moe ops for prefill" (#4806)
Reverts vllm-project/vllm-ascend#4194 as it broke CI in
https://github.com/vllm-project/vllm-ascend/actions/runs/20030369087/job/57437687382?pr=4791

Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-08 23:20:32 +08:00
shiro-zzzz
0617d7d394 [Kernel] add custom moe ops for prefill (#4194)
### What this PR does / why we need it?
1.Add the implementation of normal Aclnn operators: MoeCombineNormal,
MoeDispatchNormal, NotifyDispatch,and DispatchLayout.

- MoeCombineNormal: Implements the combine logic within MoE operations.
- MoeDispatchNormal: Implements the dispatch logic within MoE
operations.
- NotifyDispatch: Exchanges topk_idx information among different ranks
to calculate the device memory required for the dispatch stage.
- DispatchLayout: Used to calculate information related to the device
memory layout for the dispatch stage.

2.Provide PyTorch interfaces for normal operators—get_dispatch_layout,
dispatch_prefill, and combine_prefill—to be used for MoE communication
during the prefill stage in vLLM.

- get_dispatch_layout: Calculates information related to the device
memory layout for the dispatch operator, and is called before
dispatch_prefill.
- dispatch_prefill: Initiates the dispatch operation.
- combine_prefill: Initiates the combine operation.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
The functionality has already been validated using the local Qwen model.
Test cases will be added after support for multi-NPU use cases in the CI
pipeline is finalized.


- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: shiro-zzzz <zhangdianhao@huawei.com>
2025-12-08 19:11:58 +08:00