[Bugfix][DispatchFFNCombine] resolve vec error caused by unaligned UB access (#6707)
### What this PR does / why we need it?
1. Fix a vec error caused by unaligned UB accesss in the
DispatchFFNCombine;
2. Fix expert_token_nums tensor defined on host instead of NPU in
moe_comm_method.py
3. Fix multi-core copy issue of expert_token_nums in dispatchffnCombine
op (single aiv copy is sufficient)
### Does this PR introduce _any_ user-facing change?
No, this PR does not introduce any user-facing changes. The fix only
addresses internal memory access logic and does not modify any public
APIs, interfaces, or user-visible behaviors.
### How was this patch tested?
`export VLLM_ASCEND_ENABLE_FUSED_MC2=1`
vLLM version: v0.15.0
- vLLM version: v0.15.0
- vLLM main:
9562912cea
Signed-off-by: xulei_ict <xulei292@huawei.com>
Co-authored-by: xulei_ict <xulei292@huawei.com>
This commit is contained in:
@@ -756,8 +756,9 @@ CATLASS_DEVICE
|
||||
ExpertTokenNums.SetGlobalBuffer(reinterpret_cast<__gm__ int32_t*>(params.ptrExpertTokenNums));
|
||||
AscendC::GlobalTensor<int32_t> LcalCumsumMM;
|
||||
LcalCumsumMM.SetGlobalBuffer(reinterpret_cast<__gm__ int32_t*>(workspaceInfo.ptrcumsumMM + (params.EP - 1) * params.expertPerRank * sizeof(int32_t)));
|
||||
CopyGMToGM(ExpertTokenNums, LcalCumsumMM, params.expertPerRank, params.ubMoveNum);
|
||||
AscendC::SyncAll<true>();
|
||||
if (coreIdx == 0) {
|
||||
CopyGMToGM(ExpertTokenNums, LcalCumsumMM, params.expertPerRank, params.ubMoveNum);
|
||||
}
|
||||
|
||||
uint32_t curGroupOffset = 0;
|
||||
int32_t prevSumBeforeRank = 0;
|
||||
|
||||
Reference in New Issue
Block a user