[fix]: fix precision issue in dispatch_ffn_combine_bf16 and remove redundant sync (#7198)
### What this PR does / why we need it?
Fix the precision issue in dispatch_ffn_combine_bf16 operator.
Remove redundant synchronization operations in dispatch_ffn_combine
operator.
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: guanguan0308 <1546542263@qq.com>
This commit is contained in:
@@ -85,8 +85,8 @@ KernelMoeTokenUnpermute<T1, T2, T3, PROBS>::Init(GM_ADDR permuted_tokens, GM_ADD
|
||||
GM_ADDR unpermuted_tokens,
|
||||
const MoeTokenUnpermuteTilingData *__restrict tiling_data)
|
||||
{
|
||||
this->blockIdx = get_block_idx() + get_subblockid() * get_block_num();
|
||||
this->blockNum = get_block_num() * get_subblockdim();
|
||||
this->blockIdx = get_block_idx();
|
||||
this->blockNum = get_block_num();
|
||||
|
||||
if (blockIdx >= blockNum) {
|
||||
return;
|
||||
|
||||
Reference in New Issue
Block a user