[Refactor] Add expert processed token count output for DispatchFFNCombine/DispatchFFNCombineBF16 (#6402)

### What this PR does / why we need it?
Add New Output for Expert Token Count
An additional output tensor expert_token_nums is added to both operators
to meet the requirement of tracking token distribution among experts:

Tensor Name: expert_token_nums
Dimension: 1D tensor
Shape: (local_expert_num,)
Data Type: int32
Semantics: Represents the number of tokens actually received by each
expert on the current card.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: guanguan0308 <1546542263@qq.com>
Signed-off-by: guanguan0308 <162653673+guanguan0308@users.noreply.github.com>
This commit is contained in:
guanguan0308
2026-02-03 10:41:06 +08:00
committed by GitHub
parent 26b83f8bde
commit dffac6db73
18 changed files with 97 additions and 84 deletions

View File

@@ -194,7 +194,7 @@ void batch_matmul_transpose(const at::Tensor &tensor_a, const at::Tensor &tensor
return;
}
at::Tensor& dispatch_ffn_combine_meta(
std::tuple<at::Tensor&, at::Tensor&> dispatch_ffn_combine_meta(
const at::Tensor& x,
const at::TensorList& weight1,
const at::TensorList& weight2,
@@ -204,9 +204,10 @@ at::Tensor& dispatch_ffn_combine_meta(
const at::Tensor& probs,
c10::string_view group,
int64_t max_output_size,
at::Tensor& out
at::Tensor& out,
at::Tensor& expert_token_nums
) {
return out;
return {out, expert_token_nums};
}
at::Tensor npu_lightning_indexer_meta(