[Eagle3]enhance skipping dp allreduce and add it into eagle proposer (#6192)

### What this PR does / why we need it?
This PR:
1. Enhances the logic of `_skip_all_reduce_across_dp_group` to skip all
cpu dp allreduce for dense models. This is also for purpose 2.
2. Adds `_skip_all_reduce_across_dp_group` into eagle_proposer. Now
models like Qwen3-235b supports eagle3 spec decode. A typical setting
for these moe models on pd disaggregation often introduce `dp_size > 1`.
This requires `set_forward_context` to call a cpu dp allreduce to
retrieve `num_tokens_across_dp` on all cases. Skipping this allreduce
greatly improves performance.

- vLLM version: v0.14.0
- vLLM main:
d68209402d

---------

Signed-off-by: Angazenn <supperccell@163.com>
This commit is contained in:
Angazenn
2026-01-24 11:29:42 +08:00
committed by GitHub
parent 56d8f088dd
commit 019a2fe6e6
3 changed files with 35 additions and 9 deletions

View File

@@ -272,6 +272,7 @@ class TestEagleProposerDummyRun(TestBase):
self.runner.pcp_size = 1
self.runner.dcp_size = 1
self.runner.pin_memory = False
self.runner._sync_metadata_across_dp.return_value = (8, torch.tensor([8]), False)
self.vllm_config.cache_config.block_size = 16
self.vllm_config.scheduler_config.max_num_batched_tokens = 1024