### What this PR does / why we need it?
This PR:
1. Enhances the logic of `_skip_all_reduce_across_dp_group` to skip all
cpu dp allreduce for dense models. This is also for purpose 2.
2. Adds `_skip_all_reduce_across_dp_group` into eagle_proposer. Now
models like Qwen3-235b supports eagle3 spec decode. A typical setting
for these moe models on pd disaggregation often introduce `dp_size > 1`.
This requires `set_forward_context` to call a cpu dp allreduce to
retrieve `num_tokens_across_dp` on all cases. Skipping this allreduce
greatly improves performance.
- vLLM version: v0.14.0
- vLLM main:
d68209402d
---------
Signed-off-by: Angazenn <supperccell@163.com>