xc-llm-ascend

Files

Angazenn 019a2fe6e6 [Eagle3]enhance skipping dp allreduce and add it into eagle proposer (#6192 )

### What this PR does / why we need it?
This PR：
1. Enhances the logic of `_skip_all_reduce_across_dp_group` to skip all
cpu dp allreduce for dense models. This is also for purpose 2.
2. Adds `_skip_all_reduce_across_dp_group` into eagle_proposer. Now
models like Qwen3-235b supports eagle3 spec decode. A typical setting
for these moe models on pd disaggregation often introduce `dp_size > 1`.
This requires `set_forward_context` to call a cpu dp allreduce to
retrieve `num_tokens_across_dp` on all cases. Skipping this allreduce
greatly improves performance.

- vLLM version: v0.14.0
- vLLM main:
d68209402d

---------

Signed-off-by: Angazenn <supperccell@163.com>

2026-01-24 11:29:42 +08:00

__init__.py

Add Medusa speculative decoding support for vllm_ascend (#5668 )

2026-01-23 14:14:23 +08:00

eagle_proposer.py

[Eagle3]enhance skipping dp allreduce and add it into eagle proposer (#6192 )

2026-01-24 11:29:42 +08:00

interface.py

Add Medusa speculative decoding support for vllm_ascend (#5668 )

2026-01-23 14:14:23 +08:00

medusa_proposer.py

Add Medusa speculative decoding support for vllm_ascend (#5668 )