xc-llm-ascend

Files

Mengqing Cao af04ee9e7a [MoE][Dist] Fix Qwen MoE accuracy bug in DP scenario (#1856 )

### What this PR does / why we need it?
Fix Qwen MoE accuracy bug in DP scenario.

Now the implentment of `FusedMoE` in vLLM use `All2AllManager` to
manager different all2all algorithm branch. And the default branch use
`Multicast` in `dispatch` phase and `all_reduce` in `combine` phase,
which are not implented in vLLM-Ascend. This leading to invoking into a
default implentment in `base_communicator`, with empty `dispatch` and
`combine` operations, thus causing the accuracy issue on it.

This pr is a temporary workaround, refacting all2all in vLLM-Ascend
could be a better way.


- vLLM version: v0.10.0
- vLLM main:
ad57f23f6a

---------

Signed-off-by: MengqingCao <cmq0113@163.com>

2025-08-04 10:24:18 +08:00

doctests

Pin transformers to fix v0.9.1 doctest (#2048 )

2025-07-28 17:51:56 +08:00

long_term/accuracy

[MoE][Dist] Fix Qwen MoE accuracy bug in DP scenario (#1856 )

2025-08-04 10:24:18 +08:00

multicard

[MoE][Dist] Fix Qwen MoE accuracy bug in DP scenario (#1856 )

2025-08-04 10:24:18 +08:00

pd_disaggreate

Disaggregate prefill for kv cache register style (#950 )