xc-llm-ascend

Files

weichen 950c4b219a [main] refactor alltoallv in fused_moe (#2487 )

### What this PR does / why we need it?
Refactor all2all-related fused_experts (both quantized/unquantized) into
TokenDispatcherWithAll2AllV, including dispatch & combine calculation.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
E2E & UT
- vLLM version: v0.10.0
- vLLM main:
65197a5fb3

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>

2025-08-23 20:38:17 +08:00

e2e

[Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig (#2434 )

2025-08-23 19:39:44 +08:00

[main] refactor alltoallv in fused_moe (#2487 )

2025-08-23 20:38:17 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00