xc-llm-ascend

Files

weichen 320edde2df [main] [refactor] refactor fused_moe.py to enable token_dispatchers (#2570 )

### What this PR does / why we need it?
Enable token_dispatcher to replace fused_experts_with_xxx in eager mode
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
e2e & ut


- vLLM version: v0.10.1.1
- vLLM main:
704432af3c

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
Co-authored-by: sherie <963372609@qq.com>
Co-authored-by: weijinqian0 <12153182+weijinqian0@users.noreply.github.com>
Co-authored-by: shiyuan680 <72335504+shiyuan680@users.noreply.github.com>

2025-08-28 10:13:35 +08:00

doctests

Remove transformer pins for v0.9.1-dev (#2234 )

2025-08-07 14:41:10 +08:00

models

Accuracy report formatting (#2279 )

2025-08-25 09:39:30 +08:00

multicard

[2/N][Feat] Add MC2 communication method for MoE layers (#2469 )

2025-08-26 19:05:23 +08:00

pd_disaggreate

Disaggregate prefill for kv cache register style (#950 )