Files
xc-llm-ascend/vllm_ascend
Jade Zheng 0dfdfa9526 [Feature] Enhance all-reduce skipping logic for MoE models in NPUModelRunner (#5329)
Besides enabling `recompute_scheduler_enable`, we can skip all_reduce
when max_num_batched_tokens is below mc2's requirement.

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

---------

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
2025-12-26 17:39:44 +08:00
..
2025-12-20 17:03:25 +08:00
2025-12-02 22:10:52 +08:00
2025-12-11 18:45:43 +08:00
2025-12-25 09:17:06 +08:00
2025-12-02 17:35:47 +08:00
2025-12-26 14:07:37 +08:00
2025-12-26 14:07:37 +08:00
2025-12-26 14:07:37 +08:00