xc-llm-ascend

Files

AlvisGong ef8157a5f2 fixed fused alltoall execute all reduce (#5109 )

### What this PR does / why we need it?
fixed fused alltoall execute all reduce, when moe_comm_type is
MoECommType.FUSED_ALLTOALL

if moe_comm_type in {MoECommType.ALLTOALL, MoECommType.MC2,
MoECommType.FUSED_ALLTOALL} \
                    and not shared_expert_dp_enabled():
shared_out = tensor_model_parallel_all_reduce(shared_out)


- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: AlvisGong <gwly0401@163.com>
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>

2025-12-18 15:07:40 +08:00

fused_moe

fixed fused alltoall execute all reduce (#5109 )

2025-12-18 15:07:40 +08:00

triton

qwen3_next add triton ops : fused_qkvzba_split_reshape (#4788 )

2025-12-18 11:31:04 +08:00

__init__.py

[Fusion] [Graph] Add qknorm rope fusion operator (#4711 )

2025-12-17 08:53:44 +08:00

activation.py

[refact] unified soc_version code (#4359 )