xc-llm-ascend

Author SHA1 Message Date

Author	SHA1	Message	Date
yiz-liu	a6bb502e70	[2/N][Feat] Add MC2 communication method for MoE layers (#2469 ) ### What this PR does / why we need it? This method replaces the previous all-gather approach for small numbers of tokens. The key changes include: - A new `AscendFusedMoE` layer that handles token splitting, local computation, and final aggregation via all-gather. - Logic in the model runner to dynamically select between the new MC2 method and the existing all-gather method based on the number of input tokens. - Sharding the MoE communication mask across tensor-parallel ranks. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? Test case fixed. - vLLM version: v0.10.1.1 - vLLM main: `b00e69f8ca` --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-08-26 19:05:23 +08:00
yangqinghao-cmss	ee6f79c44a	Add ut for test_communicator.py (#2293 ) ### What this PR does / why we need it? Add ut for test_communicator.py - vLLM version: v0.10.0 - vLLM main: `e5ebeeba53` Signed-off-by: yangqinghao-cmss <yangqinghao_yewu@cmss.chinamobile.com>	2025-08-09 08:26:04 +08:00

yiz-liu

a6bb502e70

[2/N][Feat] Add MC2 communication method for MoE layers (#2469 )

### What this PR does / why we need it?
This method replaces the previous all-gather approach for small numbers
of tokens.

The key changes include:
- A new `AscendFusedMoE` layer that handles token splitting, local
computation, and final aggregation via all-gather.
- Logic in the model runner to dynamically select between the new MC2
method and the existing all-gather method based on the number of input
tokens.
- Sharding the MoE communication mask across tensor-parallel ranks.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
Test case fixed.


- vLLM version: v0.10.1.1
- vLLM main:
b00e69f8ca

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

2025-08-26 19:05:23 +08:00

yangqinghao-cmss

ee6f79c44a

Add ut for test_communicator.py (#2293 )

### What this PR does / why we need it?

Add ut for test_communicator.py 

- vLLM version: v0.10.0
- vLLM main:
e5ebeeba53

Signed-off-by: yangqinghao-cmss <yangqinghao_yewu@cmss.chinamobile.com>

2025-08-09 08:26:04 +08:00

2 Commits