xc-llm-ascend

Files

Angazenn 8fa188111d [PERF]support H2P communication optimization for PanguProMoe (#1463 )

### What this PR does / why we need it?
In this PR, we support H2P communication optimization when running
PanguProMoE with dp_size > 1. H2P use `reduce_scatter` and `all_gather`
to replace `all_reduce` to improve performance:

original layer:
input_layernorm --> attn --> tp all_reduce --> post_attention_layernorm
--> dp all_gather --> moe/mlp --> dp reduce_scatter --> tp all_reduce
now:
input_layernorm --> tp all_gather --> attn --> tp reduce_scatter -->
post_attention_layernorm --> all_rank all_gather --> moe/mlp -->
all_rank reduce_scatter

Besides, because `reduce_scatter` requires num_tokens that can be
divided by group size, we need pad the seqs based on
`max_tokens_across_dp`.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This PR has been tested with both offline and online inference using
PanguProMoE-72B.

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>

2025-06-28 16:10:27 +08:00

__init__.py

[Bugfix] fix env variable in dbo (#1284 )

2025-06-23 09:07:57 +08:00

deepseek_dbo.py

[CI] Pin transformers<4.53.0 and fix EPLB load_weights to make CI passed (#1482 )

2025-06-28 00:12:43 +08:00

deepseek_mtp.py

[MTP] follow custom deepseek modeling changes to support graph mode (#636 )

2025-04-28 21:18:53 +08:00

deepseek_v2.py

[CI] Pin transformers<4.53.0 and fix EPLB load_weights to make CI passed (#1482 )