xc-llm-ascend

Files

Angazenn 18495f44b2 [BugFix] Fix max_num_tokens_across_dp calculation bugs in attention_v1_torchair (#1636 )

### What this PR does / why we need it?
This PR fixes a bug that is caused by max_num_tokens_across_dp
calculation. In earlier version, we compute this by graph_pad_size plus
max_num_tokens(actual). This will result in different
max_num_tokens_across_dp across dp ranks. If padding related is
required, this might cause a wrong padding.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed normally.

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>

2025-07-07 20:03:02 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_v1_torchair.py

[BugFix] Fix max_num_tokens_across_dp calculation bugs in attention_v1_torchair (#1636 )

2025-07-07 20:03:02 +08:00

attention_v1.py

Fix W8A8 fused moe bug (#1529 )

2025-07-02 16:40:51 +08:00

attention.py

[Platform] Add initial experimental support for Altlas 300I series (#1333 )

2025-06-21 09:00:16 +08:00

mla_v1.py

[Bugfix] Add func swap_states to fix MLA attention (#1580 )

2025-07-02 17:42:53 +08:00