xc-llm-ascend

Files

Levi 9862a23985 【0.11.0-dev】optimization of kimi-k2 in cann8.3 (#4555 )

### What this PR does / why we need it?
In cann8.3， npu_moe_gating_top_k operator can support expert nums with
384, so kimi can use the operator to get better preformance.
---------

Signed-off-by: Levi-JQ <yujinqi2@huawei.com>
Co-authored-by: Levi-JQ <yujinqi2@huawei.com>

2025-12-09 08:49:15 +08:00

__init__.py

[1/N][refactor] torchair fused_moe refactor (#2438 )

2025-08-25 15:46:10 +08:00

sequence_parallel.py

[Refactor] [SP]The sequence parallelism characteristics in the MoE and Dense models are integrated into a single solution. (#3085 )

2025-09-24 11:29:59 +08:00

shared_weight_layer.py

[1/N][Feat] Cut down memory usage for o_proj in DeepSeek (#2931 )

2025-09-24 17:16:41 +08:00

torchair_activation.py

[main] mlp weight prefetch in Qwen Dense Models (#2816 )