Logo
Explore Help
Register Sign In
EngineX/xc-llm-ascend
3
0
Fork 0
You've already forked xc-llm-ascend
Code Issues Pull Requests Actions Projects Releases Wiki Activity
Files
2f1aed98ccdb0fcbe1ff4fd0abab225bfd8d0367
xc-llm-ascend/vllm_ascend/torchair/ops
History
Levi 9862a23985 【0.11.0-dev】optimization of kimi-k2 in cann8.3 (#4555)
### What this PR does / why we need it?
In cann8.3, npu_moe_gating_top_k operator can support expert nums with
384, so kimi can use the operator to get better preformance.
---------

Signed-off-by: Levi-JQ <yujinqi2@huawei.com>
Co-authored-by: Levi-JQ <yujinqi2@huawei.com>
2025-12-09 08:49:15 +08:00
..
__init__.py
[1/N][refactor] torchair fused_moe refactor (#2438)
2025-08-25 15:46:10 +08:00
sequence_parallel.py
[Refactor] [SP]The sequence parallelism characteristics in the MoE and Dense models are integrated into a single solution. (#3085)
2025-09-24 11:29:59 +08:00
shared_weight_layer.py
[1/N][Feat] Cut down memory usage for o_proj in DeepSeek (#2931)
2025-09-24 17:16:41 +08:00
torchair_activation.py
[main] mlp weight prefetch in Qwen Dense Models (#2816)
2025-09-11 21:20:09 +08:00
torchair_fused_moe.py
【0.11.0-dev】optimization of kimi-k2 in cann8.3 (#4555)
2025-12-09 08:49:15 +08:00
torchair_layernorm.py
fix deepseek torchair recompile (#3679)
2025-10-23 22:53:13 +08:00
torchair_rotary_embedding.py
Fix the bugs about operator registration by PyTorch Dispatcher (#2786)
2025-09-13 11:58:52 +08:00
torchair_vocab_parallel_embedding.py
[Feature] optimize sp & qwen3 next support sp. (#3225)
2025-10-13 23:02:12 +08:00
Powered by Gitea Version: 1.24.3 Page: 122ms Template: 6ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API