Logo
Explore Help
Register Sign In
EngineX/xc-llm-ascend
3
0
Fork 0
You've already forked xc-llm-ascend
Code Issues Pull Requests Actions Projects Releases Wiki Activity
Files
main
xc-llm-ascend/vllm_ascend/torchair
History
Levi 9862a23985 【0.11.0-dev】optimization of kimi-k2 in cann8.3 (#4555)
### What this PR does / why we need it?
In cann8.3, npu_moe_gating_top_k operator can support expert nums with
384, so kimi can use the operator to get better preformance.
---------

Signed-off-by: Levi-JQ <yujinqi2@huawei.com>
Co-authored-by: Levi-JQ <yujinqi2@huawei.com>
2025-12-09 08:49:15 +08:00
..
models
[BugFix] Fix torchair+mtp bug after deleting deepseek_mtp. (#3590)
2025-10-21 22:23:52 +08:00
ops
【0.11.0-dev】optimization of kimi-k2 in cann8.3 (#4555)
2025-12-09 08:49:15 +08:00
quantization
【0.11.0-dev】optimization of kimi-k2 in cann8.3 (#4555)
2025-12-09 08:49:15 +08:00
__init__.py
[1/4][Refactor] Refactor torchair worker (#1885)
2025-07-21 11:50:46 +08:00
torchair_attention.py
[v0.11.0][Perf] Eliminating the zerolike operator through patch (#3632)
2025-10-23 14:49:28 +08:00
torchair_mla.py
[0.11.0][BugFix] Improve the performance of prefixcache features (#4021)
2025-11-10 11:51:34 +08:00
torchair_model_runner.py
fix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_len (#3909)
2025-10-31 09:25:06 +08:00
torchair_sfa.py
For nz unset in bf16&fp16 (#4495)
2025-11-28 17:32:25 +08:00
torchair_worker.py
[CI] Upgrade vllm to newest commit (#3182)
2025-09-26 06:18:15 +08:00
utils.py
For nz unset in bf16&fp16 (#4495)
2025-11-28 17:32:25 +08:00
Powered by Gitea Version: 1.24.3 Page: 112ms Template: 11ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API