Logo
Explore Help
Register Sign In
EngineX/xc-llm-kunlun
3
0
Fork 0
You've already forked xc-llm-kunlun
Code Issues Pull Requests Projects Releases Wiki Activity
Files
1e1e870a71e4b1eb51b684cb1b4f8a1df88cdae5
xc-llm-kunlun/vllm_kunlun/v1/attention/backends/mla
History
fromck 74d4f804e8 add 2 kernels and optimize the calculation of topk_indices (#134)
Co-authored-by: chengxiaokang <chengxiaokang@baidu.com>
2026-01-22 10:29:28 +08:00
..
__init__.py
[Feature] support deepseek v3/r1/v3.2 (#78)
2026-01-05 22:55:35 +08:00
common.py
longcontext chunk make attention crash, fix it (#117)
2026-01-17 18:38:23 +08:00
flashmla_sparse.py
[Misc]Specify that DS32 only supports --kv-cache-dtype bfloat16 (#119)
2026-01-17 16:52:02 +08:00
flashmla.py
enable full cudagraph for deepseek
2026-01-12 15:18:12 +08:00
indexer.py
add 2 kernels and optimize the calculation of topk_indices (#134)
2026-01-22 10:29:28 +08:00
Powered by Gitea Version: 1.24.3 Page: 311ms Template: 8ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API