Logo
Explore Help
Register Sign In
EngineX/xc-llm-ascend
3
0
Fork 0
You've already forked xc-llm-ascend
Code Issues Pull Requests Projects Releases Wiki Activity
Files
d1f0df7b4b9ee4b750b63a1d3cade51b1697d932
xc-llm-ascend/vllm_ascend/attention
History
wangxiyuan d1f0df7b4b Revert "MLA prefill preformance optimization (#5275)" (#5410)
We'll release 0.13.0 soon. The main branch is freeze. Let's revert the
newest change and redo it once 0.13.0 is released
- vLLM version: release/v0.13.0
- vLLM main:
81786c8774
2025-12-27 09:48:56 +08:00
..
__init__.py
[Core] Make V1 work and enable V1 engine test (#389)
2025-03-28 19:34:23 +08:00
attention_cp.py
[Bugfix] Fix Qwen P/D Disaggregation accuracy issue (#5340)
2025-12-25 22:46:08 +08:00
attention_mask.py
[Model] Support pooling models (#3122)
2025-12-10 11:37:57 +08:00
attention_v1.py
[bugfix] Fix MHA model runtime error in aclgraph mode (#5397)
2025-12-26 21:37:28 +08:00
common_cp.py
[Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097)
2025-12-24 10:25:19 +08:00
mla_cp.py
Revert "MLA prefill preformance optimization (#5275)" (#5410)
2025-12-27 09:48:56 +08:00
mla_v1.py
[Feature] Remove the transpose step after attention and switch to transpose_batchmatmul (#5390)
2025-12-26 22:03:46 +08:00
sfa_v1.py
[main][Refactor] Remove with_prefill parameter from set_ascend_forward_context (#5094)
2025-12-23 14:30:50 +08:00
utils.py
[Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097)
2025-12-24 10:25:19 +08:00
Powered by Gitea Version: 1.24.3 Page: 248ms Template: 9ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API