xc-llm-ascend

Files

wangxiyuan d1f0df7b4b Revert "MLA prefill preformance optimization (#5275 )" (#5410 )

We'll release 0.13.0 soon. The main branch is freeze. Let's revert the
newest change and redo it once 0.13.0 is released
- vLLM version: release/v0.13.0
- vLLM main:
81786c8774

2025-12-27 09:48:56 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_cp.py

[Bugfix] Fix Qwen P/D Disaggregation accuracy issue (#5340 )

2025-12-25 22:46:08 +08:00

attention_mask.py

[Model] Support pooling models (#3122 )

2025-12-10 11:37:57 +08:00

attention_v1.py

[bugfix] Fix MHA model runtime error in aclgraph mode (#5397 )

2025-12-26 21:37:28 +08:00

common_cp.py

[Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097 )

2025-12-24 10:25:19 +08:00

mla_cp.py

Revert "MLA prefill preformance optimization (#5275 )" (#5410 )

2025-12-27 09:48:56 +08:00

mla_v1.py

[Feature] Remove the transpose step after attention and switch to transpose_batchmatmul (#5390 )

2025-12-26 22:03:46 +08:00

sfa_v1.py

[main][Refactor] Remove with_prefill parameter from set_ascend_forward_context (#5094 )

2025-12-23 14:30:50 +08:00

utils.py

[Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097 )

2025-12-24 10:25:19 +08:00