xc-llm-ascend

Files

ZYang6263 b91a5f0968 Support DeepSeekV3.2 with MLAPO operator (#4753 )

### What this PR does / why we need it?
This PR adds support for the optimized MLAPO operator in DSV3.2 and this
operator provides an optimized implementation that avoids redundant
q_down recomputation.
The operator implementation and optimizations were introduced in PR
[#4707](https://github.com/vllm-project/vllm-ascend/pull/4707).

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: ZYang6263 <zy626375@gmail.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>

2025-12-07 12:40:24 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_cp.py

[Refactor] 1/N Refactor attention_v1 & extract attention_cp (#4628 )

2025-12-06 09:33:28 +08:00

attention_mask.py

[Bugfix] Fix model run _npu_flash_attention hang issue (#4410 )

2025-11-29 09:20:22 +08:00

attention_v1.py

support async mtp (#4511 )