xc-llm-ascend

Files

Song Mingyang 18b90b501d [kernel] add AscendC op: lightning_indexer and sparse_flash_attention (#4625 )

### What this PR does / why we need it?
Provide high-performance AscendC operators lightning_indexer and
sparse_flash_attention to boost the execution performance of the
DeepSeek v3.2 model. Meanwhile, adapt the two AscendC operators to
vllm-ascend framework.

### Does this PR introduce _any_ user-facing change?
No (only underlying operator optimizations, with no user-facing changes)

### How was this patch tested?

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: MingYang119 <songmingyang@huawei.com>

2025-12-03 09:53:10 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_mask.py

[Bugfix] Fix model run _npu_flash_attention hang issue (#4410 )

2025-11-29 09:20:22 +08:00

attention_v1.py

upgrade vLLM to main (#4608 )

2025-12-02 22:10:52 +08:00

mla_v1.py

upgrade vLLM to main (#4608 )