xc-llm-ascend

Files

LookAround0301 5ec96fd46c [long_seq_Feat] support chunk prefill (#4158 )

### What this PR does / why we need it?
1、qwen GQA attention_v1 optim
2、DeepSeek MLA refactor, all gather q -> all gather kv 
3、modelrunner refactor for chunk prefill, we remove some code not use

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Delphine-Nic <tanwenqin@huawei.com>
Co-authored-by: Delphine-Nic <tanwenqin@huawei.com>

2025-11-14 08:43:37 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_mask.py

Upgrade CANN to 8.3.rc1 (#3945 )

2025-11-03 20:21:07 +08:00

attention_v1.py

[long_seq_Feat] support chunk prefill (#4158 )

2025-11-14 08:43:37 +08:00

mla_v1.py

[long_seq_Feat] support chunk prefill (#4158 )