xc-llm-ascend

Files

Icey 378e92a2a2 [Cherry-pick][0.11.0] Adapted to torch_npu.npu_fused_infer_attention_score (#4202 )

### What this PR does / why we need it?
Fixes a compatible bug with torch_npu.npu_fused_infer_attention_score
which is discribed in
https://github.com/vllm-project/vllm-ascend/issues/4020.
@momo609 tells us this solution.
cherry-pick: https://github.com/vllm-project/vllm-ascend/pull/4025

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with new added/existing test.

Signed-off-by: Icey <1790571317@qq.com>

2025-11-17 10:56:23 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_mask.py

[cherry-pick]Upgrade CANN to 8.3.rc1 (#3945 ) (#3962 )

2025-11-06 09:05:08 +08:00

attention_v1.py

[Cherry-pick][0.11.0] Adapted to torch_npu.npu_fused_infer_attention_score (#4202 )

2025-11-17 10:56:23 +08:00

mla_v1.py

[BugFix] Fix kv_no_split not contiguous (#3711 )