xc-llm-ascend

Files

Ting FU 9af34755ff [Bugfix] Fix model run _npu_flash_attention hang issue (#4410 )

Fix model run _npu_flash_attention in _forward_prefill_no_cache hang
issue, it was caused by wrong attention mask dtype.
### How was this patch tested?
Yes, tesed on Qwen2.5-VL and Qwen2.5-Omni

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

Signed-off-by: Ting FU <futing10@huawei.com>

2025-11-29 09:20:22 +08:00

test_attention_mask.py

[Bugfix] Fix model run _npu_flash_attention hang issue (#4410 )

2025-11-29 09:20:22 +08:00

test_attention_v1.py

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

test_mla_v1.py

[UT] Fix ut test (#4472 )

2025-11-26 21:37:47 +08:00

test_sfa_v1.py

remove get_metadata_cls (#4087 )

2025-11-19 14:58:17 +08:00