xc-llm-ascend

Files

Ting FU 9af34755ff [Bugfix] Fix model run _npu_flash_attention hang issue (#4410 )

Fix model run _npu_flash_attention in _forward_prefill_no_cache hang
issue, it was caused by wrong attention mask dtype.
### How was this patch tested?
Yes, tesed on Qwen2.5-VL and Qwen2.5-Omni

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

Signed-off-by: Ting FU <futing10@huawei.com>

2025-11-29 09:20:22 +08:00

__init__.py

[Core] Make V1 work and enable V1 engine test (#389 )

2025-03-28 19:34:23 +08:00

attention_mask.py

[Bugfix] Fix model run _npu_flash_attention hang issue (#4410 )

2025-11-29 09:20:22 +08:00

attention_v1.py

[refact] unified soc_version code (#4359 )

2025-11-26 14:28:55 +08:00

mla_v1.py

Drop 0.11.0 support (#4377 )