Files
xc-llm-ascend/tests
Ting FU 9af34755ff [Bugfix] Fix model run _npu_flash_attention hang issue (#4410)
Fix model run _npu_flash_attention in _forward_prefill_no_cache hang
issue, it was caused by wrong attention mask dtype.
### How was this patch tested?
Yes, tesed on Qwen2.5-VL and Qwen2.5-Omni

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

Signed-off-by: Ting FU <futing10@huawei.com>
2025-11-29 09:20:22 +08:00
..