xc-llm-ascend

Files

Ting FU 9af34755ff [Bugfix] Fix model run _npu_flash_attention hang issue (#4410 )

Fix model run _npu_flash_attention in _forward_prefill_no_cache hang
issue, it was caused by wrong attention mask dtype.
### How was this patch tested?
Yes, tesed on Qwen2.5-VL and Qwen2.5-Omni

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

Signed-off-by: Ting FU <futing10@huawei.com>

2025-11-29 09:20:22 +08:00

e2e

【OPS】qwen3-next support triton chunk_gated_delta_rule ops (#4070 )

2025-11-28 20:55:43 +08:00

[Bugfix] Fix model run _npu_flash_attention hang issue (#4410 )

2025-11-29 09:20:22 +08:00

__init__.py

[SpecDecode] Add spec decode support (#500 )

2025-04-17 20:16:32 +08:00