xc-llm-ascend

Files

elilzhu f9535cc9e2 [BugFix] fix qwenVL quant assertion error (#3466 )

### What this PR does / why we need it?
This PR fixes issues:
1. Solve the problem that multimodal scene cannot do weight prefetching
and throw an assertion error exception.
2. Standardize the grid_thw data type of qwen2VL to torch.int32.

### Does this PR introduce _any_ user-facing change?
None.

### How was this patch tested?
- ci & e2e

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: elilzhu <2435754260@qq.com>
Co-authored-by: zhulei (AK) <z00692222@china.huawei.com>

2025-10-16 17:08:00 +08:00

layers

[Feat] Flash comm allgher ep (#3334 )

2025-10-15 19:36:32 +08:00

__init__.py

[MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176 )

2025-10-09 14:12:46 +08:00

deepseek_mtp.py

[KVCache][Bugfix] Fix kv cache initialization error of attention layer (#3113 )

2025-09-24 11:32:34 +08:00

deepseek_v2.py

[Feat] Flash comm allgher ep (#3334 )