[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill (#4770)
### What this PR does / why we need it?
The pad `-1` modification is from
https://github.com/vllm-project/vllm/pull/25743.
It still has bugs for batched chunked prefill.
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
Signed-off-by: drslark <slarksblood@qq.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
This commit is contained in:
@@ -32,3 +32,4 @@ import vllm_ascend.patch.worker.patch_qwen2_5_vl # noqa
|
||||
import vllm_ascend.patch.worker.patch_qwen2_5_omni # noqa
|
||||
import vllm_ascend.patch.worker.patch_qwen3_vl # noqa
|
||||
import vllm_ascend.patch.worker.patch_rope # noqa
|
||||
import vllm_ascend.patch.worker.patch_qwen3_next_mtp # noqa
|
||||
|
||||
Reference in New Issue
Block a user