### What this PR does / why we need it?
Running multimodal model with ascend scheduler may cause assert error
【assert (request.num_tokens - request.num_computed_tokens) == 1】
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.10.2
- vLLM main:
17b4c6685c
---------
Signed-off-by: fan2956 <zhoufan53@huawei.com>