### What this PR does / why we need it? cherry-pick from https://github.com/vllm-project/vllm-ascend/pull/4022 The code bug caused an empty bubble. When the npu_paged_cache_load operator was called, it forcibly transferred seq_len2 to the device, which triggered synchronization and interrupted the CPU operator's launch stream. --------- Signed-off-by: underfituu <hzhucong@163.com>