[Perf] fix async copy for async scheduling (#4113)

### What this PR does / why we need it? Only CPU tensors with `pin_memory=True` can be asynchronously copied to the device. Currently, there are two instances where non-pinned CPU tensors are being copied to the device, which will trigger synchronous operations, reducing the expected benefits of asynchronous scheduling. - vLLM version: v0.11.0 - vLLM main: 83f478bb19 Signed-off-by: realliujiaxu <realliujiaxu@163.com>
2025-11-13 09:11:26 +08:00
parent c272747d13
commit 6bc770cd78
2 changed files with 4 additions and 5 deletions
--- a/vllm_ascend/worker/block_table.py
+++ b/vllm_ascend/worker/block_table.py
@@ -100,7 +100,7 @@ class BlockTable:
        self.slot_mapping_cpu = torch.zeros(
            self.max_num_batched_tokens +
            2 * self.pcp_world_size * self.max_num_reqs,
-            dtype=torch.int64,
+            dtype=torch.int32,
            device="cpu",
            pin_memory=self.pin_memory)
        self.slot_mapping_np = self.slot_mapping_cpu.numpy()