[BugFix] Improve the performance of prefixcache features (#4022)

### What this PR does / why we need it?
The code bug caused an empty bubble. When the npu_paged_cache_load
operator was called, it forcibly transferred seq_len2 to the device,
which triggered synchronization and interrupted the CPU operator's
launch stream.

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: underfituu <hzhucong@163.com>
This commit is contained in:
hucong
2025-11-08 18:45:31 +08:00
committed by GitHub
parent 1d81a289d0
commit 48094148f8
5 changed files with 31 additions and 10 deletions

View File

@@ -119,3 +119,4 @@ jobs:
config_file_path: ${{ matrix.test_config.config_file_path }}
secrets:
KUBECONFIG_B64: ${{ secrets.KUBECONFIG_B64 }}