look up multi_tp key (#3699)

### What this PR does / why we need it? In multi-Tensor Parallel (TP) scenarios, the KV pool only queries the first GPU card. When keys on other cards are released, the query result still returns as successful, introducing accuracy issues. This PR modifies the KV pool's query logic to check all cards, resolving this problem. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: fems14 <1804143737@qq.com>
2025-10-24 17:23:36 +08:00
parent c83efcb9e4
commit 82a4970fe9
3 changed files with 95 additions and 22 deletions
--- a/vllm_ascend/distributed/mooncake/mooncake_store.py
+++ b/vllm_ascend/distributed/mooncake/mooncake_store.py
@@ -68,7 +68,7 @@ class Mooncakestore():
    def exists(self, key: MooncakeEngineKey) -> bool:
        return self.store.is_exist(key.to_string()) == 1

-    def batch_exists(self, keys: list[str]) -> list[bool]:
+    def batch_exists(self, keys: list[str]) -> list[int]:
        return self.store.batch_is_exist(keys)

    def register_buffer(self, ptr, length):