[bugfix] Fixing KV Pool Memory Retention and Performance Degradation Issues (#5751)

### What this PR does / why we need it? 1.Fixed memory retention on certain GPUs caused by missing PUT operations. 2.Fixed performance degradation resulting from architectural incompatibilities in the underlying refactor. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: 2f4e6548ef --------- Signed-off-by: fems14 <1804143737@qq.com>
2026-01-09 17:46:23 +08:00
parent 3ba064f804
commit ff4c1a47b3
6 changed files with 27 additions and 22 deletions
--- a/tests/ut/distributed/mooncake/test_config_data.py
+++ b/tests/ut/distributed/mooncake/test_config_data.py
@@ -6,6 +6,9 @@ from unittest.mock import MagicMock
 fake_engine = types.ModuleType("mooncake.engine")
 fake_engine.TransferEngine = MagicMock()  # type: ignore[attr-defined]
 sys.modules["mooncake.engine"] = fake_engine
+fake_store = types.ModuleType("mooncake.store")
+fake_store.ReplicateConfig = MagicMock()  # type: ignore[attr-defined]
+sys.modules["mooncake.store"] = fake_store

 from vllm_ascend.distributed.kvpool.backend.mooncake_backend import (  # noqa: E402
    _convert_to_bytes, _parse_global_segment_size)