[bugfix] Fixing KV Pool Memory Retention and Performance Degradation Issues (#5751)

### What this PR does / why we need it?
1.Fixed memory retention on certain GPUs caused by missing PUT
operations.

2.Fixed performance degradation resulting from architectural
incompatibilities in the underlying refactor.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: fems14 <1804143737@qq.com>
This commit is contained in:
fems14
2026-01-09 17:46:23 +08:00
committed by GitHub
parent 3ba064f804
commit ff4c1a47b3
6 changed files with 27 additions and 22 deletions

View File

@@ -6,6 +6,9 @@ from unittest.mock import MagicMock
fake_engine = types.ModuleType("mooncake.engine")
fake_engine.TransferEngine = MagicMock() # type: ignore[attr-defined]
sys.modules["mooncake.engine"] = fake_engine
fake_store = types.ModuleType("mooncake.store")
fake_store.ReplicateConfig = MagicMock() # type: ignore[attr-defined]
sys.modules["mooncake.store"] = fake_store
from vllm_ascend.distributed.kvpool.backend.mooncake_backend import ( # noqa: E402
_convert_to_bytes, _parse_global_segment_size)