[bugfix] Fixing KV Pool Memory Retention and Performance Degradation Issues (#5751)
### What this PR does / why we need it?
1.Fixed memory retention on certain GPUs caused by missing PUT
operations.
2.Fixed performance degradation resulting from architectural
incompatibilities in the underlying refactor.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
---------
Signed-off-by: fems14 <1804143737@qq.com>
This commit is contained in:
@@ -6,6 +6,9 @@ from unittest.mock import MagicMock
|
||||
fake_engine = types.ModuleType("mooncake.engine")
|
||||
fake_engine.TransferEngine = MagicMock() # type: ignore[attr-defined]
|
||||
sys.modules["mooncake.engine"] = fake_engine
|
||||
fake_store = types.ModuleType("mooncake.store")
|
||||
fake_store.ReplicateConfig = MagicMock() # type: ignore[attr-defined]
|
||||
sys.modules["mooncake.store"] = fake_store
|
||||
|
||||
from vllm_ascend.distributed.kvpool.backend.mooncake_backend import ( # noqa: E402
|
||||
_convert_to_bytes, _parse_global_segment_size)
|
||||
|
||||
Reference in New Issue
Block a user