[bugfix] Fixing KV Pool Memory Retention and Performance Degradation Issues (#5751)

### What this PR does / why we need it?
1.Fixed memory retention on certain GPUs caused by missing PUT
operations.

2.Fixed performance degradation resulting from architectural
incompatibilities in the underlying refactor.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: fems14 <1804143737@qq.com>
This commit is contained in:
fems14
2026-01-09 17:46:23 +08:00
committed by GitHub
parent 3ba064f804
commit ff4c1a47b3
6 changed files with 27 additions and 22 deletions

View File

@@ -223,6 +223,8 @@ class LoadSpec:
# Whether the scheduler allow us to load the tokens
can_load: bool
token_len: int = 0
@dataclass
class RequestTracker: