[Bugfix][PD] Auto-clear producer KV cache if no pull notification (#2174)
### What this PR does / why we need it?
This PR addresses a critical issue where Node D (Device) failures cause
Node P (Processor) to hang due to inability to release KV cache.
**Trigger Scenarios:**
1. Node D fails mid-inference (e.g., network disconnection)
2. Node D rejects requests at a certain stage (e.g., via API server)
3. Load-test script termination causes Node P or D to abort queued
requests
**Root Cause Analysis:**
1. Currently, Node D sends a "KV cache pull complete, release approved"
message to Node P
2. This message is transmitted via the worker connector. If PD
connection breaks or requests are rejected upstream, Node D cannot send
the message
3. Node P will never release KV cache without receiving this message
**Solution:**
Following VLLM community's approach (NIXL connector timeout mechanism),
we're implementing:
- A timeout mechanism with comprehensive warnings
- Updated README documentation
- Reference: VLLM's optimization PR
[#20139](https://github.com/vllm-project/vllm/pull/20139)
### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?
None
- vLLM version: v0.10.2
- vLLM main:
9607d5eb44
---------
Signed-off-by: underfituu <hzhucong@163.com>
This commit is contained in:
@@ -19,6 +19,7 @@ import numpy.typing as npt
|
||||
import torch
|
||||
import zmq
|
||||
from mooncake.engine import TransferEngine # type: ignore
|
||||
from vllm import envs
|
||||
from vllm.config import VllmConfig
|
||||
from vllm.distributed.kv_transfer.kv_connector.v1.base import (
|
||||
KVConnectorBase_V1, KVConnectorMetadata, KVConnectorRole)
|
||||
@@ -100,7 +101,7 @@ class KVCacheTaskTracker:
|
||||
while self.delayed_free_requests:
|
||||
request_id, delay_start_time = self.delayed_free_requests[0]
|
||||
if (current_time - delay_start_time
|
||||
> envs_ascend.VLLM_ASCEND_KVCACHE_DELAY_FREE_TIMEOUT):
|
||||
> envs.VLLM_NIXL_ABORT_REQUEST_TIMEOUT):
|
||||
self.delayed_free_requests.popleft()
|
||||
expired_requests.add(request_id)
|
||||
logger.info("Force freed request: %s", request_id)
|
||||
|
||||
Reference in New Issue
Block a user