[Bugfix][PD] Auto-clear producer KV cache if no pull notification (#2174)
### What this PR does / why we need it?
This PR addresses a critical issue where Node D (Device) failures cause
Node P (Processor) to hang due to inability to release KV cache.
**Trigger Scenarios:**
1. Node D fails mid-inference (e.g., network disconnection)
2. Node D rejects requests at a certain stage (e.g., via API server)
3. Load-test script termination causes Node P or D to abort queued
requests
**Root Cause Analysis:**
1. Currently, Node D sends a "KV cache pull complete, release approved"
message to Node P
2. This message is transmitted via the worker connector. If PD
connection breaks or requests are rejected upstream, Node D cannot send
the message
3. Node P will never release KV cache without receiving this message
**Solution:**
Following VLLM community's approach (NIXL connector timeout mechanism),
we're implementing:
- A timeout mechanism with comprehensive warnings
- Updated README documentation
- Reference: VLLM's optimization PR
[#20139](https://github.com/vllm-project/vllm/pull/20139)
### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?
None
- vLLM version: v0.10.2
- vLLM main:
9607d5eb44
---------
Signed-off-by: underfituu <hzhucong@163.com>
This commit is contained in:
@@ -159,11 +159,6 @@ env_variables: Dict[str, Callable[[], Any]] = {
|
||||
# caused by the initialization of the Mooncake connector.
|
||||
"PHYSICAL_DEVICES":
|
||||
lambda: os.getenv("PHYSICAL_DEVICES", None),
|
||||
# Timeout (in seconds) for delayed KVCache block release. In the prefill
|
||||
# node, if a request is marked for delayed KV block release and the blocks
|
||||
# are not freed within this timeout, they will be forcibly released.
|
||||
"VLLM_ASCEND_KVCACHE_DELAY_FREE_TIMEOUT":
|
||||
lambda: int(os.getenv("VLLM_ASCEND_KVCACHE_DELAY_FREE_TIMEOUT", 250)),
|
||||
}
|
||||
|
||||
# end-env-vars-definition
|
||||
@@ -177,4 +172,4 @@ def __getattr__(name: str):
|
||||
|
||||
|
||||
def __dir__():
|
||||
return list(env_variables.keys())
|
||||
return list(env_variables.keys())
|
||||
Reference in New Issue
Block a user