[P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (#5968)
### What this PR does / why we need it?
The force-free secondary release request causes the node to crash. When
requests are pulled too quickly, they should not be added to the
delay-free queue.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.13.0
- vLLM main:
2c24bc6996
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
This commit is contained in:
@@ -152,7 +152,8 @@ class KVCacheTaskTracker:
|
||||
def add_delayed_request(self, request_id: str, delay_start_time: float):
|
||||
"""Add a delayed free request."""
|
||||
with self.done_task_lock:
|
||||
self.delayed_free_requests[request_id] = delay_start_time
|
||||
if request_id in self.reqs_to_process:
|
||||
self.delayed_free_requests[request_id] = delay_start_time
|
||||
|
||||
def _retrieve_expired_requests(self):
|
||||
"""Retrieve all expired delayed requests."""
|
||||
|
||||
Reference in New Issue
Block a user