[ModelRunner] Use shared CachedRequestData cross request to fix ci (#1546)
### What this PR does / why we need it?
This PR (adapted from
2863befce3)
updates the CachedRequestData definition to use a single instance shared
across all requests in a batch, instead of creating a new instance per
request.
Found ci boken by the vllm's model_runner change: `ERROR 07-01 09:53:53
[core.py:521] TypeError: 'CachedRequestData' object is not iterable`,
Modify the model_runner to fix it.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
pass ci will verify this.
---------
Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
This commit is contained in:
@@ -192,7 +192,10 @@ def test_schedule(enable_prefix_caching: Optional[bool],
|
||||
# Test initial scheduling
|
||||
output = scheduler.schedule()
|
||||
assert len(output.scheduled_new_reqs) == len(requests)
|
||||
assert len(output.scheduled_cached_reqs) == 0
|
||||
if vllm_version_is("0.9.1"):
|
||||
assert len(output.scheduled_cached_reqs) == 0
|
||||
else:
|
||||
assert output.scheduled_cached_reqs.num_reqs == 0
|
||||
assert len(output.finished_req_ids) == 0
|
||||
# Verify all requests are scheduled.
|
||||
for req_id, num_tokens in output.num_scheduled_tokens.items():
|
||||
|
||||
Reference in New Issue
Block a user