[P/D] layerwise connector support recompute scheduler (#5900)
### What this PR does / why we need it?
layerwise connector support recompute scheduler.
NOTE:
Triggering recompute will invoke the tokenizer again, which may lead to
precision fluctuations.
[RFC]: CDCP Scheduling for Disaggregated Prefilling with KV Cache
Layerwise Push Support
https://github.com/vllm-project/vllm-ascend/issues/4842
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.13.0
- vLLM main:
bde38c11df
---------
Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
This commit is contained in:
@@ -642,7 +642,7 @@ class RecomputeScheduler(Scheduler):
|
||||
EngineCoreOutput(
|
||||
request_id=req_info.request_id,
|
||||
finish_reason=FinishReason.STOP,
|
||||
new_token_ids=[req_info.output_token_ids[-1]],
|
||||
new_token_ids=[],
|
||||
stop_reason="recomputed",
|
||||
)
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user