[P/D] layerwise connector support recompute scheduler (#5900)

### What this PR does / why we need it? layerwise connector support recompute scheduler. NOTE： Triggering recompute will invoke the tokenizer again, which may lead to precision fluctuations. [RFC]: CDCP Scheduling for Disaggregated Prefilling with KV Cache Layerwise Push Support https://github.com/vllm-project/vllm-ascend/issues/4842 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: bde38c11df --------- Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
2026-02-07 15:24:42 +08:00
parent d266fd7b47
commit e5f0e0eaf7
2 changed files with 89 additions and 12 deletions
--- a/vllm_ascend/core/recompute_scheduler.py
+++ b/vllm_ascend/core/recompute_scheduler.py
@@ -642,7 +642,7 @@ class RecomputeScheduler(Scheduler):
                    EngineCoreOutput(
                        request_id=req_info.request_id,
                        finish_reason=FinishReason.STOP,
-                        new_token_ids=[req_info.output_token_ids[-1]],
+                        new_token_ids=[],
                        stop_reason="recomputed",
                    )
                )