Files
xc-llm-ascend/tests
linfeng-yuan 0fbe0831ec [bugfix][refactor] fix recompute_scheduler break with vllm 0.12.0 & support async scheduling & refactor recompute_scheduler.py (#4895)
### What this PR does / why we need it?
Currently, the initialization and fundamental functions of
RecomputeScheduler are broken with `vLLM v0.12.0`. This PR fixes the
conflicts of `RecomputeScheduler` and refactor its implementations by
inheriting original `Scheduler` of vLLM. Meanwhile, this PR also
supports async cheduling with recompute scheduler by implementing
`AsyncRecomputeScheduler` which is simply inherited `AsncyScheduler` of
vLLM and `RecomputeScheduler` of vLLM-Ascend with python MRO.
### Does this PR introduce _any_ user-facing change?
No. The switch naming is the same as v0.11.0 :
`recompute_scheduler_enable`
### How was this patch tested?
E2E serving with 2P1D dsv3.1 passed. The performance was the same as
original vllm scheduler with `async_scheduling` and preempted requests
in D Nodes are successfully transfered to Proxy and further to P Node.
This significantly improves the performance and robustness of PD
disaggregation deployments.


- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-12-11 22:24:49 +08:00
..
2025-12-11 20:35:32 +08:00