[Misc][V0 Deprecation] Remove Draft Model Runner Used for V0 Spec Decode (#1810)

### What this PR does / why we need it? Remove draft model runner used for V0 spec decode. This PR is a part of https://github.com/vllm-project/vllm-ascend/issues/1620. - vLLM version: v0.9.2 - vLLM main: 34cda778a0 --------- Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-16 10:51:23 +08:00
parent f96100fad5
commit f9e2e9bb31
4 changed files with 0 additions and 493 deletions
--- a/vllm_ascend/patch/init.py
+++ b/vllm_ascend/patch/init.py
@@ -73,21 +73,6 @@
 #    Future Plan:
 #       Keep this patch in vllm-ascend.
 #
-# ** File: worker/patch_common/patch_spec_decode_worker.py **
-# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-#   1. `vllm.spec_decode.spec_decode_worker.SpecDecodeWorker.create_worker`
-#    Why:
-#       We need to use the patched `TP1DraftModelRunner` in `SpecDecodeWorker.create_worker`.
-#       The mainly reason to overwrite `TP1DraftModelRunner`is the hard code of
-#           `FlashAttentionMetadata`
-#    How：
-#       ditto
-#    Related PR (if no, explain why):
-#       - https://github.com/vllm-project/vllm/pull/15195
-#       - https://github.com/vllm-project/vllm-ascend/pull/395
-#    Future Plan:
-#       Revert it when the related pr is merged in vllm and vllm-ascend.
-#
 # ** File: worker/patch_common/patch_distributed.py **
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 #   1. `vllm.distributed.parallel_state.GroupCoordinator`