[SpecDecode][CI] Set default values to fix spec decode and fix multicard CI (#1109)

### What this PR does / why we need it? - Set default values to fix spec decode - To avoid oom, we need to run the test in a single process ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed, espcecially multicards CI - For spec decode test, long term CI passed Closes: https://github.com/vllm-project/vllm-ascend/pull/1105 --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: mengwei805 <mengwei25@huawei.com>
2025-06-07 11:23:30 +08:00
parent e9ada685ec
commit 8d00775fce
2 changed files with 13 additions and 1 deletions
--- a/vllm_ascend/patch/worker/patch_common/patch_spec_decode_worker.py
+++ b/vllm_ascend/patch/worker/patch_common/patch_spec_decode_worker.py
@@ -56,6 +56,12 @@ def create_worker(
        draft_worker_kwargs.pop("ngram_prompt_lookup_max"))
    ngram_prompt_lookup_min = (
        draft_worker_kwargs.pop("ngram_prompt_lookup_min"))
+
+    # TODO(Yizhou): A quick fix, must be refactored ASAP
+    draft_worker_kwargs["vllm_config"].parallel_config.expert_parallel_size = 1
+    draft_worker_kwargs[
+        "vllm_config"].parallel_config.expert_tensor_parallel_size = 1
+
    draft_model_config = draft_worker_kwargs["vllm_config"].model_config
    draft_parallel_config: ParallelConfig = draft_worker_kwargs[
        'vllm_config'].parallel_config