[SpecDecode][CI] Set default values to fix spec decode and fix multicard CI (#1109)

### What this PR does / why we need it?
- Set default values to fix spec decode
- To avoid oom, we need to run the test in a single process

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- CI passed, espcecially multicards CI
- For spec decode test, long term CI passed

Closes: https://github.com/vllm-project/vllm-ascend/pull/1105

---------

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
This commit is contained in:
Yikun Jiang
2025-06-07 11:23:30 +08:00
committed by GitHub
parent e9ada685ec
commit 8d00775fce
2 changed files with 13 additions and 1 deletions

View File

@@ -56,6 +56,12 @@ def create_worker(
draft_worker_kwargs.pop("ngram_prompt_lookup_max"))
ngram_prompt_lookup_min = (
draft_worker_kwargs.pop("ngram_prompt_lookup_min"))
# TODO(Yizhou): A quick fix, must be refactored ASAP
draft_worker_kwargs["vllm_config"].parallel_config.expert_parallel_size = 1
draft_worker_kwargs[
"vllm_config"].parallel_config.expert_tensor_parallel_size = 1
draft_model_config = draft_worker_kwargs["vllm_config"].model_config
draft_parallel_config: ParallelConfig = draft_worker_kwargs[
'vllm_config'].parallel_config