[Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig (#2434)

### What this PR does / why we need it? Add configuration check logic for ascend scheduler: if chunked_prefill is disabled, max_num_batched_tokens couldn't be less than max_model_len, following vLLM; ### Does this PR introduce _any_ user-facing change? users cannot set max_num_batched_tokens smaller than max_model_len with ascend scheduler ### How was this patch tested? CI and vllm serving passed - vLLM version: v0.10.0 - vLLM main: f77a0802b7 Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-08-23 19:39:44 +08:00
parent 3629bc4431
commit 4af5b80606
3 changed files with 66 additions and 6 deletions
--- a/tests/e2e/singlecard/test_ascend_scheduler.py
+++ b/tests/e2e/singlecard/test_ascend_scheduler.py
@@ -16,7 +16,7 @@ def test_concurrent_partial_prefill():
                        },
                    },
                    max_num_seqs=3,
-                    max_num_batched_tokens=200,
+                    max_num_batched_tokens=2048,
                    enforce_eager=True,
                    max_model_len=2048,
                    gpu_memory_utilization=0.7) as vllm_model:
@@ -35,7 +35,7 @@ def test_prefix_cache_stats_is_recorded():
                        },
                    },
                    max_num_seqs=3,
-                    max_num_batched_tokens=200,
+                    max_num_batched_tokens=2048,
                    enforce_eager=True,
                    max_model_len=2048,
                    gpu_memory_utilization=0.7) as vllm_model: