[Core] Restore scheduling logic under default configuration (#3967)

### What this PR does / why we need it? This PR reverts the changes introduced in PR #2894 Initially, due to performance issues with the older version of the chunked prefill ops, the default behavior was to use the Ascend scheduler to disable the chunked prefill feature. However, with the improvements in the performance of the new chunked prefill ops, this interception strategy has been removed. This change also aligns with the community's default configuration behavior. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.11.0 - vLLM main: 83f478bb19 Signed-off-by: rjg-lyh <1318825571@qq.com>
2025-11-10 17:48:56 +08:00
parent 75c3f9a780
commit a1558b99c2
2 changed files with 12 additions and 55 deletions
--- a/tests/ut/test_platform.py
+++ b/tests/ut/test_platform.py
@@ -737,7 +737,7 @@ class TestNPUPlatform(TestBase):
            self.platform.check_and_update_config(VllmConfig)
            self.assertTrue(
                "PIECEWISE compilation enabled on NPU. use_inductor not supported - "
-                "using only ACL Graph mode" in cm.output[1])
+                "using only ACL Graph mode" in cm.output[0])
            if vllm_version_is("0.11.0"):
                self.assertEqual(
                    VllmConfig.compilation_config.level,