debug_aclgraph_sizes_capture (#2827)

### What this PR does / why we need it? 1. Solved the problem that in the Qwen3 Moe model case, opening DP would use an extra stream, causing ACLgraph sizes capture error 2. After experimentation, it was found that in many cases, some operators would occupy more streams than expected. Therefore, the buffer area for streams in ACLgraph was not large enough. After discussion, extra 120 streams were added as buffer. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: main - vLLM main: 0ae43dbf8c Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2025-09-10 22:50:48 +08:00
parent e75b568011
commit b7df04de9b
2 changed files with 13 additions and 5 deletions
--- a/tests/ut/test_utils.py
+++ b/tests/ut/test_utils.py
@@ -259,7 +259,7 @@ class TestUtils(TestBase):
        utils.update_aclgraph_sizes(test_vllm_config)
        del os.environ['HCCL_OP_EXPANSION_MODE']
        self.assertEqual(
-            147,
+            138,
            len(test_vllm_config.compilation_config.cudagraph_capture_sizes))

        test_vllm_config.speculative_config = mock.MagicMock()
@@ -272,7 +272,7 @@ class TestUtils(TestBase):
        utils.update_aclgraph_sizes(test_vllm_config)
        del os.environ['HCCL_OP_EXPANSION_MODE']
        self.assertEqual(
-            120,
+            112,
            len(test_vllm_config.compilation_config.cudagraph_capture_sizes))

        # max_num_batch_sizes >= len(original_sizes)