[BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204)

### What this PR does / why we need it?
1. Solved the issue where sizes capture failed for the Qwen3-32b-int8
model when aclgraph, dp1, and tp4 were enabled.
2. Added the exception thrown when sizes capture fails and provided a
solution
3. Add this common problem to the FAQ doc
### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
This commit is contained in:
lilinsiman
2025-09-28 17:44:04 +08:00
committed by GitHub
parent a86ece5e39
commit 1705501ae2
4 changed files with 47 additions and 14 deletions

View File

@@ -260,7 +260,7 @@ class TestUtils(TestBase):
utils.update_aclgraph_sizes(test_vllm_config)
del os.environ['HCCL_OP_EXPANSION_MODE']
self.assertEqual(
138,
137,
len(test_vllm_config.compilation_config.cudagraph_capture_sizes))
test_vllm_config.speculative_config = mock.MagicMock()
@@ -273,7 +273,7 @@ class TestUtils(TestBase):
utils.update_aclgraph_sizes(test_vllm_config)
del os.environ['HCCL_OP_EXPANSION_MODE']
self.assertEqual(
112,
111,
len(test_vllm_config.compilation_config.cudagraph_capture_sizes))
# max_num_batch_sizes >= len(original_sizes)