[BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204)
### What this PR does / why we need it? 1. Solved the issue where sizes capture failed for the Qwen3-32b-int8 model when aclgraph, dp1, and tp4 were enabled. 2. Added the exception thrown when sizes capture fails and provided a solution 3. Add this common problem to the FAQ doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.10.2 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 Signed-off-by: lilinsiman <lilinsiman@gmail.com>
This commit is contained in:
@@ -260,7 +260,7 @@ class TestUtils(TestBase):
|
||||
utils.update_aclgraph_sizes(test_vllm_config)
|
||||
del os.environ['HCCL_OP_EXPANSION_MODE']
|
||||
self.assertEqual(
|
||||
138,
|
||||
137,
|
||||
len(test_vllm_config.compilation_config.cudagraph_capture_sizes))
|
||||
|
||||
test_vllm_config.speculative_config = mock.MagicMock()
|
||||
@@ -273,7 +273,7 @@ class TestUtils(TestBase):
|
||||
utils.update_aclgraph_sizes(test_vllm_config)
|
||||
del os.environ['HCCL_OP_EXPANSION_MODE']
|
||||
self.assertEqual(
|
||||
112,
|
||||
111,
|
||||
len(test_vllm_config.compilation_config.cudagraph_capture_sizes))
|
||||
|
||||
# max_num_batch_sizes >= len(original_sizes)
|
||||
|
||||
Reference in New Issue
Block a user