[CI]Fix the error caused by layer_sharding in dsv32 (#8719)
### What this PR does / why we need it? This PR fixes the error in DSV32 mixed deployment caused by enabling layer_sharding. - Currently, mixed deployment no longer supports the enabling of layer_sharding. Therefore, it has been removed from the service-oriented configuration. - The error "RPC call to sample_tokens timed out" occurred because the dshm size limit was set too small. Therefore, it was increased to 512 Gi. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? The nightly test has passed. Signed-off-by: wyh145 <1987244901@qq.com>
This commit is contained in:
@@ -39,7 +39,6 @@ deployment:
|
|||||||
--trust-remote-code
|
--trust-remote-code
|
||||||
--speculative-config '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}'
|
--speculative-config '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}'
|
||||||
--compilation-config '{"cudagraph_capture_sizes": [8, 16, 24, 32, 40, 48], "cudagraph_mode": "FULL_DECODE_ONLY"}'
|
--compilation-config '{"cudagraph_capture_sizes": [8, 16, 24, 32, 40, 48], "cudagraph_mode": "FULL_DECODE_ONLY"}'
|
||||||
--additional-config '{"layer_sharding": ["q_b_proj", "o_proj"]}'
|
|
||||||
--tokenizer-mode deepseek_v32
|
--tokenizer-mode deepseek_v32
|
||||||
--reasoning-parser deepseek_v3
|
--reasoning-parser deepseek_v3
|
||||||
|
|
||||||
@@ -64,7 +63,6 @@ deployment:
|
|||||||
--trust-remote-code
|
--trust-remote-code
|
||||||
--speculative-config '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}'
|
--speculative-config '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}'
|
||||||
--compilation-config '{"cudagraph_capture_sizes": [8, 16, 24, 32, 40, 48], "cudagraph_mode": "FULL_DECODE_ONLY"}'
|
--compilation-config '{"cudagraph_capture_sizes": [8, 16, 24, 32, 40, 48], "cudagraph_mode": "FULL_DECODE_ONLY"}'
|
||||||
--additional-config '{"layer_sharding": ["q_b_proj", "o_proj"]}'
|
|
||||||
--tokenizer-mode deepseek_v32
|
--tokenizer-mode deepseek_v32
|
||||||
--reasoning-parser deepseek_v3
|
--reasoning-parser deepseek_v3
|
||||||
benchmarks:
|
benchmarks:
|
||||||
|
|||||||
@@ -38,8 +38,6 @@ test_cases:
|
|||||||
- '{"cudagraph_capture_sizes":[4, 8, 16, 20, 24, 28, 32], "cudagraph_mode":"FULL_DECODE_ONLY"}'
|
- '{"cudagraph_capture_sizes":[4, 8, 16, 20, 24, 28, 32], "cudagraph_mode":"FULL_DECODE_ONLY"}'
|
||||||
- "--speculative-config"
|
- "--speculative-config"
|
||||||
- '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}'
|
- '{"num_speculative_tokens": 3, "method":"deepseek_mtp"}'
|
||||||
- "--additional-config"
|
|
||||||
- '{"layer_sharding": ["q_b_proj", "o_proj"]}'
|
|
||||||
- "--reasoning-parser"
|
- "--reasoning-parser"
|
||||||
- "deepseek_v3"
|
- "deepseek_v3"
|
||||||
- "--tokenizer_mode"
|
- "--tokenizer_mode"
|
||||||
|
|||||||
Reference in New Issue
Block a user