【main】SP For Qwen3 MoE (#2209)
### What this PR does / why we need it?
Qwen3 MoE supports SP. In scenarios like AlltoAll, AlltoAllv, and MC2,
replacing AllReduce with Reduce-Scatter and AllGather achieves
computational benefits in norm operations while saving one AllGather
communication. This feature is enabled during the P-phase and delivers
notable gains in long-sequence scenarios (e.g., 16k–25k), with
performance improvements reaching 5%–10%.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
```
compilation_config={
"pass_config":{
"enable_sequence_parallelism": True
}
},
enable_expert_parallel=True,
```
- vLLM version: v0.10.0
- vLLM main:
9edd1db02b
---------
Signed-off-by: libaokui <libaokui@huawei.com>
Co-authored-by: libaokui <libaokui@huawei.com>
This commit is contained in:
@@ -26,6 +26,7 @@ class TestNPUPlatform(TestBase):
|
||||
self.mock_vllm_config.cache_config = MagicMock()
|
||||
self.mock_vllm_config.scheduler_config = MagicMock()
|
||||
self.mock_vllm_config.speculative_config = None
|
||||
self.mock_vllm_config.compilation_config.pass_config.enable_sequence_parallelism = False
|
||||
|
||||
self.mock_ascend_config = MagicMock()
|
||||
self.mock_ascend_config.torchair_graph_config.enabled = False
|
||||
|
||||
Reference in New Issue
Block a user