[Feat] Support MLP_TP feature, exclude MOE layer (#4999)

#4257 This PR implements the dense_ffn TP of the first three layers of the deepseek model, I have refactored this PR and used very little code to support the implementation of this feature. This PR adds a function `is_moe_layer` to mlp_tp, which supports MLP TP in models with both mlp and moe, such as deepseek or chat GLM. - vLLM version: v0.12.0 - vLLM main: ad32e3e19c Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: 子潜 <ziqian@U-DMKXH32D-2015.local> Co-authored-by: chenxiao <Jaychou1620@Gmail.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>
2025-12-18 20:06:53 +08:00
parent 5a88e3333b
commit a74a1196c5
3 changed files with 37 additions and 23 deletions
--- a/tests/ut/distributed/test_parallel_state.py
+++ b/tests/ut/distributed/test_parallel_state.py
@@ -41,6 +41,7 @@ def test_init_ascend_model_parallel(mock_distributed, parallel_config):
    mock_ascend_config.finegrained_tp_config.lmhead_tensor_parallel_size = 2
    mock_ascend_config.finegrained_tp_config.oproj_tensor_parallel_size = 2
    mock_ascend_config.finegrained_tp_config.embedding_tensor_parallel_size = 2
+    mock_ascend_config.finegrained_tp_config.mlp_tensor_parallel_size = 2
    mock_ascend_config.flashcomm2_oproj_tensor_parallel_size = 2
    mock_ascend_config.pd_tp_ratio = 2
    mock_ascend_config.num_head_replica = 0