[P/D] Improve the performance of Layerwise Connector (#5303)

### What this PR does / why we need it? Improve the performance of Layerwise Connector, mainly includes the following points: 1. Use event synchronize to replace stream synchronize. 2. Access metaserver when scheduling. 3. Transfer kvcache each Chunk prefill segmentation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By CI. - vLLM version: release/v0.13.0 - vLLM main: 5fbfa8d9ef --------- Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Co-authored-by: liziyu <liziyu16@huawei.com> Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
2025-12-31 15:09:01 +08:00
parent 7d5242faca
commit 46a1614387
5 changed files with 354 additions and 202 deletions
--- a/tests/ut/attention/test_mla_v1.py
+++ b/tests/ut/attention/test_mla_v1.py
@@ -1112,6 +1112,7 @@ class TestAscendMLAImpl(TestBase):
            MagicMock(), MagicMock()
        ]
        self.impl.num_kv_heads = self.impl.num_heads
+        self.impl.is_kv_producer = False

        decode_res, prefill_res = self.impl._mla_preprocess(
            "mock_layer",