[P/D] Improve the performance of Layerwise Connector (#5303)
### What this PR does / why we need it?
Improve the performance of Layerwise Connector, mainly includes the
following points:
1. Use event synchronize to replace stream synchronize.
2. Access metaserver when scheduling.
3. Transfer kvcache each Chunk prefill segmentation.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
By CI.
- vLLM version: release/v0.13.0
- vLLM main:
5fbfa8d9ef
---------
Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
This commit is contained in:
@@ -1112,6 +1112,7 @@ class TestAscendMLAImpl(TestBase):
|
||||
MagicMock(), MagicMock()
|
||||
]
|
||||
self.impl.num_kv_heads = self.impl.num_heads
|
||||
self.impl.is_kv_producer = False
|
||||
|
||||
decode_res, prefill_res = self.impl._mla_preprocess(
|
||||
"mock_layer",
|
||||
|
||||
Reference in New Issue
Block a user