### What this PR does / why we need it?
Fix layerwise connector for decoder tp size > num kv heads. In this case
prefiller should push kv cache to all decoder npu.
- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef
Signed-off-by: liziyu <liziyu16@huawei.com>