[P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780)
### What this PR does / why we need it?
As support for the mooncake connector is now available, the llmdatadist
connector is no longer being maintained, so the llmdatadist-related
files need to be retired.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci
- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c
---------
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
This commit is contained in:
@@ -3398,7 +3398,7 @@ class NPUModelRunner(LoRAModelRunnerMixin, ECConnectorModelRunnerMixin):
|
||||
# init kv cache tensors
|
||||
kv_cache_raw_tensors: dict[str, Union[torch.Tensor,
|
||||
Optional[torch.Tensor]]] = {}
|
||||
# llmdatadist need the addr of cache tensor be aligned with 2M
|
||||
# prefill disaggregation need the addr of cache tensor be aligned with 2M
|
||||
alignment = 2 * 1024 * 1024
|
||||
for kv_cache_tensor in kv_cache_config.kv_cache_tensors:
|
||||
# TODO: REFACTOR ME to sharing hybrid cache
|
||||
@@ -3426,7 +3426,7 @@ class NPUModelRunner(LoRAModelRunnerMixin, ECConnectorModelRunnerMixin):
|
||||
elif "attn" in layer_name and layer_name not in kv_cache_raw_tensors.keys(
|
||||
):
|
||||
# NOTE: We need to init k cache tensor (nope cache tensor in mla) and
|
||||
# v cache tensor (rope cache tensor in mla) separately to support llmdatadist,
|
||||
# v cache tensor (rope cache tensor in mla) separately to support prefill disaggregation,
|
||||
# as it only support the 0-dim of kv_cache is `num_blocks`.
|
||||
# For deepseek mla, we need to spilt cache tensor accrodding to the nope head dim
|
||||
# and rope head dim.
|
||||
|
||||
Reference in New Issue
Block a user