KV‑Cache (MHA, MLA): add missing start_layer / end_layer fields to MHATokenToKVPoolHost and MLATokenToKVPoolHost (#6016)
Co-authored-by: 继优 <jiyou.ljy@alibaba-inc.com> Co-authored-by: chus-chus <chus-chus@users.noreply.github.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
This commit is contained in:
@@ -762,6 +762,8 @@ class HostKVCache(abc.ABC):
|
||||
self.size = int(device_pool.size * host_to_device_ratio)
|
||||
# Align the host memory pool size to the page size
|
||||
self.size = self.size - (self.size % self.page_size)
|
||||
self.start_layer = device_pool.start_layer
|
||||
self.end_layer = device_pool.end_layer
|
||||
|
||||
assert (
|
||||
self.size > device_pool.size
|
||||
|
||||
Reference in New Issue
Block a user