Commit Graph

52 Commits

Author SHA1 Message Date
hzh0425
ee3bd8a1c8 feat(hicache): Support passing prefix keys for l3 store. (#9045)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-10-10 00:22:05 -07:00
Ke Bao
31b49c0b51 EAGLE cache fix for HiCache (#11215) 2025-10-04 16:53:53 -07:00
ykwd
bfa274380b [HiCache] Configurable and Dynamic Prefetch Timeout (#10512)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-10-01 08:44:10 -05:00
huangtingwei
e05555fad8 [HiCacheStorage] mooncake store support page_first_direct layout (#10591) 2025-09-28 20:45:48 -07:00
Zhiqiang Xie
3d40794fcf [HiCache] Cleaning the deprecated host memory state (#10778) 2025-09-25 14:43:53 +08:00
Xinyuan Tong
12d6cf18f0 Refactors radix cache for extra key support (#10317)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-09-22 02:16:16 +08:00
Xuchun Shang
1ccd59c715 [HICache] introduce evict policy (#10190)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
2025-09-18 11:10:20 +08:00
DarkSharpness
948b01a04c [Refactor] Remove Hicache Load & Write threads (#10127)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-08 22:18:50 -07:00
Shisong Ma
33467c05a4 [BUG FIX] add fail check when get fail in case wait complete block (#9971)
Co-authored-by: mashisong <mashisong@bytedance.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-07 18:34:04 -07:00
Teng Ma
41628dc1b1 [HiCache] fix: check clear() method for storage backend (#10096)
Co-authored-by: hzh0425 <hzh0425@apache.org>
2025-09-06 22:59:58 -07:00
pansicheng
f84db115b1 Add storage read/write bandwidth logs to monitor kvcache performance (#9965)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-05 16:52:55 -07:00
JinYan Su
37565b7f21 fix(cache): move ongoing_prefetch pop after validation to prevent leak (#9927)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-03 02:39:34 +00:00
huangtingwei
cb9e0e4180 [HiCacheStorage] fix abort request host memory leaks (#9874)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-01 18:59:29 -07:00
Zhiqiang Xie
8b6966d020 [HiCache] Storage Refactoring (#9797)
Co-authored-by: pansicheng <27603155+pansicheng@users.noreply.github.com>
2025-08-31 22:58:21 +08:00
Teng Ma
f05c68733e [HiCache] Clear kvcache in storage backend with fastAPI (#9750)
Co-authored-by: hzh0425 <hzh0425@apache.org>
2025-08-31 17:41:44 +08:00
Zhiqiang Xie
54e872d343 [HiCache] resolve conflict between chunked-prefill and hicache hit count (#9776) 2025-08-30 01:30:54 +08:00
Pablo Iyu Guerrero
a3aee7c377 fix: HiRadixCache: fix prefetch completion race (#9397) 2025-08-27 15:43:01 +08:00
hzh0425
c04c17edfa refactor(hicache): Introduce generic HiCacheStorageConfig for improved configuration management (#9555)
Co-authored-by: Teng Ma <805522925@qq.com>
2025-08-26 17:55:20 -07:00
Zhiqiang Xie
43de1d7304 HiCache Storage fix host memory leak (#9648) 2025-08-26 10:49:40 -07:00
Zhiqiang Xie
0eec4cb6cc HiCache, add bench long context plus minor fixs (#9086)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-11 16:54:52 -07:00
Zhiqiang Xie
9f78f391ae HiCache Storage: generate hash when inserting new nodes (#9053) 2025-08-11 14:18:59 -07:00
Zhiqiang Xie
6e0b646832 HiCache Storage tp fix (#8878) 2025-08-09 01:16:51 -07:00
pansicheng
e2fd2b9c7e Simple prefetch policy (#8692) 2025-08-08 02:09:28 -07:00
Zhiqiang Xie
dd7ca00601 Interface change for kvcache io to support page first layout (#8318) 2025-08-01 11:37:49 +08:00
Zhiqiang Xie
9305ea6c2d HiCache, fixing hash value indexing (#8636) 2025-08-01 11:29:51 +08:00
huangtingwei
d904959233 Support l3 cache (mooncake store) for hiradix cache (#7211)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>
Co-authored-by: zuoyuan <zhangzuo21@mails.tsinghua.edu.cn>
Co-authored-by: @wangyueneng.wyn <wangyueneng.wyn@antgroup.com>
Co-authored-by: JinYan Su <jinyansu792@gmail.com>
2025-07-30 23:15:51 -07:00
huangtingwei
26c8a310bd fix incorrect increase of hit count (#8533)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-07-31 06:02:42 +00:00
yi wang
5963e50503 [bugfix] Fix 2 minor bugs in the hicache storage layer (#8404) 2025-07-31 05:47:14 +00:00
pansicheng
299803343d Add hf3fs support for hicache storage (based on #7704) (#7280)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-07-30 17:42:41 -07:00
Zhiqiang Xie
528bd1ed85 HiCache, check before terminate prefetching (#8372) 2025-07-26 23:13:16 -07:00
Zhiqiang Xie
145482f422 HiCache Storage TP Refinement (#8307)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
2025-07-25 08:31:47 +08:00
Zhiqiang Xie
9d33fcfb8e Hicache Storage Layer Prototype (#7704) 2025-07-18 15:20:19 +08:00
Zhiqiang Xie
2fc824b84c Kernels for efficient KV cache IO (#7313) 2025-07-06 22:53:36 -07:00
Liangsheng Yin
05c9bc8956 [minor] simplify the TokenToKVPoolAllocator (#7414) 2025-06-22 12:37:18 +08:00
DarkSharpness
47367b768d [Refactor] Clean up radix cache related API (#7303)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-06-20 00:58:48 +08:00
Zhiqiang Xie
e56685ac1b Upstreaming hicache bug fixes (#7267) 2025-06-17 17:44:57 -07:00
Lianmin Zheng
a023856b12 Move host memory pools into a separate file (#7200) 2025-06-14 21:31:42 -07:00
Lifu Huang
3cf1473a09 Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
2025-05-17 16:49:18 -07:00
Zhiqiang Xie
70645f4d7d upstream hicache fixes (#5570) 2025-04-20 23:08:30 -07:00
Zhiqiang Xie
e2574ee986 fix hicache write back (#5543) 2025-04-19 21:56:22 -07:00
Zhiqiang Xie
3fadc64793 bug fix for hicache host eviction (#4989) 2025-04-02 00:33:50 -07:00
Zhiqiang Xie
e119f04215 Large page size aligned hierarchical caching (#4581) 2025-04-01 22:38:15 -07:00
Zhiqiang Xie
a98290aea3 Unit test for Hierarchical Caching (#4486) 2025-03-17 17:45:00 -07:00
Zhiqiang Xie
f5bbf6037d Fix: Complete int32 to int64 conversion (#4465) 2025-03-16 18:14:27 -07:00
JieXin Liang
1a3fa75f2f [Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. (#4466) 2025-03-16 00:02:47 -07:00
Lu Changqi
0e0ec70200 Hierarchical Caching supports MLA (#4009)
Signed-off-by: Changqi Lu <luchangqi.123@bytedance.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-03-13 20:42:14 -07:00
Zhiqiang Xie
fbdb50501f Hot fix for hicache with new page aligned radixtree (#4397) 2025-03-13 15:50:49 -07:00
Lianmin Zheng
c76040e31b Support page size > 1 (#4356) 2025-03-12 22:22:39 -07:00
Zhiqiang Xie
10b544ae9b Hierarchical Caching Refactoring and Fixing TP issue (#4082) 2025-03-12 11:22:35 -07:00
Zhiqiang Xie
9376ac361d Memory pool fix for upstream change about eagle (#4170) 2025-03-07 00:58:20 -08:00