Commit Graph

222 Commits

Author SHA1 Message Date
huangtingwei
e05555fad8 [HiCacheStorage] mooncake store support page_first_direct layout (#10591) 2025-09-28 20:45:48 -07:00
Teng Ma
9816989bff [HiCache] bug: fix mooncake store batch set v1 (#11013) 2025-09-28 23:18:48 +08:00
hzh0425
c8a5d12abe [HiCache]: Support dynamic loading backends for hicache (#10551)
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
2025-09-26 18:34:11 -07:00
yi wang
fce170480a integrate AIBrix KVcache (#10376) 2025-09-25 14:47:09 +08:00
Zhiqiang Xie
3d40794fcf [HiCache] Cleaning the deprecated host memory state (#10778) 2025-09-25 14:43:53 +08:00
pansicheng
d4041a5eeb refactor zero copy (#10300)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-09-22 15:17:31 -07:00
Xinyuan Tong
12d6cf18f0 Refactors radix cache for extra key support (#10317)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
2025-09-22 02:16:16 +08:00
huangtingwei
7f399e4bce [HiCacheStorage]support page_first_direct layout for generic set&get (#10522) 2025-09-19 05:47:16 -07:00
FlyPanda
8b713c7248 Hicache L3 backend mooncake optimization configuration reading method (#10319)
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com>
Co-authored-by: shicang <shicang@shicang>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-09-19 12:25:01 +08:00
Xuchun Shang
1ccd59c715 [HICache] introduce evict policy (#10190)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
2025-09-18 11:10:20 +08:00
Lianmin Zheng
f949ad5794 [Auto Sync] Update activation.py, chunk_cache.py, utils.py (20250917) (#10538)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-16 17:06:43 -07:00
ykwd
4bb08f6e07 [Hicache] Evaluate Per-Round Metrics in Multiturn Bench (#10203)
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
2025-09-15 19:34:40 -07:00
Binyao Jiang
9752861002 [Fix] Support qwen3-next MTP+DP (#10392) 2025-09-13 17:45:04 +08:00
Yi Zhang
297d374510 support qwen3_next blackwell (#10403) 2025-09-13 17:18:26 +08:00
Binyao Jiang
31e9d3a5aa [Fix] Init mamba related memory pools with torch.zeros (#10400) 2025-09-13 14:16:48 +08:00
Teng Ma
49f169d53e [HiCache] doc: update deployment in readme (#10332)
Signed-off-by: Teng Ma <sima.mt@alibaba-inc.com>
2025-09-12 16:35:37 -07:00
Teng Ma
7fce2fd91a [HiCache] fix mooncake config in different tp size (#10377) 2025-09-12 16:34:23 -07:00
Even Zhou
16cd550c85 Support Qwen3-Next on Ascend NPU (#10379) 2025-09-12 16:31:37 -07:00
huangtingwei
b4c2c421e9 support memory_pool_host page first direct layout (#10031)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-11 23:19:44 -07:00
Stefan He
6c18ab46a2 [Qwen3-Next] switch to triton and cache conv states to accelerate MTP from 300 tok/s to 341 tok/s (#10335)
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
2025-09-11 11:59:48 -07:00
Yi Zhang
30c6e1f569 Qwen3-Next support (#10233)
Co-authored-by: cao1zhg <114661107+cao1zhg@users.noreply.github.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: Binyao Jiang <byjiang1996@gmail.com>
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: Lifu Huang <lifu.hlf@gmail.com>
Co-authored-by: qingquansong <ustcsqq@gmail.com>
Co-authored-by: Yaoyao Ding <dingyaoyao.cs@gmail.com>
Co-authored-by: Ke Bao <ISPObaoke@163.com>
Co-authored-by: Minglei Zhu <mingleizhu1122@gmail.com>
2025-09-11 04:11:49 -07:00
Teng Ma
8471e5e616 [HiCache] feat: add mooncake backend extra config (#10213) 2025-09-09 12:50:00 -07:00
DarkSharpness
948b01a04c [Refactor] Remove Hicache Load & Write threads (#10127)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-08 22:18:50 -07:00
hzh0425
ec99668ab7 [Hicache]: Add E2E CI For 3FS-KVStore (#10131) 2025-09-08 01:54:50 -07:00
Huaiyu, Zheng
ee21817c6b enable llama3.1-8B on xpu (#9434) 2025-09-07 22:34:20 -07:00
Shisong Ma
33467c05a4 [BUG FIX] add fail check when get fail in case wait complete block (#9971)
Co-authored-by: mashisong <mashisong@bytedance.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-07 18:34:04 -07:00
Teng Ma
41628dc1b1 [HiCache] fix: check clear() method for storage backend (#10096)
Co-authored-by: hzh0425 <hzh0425@apache.org>
2025-09-06 22:59:58 -07:00
Yuwei An
9a7ced4e4d [Feature] LMCache Connector Integration (#9741)
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Signed-off-by: YuhanLiu11 <yliu738@wisc.edu>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-06 20:14:55 -07:00
Zhiqiang Xie
0b8c5721f1 [HiStorage] Remove delete and clear as necessary methods (#10039) 2025-09-06 10:27:26 +08:00
Xinyuan Tong
273b28344b [Minor] Refactors KV memory pool (#9842) 2025-09-05 17:06:08 -07:00
pansicheng
f84db115b1 Add storage read/write bandwidth logs to monitor kvcache performance (#9965)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-05 16:52:55 -07:00
ykwd
93088b6975 [Hicache] Mooncake API Fix & Test, and Improved Readme (#9951)
Co-authored-by: Teng Ma <sima.mt@alibaba-inc.com>
2025-09-04 13:55:39 -07:00
pansicheng
d07304870b fix 3fs zerocopy (#9938)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-04 13:24:12 -07:00
hzh0425
106c2b31fb feat(hicache): Add generic hicache ci e2e test and benchmark test (#9846)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-04 20:43:46 +08:00
Xinyuan Tong
56eb5d0a3d fix swa clear(): rename is_in_free_group to is_not_in_free_group (#9914) 2025-09-03 11:42:12 -07:00
JinYan Su
37565b7f21 fix(cache): move ongoing_prefetch pop after validation to prevent leak (#9927)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-03 02:39:34 +00:00
Zhiqiang Xie
369b143366 [HiCache] Minor fix on file storage backend (#9869) 2025-09-02 15:52:37 -07:00
hzh0425
4d89389c4f Fix the key passing issue in page first layout. (#9929) 2025-09-02 11:30:11 -07:00
hzh0425
58d06fdc95 [HiCacheStorage]: Improve 3fs kvstore‘s performance and resolve mla issues (#9876) 2025-09-01 19:01:48 -07:00
huangtingwei
cb9e0e4180 [HiCacheStorage] fix abort request host memory leaks (#9874)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
2025-09-01 18:59:29 -07:00
huangtingwei
b361750a4a Mooncake store get zero copy meta optimization (#9857) 2025-09-01 03:27:56 -07:00
Zhiqiang Xie
8b6966d020 [HiCache] Storage Refactoring (#9797)
Co-authored-by: pansicheng <27603155+pansicheng@users.noreply.github.com>
2025-08-31 22:58:21 +08:00
Teng Ma
f05c68733e [HiCache] Clear kvcache in storage backend with fastAPI (#9750)
Co-authored-by: hzh0425 <hzh0425@apache.org>
2025-08-31 17:41:44 +08:00
Zhiqiang Xie
f9076a5a2c hot fix for mooncake batch set api (#9836) 2025-08-30 21:01:51 -07:00
hzh0425
161e9dc51e feat(hicache-3fs): 3FS-Store Backup Optimizations For MLA Model. (#9692) 2025-08-29 10:48:51 -07:00
Zhiqiang Xie
54e872d343 [HiCache] resolve conflict between chunked-prefill and hicache hit count (#9776) 2025-08-30 01:30:54 +08:00
hzh0425
38cd5fb1e0 bugfix(hicache): Move exists check before key suffixing (#9749) 2025-08-28 18:29:47 -07:00
chenxu140
74dd4249ac [Feature] Support NPUGraph for DeepSeek on Ascend NPU (#9355)
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
2025-08-28 16:06:24 -07:00
huangtingwei
55349e361d support mooncake store dp attention (#9684) 2025-08-28 12:31:31 +08:00
huangtingwei
ae7428a8a7 fix mooncake store mla zero copy meta (#9678) 2025-08-27 15:43:16 +08:00