sglang

Author	SHA1	Message	Date
Vishwanath Venkatesan	2cd2e27f80	SGLang HiCache NIXL Connector (#8488 ) Signed-off-by: Vishwanath Venkatesan <vvenkatesan@nvidia.com> Co-authored-by: Moein Khazraee <moein@nvidia.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-07-31 13:09:42 -07:00
Ke Bao	3c307dc057	Fix hf3fs_fuse import error (#8623 )	2025-07-31 22:42:31 +08:00
huangtingwei	d904959233	Support l3 cache (mooncake store) for hiradix cache (#7211 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu> Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com> Co-authored-by: zuoyuan <zhangzuo21@mails.tsinghua.edu.cn> Co-authored-by: @wangyueneng.wyn <wangyueneng.wyn@antgroup.com> Co-authored-by: JinYan Su <jinyansu792@gmail.com>	2025-07-30 23:15:51 -07:00
huangtingwei	26c8a310bd	fix incorrect increase of hit count (#8533 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-07-31 06:02:42 +00:00
yi wang	5963e50503	[bugfix] Fix 2 minor bugs in the hicache storage layer (#8404 )	2025-07-31 05:47:14 +00:00
pansicheng	299803343d	Add hf3fs support for hicache storage (based on #7704 ) (#7280 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-07-30 17:42:41 -07:00
hzh0425	a85ebf50b8	feat(hicache): support file backend reading directory config form env. (#8498 )	2025-07-29 21:18:46 -07:00
Zhiqiang Xie	528bd1ed85	HiCache, check before terminate prefetching (#8372 )	2025-07-26 23:13:16 -07:00
Zhiqiang Xie	145482f422	HiCache Storage TP Refinement (#8307 ) Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>	2025-07-25 08:31:47 +08:00
YiXR	a99801e075	[Performance][PD Disaggregation] optimize TokenToKVPoolAllocator by sorting free pages (#8133 ) Signed-off-by: Xingrui Yi <yixingrui@linux.alibaba.com> Co-authored-by: Xingrui Yi <yixingrui@linux.alibaba.com>	2025-07-23 13:28:12 -07:00
Zhiqiang Xie	9d33fcfb8e	Hicache Storage Layer Prototype (#7704 )	2025-07-18 15:20:19 +08:00
Ziqi Fan	01857fab61	fix: update HostKVCache init to report correct msg when available memory is not enough (#8102 )	2025-07-17 21:24:34 +08:00
hzh0425	7c39e8a198	Fix Bug 'get_cpu_copy not Implemented' in pd offloading mode (#7982 )	2025-07-14 14:57:10 -07:00
Hanming Lu	9379da77de	SWA Prefix Cache (#7367 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-07-13 12:31:07 -07:00
Ying Sheng	bcc5ba94b4	[minor fix] SWA missing methods (#7972 )	2025-07-11 23:57:02 -07:00
Ying Sheng	cee9f329c4	[minor fix] llama4 hybrid memory (#7950 )	2025-07-11 23:11:36 -07:00
ronnie_zheng	86044712c6	[feature] kv transfer support of ascend npu (#7795 ) Co-authored-by: liupeng <liupeng374@huawei.com>	2025-07-11 00:07:51 -07:00
ronnie_zheng	766392c6bd	[feature]Ascend quantization support (#7791 ) Co-authored-by: ichernob <ichernobnn@gmail.com> Co-authored-by: liupeng <liupeng374@huawei.com>	2025-07-10 09:17:37 -07:00
Zhiqiang Xie	2fc824b84c	Kernels for efficient KV cache IO (#7313 )	2025-07-06 22:53:36 -07:00
Lianmin Zheng	14229ccf8f	Move mem_fraction_static adjustment for multimodal models to `server_args.py` & Fix session control & Other cleanups (#7748 )	2025-07-04 16:33:33 -07:00
ronnie_zheng	1e0e549766	Ascend attention backend(PA&MLA) (#7722 ) Co-authored-by: Maksim <makcum888e@mail.ru> Co-authored-by: VDV1985 <vladdv85@mail.ru>	2025-07-03 09:23:19 -07:00
Lianmin Zheng	22352d47a9	Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632 ) Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-06-29 23:16:19 -07:00
tarinkk	eb6c2c1663	Hybrid kv cache for LLaMA4 (#6563 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: tarinkk <rt572@physics.rutger.edu> Co-authored-by: tarinkk <rt572@rutgers.physics.edu> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-06-27 18:58:55 -07:00
Liangsheng Yin	05c9bc8956	[minor] simplify the `TokenToKVPoolAllocator` (#7414 )	2025-06-22 12:37:18 +08:00
Liangsheng Yin	5ea5d22170	Fix CPU offloading for MLA memory pool (#7409 )	2025-06-22 02:39:05 +08:00
Shangming Cai	187b85b7f3	[PD] Optimize custom mem pool usage and bump mooncake version (#7393 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-06-20 09:50:39 -07:00
DarkSharpness	47367b768d	[Refactor] Clean up radix cache related API (#7303 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-06-20 00:58:48 +08:00
Stefan He	3774f07825	Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (#7099 )	2025-06-19 00:56:37 -07:00
Zhiqiang Xie	e56685ac1b	Upstreaming hicache bug fixes (#7267 )	2025-06-17 17:44:57 -07:00
shangmingc	c26d7349d3	[PD] Add custom memory pool option to support Mooncake PD with NVLink (#7264 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-06-17 17:21:37 -07:00
Lianmin Zheng	b1286a116a	[EAGLE] Refactor code for page size > 1 & more simplifications (#7213 )	2025-06-16 03:04:29 -07:00
Baizhou Zhang	d2679f5109	Fix ChunkCache object has no attribute 'disable' (#7217 )	2025-06-15 20:55:15 -07:00
Lianmin Zheng	fff10809bf	Revert "[EAGLE] Refactor code for page size > 1 & more simplifications" (#7210 )	2025-06-15 02:48:00 -07:00
Lianmin Zheng	5f1ab32717	[EAGLE] Refactor code for page size > 1 & more simplifications (#7163 )	2025-06-14 23:16:23 -07:00
Lianmin Zheng	38af4f68a9	Fix grammar abort & Minor style fixes (#7204 )	2025-06-14 22:49:41 -07:00
Lianmin Zheng	a6305c7d50	Lianmin/simplify memory pool (#7202 )	2025-06-14 22:25:37 -07:00
Lianmin Zheng	a023856b12	Move host memory pools into a separate file (#7200 )	2025-06-14 21:31:42 -07:00
Byron Hsu	db0cc57e75	[PD] Support decode retract and update decode.py (#7196 )	2025-06-14 19:48:05 -07:00
Faradawn Yang	777688b892	[feat]: Emit fixed-size KV blocks events (#6824 )	2025-06-11 13:07:58 -07:00
sogalin	02543b545c	Fix misusing the "_is_cuda". (#7091 )	2025-06-11 11:21:31 -07:00
Chang Su	4685fbb888	[VLM] Support chunk prefill for VLM (#6355 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-05-22 20:32:41 -07:00
Byron Hsu	0a4fc73b48	[PD] Fix failure abort (#6535 )	2025-05-22 20:32:03 -07:00
Baizhou Zhang	d4c038daed	[Fix]Fix capture fail bug for DeepSeek (#6275 )	2025-05-21 11:11:20 -07:00
Lianmin Zheng	03886917bd	Disable all two stream overlap on amd (#6475 )	2025-05-20 19:06:59 -07:00
Trevor Morris	7adf245ba2	[Metrics] Add KV events publishing (#6098 )	2025-05-19 14:19:54 -07:00
wangxiyu191	155214952b	refactor: Extract repeated member variables in KVCache subclasses to base class. (#6323 )	2025-05-18 15:28:15 -07:00
doujiang24	9d24c3ffb0	chore: tiny remove duplicated code (#6392 ) Signed-off-by: doujiang24 <doujiang24@gmail.com>	2025-05-18 02:17:32 -07:00
Lifu Huang	3cf1473a09	Use monotonic clock for interval measurement (#6211 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-17 16:49:18 -07:00
Simon (Jiyou) Li	b29a026e14	KV‑Cache (MHA, MLA): add missing start_layer / end_layer fields to MHATokenToKVPoolHost and MLATokenToKVPoolHost (#6016 ) Co-authored-by: 继优 <jiyou.ljy@alibaba-inc.com> Co-authored-by: chus-chus <chus-chus@users.noreply.github.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-05-09 15:50:06 -07:00
Zhiqiang Xie	f8e460930a	Fix prefill OOM error in the case of large page size (#5081 )	2025-05-05 16:02:55 -07:00

1 2 3

134 Commits