sglang

Author	SHA1	Message	Date
Lianmin Zheng	55381a46ac	Revert "[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability" (#8181 )	2025-07-19 22:41:30 -07:00
ybyang	4540a4666a	[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability (#8115 ) Signed-off-by: ybyang <ybyang7@iflytek.com>	2025-07-19 18:10:00 -07:00
Lianmin Zheng	bb0e8a32b5	Clean up server args (#8161 )	2025-07-19 11:32:52 -07:00
Sai Enduri	d0510f08fe	Revert "Fix different device type adjustment in PP" (#8141 )	2025-07-18 01:12:11 -07:00
Zhiqiang Xie	9d33fcfb8e	Hicache Storage Layer Prototype (#7704 )	2025-07-18 15:20:19 +08:00
Zhao Chen	3586b4cef2	feat: add production metric for retracted requests due to insufficient kvcache (#7030 ) Signed-off-by: Zhao Chen <zhaochen.zju@gmail.com>	2025-07-17 11:59:05 -07:00
Yingchun Lai	795668dc73	feat: add tp_rank, pp_rank and dp_rank labels for scheduler metrics (#7597 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-07-16 17:55:59 -07:00
Qiaolin Yu	69f453e5a4	Use device_group for all_gather when disabling overlap scheduling (#8001 )	2025-07-15 19:38:58 -07:00
Qiaolin Yu	3bc43c683e	Fix different device type adjustment in PP (#7760 )	2025-07-15 19:37:14 -07:00
Hanming Lu	9379da77de	SWA Prefix Cache (#7367 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-07-13 12:31:07 -07:00
kyleliang-nv	dd445a41f5	[feature] Add start step profile argument in /start_profile (#7608 )	2025-07-09 18:42:15 -07:00
Zhiqiang Xie	2fc824b84c	Kernels for efficient KV cache IO (#7313 )	2025-07-06 22:53:36 -07:00
yuhsuan-t	8d4a01cbd7	Log the timestamps of each prefill/decode iteration (#6094 ) Co-authored-by: yuhsuan-t <12108766+yuhsaun-t@users.noreply.github.com>	2025-07-07 01:57:27 +00:00
Nan Jiang	ba69c153f6	[RL]: Fix error tagging in multi-stage wake up (#7812 ) Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>	2025-07-06 16:51:29 -07:00
Stefan He	3589aa79b0	[RL] Fix illegal memory for _import_static_state (#7733 ) Co-authored-by: nanjiangwill <willjiang2018@gmail.com>	2025-07-06 16:25:21 -07:00
Cheng Wan	8fc910db03	DP Attention with Auto DeepEP Dispatch (#7222 )	2025-07-05 01:54:24 -07:00
Lianmin Zheng	14229ccf8f	Move mem_fraction_static adjustment for multimodal models to `server_args.py` & Fix session control & Other cleanups (#7748 )	2025-07-04 16:33:33 -07:00
TianyuZhang1214	0099172327	feat: use D2D instead of H2H in pp (#7673 ) Co-authored-by: alpha-baby <fujianhao1997@qq.com>	2025-07-03 10:58:50 -07:00
Chunyuan WU	1dce6c480f	[CPU] support the case where num_attention_heads or intermediate_size is not divisible by the TP size (#6771 )	2025-07-03 09:51:38 -07:00
Ziming Huang	1bebd3154e	Fix num_tokens_pre_allocated in disaggregation log (#7714 )	2025-07-02 22:31:49 -07:00
Albert	d3c275b117	Support updating weights at once by stopping all requests (#6698 ) Signed-off-by: Tianyu Zhou <albert.zty@antgroup.com> Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>	2025-07-02 22:26:06 -07:00
Zilin Zhu	0626f678de	[RL] support update_weights_from_distributed with different group and multiple weights (#7292 )	2025-07-02 19:29:11 -07:00
Lianmin Zheng	22352d47a9	Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632 ) Co-authored-by: Kan Wu <wukanustc@gmail.com>	2025-06-29 23:16:19 -07:00
fzyzcjy	0c9c6c75a8	Move files related to EPLB (#7580 )	2025-06-29 15:39:38 -07:00
Lifu Huang	49538d111b	Support dynamic LoRA loading / unloading in engine/server API (#7446 )	2025-06-27 21:00:27 -07:00
tarinkk	eb6c2c1663	Hybrid kv cache for LLaMA4 (#6563 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: tarinkk <rt572@physics.rutger.edu> Co-authored-by: tarinkk <rt572@rutgers.physics.edu> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-06-27 18:58:55 -07:00
Stefan He	00fbd8a484	Fix typo of flash_cache (#7513 )	2025-06-25 02:04:41 -07:00
zixuanzhang226	f3cbd24541	feat: send kvmetrics from sglang scheduler (#6721 )	2025-06-25 01:57:49 -07:00
DangKai	bc2e5645c4	fix: force synchronization between TP workers when update_weights (#6626 ) Co-authored-by: dangkai.dk <dangkai.dk@alibaba-inc.com>	2025-06-25 01:35:59 -07:00
u4lr451	ed0a0b692c	Perormance: Enable cuda graph for dp idle batch (#7269 ) Co-authored-by: austindeng <austindeng@tencent.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: ch-wan <cwan39@gatech.edu>	2025-06-23 17:34:13 -07:00
Lianmin Zheng	55e03b10c4	Fix a bug in BatchTokenIDOut & Misc style and dependency updates (#7457 )	2025-06-23 06:20:39 -07:00
fzyzcjy	edc21cc8ae	Tiny add logging for GC (#7406 )	2025-06-22 12:40:02 +08:00
Liangsheng Yin	05c9bc8956	[minor] simplify the `TokenToKVPoolAllocator` (#7414 )	2025-06-22 12:37:18 +08:00
Cheng Wan	5041df2d01	Fix 7285 Merge Conflicts (#7403 )	2025-06-20 16:02:50 -07:00
Cheng Wan	73b13e69b4	Optimize DP attn scheduling for speculative decoding (#7285 )	2025-06-20 15:06:41 -07:00
Cheng Wan	e879d8b7a8	[Feature] Comprehensive Hybrid Parallelism Support (#6389 )	2025-06-20 14:43:11 -07:00
strgrb	ceba0ce4f6	support return logprobs for pipeline (#7356 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-06-19 23:50:45 -07:00
Huang Long	1d6515ef2a	[Bugfix]Fix hang bug using dp attention with HiRadixCache (#7159 ) Signed-off-by: huanglong <huanglong@linux.alibaba.com>	2025-06-19 20:34:36 -07:00
Atream	4f838c09cd	[PD] Transfer hidden states for mtp when disaggregation (#7242 )	2025-06-19 11:22:47 -07:00
DarkSharpness	47367b768d	[Refactor] Clean up radix cache related API (#7303 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-06-20 00:58:48 +08:00
Stefan He	3774f07825	Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (#7099 )	2025-06-19 00:56:37 -07:00
fzyzcjy	9c6a0656a3	Fix profiler error when there are idle passes (#7003 )	2025-06-18 10:55:01 -07:00
Zhiqiang Xie	e56685ac1b	Upstreaming hicache bug fixes (#7267 )	2025-06-17 17:44:57 -07:00
shangmingc	c26d7349d3	[PD] Add custom memory pool option to support Mooncake PD with NVLink (#7264 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-06-17 17:21:37 -07:00
u4lr451	10d60cd41b	feat: mtp support dp-attention (#6081 ) Co-authored-by: austindeng <austindeng@tencent.com> Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: ch-wan <cwan39@gatech.edu>	2025-06-17 00:33:28 -07:00
woodx	e30ef368ab	Feat/support rerank (#6058 )	2025-06-16 10:50:01 -07:00
Liangsheng Yin	c494386728	minor fix (#7245 )	2025-06-16 23:30:26 +08:00
Byron Hsu	88f9c347b2	[PD] use int32 for kv indices & get num_reserved_decode_tokens from server_args (#7214 )	2025-06-15 11:51:03 -07:00
Lianmin Zheng	38af4f68a9	Fix grammar abort & Minor style fixes (#7204 )	2025-06-14 22:49:41 -07:00
Byron Hsu	db0cc57e75	[PD] Support decode retract and update decode.py (#7196 )	2025-06-14 19:48:05 -07:00

1 2 3 4 5 ...

329 Commits