sglang

Author	SHA1	Message	Date
ybyang	4540a4666a	[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability (#8115 ) Signed-off-by: ybyang <ybyang7@iflytek.com>	2025-07-19 18:10:00 -07:00
Yineng Zhang	561dd7b2ce	chore: upgrade sgl-kernel 0.2.6 (#8166 )	2025-07-19 03:17:08 -07:00
Xinyuan Tong	6e923dbd30	feat: update multimodal data handling in engine entrypoint (#8002 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-15 00:12:22 -07:00
Yineng Zhang	732fc8e405	chore: upgrade sgl-kernel 0.2.5 (#7971 )	2025-07-11 20:35:06 -07:00
Yineng Zhang	62f5522ffe	chore: upgrade sgl-kernel v0.2.4 (#7801 )	2025-07-05 17:37:40 -07:00
Yineng Zhang	77cfea689d	chore: upgrade sgl-kernel v0.2.3 (#7786 )	2025-07-05 01:55:55 -07:00
Yi Zhang	489934be0a	fuse renormal into moe topk softmax kernel python code (#7751 ) Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: zhyncs <me@zhyncs.com>	2025-07-03 16:22:14 -07:00
Zilin Zhu	0626f678de	[RL] support update_weights_from_distributed with different group and multiple weights (#7292 )	2025-07-02 19:29:11 -07:00
Yineng Zhang	f18a8fddd4	chore: upgrade flashinfer v0.2.7.post1 (#7698 )	2025-07-01 14:05:57 -07:00
Zhiqiang Xie	f9eb04ddb2	upgrade sgl kernel to 0.2.1 for main (#7676 )	2025-07-01 00:00:13 -07:00
Yineng Zhang	392e441ad1	chore: upgrade flashinfer v0.2.7 jit (#7663 )	2025-06-30 13:26:26 -07:00
Lifu Huang	49538d111b	Support dynamic LoRA loading / unloading in engine/server API (#7446 )	2025-06-27 21:00:27 -07:00
eigen	20beb3702b	feat: add return hidden_states at async generation (#7507 )	2025-06-25 02:10:09 -07:00
zixuanzhang226	f3cbd24541	feat: send kvmetrics from sglang scheduler (#6721 )	2025-06-25 01:57:49 -07:00
Chang Su	72676cd6c0	feat(oai refactor): Replace `openai_api` with `entrypoints/openai` (#7351 ) Co-authored-by: Jin Pan <jpan236@wisc.edu>	2025-06-21 13:21:06 -07:00
Stefan He	3774f07825	Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (#7099 )	2025-06-19 00:56:37 -07:00
ishandhanani	31fccf5a4f	chore: change logs from`INFO` to `DEBUG` for dp and add force quit for tokenizer manager (#7251 )	2025-06-18 01:36:43 -07:00
woodx	e30ef368ab	Feat/support rerank (#6058 )	2025-06-16 10:50:01 -07:00
Lianmin Zheng	53a525bf33	[Eagle] Fix kernel call after updating speculative sampling kernels (#7231 )	2025-06-16 07:25:59 -07:00
JieXin Liang	ed89837cf4	chore: upgrade sgl-kernel v0.1.8.post2 (#7186 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-06-14 18:26:18 -07:00
fzyzcjy	bec3e48402	Support new DeepGEMM format in per token group quant (part 2: srt) (#7155 )	2025-06-13 14:25:40 -07:00
ishandhanani	f1569876d5	feat: add direct routing strategy to DP worker (#6884 )	2025-06-09 11:44:05 -07:00
Yineng Zhang	56ccd3c22c	chore: upgrade flashinfer v0.2.6.post1 jit (#6958 ) Co-authored-by: alcanderian <alcanderian@gmail.com> Co-authored-by: Qiaolin Yu <qy254@cornell.edu> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: ispobock <ispobaoke@gmail.com>	2025-06-09 09:22:39 -07:00
Yineng Zhang	23881fa60c	chore: upgrade sgl-kernel v0.1.6.post1 (#6957 )	2025-06-07 17:18:55 -07:00
JieXin Liang	6153f2ff6e	chore: upgrade sgl-kernel v0.1.6 (#6945 )	2025-06-07 02:53:26 -07:00
Chanh Nguyen	3f1e433903	Decoder-only Scoring API (#6460 ) Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>	2025-06-04 14:14:54 -07:00
Yineng Zhang	34c63731fc	chore: upgrade sgl-kernel v0.1.5 (#6795 )	2025-05-31 18:32:00 -07:00
Lianmin Zheng	2d72fc47cf	Improve profiler and integrate profiler in bench_one_batch_server (#6787 )	2025-05-31 15:53:55 -07:00
Yineng Zhang	0b07c4a99f	chore: upgrade sgl-kernel v0.1.4 (#6532 )	2025-05-22 13:28:16 -07:00
Yineng Zhang	f07c6a009b	chore: upgrade sgl-kernel v0.1.3 (#6377 )	2025-05-17 19:47:05 -07:00
fzyzcjy	f87283573e	Add expert distribution APIs for engine (#6290 )	2025-05-17 18:31:51 -07:00
fzyzcjy	01d2838c0f	Fix stop_profile does not wait for finishing (#4741 )	2025-05-17 17:06:15 -07:00
Yury Sulsky	f19a9204cd	Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136 ) Co-authored-by: Yury Sulsky <ysulsky@tesla.com>	2025-05-16 12:26:15 -07:00
Lianmin Zheng	fba8eccd7e	Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-05-12 00:17:33 -07:00
Yineng Zhang	230106304d	chore: upgrade sgl-kernel v0.1.2.post1 (#6196 ) Co-authored-by: alcanderian <alcanderian@gmail.com>	2025-05-11 22:41:37 +08:00
Steven Shimizu	03dd785cd0	Added async_encode method to Engine (#4701 )	2025-05-10 18:58:40 -07:00
ishandhanani	e444c13fb4	feat(engine): add bootstrap parameters to generate methods (dynamo) (#6075 )	2025-05-07 10:33:58 -07:00
fzyzcjy	c68de47915	Super tiny fix doc (#5233 )	2025-05-07 22:41:50 +08:00
Ying Sheng	11383cec3c	[PP] Add pipeline parallelism (#5724 )	2025-04-30 18:18:07 -07:00
Yineng Zhang	9a6ad8916d	chore: upgrade sgl-kernel 0.1.1 (#5933 )	2025-04-30 16:13:30 -07:00
Baizhou Zhang	799789afed	Bump Flashinfer to 0.2.5 (#5870 ) Co-authored-by: Yuhao Chen <yxckeis8@gmail.com>	2025-04-29 19:50:57 -07:00
woodx	2c3ea29476	[Feature] support auto chat template (#4949 )	2025-04-28 22:34:18 -07:00
Yineng Zhang	41ac0c6d48	chore: upgrade sgl-kernel 0.1.0 (#5690 )	2025-04-27 21:00:50 -07:00
fzyzcjy	1195182040	Tiny add Engine.flush_cache API (#5241 )	2025-04-20 18:15:03 -07:00
tianlian yi	bc92107b03	Support server based rollout in Verlengine (#4848 ) Co-authored-by: Jin Pan <jpan236@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>	2025-04-12 10:07:52 -07:00
XinyuanTong	d09a51f1f6	[feat&refactor] Enhance multimodal input support with refactor io_struct (#4938 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-04-08 14:48:07 -07:00
XinyuanTong	9eb49e878b	[VLM RLHF] Take Image input for verl vlm rollout (#4915 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: GeLee <leege233@gmail.com>	2025-04-01 20:03:17 -07:00
Wei Wu	91ba98fe50	[Fix] Resolve GPU Memory Leak in update_weights_from_tensor (#4446 )	2025-03-17 08:54:30 +00:00
Yinghai Lu	c614dbdf95	Nicer standalone engine inferface (#4480 )	2025-03-17 01:42:04 -07:00
woodx	48efec7b05	Feature: support code completion (#3612 )	2025-03-16 18:26:19 -07:00

1 2

72 Commits