Commit Graph

72 Commits

Author SHA1 Message Date
ybyang
4540a4666a [Feature] Simple Improve Health Check Mechanism for Production-Grade Stability (#8115)
Signed-off-by: ybyang <ybyang7@iflytek.com>
2025-07-19 18:10:00 -07:00
Yineng Zhang
561dd7b2ce chore: upgrade sgl-kernel 0.2.6 (#8166) 2025-07-19 03:17:08 -07:00
Xinyuan Tong
6e923dbd30 feat: update multimodal data handling in engine entrypoint (#8002)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-07-15 00:12:22 -07:00
Yineng Zhang
732fc8e405 chore: upgrade sgl-kernel 0.2.5 (#7971) 2025-07-11 20:35:06 -07:00
Yineng Zhang
62f5522ffe chore: upgrade sgl-kernel v0.2.4 (#7801) 2025-07-05 17:37:40 -07:00
Yineng Zhang
77cfea689d chore: upgrade sgl-kernel v0.2.3 (#7786) 2025-07-05 01:55:55 -07:00
Yi Zhang
489934be0a fuse renormal into moe topk softmax kernel python code (#7751)
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>
2025-07-03 16:22:14 -07:00
Zilin Zhu
0626f678de [RL] support update_weights_from_distributed with different group and multiple weights (#7292) 2025-07-02 19:29:11 -07:00
Yineng Zhang
f18a8fddd4 chore: upgrade flashinfer v0.2.7.post1 (#7698) 2025-07-01 14:05:57 -07:00
Zhiqiang Xie
f9eb04ddb2 upgrade sgl kernel to 0.2.1 for main (#7676) 2025-07-01 00:00:13 -07:00
Yineng Zhang
392e441ad1 chore: upgrade flashinfer v0.2.7 jit (#7663) 2025-06-30 13:26:26 -07:00
Lifu Huang
49538d111b Support dynamic LoRA loading / unloading in engine/server API (#7446) 2025-06-27 21:00:27 -07:00
eigen
20beb3702b feat: add return hidden_states at async generation (#7507) 2025-06-25 02:10:09 -07:00
zixuanzhang226
f3cbd24541 feat: send kvmetrics from sglang scheduler (#6721) 2025-06-25 01:57:49 -07:00
Chang Su
72676cd6c0 feat(oai refactor): Replace openai_api with entrypoints/openai (#7351)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
2025-06-21 13:21:06 -07:00
Stefan He
3774f07825 Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (#7099) 2025-06-19 00:56:37 -07:00
ishandhanani
31fccf5a4f chore: change logs fromINFO to DEBUG for dp and add force quit for tokenizer manager (#7251) 2025-06-18 01:36:43 -07:00
woodx
e30ef368ab Feat/support rerank (#6058) 2025-06-16 10:50:01 -07:00
Lianmin Zheng
53a525bf33 [Eagle] Fix kernel call after updating speculative sampling kernels (#7231) 2025-06-16 07:25:59 -07:00
JieXin Liang
ed89837cf4 chore: upgrade sgl-kernel v0.1.8.post2 (#7186)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-06-14 18:26:18 -07:00
fzyzcjy
bec3e48402 Support new DeepGEMM format in per token group quant (part 2: srt) (#7155) 2025-06-13 14:25:40 -07:00
ishandhanani
f1569876d5 feat: add direct routing strategy to DP worker (#6884) 2025-06-09 11:44:05 -07:00
Yineng Zhang
56ccd3c22c chore: upgrade flashinfer v0.2.6.post1 jit (#6958)
Co-authored-by: alcanderian <alcanderian@gmail.com>
Co-authored-by: Qiaolin Yu <qy254@cornell.edu>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
2025-06-09 09:22:39 -07:00
Yineng Zhang
23881fa60c chore: upgrade sgl-kernel v0.1.6.post1 (#6957) 2025-06-07 17:18:55 -07:00
JieXin Liang
6153f2ff6e chore: upgrade sgl-kernel v0.1.6 (#6945) 2025-06-07 02:53:26 -07:00
Chanh Nguyen
3f1e433903 Decoder-only Scoring API (#6460)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
2025-06-04 14:14:54 -07:00
Yineng Zhang
34c63731fc chore: upgrade sgl-kernel v0.1.5 (#6795) 2025-05-31 18:32:00 -07:00
Lianmin Zheng
2d72fc47cf Improve profiler and integrate profiler in bench_one_batch_server (#6787) 2025-05-31 15:53:55 -07:00
Yineng Zhang
0b07c4a99f chore: upgrade sgl-kernel v0.1.4 (#6532) 2025-05-22 13:28:16 -07:00
Yineng Zhang
f07c6a009b chore: upgrade sgl-kernel v0.1.3 (#6377) 2025-05-17 19:47:05 -07:00
fzyzcjy
f87283573e Add expert distribution APIs for engine (#6290) 2025-05-17 18:31:51 -07:00
fzyzcjy
01d2838c0f Fix stop_profile does not wait for finishing (#4741) 2025-05-17 17:06:15 -07:00
Yury Sulsky
f19a9204cd Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136)
Co-authored-by: Yury Sulsky <ysulsky@tesla.com>
2025-05-16 12:26:15 -07:00
Lianmin Zheng
fba8eccd7e Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-05-12 00:17:33 -07:00
Yineng Zhang
230106304d chore: upgrade sgl-kernel v0.1.2.post1 (#6196)
Co-authored-by: alcanderian <alcanderian@gmail.com>
2025-05-11 22:41:37 +08:00
Steven Shimizu
03dd785cd0 Added async_encode method to Engine (#4701) 2025-05-10 18:58:40 -07:00
ishandhanani
e444c13fb4 feat(engine): add bootstrap parameters to generate methods (dynamo) (#6075) 2025-05-07 10:33:58 -07:00
fzyzcjy
c68de47915 Super tiny fix doc (#5233) 2025-05-07 22:41:50 +08:00
Ying Sheng
11383cec3c [PP] Add pipeline parallelism (#5724) 2025-04-30 18:18:07 -07:00
Yineng Zhang
9a6ad8916d chore: upgrade sgl-kernel 0.1.1 (#5933) 2025-04-30 16:13:30 -07:00
Baizhou Zhang
799789afed Bump Flashinfer to 0.2.5 (#5870)
Co-authored-by: Yuhao Chen <yxckeis8@gmail.com>
2025-04-29 19:50:57 -07:00
woodx
2c3ea29476 [Feature] support auto chat template (#4949) 2025-04-28 22:34:18 -07:00
Yineng Zhang
41ac0c6d48 chore: upgrade sgl-kernel 0.1.0 (#5690) 2025-04-27 21:00:50 -07:00
fzyzcjy
1195182040 Tiny add Engine.flush_cache API (#5241) 2025-04-20 18:15:03 -07:00
tianlian yi
bc92107b03 Support server based rollout in Verlengine (#4848)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>
2025-04-12 10:07:52 -07:00
XinyuanTong
d09a51f1f6 [feat&refactor] Enhance multimodal input support with refactor io_struct (#4938)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
2025-04-08 14:48:07 -07:00
XinyuanTong
9eb49e878b [VLM RLHF] Take Image input for verl vlm rollout (#4915)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: GeLee <leege233@gmail.com>
2025-04-01 20:03:17 -07:00
Wei Wu
91ba98fe50 [Fix] Resolve GPU Memory Leak in update_weights_from_tensor (#4446) 2025-03-17 08:54:30 +00:00
Yinghai Lu
c614dbdf95 Nicer standalone engine inferface (#4480) 2025-03-17 01:42:04 -07:00
woodx
48efec7b05 Feature: support code completion (#3612) 2025-03-16 18:26:19 -07:00