ybyang
|
4540a4666a
|
[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability (#8115)
Signed-off-by: ybyang <ybyang7@iflytek.com>
|
2025-07-19 18:10:00 -07:00 |
|
Yineng Zhang
|
561dd7b2ce
|
chore: upgrade sgl-kernel 0.2.6 (#8166)
|
2025-07-19 03:17:08 -07:00 |
|
Xinyuan Tong
|
6e923dbd30
|
feat: update multimodal data handling in engine entrypoint (#8002)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-07-15 00:12:22 -07:00 |
|
Yineng Zhang
|
732fc8e405
|
chore: upgrade sgl-kernel 0.2.5 (#7971)
|
2025-07-11 20:35:06 -07:00 |
|
Yineng Zhang
|
62f5522ffe
|
chore: upgrade sgl-kernel v0.2.4 (#7801)
|
2025-07-05 17:37:40 -07:00 |
|
Yineng Zhang
|
77cfea689d
|
chore: upgrade sgl-kernel v0.2.3 (#7786)
|
2025-07-05 01:55:55 -07:00 |
|
Yi Zhang
|
489934be0a
|
fuse renormal into moe topk softmax kernel python code (#7751)
Co-authored-by: ispobock <ispobaoke@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-07-03 16:22:14 -07:00 |
|
Zilin Zhu
|
0626f678de
|
[RL] support update_weights_from_distributed with different group and multiple weights (#7292)
|
2025-07-02 19:29:11 -07:00 |
|
Yineng Zhang
|
f18a8fddd4
|
chore: upgrade flashinfer v0.2.7.post1 (#7698)
|
2025-07-01 14:05:57 -07:00 |
|
Zhiqiang Xie
|
f9eb04ddb2
|
upgrade sgl kernel to 0.2.1 for main (#7676)
|
2025-07-01 00:00:13 -07:00 |
|
Yineng Zhang
|
392e441ad1
|
chore: upgrade flashinfer v0.2.7 jit (#7663)
|
2025-06-30 13:26:26 -07:00 |
|
Lifu Huang
|
49538d111b
|
Support dynamic LoRA loading / unloading in engine/server API (#7446)
|
2025-06-27 21:00:27 -07:00 |
|
eigen
|
20beb3702b
|
feat: add return hidden_states at async generation (#7507)
|
2025-06-25 02:10:09 -07:00 |
|
zixuanzhang226
|
f3cbd24541
|
feat: send kvmetrics from sglang scheduler (#6721)
|
2025-06-25 01:57:49 -07:00 |
|
Chang Su
|
72676cd6c0
|
feat(oai refactor): Replace openai_api with entrypoints/openai (#7351)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
|
2025-06-21 13:21:06 -07:00 |
|
Stefan He
|
3774f07825
|
Multi-Stage Awake: Support Resume and Pause KV Cache and Weights separately (#7099)
|
2025-06-19 00:56:37 -07:00 |
|
ishandhanani
|
31fccf5a4f
|
chore: change logs fromINFO to DEBUG for dp and add force quit for tokenizer manager (#7251)
|
2025-06-18 01:36:43 -07:00 |
|
woodx
|
e30ef368ab
|
Feat/support rerank (#6058)
|
2025-06-16 10:50:01 -07:00 |
|
Lianmin Zheng
|
53a525bf33
|
[Eagle] Fix kernel call after updating speculative sampling kernels (#7231)
|
2025-06-16 07:25:59 -07:00 |
|
JieXin Liang
|
ed89837cf4
|
chore: upgrade sgl-kernel v0.1.8.post2 (#7186)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-06-14 18:26:18 -07:00 |
|
fzyzcjy
|
bec3e48402
|
Support new DeepGEMM format in per token group quant (part 2: srt) (#7155)
|
2025-06-13 14:25:40 -07:00 |
|
ishandhanani
|
f1569876d5
|
feat: add direct routing strategy to DP worker (#6884)
|
2025-06-09 11:44:05 -07:00 |
|
Yineng Zhang
|
56ccd3c22c
|
chore: upgrade flashinfer v0.2.6.post1 jit (#6958)
Co-authored-by: alcanderian <alcanderian@gmail.com>
Co-authored-by: Qiaolin Yu <qy254@cornell.edu>
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2025-06-09 09:22:39 -07:00 |
|
Yineng Zhang
|
23881fa60c
|
chore: upgrade sgl-kernel v0.1.6.post1 (#6957)
|
2025-06-07 17:18:55 -07:00 |
|
JieXin Liang
|
6153f2ff6e
|
chore: upgrade sgl-kernel v0.1.6 (#6945)
|
2025-06-07 02:53:26 -07:00 |
|
Chanh Nguyen
|
3f1e433903
|
Decoder-only Scoring API (#6460)
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
|
2025-06-04 14:14:54 -07:00 |
|
Yineng Zhang
|
34c63731fc
|
chore: upgrade sgl-kernel v0.1.5 (#6795)
|
2025-05-31 18:32:00 -07:00 |
|
Lianmin Zheng
|
2d72fc47cf
|
Improve profiler and integrate profiler in bench_one_batch_server (#6787)
|
2025-05-31 15:53:55 -07:00 |
|
Yineng Zhang
|
0b07c4a99f
|
chore: upgrade sgl-kernel v0.1.4 (#6532)
|
2025-05-22 13:28:16 -07:00 |
|
Yineng Zhang
|
f07c6a009b
|
chore: upgrade sgl-kernel v0.1.3 (#6377)
|
2025-05-17 19:47:05 -07:00 |
|
fzyzcjy
|
f87283573e
|
Add expert distribution APIs for engine (#6290)
|
2025-05-17 18:31:51 -07:00 |
|
fzyzcjy
|
01d2838c0f
|
Fix stop_profile does not wait for finishing (#4741)
|
2025-05-17 17:06:15 -07:00 |
|
Yury Sulsky
|
f19a9204cd
|
Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136)
Co-authored-by: Yury Sulsky <ysulsky@tesla.com>
|
2025-05-16 12:26:15 -07:00 |
|
Lianmin Zheng
|
fba8eccd7e
|
Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-05-12 00:17:33 -07:00 |
|
Yineng Zhang
|
230106304d
|
chore: upgrade sgl-kernel v0.1.2.post1 (#6196)
Co-authored-by: alcanderian <alcanderian@gmail.com>
|
2025-05-11 22:41:37 +08:00 |
|
Steven Shimizu
|
03dd785cd0
|
Added async_encode method to Engine (#4701)
|
2025-05-10 18:58:40 -07:00 |
|
ishandhanani
|
e444c13fb4
|
feat(engine): add bootstrap parameters to generate methods (dynamo) (#6075)
|
2025-05-07 10:33:58 -07:00 |
|
fzyzcjy
|
c68de47915
|
Super tiny fix doc (#5233)
|
2025-05-07 22:41:50 +08:00 |
|
Ying Sheng
|
11383cec3c
|
[PP] Add pipeline parallelism (#5724)
|
2025-04-30 18:18:07 -07:00 |
|
Yineng Zhang
|
9a6ad8916d
|
chore: upgrade sgl-kernel 0.1.1 (#5933)
|
2025-04-30 16:13:30 -07:00 |
|
Baizhou Zhang
|
799789afed
|
Bump Flashinfer to 0.2.5 (#5870)
Co-authored-by: Yuhao Chen <yxckeis8@gmail.com>
|
2025-04-29 19:50:57 -07:00 |
|
woodx
|
2c3ea29476
|
[Feature] support auto chat template (#4949)
|
2025-04-28 22:34:18 -07:00 |
|
Yineng Zhang
|
41ac0c6d48
|
chore: upgrade sgl-kernel 0.1.0 (#5690)
|
2025-04-27 21:00:50 -07:00 |
|
fzyzcjy
|
1195182040
|
Tiny add Engine.flush_cache API (#5241)
|
2025-04-20 18:15:03 -07:00 |
|
tianlian yi
|
bc92107b03
|
Support server based rollout in Verlengine (#4848)
Co-authored-by: Jin Pan <jpan236@wisc.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>
|
2025-04-12 10:07:52 -07:00 |
|
XinyuanTong
|
d09a51f1f6
|
[feat&refactor] Enhance multimodal input support with refactor io_struct (#4938)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
|
2025-04-08 14:48:07 -07:00 |
|
XinyuanTong
|
9eb49e878b
|
[VLM RLHF] Take Image input for verl vlm rollout (#4915)
Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
Co-authored-by: GeLee <leege233@gmail.com>
|
2025-04-01 20:03:17 -07:00 |
|
Wei Wu
|
91ba98fe50
|
[Fix] Resolve GPU Memory Leak in update_weights_from_tensor (#4446)
|
2025-03-17 08:54:30 +00:00 |
|
Yinghai Lu
|
c614dbdf95
|
Nicer standalone engine inferface (#4480)
|
2025-03-17 01:42:04 -07:00 |
|
woodx
|
48efec7b05
|
Feature: support code completion (#3612)
|
2025-03-16 18:26:19 -07:00 |
|