Commit Graph

693 Commits

Author SHA1 Message Date
Ying Sheng
d3d4d76758 [Eagle] Refactor eagle speculative decoding (#3986)
Co-authored-by: Ke Bao <ISPObaoke@163.com>
2025-03-05 08:06:07 -08:00
Mick
583d6af71b example: add vlm to token in & out example (#3941)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
2025-03-04 22:18:26 -08:00
Lianmin Zheng
e074d84e5b [Minor] more code cleanup (#4077) 2025-03-04 21:23:47 -08:00
Lianmin Zheng
77a3954bf7 Simplify eagle tests and TP sync in grammar backend (#4066) 2025-03-04 13:40:40 -08:00
Xihuai Wang
95575aa76a Reasoning parser (#4000)
Co-authored-by: Lucas Pickup <lupickup@microsoft.com>
2025-03-03 21:16:36 -08:00
Lianmin Zheng
935cda944b Misc clean up; Remove the support of jump forward (#4032) 2025-03-03 07:02:14 -08:00
Lianmin Zheng
66301e124f Improve code styles (#4021) 2025-03-03 03:20:23 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Lianmin Zheng
9e1014cf99 Revert "Add fast decode plan for flashinfer mla" (#4008) 2025-03-02 19:29:10 -08:00
Baizhou Zhang
fa56106731 Add fast decode plan for flashinfer mla (#3987) 2025-03-02 19:16:37 -08:00
Qiaolin Yu
40782f05d7 Refactor: Move return_hidden_states to the generate input (#3985)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
2025-03-01 17:51:29 -08:00
Chayenne
930da877c4 rename FunctionCallReqInput to ParseFunctionCallReq (#3976) 2025-02-28 18:46:25 -08:00
Baizhou Zhang
90a4b7d98a [Feature]Support ragged prefill in flashinfer mla backend (#3967)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-28 18:13:56 -08:00
fzyzcjy
e3e0bc50a9 [Feature] SPMD for SGLang + Verl (#3852) 2025-02-28 09:53:10 -08:00
mlmz
bac414ab53 [Feature] integrate Structural Tag in xgrammar backend for function calling (#3566)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
2025-02-27 23:33:41 -08:00
Chang Su
eec3f6d1eb [Bugfix] Fix tokenizer_manager not getting 400 when req is too long (#3678)
Co-authored-by: voidxb <unkown>
2025-02-27 22:59:43 -08:00
KCFindstr
bc20e93f2d [feat] Add Vertex AI compatible prediction route for /generate (#3866) 2025-02-27 19:42:15 -08:00
Qiaolin Yu
d6898dd253 Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 22:06:54 -08:00
IAN
107710268a [BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841) 2025-02-25 09:32:05 -08:00
Wang Ran (汪然)
60b771c815 Improve: fix typos (#3801)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-24 16:51:23 -08:00
Lianmin Zheng
d7934cde45 Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00
Lianmin Zheng
62bbd34393 Revert "Extract generation_manager from tokenizer_manager" (#3829) 2025-02-24 14:49:16 -08:00
Lianmin Zheng
f2388f6b95 Revert "Rename TokenizerManager to StdOrchestrator" (#3828) 2025-02-24 14:47:59 -08:00
Zhiqiang Xie
6c7a152c5a Hierarchical Caching for SGLang (#2693)
Co-authored-by: Wenxuan Tan <wenxuan.tan@wisc.edu>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-02-23 21:56:30 -08:00
fzyzcjy
45360b2fa9 Improve: Rename TokenizerManager to StdOrchestrator (#3116) 2025-02-23 00:30:58 -08:00
fzyzcjy
3f41b18455 Improve: Extract generation_manager from tokenizer_manager (#3115) 2025-02-22 23:25:45 -08:00
Mick
45205d88a0 bench: Add MMMU benchmark for vLM (#3562) 2025-02-22 08:10:59 -08:00
fzyzcjy
9087694006 Improve: Use TypeBasedDispatcher in DetokenizerManager (#3117) 2025-02-21 19:50:46 -08:00
Shenggui Li
9af0e21ef5 [bug] fixed batch api for DeepSeek V3/R1 (#3754) 2025-02-21 10:28:16 -08:00
Yineng Zhang
714f3e6362 feat: support flashinfer mla with prefix cache (#3643) 2025-02-18 02:06:43 +08:00
Mick
bcc213df61 Model: Support Qwen 2.5 vl (#3258) 2025-02-16 00:58:53 -08:00
Yineng Zhang
70f894b810 feat: support flashinfer mla attention for deepseek v3 (#3550) 2025-02-14 08:50:14 +08:00
Jackmin801
5f0e7de339 [Feat] Return hidden states (experimental) (#3364)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-10 15:54:37 -08:00
Mick
9f635ea50d [Fix] Address remaining issues of supporting MiniCPMV (#2977) 2025-01-28 00:22:13 -08:00
Zhiqiang Xie
08104b56de Sanity check to prevent performance regression (#3171)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2025-01-27 12:28:17 -08:00
Lianmin Zheng
53cef81587 Improve weight loading and code style (#3174) 2025-01-27 03:00:41 -08:00
YAMY
b045841bae Feature/function calling update (#2700)
Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
2025-01-26 09:57:51 -08:00
Lianmin Zheng
1dda8c5e4c Return more infos for computing average acceptance length (#3152) 2025-01-26 04:51:54 -08:00
Lianmin Zheng
d1a0863251 Add a test case for cached_tokens (#3145) 2025-01-26 01:39:28 -08:00
Lianmin Zheng
3d8f1c9bcf Use int64 as indices for set_kv_buffer (#3039) 2025-01-21 19:46:09 -08:00
Lianmin Zheng
287d07a669 Misc fixes for eagle (flush_cache, CPU overhead) (#3014) 2025-01-20 20:27:38 -08:00
996_icu
b730aa6b9e [EAGLE] Fix some boundary situation when retract reqs and req's max token = 1 (#2939)
Co-authored-by: josephyou <josephyou@tencent.com>
2025-01-20 17:46:43 -08:00
Hongpeng Guo
949b3fbfce [Doc] Update doc of custom logit processor (#3021)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
2025-01-20 16:50:25 -08:00
Hongpeng Guo
583697cd71 [Enhancement] Custom Logit Processor Improvement (#2998)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
2025-01-20 02:00:35 -08:00
Lianmin Zheng
09bcbe0123 Update TypeBasedDispatcher and balance CI tests (#3001) 2025-01-19 23:37:27 -08:00
Lianmin Zheng
03464890e0 Separate two entry points: Engine and HTTP server (#2996)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-01-19 22:09:24 -08:00
Lianmin Zheng
cd493b5afc Improve metrics, logging, and importing orders (#2992) 2025-01-19 18:36:59 -08:00
Lianmin Zheng
61f42b5732 Move sgl.Runtime under sglang/lang (#2990) 2025-01-19 17:10:29 -08:00
Hongpeng Guo
e403d23757 [Feature] Add sampler custom logits processor (#2396)
Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>
2025-01-19 14:46:53 -08:00
Seungduk Kim
d77caa2b75 [#2812] Make the decode status dict capcity adjustable by a CLI param (#2839) 2025-01-19 11:36:53 -08:00