Commit Graph

1489 Commits

Author SHA1 Message Date
Lianmin Zheng
66301e124f Improve code styles (#4021) 2025-03-03 03:20:23 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
yinfan98
b4d34cd35d Fix nightly-test CI (#3826) 2025-03-02 23:14:45 -08:00
Lianmin Zheng
9e1014cf99 Revert "Add fast decode plan for flashinfer mla" (#4008) 2025-03-02 19:29:10 -08:00
Baizhou Zhang
fa56106731 Add fast decode plan for flashinfer mla (#3987) 2025-03-02 19:16:37 -08:00
Zhousx
7fbab730bd [feat] add small vocab table for eagle's draft model[1]. (#3822)
Co-authored-by: Achazwl <323163497@qq.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-03-02 18:58:45 -08:00
Hubert Lu
9cf4077294 Enable custom AR for AMD GPUs and maintain it in sgl-kernel (#3406) 2025-03-02 15:19:06 -08:00
Ke Bao
00ce7e311c Fix all gather torch compile (#3992)
Co-authored-by: yizhang2077 <1109276519@qq.com>
2025-03-02 00:41:38 -08:00
Qiaolin Yu
40782f05d7 Refactor: Move return_hidden_states to the generate input (#3985)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
2025-03-01 17:51:29 -08:00
Chayenne
930da877c4 rename FunctionCallReqInput to ParseFunctionCallReq (#3976) 2025-02-28 18:46:25 -08:00
Baizhou Zhang
90a4b7d98a [Feature]Support ragged prefill in flashinfer mla backend (#3967)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
2025-02-28 18:13:56 -08:00
Yineng Zhang
f3b99f73b3 update flashinfer-python version 2025-02-28 16:31:59 -08:00
Chaitanya Sri Krishna Lolla
77a6c9d229 Remove unused imports from rocm mla kernel. (#3963) 2025-02-28 10:01:08 -08:00
fzyzcjy
e3e0bc50a9 [Feature] SPMD for SGLang + Verl (#3852) 2025-02-28 09:53:10 -08:00
mlmz
bac414ab53 [Feature] integrate Structural Tag in xgrammar backend for function calling (#3566)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
2025-02-27 23:33:41 -08:00
Chang Su
eec3f6d1eb [Bugfix] Fix tokenizer_manager not getting 400 when req is too long (#3678)
Co-authored-by: voidxb <unkown>
2025-02-27 22:59:43 -08:00
Chayenne
90bc26a813 set a strict sgl-kernel version (#3950) 2025-02-27 22:44:57 -08:00
Kebe
ec0a72c2d9 Fix bench_serving not recognizing OPENAI_API_KEY (#3870)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-02-27 20:18:53 -08:00
KCFindstr
bc20e93f2d [feat] Add Vertex AI compatible prediction route for /generate (#3866) 2025-02-27 19:42:15 -08:00
Qiaolin Yu
d38878523d Fix the doc link for sampling params (#3861) 2025-02-27 13:31:43 -08:00
Yineng Zhang
564bdf29f7 upgrade flashinfer v0.2.2.post1 (#3934) 2025-02-27 09:53:48 -08:00
Enrique Shockwave
d281587989 Improve: Support xgrammar 0.1.14 (#3593) 2025-02-27 08:42:54 -08:00
laixin
b0df5d240b Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-02-27 10:59:46 +00:00
Qiaolin Yu
d6898dd253 Add return hidden state in the native API (#3897)
Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-26 22:06:54 -08:00
Tianxing Wu
8b681d7724 [Rocm] Fix to the rocm_mla_decode_rope.py returning random result (#3898) 2025-02-26 17:05:30 -08:00
JC1DA
7551498a69 [Feature] Support llguidance for constrained decoding (#3298) 2025-02-26 10:41:49 -08:00
lukec
21463e321a Expert Parallelism (EP) Support for DeepSeek V3/R1 (#3602)
Co-authored-by: laixin <xielx@shanghaitech.edu.cn>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: laixin <q865809639@gmail.com>
2025-02-26 02:29:37 -08:00
Kebe
60524920ba [Bug]: Fix maximum recursion depth triggered on exception exit (#3519) 2025-02-25 09:39:38 -08:00
IAN
107710268a [BugFix] Fix crash when receive a req with structed output in DP attention mode. (#3841) 2025-02-25 09:32:05 -08:00
who who who
4606e2a3fe Bug: fix capture_bs (#3857) 2025-02-25 08:40:35 -08:00
Nicolas Castet
127998cc41 Fix allgather ops inside cuda graphs (#3709) 2025-02-25 08:39:10 -08:00
Shenggui Li
c0bb9eb3b3 [improve] made timeout configurable (#3803) 2025-02-25 00:26:08 -08:00
Yueyang Pan
7036d6fc67 [Bug]: Add missing clamp to llavavid (#3787) 2025-02-24 19:10:15 -08:00
Chaitanya Sri Krishna Lolla
6ce9dbe828 [ROCm] Enable Fused MLA Triton kernel for DeepSeekV3 (#3237)
Co-authored-by: HAI <hixiao@gmail.com>
2025-02-24 18:14:31 -08:00
Wang Ran (汪然)
60b771c815 Improve: fix typos (#3801)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-02-24 16:51:23 -08:00
Lianmin Zheng
d7934cde45 Fix CI and install docs (#3821) 2025-02-24 16:17:38 -08:00
Lianmin Zheng
62bbd34393 Revert "Extract generation_manager from tokenizer_manager" (#3829) 2025-02-24 14:49:16 -08:00
Lianmin Zheng
f2388f6b95 Revert "Rename TokenizerManager to StdOrchestrator" (#3828) 2025-02-24 14:47:59 -08:00
Lianmin Zheng
c9745ee082 Fix pandas dependency in CI (#3818) 2025-02-24 05:56:57 -08:00
laixin
1a6e97577a Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3730)
Co-authored-by: HandH1998 <1335248067@qq.com>
2025-02-24 05:43:35 -08:00
Baizhou Zhang
b110084654 Refactor flashinfer logic for deepseek v3 and fix accuracy bug (#3785) 2025-02-24 04:07:25 -08:00
Lianmin Zheng
27a46317b6 Fix dependency (#3813) 2025-02-24 03:50:58 -08:00
Zhiqiang Xie
6c7a152c5a Hierarchical Caching for SGLang (#2693)
Co-authored-by: Wenxuan Tan <wenxuan.tan@wisc.edu>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-02-23 21:56:30 -08:00
fzyzcjy
45360b2fa9 Improve: Rename TokenizerManager to StdOrchestrator (#3116) 2025-02-23 00:30:58 -08:00
fzyzcjy
3f41b18455 Improve: Extract generation_manager from tokenizer_manager (#3115) 2025-02-22 23:25:45 -08:00
Mick
45205d88a0 bench: Add MMMU benchmark for vLM (#3562) 2025-02-22 08:10:59 -08:00
fzyzcjy
9087694006 Improve: Use TypeBasedDispatcher in DetokenizerManager (#3117) 2025-02-21 19:50:46 -08:00
fzyzcjy
a3339d8cac Bug: Fix weight loader error when LM head weights are tied (#3766) 2025-02-21 17:53:12 -08:00
Chayenne
14d90617b0 Bug: fix lm head weights in Qwen models (#3777) 2025-02-21 16:49:31 -08:00
fzyzcjy
d37f95511d Improve: Tiny fix Olmo2 (#3348) 2025-02-21 16:09:35 -08:00