sglang

Author	SHA1	Message	Date
Lianmin Zheng	9e426466af	Clean up allocators (#9134 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-13 13:56:04 -07:00
J	0e7a5b2694	fix: prevent crashes due to logit bias dimension mismatch (#7685 )	2025-07-23 15:30:55 -07:00
Lianmin Zheng	071a1f51ae	[Minor] clean up multimodal processor and tokenizer manager (#7624 )	2025-06-29 02:50:14 -07:00
Brayden Zhong	ca9291181d	[Feature] Add Logit Bias (#6579 ) Co-authored-by: Cinjon Resnick <cinjon.resnick@gmail.com>	2025-06-10 15:39:25 -07:00
Lianmin Zheng	608668e143	Slightly improve the sampler to skip unnecessary steps (#6956 )	2025-06-08 03:18:54 -07:00
Lianmin Zheng	d18c6b3358	Support incremental streaming of logprob/token_ids between scheduler and detokenizer (#6225 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-05-12 14:33:38 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
Yineng Zhang	66fc63d6b1	Revert "feat: add thinking_budget (#6089 )" (#6181 )	2025-05-10 16:07:45 -07:00
thyecust	63484f9fd6	feat: add thinking_budget (#6089 )	2025-05-09 08:22:09 -07:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
JieXin Liang	1a3fa75f2f	[Fix] use `torch.cat` instead of `torch.concat` to prevent entering the `Autograd` backends. (#4466 )	2025-03-16 00:02:47 -07:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Qiaolin Yu	40782f05d7	Refactor: Move return_hidden_states to the generate input (#3985 ) Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>	2025-03-01 17:51:29 -08:00
Qiaolin Yu	d6898dd253	Add return hidden state in the native API (#3897 ) Co-authored-by: Beichen-Ma <mabeichen12@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-26 22:06:54 -08:00
Lianmin Zheng	27acf63bbd	Use torch.compile for scaling penalty (#3133 )	2025-01-25 18:27:33 -08:00
Hongpeng Guo	583697cd71	[Enhancement] Custom Logit Processor Improvement (#2998 ) Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>	2025-01-20 02:00:35 -08:00
Hongpeng Guo	e403d23757	[Feature] Add sampler custom logits processor (#2396 ) Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>	2025-01-19 14:46:53 -08:00
Xiaoyu Zhang	d08c77c434	Sampling penalties memory interface (#2870 )	2025-01-13 23:09:00 +08:00
Lianmin Zheng	bdc1acf6cd	Misc fix for min_p_sampling, --cuda-graph-bs (#2761 )	2025-01-07 02:52:53 -08:00
Lianmin Zheng	21ec66e59e	Minor follow-up fixes for the logprob refactor (#2670 )	2024-12-30 05:42:08 -08:00
Lianmin Zheng	9c6ba2484f	Refactor logprob computation to return the real logprob used in sampling (#2664 )	2024-12-30 04:51:38 -08:00
Lianmin Zheng	0e7409adb6	Fix the overlap for xgrammar (#2377 )	2024-12-06 05:49:29 -08:00
Yixin Dong	538fa0ae13	[Fix] Avoid calling fill_vocab_mask for terminated requests (#2175 )	2024-11-25 17:31:25 +08:00
Lianmin Zheng	7d671e4ad2	Enable overlap by default (#2067 )	2024-11-19 22:07:58 -08:00
Lianmin Zheng	ffd20fcd03	Make constrained decoding work for overlap scheduler (#2095 )	2024-11-19 15:04:43 -08:00
Lianmin Zheng	b110453802	Simplify logits penalizer (#2086 )	2024-11-18 17:48:28 -08:00
DarkSharpness	9c745d078e	[Performance] Update xgrammar-related constrained decoding (#2056 )	2024-11-17 16:58:49 -08:00
Lianmin Zheng	edad373135	Fix illegal memory access in overlap mode & Use more fused triton kernels for building meta data (#2051 )	2024-11-16 16:14:23 -08:00
Lianmin Zheng	ba069a24d3	Fix grammar backend (#2018 )	2024-11-12 21:17:38 -08:00
DarkSharpness	b77a02cdfd	[Performance] Support both xgrammar and outlines for constrained decoding (#1752 )	2024-10-25 21:47:02 +00:00
Lianmin Zheng	e12358dc91	Simplify the usage of device (#1734 )	2024-10-20 18:17:41 -07:00
Lianmin Zheng	59cbf47626	Unify the memory pool api and tp worker API (#1724 )	2024-10-19 23:19:26 -07:00
Lianmin Zheng	3db43d1b08	Fix `is_all_ready` for overlap copy (#1710 )	2024-10-18 21:01:52 -07:00
Lianmin Zheng	f0f8a7699b	Simplify the nan detection and greedy check in sampler (#1709 )	2024-10-18 20:21:24 -07:00
Lianmin Zheng	2bcfba1b08	Skip unnecessary penalizer (#1707 )	2024-10-18 17:54:03 -07:00
Lianmin Zheng	392f2863c8	Add dtype for more operations (#1705 )	2024-10-18 12:18:15 -07:00
Lianmin Zheng	4a292f670d	[Minor] Add some utility functions (#1671 )	2024-10-14 20:08:03 -07:00
Lianmin Zheng	9da5a60b18	Add an option to disable penalizer (#1651 )	2024-10-12 17:53:23 -07:00
Zhang, Liangang	8275049ce3	Add device support (#1607 )	2024-10-11 02:05:58 -07:00
Lianmin Zheng	9244f27f0a	[Minor] Improve the style and fix flaky tests (#1584 )	2024-10-06 00:10:48 -07:00
Lianmin Zheng	32eb6e96f2	Organize sampling batch info better (#1562 )	2024-10-03 18:29:49 -07:00
Lianmin Zheng	317631cada	[Fix] Move ScheduleBatch out of SamplingInfo (#1556 )	2024-10-02 17:18:04 -07:00
Lianmin Zheng	b564835364	[Fix] do not maintain regex_fsm in SamplingBatchInfo (#1555 )	2024-10-02 13:19:44 -07:00
Lianmin Zheng	63ba2f8d7b	Clean up batch data structures: Introducing ModelWorkerBatch (#1544 )	2024-09-30 06:41:49 -07:00
Lianmin Zheng	2fa5cec775	Simplify sampler and its error handling (#1441 )	2024-09-16 21:23:31 -07:00
Lianmin Zheng	9ba1f09760	[Fix] Fix logprob and normalized_logprob (#1428 )	2024-09-15 06:36:06 -07:00
Liangsheng Yin	70b6802982	Optimize conflicts between CUDA graph and vocab mask tensors (#1392 )	2024-09-13 20:27:53 -07:00
Lianmin Zheng	fec185ce0c	Refactor attention backend (#1381 )	2024-09-11 11:44:26 -07:00
Liangsheng Yin	fbb4754cb8	Fix vocab mask update bug (#1376 )	2024-09-10 13:10:36 -07:00
Liangsheng Yin	a5a134f39f	Fix bugs in sampler with CUDA graph / torch.compile (#1306 )	2024-09-02 23:18:48 +00:00

1 2

55 Commits