sglang

Author	SHA1	Message	Date
Zhiqiang Xie	b26cb1c55a	Fix problem of large page size with chunked prefill (#6046 )	2025-05-06 15:19:47 +08:00
Zhiqiang Xie	f8e460930a	Fix prefill OOM error in the case of large page size (#5081 )	2025-05-05 16:02:55 -07:00
Lianmin Zheng	c76040e31b	Support page size > 1 (#4356 )	2025-03-12 22:22:39 -07:00
Zhiqiang Xie	10b544ae9b	Hierarchical Caching Refactoring and Fixing TP issue (#4082 )	2025-03-12 11:22:35 -07:00
Ying Sheng	d3d4d76758	[Eagle] Refactor eagle speculative decoding (#3986 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2025-03-05 08:06:07 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Zhiqiang Xie	8af7048dcf	Query remaining memory dynamically for PrefillAdder (#2941 )	2025-01-17 20:20:26 -08:00
Lianmin Zheng	f65c13b559	Remove normalized_prompt_logprobs from the engine to make code easier to maintain (#2902 )	2025-01-15 04:54:14 -08:00
libra	bdb3929dbb	Refactor SchedulePolicy to improve code organization (#2571 )	2025-01-04 00:05:16 +08:00
Liangsheng Yin	e7ebecf82e	Fix cache hit rate when chunked prefill (#2555 )	2024-12-26 03:14:28 -08:00
Lianmin Zheng	56198b45d9	Add a benchmark script for in-batch prefix caching (#2494 )	2024-12-16 18:49:02 -08:00
SangBin Cho	9208618b3e	[Core] in batch prefix caching by delay scheduling (#2442 )	2024-12-11 12:51:50 -08:00
Liangsheng Yin	5f12f0e7af	Fix chunked prefill when ignore eos (#2290 )	2024-12-01 00:37:53 -08:00
Lianmin Zheng	f5b5f2bff9	Revert "[Fix] fix assertion error for chunked prefill when disabling cache" (#2286 )	2024-11-30 19:03:42 -08:00
Rui Wang	d622851dc9	[Fix] fix assertion error for chunked prefill when disabling cache (#2282 )	2024-11-30 17:53:43 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Lianmin Zheng	80e2c4a8de	Fix chunked prefill with output logprob (#2083 )	2024-11-18 13:16:28 -08:00
Lianmin Zheng	1929c06762	Simplify prometheus metrics (#1981 ) Co-authored-by: Mohit Reddy <mohitreddy1996@users.noreply.github.com>	2024-11-10 04:39:32 -08:00
Lzhang-hub	a146d9990e	support prometheus metrics (#1853 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-11-05 20:42:53 -08:00
Lianmin Zheng	efbc116a0f	Do not use longest prefix matching when #queue-req is large (#1896 )	2024-11-03 01:45:20 -07:00
Lianmin Zheng	2b80978859	Provide an argument to set the maximum batch size for cuda graph (#1809 )	2024-10-26 15:09:33 -07:00
Lianmin Zheng	c555ce2ca2	Revert "Fix memory leak when doing chunked prefill" (#1797 )	2024-10-25 10:24:44 -07:00
Liangsheng Yin	a2f5e7555f	Fix memory leak when doing chunked prefill (#1787 )	2024-10-25 08:01:17 -07:00
havetc	ecb8bad276	Returning a per request metric for number of cached_tokens read (#1599 )	2024-10-16 11:49:22 -07:00
Lianmin Zheng	24f3e1511c	[Minor] Improve style (#1666 )	2024-10-14 05:25:00 -07:00
Ke Bao	68f8b60d22	Fix chunked prefill condition (#1594 )	2024-10-07 06:34:14 +00:00
Lianmin Zheng	9244f27f0a	[Minor] Improve the style and fix flaky tests (#1584 )	2024-10-06 00:10:48 -07:00
Liangsheng Yin	5d0ba4038f	Refine the add request reasons to avoid corner cases. (#1574 )	2024-10-04 18:00:18 -07:00
Lianmin Zheng	36d5acfca5	Rename InputMetadata -> ForwardBatch (#1543 )	2024-09-30 02:41:11 -07:00

29 Commits