sglang

Author	SHA1	Message	Date
fzyzcjy	15ddd84322	Add retry for flaky tests in CI (#4755 )	2025-03-25 16:53:12 -07:00
Lianmin Zheng	bc1534ff32	Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134 ) Co-authored-by: Sehoon Kim <kssteven418@gmail.com> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-06 06:13:59 -08:00
Lianmin Zheng	286e6540a6	Remove prefill-only-one-req (#4117 )	2025-03-05 20:58:48 -08:00
Lianmin Zheng	77a3954bf7	Simplify eagle tests and TP sync in grammar backend (#4066 )	2025-03-04 13:40:40 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Yineng Zhang	6718b10996	fix eagle unit test (#3591 )	2025-02-15 23:10:48 +08:00
Lianmin Zheng	da6f8081f6	Fix CI tests (#3132 )	2025-01-25 17:43:39 -08:00
Lianmin Zheng	a4331cd260	Add accuracy and latency tests of eagle into CI (#3027 )	2025-01-21 02:55:14 -08:00
Yineng Zhang	67470bbb28	minor: update correct measurement unit (#2406 )	2024-12-08 20:55:04 +08:00
Lianmin Zheng	ccaf1f997c	[CI] Print summary on github actions (#2274 )	2024-11-29 23:48:54 -08:00
Lianmin Zheng	3c5538f781	Update CI threshold (#2186 )	2024-11-25 15:24:17 -08:00
Lianmin Zheng	5652c56535	Update CI threshold & Improve code style (#2159 )	2024-11-24 06:29:38 -08:00
Lianmin Zheng	7d671e4ad2	Enable overlap by default (#2067 )	2024-11-19 22:07:58 -08:00
Yineng Zhang	766192610e	feat: update torch 2.5.1 (#2069 )	2024-11-18 21:29:13 +08:00
Lianmin Zheng	38625e2139	Remove monkey_patch_vllm_dummy_weight_loader (#2064 )	2024-11-17 15:48:12 -08:00
Lianmin Zheng	c1f401fc58	Revert "chore: update torch v2.5.1" (#2063 )	2024-11-17 15:29:38 -08:00
Yineng Zhang	3b878863f7	chore: update torch v2.5.1 (#1849 )	2024-11-18 00:06:00 +08:00
Ying Sheng	c98e84c21e	[Minor, Performance] Use torch.argmax for greedy sampling (#1589 )	2024-10-06 13:15:05 -07:00
Ying Sheng	04b262cd91	[Fix] Fix major performance bug in certain cases (#1563 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-10-04 08:51:11 +00:00
Lianmin Zheng	1acccb364a	Fix oom issues with fp8 for llama (#1454 )	2024-09-18 03:45:19 -07:00
Lianmin Zheng	9463bc1385	Enable torch.compile for triton backend (#1422 )	2024-09-14 15:38:37 -07:00
Lianmin Zheng	ad0ff62a4c	Balance test in CI (#1411 )	2024-09-12 23:29:44 -07:00
Lianmin Zheng	68be2f6d3b	[CI] Include triton backend and online serving benchmark into CI (#1408 )	2024-09-12 21:36:41 -07:00

23 Commits