sglang

Author	SHA1	Message	Date
Yineng Zhang	67470bbb28	minor: update correct measurement unit (#2406 )	2024-12-08 20:55:04 +08:00
Lianmin Zheng	ccaf1f997c	[CI] Print summary on github actions (#2274 )	2024-11-29 23:48:54 -08:00
Lianmin Zheng	3c5538f781	Update CI threshold (#2186 )	2024-11-25 15:24:17 -08:00
Lianmin Zheng	5652c56535	Update CI threshold & Improve code style (#2159 )	2024-11-24 06:29:38 -08:00
Lianmin Zheng	7d671e4ad2	Enable overlap by default (#2067 )	2024-11-19 22:07:58 -08:00
Yineng Zhang	766192610e	feat: update torch 2.5.1 (#2069 )	2024-11-18 21:29:13 +08:00
Lianmin Zheng	38625e2139	Remove monkey_patch_vllm_dummy_weight_loader (#2064 )	2024-11-17 15:48:12 -08:00
Lianmin Zheng	c1f401fc58	Revert "chore: update torch v2.5.1" (#2063 )	2024-11-17 15:29:38 -08:00
Yineng Zhang	3b878863f7	chore: update torch v2.5.1 (#1849 )	2024-11-18 00:06:00 +08:00
Ying Sheng	c98e84c21e	[Minor, Performance] Use torch.argmax for greedy sampling (#1589 )	2024-10-06 13:15:05 -07:00
Ying Sheng	04b262cd91	[Fix] Fix major performance bug in certain cases (#1563 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-10-04 08:51:11 +00:00
Lianmin Zheng	1acccb364a	Fix oom issues with fp8 for llama (#1454 )	2024-09-18 03:45:19 -07:00
Lianmin Zheng	9463bc1385	Enable torch.compile for triton backend (#1422 )	2024-09-14 15:38:37 -07:00
Lianmin Zheng	ad0ff62a4c	Balance test in CI (#1411 )	2024-09-12 23:29:44 -07:00
Lianmin Zheng	68be2f6d3b	[CI] Include triton backend and online serving benchmark into CI (#1408 )	2024-09-12 21:36:41 -07:00