sglang

Author	SHA1	Message	Date
Fidel González	76285fdeea	Fix typo in README (#3190 )	2025-01-27 23:15:24 -08:00
Byron Hsu	988d0a4bfc	[kernel] Use sgl_kernel rope (#3169 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-01-28 14:33:11 +08:00
Xiaoyu Zhang	81262c7b72	clean up useless file (#3192 )	2025-01-28 14:29:30 +08:00
Byron Hsu	27aeb4b7d8	[test] deduplicate test_session_control (#3183 )	2025-01-28 13:17:06 +08:00
Jhin	7b9b4f4426	Docs fix about EAGLE and streaming output (#3166 ) Co-authored-by: Chayenne <zhaochenyang@ucla.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Jhin <jhinpan@umich.edu>	2025-01-27 18:10:45 -08:00
Zhiqiang Xie	08104b56de	Sanity check to prevent performance regression (#3171 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-01-27 12:28:17 -08:00
Yineng Zhang	cf142b6eb8	fix: update Dockerfile for cu118 (#3181 )	2025-01-27 23:46:44 +08:00
Yineng Zhang	4ab43cfb3e	chore: bump v0.4.2 (#3180 )	2025-01-27 21:42:05 +08:00
Yineng Zhang	2f79f58873	feat: use sgl-kernel 0.0.3 in sglang (#3179 )	2025-01-27 21:39:52 +08:00
Yineng Zhang	8a96f74988	chore: bump 0.0.3 for sgl-kernel (#3178 ) Co-authored-by: ispobock <ispobaoke@hotmail.com> Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com> Co-authored-by: HandH1998 <007aabbcc411@gmail.com> Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: ByronHsu <byronhsu1230@gmail.com>	2025-01-27 20:29:28 +08:00
Yineng Zhang	827aa8730b	cleanup sgl-kernel kernels (#3175 )	2025-01-27 19:11:01 +08:00
Lianmin Zheng	f8ca66fb49	Update thresholds in test_nightly_gsm8k_eval.py (#3176 )	2025-01-27 03:02:09 -08:00
Lianmin Zheng	53cef81587	Improve weight loading and code style (#3174 )	2025-01-27 03:00:41 -08:00
yigex	351a72d40b	add dsv3 mi300 triton config for block scale (#3146 )	2025-01-27 17:25:53 +08:00
Byron Hsu	514f37c32b	[kernel] Fix position ids in rope (#3173 )	2025-01-27 17:09:51 +08:00
Lianmin Zheng	52c03f16b9	Add activation parameters to fused_moe (#3170 )	2025-01-27 00:23:37 -08:00
Byron Hsu	741fccd7bf	Bump sgl kernel to 0.0.2.post19 (#3167 )	2025-01-27 15:36:07 +08:00
yizhang2077	1e3e521544	add unit test for block wise fp8 (#3156 )	2025-01-27 15:32:04 +08:00
Byron Hsu	fb11a43981	[kernel] Integrate flashinfer's rope with higher precision and better perf (#3134 )	2025-01-27 15:28:00 +08:00
Lianmin Zheng	af02f99b7c	Add more logprob tests (#3162 )	2025-01-26 22:24:55 -08:00
Jhin	9472e69963	Doc: Add Docs about EAGLE speculative decoding (#3144 ) Co-authored-by: Chayenne <zhaochenyang@ucla.edu> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-01-26 17:49:13 -08:00
Chayenne	1acc1f561a	[Docs]: Add function calling in index.rst (#3155 )	2025-01-26 11:11:27 -08:00
YAMY	b045841bae	Feature/function calling update (#2700 ) Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com>	2025-01-26 09:57:51 -08:00
Yineng Zhang	f265d15b96	use self-hosted to build sgl-kernel (#3154 )	2025-01-26 23:02:57 +08:00
Yineng Zhang	02431b9ad2	fix link in README (#3153 )	2025-01-26 21:30:00 +08:00
Lianmin Zheng	1dda8c5e4c	Return more infos for computing average acceptance length (#3152 )	2025-01-26 04:51:54 -08:00
Yineng Zhang	7e0976133c	udpate sgl-kernel version for srt (#3150 )	2025-01-26 20:22:34 +08:00
Lianmin Zheng	f4a92f4b56	Temporarily skip the openai frontend tests (#3151 )	2025-01-26 04:17:35 -08:00
Yineng Zhang	318260c0fa	chore: bump 0.0.2.post18 for sgl-kernel (#3149 )	2025-01-26 19:00:34 +08:00
Lianmin Zheng	4a61253123	Do not load OPENAI_KEY from secrets (#3147 )	2025-01-26 01:54:03 -08:00
Lianmin Zheng	d1a0863251	Add a test case for cached_tokens (#3145 )	2025-01-26 01:39:28 -08:00
Hubert Lu	f8b28e461a	Add CPU affinity setting to latency benchmark (#3085 )	2025-01-25 23:52:05 -08:00
HandH1998	82392da830	support w8a8 fp8 kernel with CUTLASS (#3047 ) Co-authored-by: yych0745 <1398089567@qq.com>	2025-01-26 15:46:51 +08:00
Yineng Zhang	95f789adb0	minor: cleanup sgl-kernel (#3143 )	2025-01-26 14:29:58 +08:00
Lianmin Zheng	4f118a39d7	Fix repetition penalty (#3139 )	2025-01-25 21:48:58 -08:00
yigex	66283dbc0c	[Fix] Not skip NVML Check on AMD Platform (#3135 )	2025-01-25 21:33:51 -08:00
Yineng Zhang	822bae8c00	feat: cross python wheel for sgl-kernel (#3138 )	2025-01-26 13:21:34 +08:00
Hui Liu	8e48ca8cc1	enable kv_scale for Gemma2 (#3113 )	2025-01-25 18:29:14 -08:00
Lianmin Zheng	27acf63bbd	Use torch.compile for scaling penalty (#3133 )	2025-01-25 18:27:33 -08:00
Lianmin Zheng	da6f8081f6	Fix CI tests (#3132 )	2025-01-25 17:43:39 -08:00
yinfan98	9286740eff	feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#3130 ) Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com> Co-authored-by: yinfan98 <1106110035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-01-26 02:55:08 +08:00
Yineng Zhang	896c07441e	update installation doc for sgl-kernel (#3129 )	2025-01-26 00:00:13 +08:00
Ke Bao	c23d5706f4	Update whl index path (#3128 )	2025-01-25 23:57:09 +08:00
Ke Bao	67ad4338e1	Update tag name for whl release (#3127 )	2025-01-25 23:14:35 +08:00
Yineng Zhang	3cab5f71ea	speedup pr test for sgl-kernel (#3126 )	2025-01-25 21:37:48 +08:00
Yineng Zhang	14e754a868	chore: bump v0.0.2.post17 for sgl-kernel (#3125 )	2025-01-25 20:43:02 +08:00
yizhang2077	98522149ff	mirror fix for custom allreduce (#3124 )	2025-01-25 18:26:41 +08:00
Xiaoyu Zhang	5d9d15e70f	support fp32 in sampling_scaling_penalties kernel (#3121 )	2025-01-25 16:52:17 +08:00
Ke Bao	665e5e85f6	Add step to update sgl-kernel whl index (#3110 )	2025-01-25 02:03:01 +08:00
Ke Bao	a22f60a313	Add workflow for sgl-kernel cu118 release (#3109 )	2025-01-24 22:30:30 +08:00

1 2 3 4 5 ...

1909 Commits