sglang

Author	SHA1	Message	Date
Yineng Zhang	7e0976133c	udpate sgl-kernel version for srt (#3150 )	2025-01-26 20:22:34 +08:00
Lianmin Zheng	f4a92f4b56	Temporarily skip the openai frontend tests (#3151 )	2025-01-26 04:17:35 -08:00
Yineng Zhang	318260c0fa	chore: bump 0.0.2.post18 for sgl-kernel (#3149 )	2025-01-26 19:00:34 +08:00
Lianmin Zheng	4a61253123	Do not load OPENAI_KEY from secrets (#3147 )	2025-01-26 01:54:03 -08:00
Lianmin Zheng	d1a0863251	Add a test case for cached_tokens (#3145 )	2025-01-26 01:39:28 -08:00
Hubert Lu	f8b28e461a	Add CPU affinity setting to latency benchmark (#3085 )	2025-01-25 23:52:05 -08:00
HandH1998	82392da830	support w8a8 fp8 kernel with CUTLASS (#3047 ) Co-authored-by: yych0745 <1398089567@qq.com>	2025-01-26 15:46:51 +08:00
Yineng Zhang	95f789adb0	minor: cleanup sgl-kernel (#3143 )	2025-01-26 14:29:58 +08:00
Lianmin Zheng	4f118a39d7	Fix repetition penalty (#3139 )	2025-01-25 21:48:58 -08:00
yigex	66283dbc0c	[Fix] Not skip NVML Check on AMD Platform (#3135 )	2025-01-25 21:33:51 -08:00
Yineng Zhang	822bae8c00	feat: cross python wheel for sgl-kernel (#3138 )	2025-01-26 13:21:34 +08:00
Hui Liu	8e48ca8cc1	enable kv_scale for Gemma2 (#3113 )	2025-01-25 18:29:14 -08:00
Lianmin Zheng	27acf63bbd	Use torch.compile for scaling penalty (#3133 )	2025-01-25 18:27:33 -08:00
Lianmin Zheng	da6f8081f6	Fix CI tests (#3132 )	2025-01-25 17:43:39 -08:00
yinfan98	9286740eff	feat: refactor sgl-kernel and use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#3130 ) Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com> Co-authored-by: yinfan98 <1106110035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-01-26 02:55:08 +08:00
Yineng Zhang	896c07441e	update installation doc for sgl-kernel (#3129 )	2025-01-26 00:00:13 +08:00
Ke Bao	c23d5706f4	Update whl index path (#3128 )	2025-01-25 23:57:09 +08:00
Ke Bao	67ad4338e1	Update tag name for whl release (#3127 )	2025-01-25 23:14:35 +08:00
Yineng Zhang	3cab5f71ea	speedup pr test for sgl-kernel (#3126 )	2025-01-25 21:37:48 +08:00
Yineng Zhang	14e754a868	chore: bump v0.0.2.post17 for sgl-kernel (#3125 )	2025-01-25 20:43:02 +08:00
yizhang2077	98522149ff	mirror fix for custom allreduce (#3124 )	2025-01-25 18:26:41 +08:00
Xiaoyu Zhang	5d9d15e70f	support fp32 in sampling_scaling_penalties kernel (#3121 )	2025-01-25 16:52:17 +08:00
Ke Bao	665e5e85f6	Add step to update sgl-kernel whl index (#3110 )	2025-01-25 02:03:01 +08:00
Ke Bao	a22f60a313	Add workflow for sgl-kernel cu118 release (#3109 )	2025-01-24 22:30:30 +08:00
Yineng Zhang	04f0b4cbef	minor: update sgl-kernel setup (#3107 )	2025-01-24 20:10:35 +08:00
Adarsh Shirawalmath	4505a43614	[Docs] minor update for phi-3 and phi-4 (#3096 )	2025-01-24 04:00:20 -08:00
Trevor Morris	685a5738a7	Allow local cutlass directory to be used in sgl-kernel build (#3037 )	2025-01-24 03:59:47 -08:00
Yineng Zhang	153b414e83	minor: sync flashinfer and add turbomind as 3rdparty (#3105 )	2025-01-24 19:22:39 +08:00
Ke Bao	6619f48e18	Fix cu118 group gemm compile issue (#3097 )	2025-01-24 15:19:09 +08:00
Byron Hsu	3ed0a547b2	[router] Fix twine uploading (#3095 )	2025-01-23 21:01:01 -08:00
Byron Hsu	8d8ef8497e	bump router to 0.1.4 (#3094 )	2025-01-23 20:32:43 -08:00
Byron Hsu	9a0cc2e90e	[router] Forward all request headers from router to workers (#3070 )	2025-01-23 20:30:31 -08:00
Ke Bao	7bad7e75bf	Add shapes for int8 gemm benchmark (#3093 )	2025-01-24 12:27:30 +08:00
simveit	1c4e0d2445	Docs: Update doc for server arguments (#2742 ) Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-01-23 11:32:05 -08:00
Yineng Zhang	54bac8af0b	chore: bump sgl-kernel 0.0.2.post16 (#3087 )	2025-01-24 01:57:48 +08:00
Yineng Zhang	5de4051bcf	feat: integrate sampling kernels into sgl-kernel (#3086 ) Co-authored-by: Zihao Ye <expye@outlook.com>	2025-01-24 01:54:47 +08:00
Xiaoyu Zhang	e0cd65c2b6	[hotfix] fix test_sampling_scaling_penalties.py ci test (#3084 )	2025-01-24 00:33:59 +08:00
Xiaoyu Zhang	f1b6861828	use flashinfer vec_dtypes in sgl_kernel (#3083 )	2025-01-23 22:19:04 +08:00
Yineng Zhang	0da0989ad4	sync flashinfer and update sgl-kernel tests (#3081 )	2025-01-23 21:13:55 +08:00
Yineng Zhang	07a22cbba3	use env variable to control the build conf on the CPU build node (#3080 )	2025-01-23 20:46:49 +08:00
Yineng Zhang	3d0bfa3e17	update version setup for sgl-kernel (#3079 )	2025-01-23 19:45:25 +08:00
Yineng Zhang	1f6cf0d4b9	fix build error for sgl-kernel (#3078 )	2025-01-23 19:16:35 +08:00
Lianmin Zheng	553f5a3ffe	Remove torch dependency in sgl-kernel (#3074 )	2025-01-23 17:23:37 +08:00
Xiaoyu Zhang	ac2dc35d0e	support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030 )	2025-01-23 15:29:20 +08:00
Yineng Zhang	3e032c07cc	use v0.6.4.post1 for sgl-kernel ci (#3071 )	2025-01-23 14:19:38 +08:00
Yineng Zhang	44e12ce463	docs: update developer guide for sgl-kernel (#3069 )	2025-01-23 14:08:25 +08:00
Yineng Zhang	a547aad61f	docs: add developer guide for sgl-kernel (#3068 )	2025-01-23 13:47:53 +08:00
Lianmin Zheng	ea535dc574	Revert "disable custom allreduce on HIP" (#3067 )	2025-01-22 21:33:35 -08:00
Ke Wen	862bcff833	Support loading of larger models with on-the-fly quantization (#3061 )	2025-01-22 21:33:17 -08:00
Lianmin Zheng	8b84e69f25	Fix tp token sync for dp attention (#3062 )	2025-01-22 18:51:40 -08:00

1 2 3 4 5 ...

1883 Commits