sglang

Author	SHA1	Message	Date
Lianmin Zheng	8d323e95e4	Use clang format 18 in pr-test-sgl-kernel.yml (#4203 )	2025-03-08 01:28:10 -08:00
Yineng Zhang	96d0e37fa7	Revert "Minor improvement to per_tensor_quant_fp8 (#4197 )" (#4198 )	2025-03-07 22:57:09 -08:00
Rex	90bb2be27e	Minor improvement to per_tensor_quant_fp8 (#4197 )	2025-03-07 22:52:12 -08:00
lukec	b93ef5e56d	Remove the vllm dependency from the moe_align function (#4164 ) Co-authored-by: Hongbosherlock <hongbosherlock@gmail.com>	2025-03-07 22:42:16 -08:00
Lianmin Zheng	d052f4c8a9	New clang format for sgl kernel (#4194 )	2025-03-07 20:21:08 -08:00
Yineng Zhang	eb61f5c9af	Revert "ROCm: Flex Attention Enablement with custom backends (#4178 )" (#4186 )	2025-03-07 10:27:52 -08:00
HAI	0beea4503f	ROCm: Flex Attention Enablement with custom backends (#4178 ) Co-authored-by: linsun12 <linsun12@amd.com>	2025-03-07 04:38:53 -08:00
Yineng Zhang	96263f275c	chore: bump v0.0.3.post7 for sgl-kernel (#4176 )	2025-03-07 01:15:34 -08:00
Yineng Zhang	94a2b9d33e	Put utils in ifndef USE_ROCM to fix CI (#4167 ) (#4168 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-03-07 00:01:17 -08:00
Stefan He	3c3eb374b2	Remove non-existent AMD header include (#4166 )	2025-03-06 23:29:30 -08:00
Stefan He	95085d65e9	[Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163 )	2025-03-06 22:58:52 -08:00
Stefan He	63ee26d162	Add sgl_per_token_quant_fp8 (#4089 )	2025-03-06 20:53:05 -08:00
Xiaoyu Zhang	ad55f17182	[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786 )	2025-03-06 18:05:43 -08:00
Liu Jinjie	0804dd11a0	remove unused max_jobs in setup_rocm.py (#4126 ) Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>	2025-03-06 00:12:19 -08:00
Lianmin Zheng	e074d84e5b	[Minor] more code cleanup (#4077 )	2025-03-04 21:23:47 -08:00
Liu Jinjie	926f8efc0c	remove unused max_jobs (#3607 ) Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>	2025-03-04 04:23:39 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	110e006673	Reorganize python source files in sgl-kernel with multiple files (#4027 )	2025-03-03 06:36:40 -08:00
Lianmin Zheng	6b45a21d16	Reorganize c++ source files in sgl-kernel with multiple folders (#4025 )	2025-03-03 05:32:30 -08:00
Lianmin Zheng	66301e124f	Improve code styles (#4021 )	2025-03-03 03:20:23 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Hubert Lu	9cf4077294	Enable custom AR for AMD GPUs and maintain it in sgl-kernel (#3406 )	2025-03-02 15:19:06 -08:00
Chayenne	18bb216c28	Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982 )	2025-02-28 23:57:17 -08:00
Elfie Guo	9e74ee91da	Update cutlass dependency (#3966 )	2025-02-28 16:16:31 -08:00
yiakwy-xpu-ml-framework-team	1c96fa86cf	[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613 )	2025-02-27 19:42:48 -08:00
Xiaoyu Zhang	55a7ec388f	use warp shuffle style reduce and flashinfer vectorize (#3628 )	2025-02-19 20:53:51 +08:00
Baizhou Zhang	67fc595bb8	[Feature] Apply Cublas Grouped Gemm kernel (#3629 )	2025-02-18 15:18:31 +08:00
Xiaoyu Zhang	3efbdf68b9	fix sgl-kernel codestyle (#3563 )	2025-02-14 18:05:52 +08:00
Yineng Zhang	e082142519	chore: bump 0.0.3.post6 sgl-kernel (#3555 )	2025-02-14 08:55:15 +08:00
Xiaoyu Zhang	f076328bb7	fix moe_align_kernel shm init not sync bug (#3534 )	2025-02-13 16:47:00 +08:00
Yineng Zhang	4430c0a513	chore: bump 0.0.3.post5 sgl-kernel (#3530 )	2025-02-13 01:51:46 +08:00
yizhang2077	640363ad20	support blockwise fp8 matmul kernel (#3267 )	2025-02-13 01:49:33 +08:00
Yineng Zhang	b96e92e6e6	chore: bump 0.0.3.post4 sgl-kernel (#3523 )	2025-02-12 17:28:36 +08:00
Xiaoyu Zhang	bb418ced80	optimize per token group quant fp8 (#3490 )	2025-02-11 22:19:05 +08:00
Yineng Zhang	6239d0b2e7	chore: bump sgl-kernel v0.0.3.post3 (#3440 )	2025-02-10 04:00:52 +08:00
Yineng Zhang	4cfd3add6d	support version in sgl-kernel (#3439 )	2025-02-10 03:49:52 +08:00
Yineng Zhang	29daf498cd	fix cu118 link issue (#3421 )	2025-02-09 18:16:44 +08:00
Yineng Zhang	f9905d59a8	support speculative decoding kernel in sgl-kernel (#3373 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-07 20:29:51 +08:00
Yineng Zhang	45c87e083f	fix undefined symbol cudaGetDriverEntryPointByVersion (#3372 )	2025-02-07 19:32:45 +08:00
Xiaoyu Zhang	cdae77b03d	optimize moe_align_kernel cuda (#3347 )	2025-02-07 00:53:46 +08:00
Yineng Zhang	adeee15204	fix sgl-kernel build failure on AMD (#3352 )	2025-02-07 00:35:59 +08:00
Xiaoyu Zhang	ad3499858e	clean moe align block kernel code and add acc test (#3332 )	2025-02-06 16:42:36 +08:00
HAI	2c1a695ff1	ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287 )	2025-02-04 21:44:44 +08:00
Yineng Zhang	00fa7d0417	add copyright for sgl-kernel (#3270 )	2025-02-03 21:34:44 +08:00
Yineng Zhang	7876279ea7	update cutlass dependency (#3240 )	2025-02-01 03:13:44 +08:00
Yineng Zhang	3ee62235c6	revert the MoE dependence (#3230 )	2025-01-31 16:51:41 +08:00
Yineng Zhang	9602c2aac7	keep the parts needed for moe_kernels (#3218 )	2025-01-31 00:39:47 +08:00
Yineng Zhang	e81d7f11de	add tensorrt_llm moe_gemm as 3rdparty (#3217 )	2025-01-30 23:49:14 +08:00
Yineng Zhang	222ce6f1da	add tensorrt_llm common and cutlass_extensions as 3rdparty (#3216 ) Co-authored-by: BBuf <35585791+BBuf@users.noreply.github.com>	2025-01-30 23:04:41 +08:00
Yineng Zhang	468d23cff9	update setup for sgl-kernel (#3214 )	2025-01-30 19:47:50 +08:00

1 2 3

141 Commits