sglang

Author	SHA1	Message	Date
Stefan He	e0917e6bd0	Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215 ) Co-authored-by: Stefan He <bhe@linkedin.com>	2025-03-12 00:08:03 -07:00
Xiaoyu Zhang	7130a7cea9	refine sgl_moe_align_block_size_benchmark (#4327 )	2025-03-11 22:48:38 -07:00
yigex	690e1f2371	[AMD] Fix rocm sgl-kernel missing modules error (#4311 ) Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>	2025-03-11 10:35:28 -07:00
Yineng Zhang	cd90945518	bump sgl-kernel 0.0.4.post2 (#4288 )	2025-03-11 00:09:47 -07:00
Yineng Zhang	bde24ab31f	update deepgemm (#4284 )	2025-03-10 23:39:57 -07:00
Elfie Guo	bf2eefc0c7	Uupdate cutalss dependency for its bug fix (#4277 )	2025-03-10 17:00:05 -07:00
Yineng Zhang	3dd4feae63	add THIRDPARTYNOTICES for DeepGEMM (#4272 )	2025-03-10 11:10:57 -07:00
Lianmin Zheng	cf0ccd406e	Optimize rope in sgl kernel (#4267 )	2025-03-10 10:07:45 -07:00
Lianmin Zheng	1a5023e05d	Release sgl-kernel v0.0.4.post1 (#4255 )	2025-03-10 02:39:50 -07:00
Xiaoyu Zhang	23308a9032	fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231 )	2025-03-10 01:42:58 -07:00
Lianmin Zheng	aa957102a9	Simplify tests & Fix trtllm custom allreduce registration (#4252 )	2025-03-10 01:24:22 -07:00
laixin	c553e1604c	DeepGemm integrate to sgl-kernel (#4165 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-03-10 00:35:07 -07:00
Lianmin Zheng	7c0541b385	Move activation.cu to sgl-kernel/elementwise (#4250 )	2025-03-09 22:41:13 -07:00
Lianmin Zheng	730d084f2a	Minor style fix for sgl-kernel (#4243 )	2025-03-09 20:15:13 -07:00
Lianmin Zheng	eb06dbcbf8	Move rope and bmm into sgl-kernel (#4241 )	2025-03-09 18:38:15 -07:00
Yineng Zhang	df84ab2a5b	update sgl-kernel 3rdparty (#4228 )	2025-03-09 01:16:05 -08:00
Yineng Zhang	5c7dd14ba1	chore: bump v0.0.4 for sgl-kernel (#4223 )	2025-03-08 23:01:59 -08:00
Lianmin Zheng	8abf74e3c9	Rename files in sgl kernel to avoid nested folder structure (#4213 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-03-08 22:54:51 -08:00
Xiaoyu Zhang	79a321af55	revert pr 3628 to pass test_mla ci (#4219 )	2025-03-08 21:15:14 -08:00
Xiaoyu Zhang	b3251e9f40	refine quant kernel code style (#4211 )	2025-03-08 05:47:35 -08:00
Lianmin Zheng	8d323e95e4	Use clang format 18 in pr-test-sgl-kernel.yml (#4203 )	2025-03-08 01:28:10 -08:00
Yineng Zhang	96d0e37fa7	Revert "Minor improvement to per_tensor_quant_fp8 (#4197 )" (#4198 )	2025-03-07 22:57:09 -08:00
Rex	90bb2be27e	Minor improvement to per_tensor_quant_fp8 (#4197 )	2025-03-07 22:52:12 -08:00
lukec	b93ef5e56d	Remove the vllm dependency from the moe_align function (#4164 ) Co-authored-by: Hongbosherlock <hongbosherlock@gmail.com>	2025-03-07 22:42:16 -08:00
Lianmin Zheng	d052f4c8a9	New clang format for sgl kernel (#4194 )	2025-03-07 20:21:08 -08:00
Yineng Zhang	eb61f5c9af	Revert "ROCm: Flex Attention Enablement with custom backends (#4178 )" (#4186 )	2025-03-07 10:27:52 -08:00
HAI	0beea4503f	ROCm: Flex Attention Enablement with custom backends (#4178 ) Co-authored-by: linsun12 <linsun12@amd.com>	2025-03-07 04:38:53 -08:00
Yineng Zhang	96263f275c	chore: bump v0.0.3.post7 for sgl-kernel (#4176 )	2025-03-07 01:15:34 -08:00
Yineng Zhang	94a2b9d33e	Put utils in ifndef USE_ROCM to fix CI (#4167 ) (#4168 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-03-07 00:01:17 -08:00
Stefan He	3c3eb374b2	Remove non-existent AMD header include (#4166 )	2025-03-06 23:29:30 -08:00
Stefan He	95085d65e9	[Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163 )	2025-03-06 22:58:52 -08:00
Stefan He	63ee26d162	Add sgl_per_token_quant_fp8 (#4089 )	2025-03-06 20:53:05 -08:00
Xiaoyu Zhang	ad55f17182	[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786 )	2025-03-06 18:05:43 -08:00
Liu Jinjie	0804dd11a0	remove unused max_jobs in setup_rocm.py (#4126 ) Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>	2025-03-06 00:12:19 -08:00
Lianmin Zheng	e074d84e5b	[Minor] more code cleanup (#4077 )	2025-03-04 21:23:47 -08:00
Liu Jinjie	926f8efc0c	remove unused max_jobs (#3607 ) Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>	2025-03-04 04:23:39 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	110e006673	Reorganize python source files in sgl-kernel with multiple files (#4027 )	2025-03-03 06:36:40 -08:00
Lianmin Zheng	6b45a21d16	Reorganize c++ source files in sgl-kernel with multiple folders (#4025 )	2025-03-03 05:32:30 -08:00
Lianmin Zheng	66301e124f	Improve code styles (#4021 )	2025-03-03 03:20:23 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Hubert Lu	9cf4077294	Enable custom AR for AMD GPUs and maintain it in sgl-kernel (#3406 )	2025-03-02 15:19:06 -08:00
Chayenne	18bb216c28	Revert "[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x)" (#3982 )	2025-02-28 23:57:17 -08:00
Elfie Guo	9e74ee91da	Update cutlass dependency (#3966 )	2025-02-28 16:16:31 -08:00
yiakwy-xpu-ml-framework-team	1c96fa86cf	[MOE] enable efficient moe_alignment multi-blocks execution (3x~6x) (#3613 )	2025-02-27 19:42:48 -08:00
Xiaoyu Zhang	55a7ec388f	use warp shuffle style reduce and flashinfer vectorize (#3628 )	2025-02-19 20:53:51 +08:00
Baizhou Zhang	67fc595bb8	[Feature] Apply Cublas Grouped Gemm kernel (#3629 )	2025-02-18 15:18:31 +08:00
Xiaoyu Zhang	3efbdf68b9	fix sgl-kernel codestyle (#3563 )	2025-02-14 18:05:52 +08:00
Yineng Zhang	e082142519	chore: bump 0.0.3.post6 sgl-kernel (#3555 )	2025-02-14 08:55:15 +08:00
Xiaoyu Zhang	f076328bb7	fix moe_align_kernel shm init not sync bug (#3534 )	2025-02-13 16:47:00 +08:00

1 2 3 4

161 Commits