sglang

Author	SHA1	Message	Date
Stefan He	e0917e6bd0	Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215 ) Co-authored-by: Stefan He <bhe@linkedin.com>	2025-03-12 00:08:03 -07:00
Xiaoyu Zhang	7130a7cea9	refine sgl_moe_align_block_size_benchmark (#4327 )	2025-03-11 22:48:38 -07:00
Michael Yao	8f1f614ee2	[Docs] Clean up benchmark_and_profiling.md (#4297 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-11 21:48:21 -07:00
lambert0312	7140ba3573	Add A800 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4323 )	2025-03-11 18:25:56 -07:00
Yineng Zhang	d1da58e275	unify is_cuda and is_hip (#4321 )	2025-03-11 18:12:56 -07:00
Yineng Zhang	1cf63485c1	upgrade flashinfer 0.2.3 (#4317 ) Co-authored-by: qingquansong <qsong@linkedin.com>	2025-03-11 15:37:17 -07:00
Mick	ff2ce0b86f	refactor: move image processors to separate files (#4229 )	2025-03-11 12:35:35 -07:00
Ximingwang-09	0f2a2e3c19	Add H20 tuning configs support DeepSeek V3/R1 INT8(block-wise) (#4220 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-03-11 12:32:33 -07:00
yigex	690e1f2371	[AMD] Fix rocm sgl-kernel missing modules error (#4311 ) Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>	2025-03-11 10:35:28 -07:00
Yineng Zhang	00f42707ea	update doc (#4299 )	2025-03-11 01:14:16 -07:00
yych0745	6a02b32d07	Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2025-03-11 00:49:06 -07:00
Ke Bao	3a08f54638	Update MTP doc (#4290 )	2025-03-11 00:46:55 -07:00
lukec	dce303e279	linear support deepgemm (#4199 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-11 00:38:37 -07:00
Yineng Zhang	4d27eb9ad1	update sgl-kernel 0.0.4.post2 (#4291 )	2025-03-11 00:34:33 -07:00
lambert0312	d3ecd63204	Add A800 tuning configs support DeepSeek V3/R1 BF16 and INT8(block-wise) (#4136 )	2025-03-11 00:32:25 -07:00
Yineng Zhang	cd90945518	bump sgl-kernel 0.0.4.post2 (#4288 )	2025-03-11 00:09:47 -07:00
Yineng Zhang	bde24ab31f	update deepgemm (#4284 )	2025-03-10 23:39:57 -07:00
Elfie Guo	bf2eefc0c7	Uupdate cutalss dependency for its bug fix (#4277 )	2025-03-10 17:00:05 -07:00
Lianmin Zheng	5524e7d057	Fix nightly eval for neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8 (#4279 )	2025-03-10 16:50:28 -07:00
Yineng Zhang	e187a3d595	upgrade xgrammar 0.1.15 (#4275 )	2025-03-10 14:53:24 -07:00
Yineng Zhang	3dd4feae63	add THIRDPARTYNOTICES for DeepGEMM (#4272 )	2025-03-10 11:10:57 -07:00
HandH1998	2ac189edc8	Amd test fp8 (#4261 )	2025-03-10 10:12:09 -07:00
Lianmin Zheng	5a6400eec5	Test no vllm custom allreduce (#4256 )	2025-03-10 10:08:25 -07:00
Lianmin Zheng	cf0ccd406e	Optimize rope in sgl kernel (#4267 )	2025-03-10 10:07:45 -07:00
Lianmin Zheng	3d56585a97	increase the timeout of nightly-test.yml (#4262 )	2025-03-10 05:07:03 -07:00
Lianmin Zheng	00d25a7f5e	Fix quantization and nightly tests (#4258 )	2025-03-10 03:06:21 -07:00
Lianmin Zheng	1a5023e05d	Release sgl-kernel v0.0.4.post1 (#4255 )	2025-03-10 02:39:50 -07:00
Xiaoyu Zhang	23308a9032	fix per_token_group_quant_fp8 illegal memory when num_groups % 16 != 0 (#4231 )	2025-03-10 01:42:58 -07:00
shimin	ac69885056	fix the input_ids is None error (#4144 )	2025-03-10 01:38:37 -07:00
Lianmin Zheng	aa957102a9	Simplify tests & Fix trtllm custom allreduce registration (#4252 )	2025-03-10 01:24:22 -07:00
simveit	007f8b3dc2	Added example for multimodal embedding (#4206 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-10 00:53:56 -07:00
DavidChan	4455b26e76	[Bug fixed] fixed the crash when enable the dp-attention on the single card (#3958 )	2025-03-10 00:50:34 -07:00
laixin	c553e1604c	DeepGemm integrate to sgl-kernel (#4165 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-03-10 00:35:07 -07:00
Lianmin Zheng	7c0541b385	Move activation.cu to sgl-kernel/elementwise (#4250 )	2025-03-09 22:41:13 -07:00
Lianmin Zheng	e8a69e4d0c	Clean up fp8 support (#4230 )	2025-03-09 21:46:35 -07:00
Lianmin Zheng	fbd560028a	Auto balance CI tests (#4238 )	2025-03-09 21:05:55 -07:00
Lianmin Zheng	730d084f2a	Minor style fix for sgl-kernel (#4243 )	2025-03-09 20:15:13 -07:00
Lianmin Zheng	4a05bdfa86	Revert "Check eagle server args" (#4242 )	2025-03-09 18:53:33 -07:00
Lianmin Zheng	eb06dbcbf8	Move rope and bmm into sgl-kernel (#4241 )	2025-03-09 18:38:15 -07:00
Baizhou Zhang	9dfafa743c	Fix test of flashinfer mla with nextn (#4237 )	2025-03-09 12:45:39 -07:00
Ke Bao	f1d09a6541	Update bench speculative script (#4235 )	2025-03-09 12:19:01 -07:00
Yineng Zhang	df84ab2a5b	update sgl-kernel 3rdparty (#4228 )	2025-03-09 01:16:05 -08:00
Ying Sheng	34c8898755	Check eagle server args (#4217 )	2025-03-09 01:10:43 -08:00
HandH1998	0dd6cda288	Apply sgl w8a8 fp8 kernel (#3148 )	2025-03-09 00:03:32 -08:00
Baizhou Zhang	9fb48f951f	Support nextn for flashinfer mla attention backend (#4218 )	2025-03-09 00:01:54 -08:00
Yineng Zhang	89ccb533ad	use sgl-kernel 0.0.4 (#4224 )	2025-03-08 23:43:09 -08:00
Stefan He	dceb256f1b	[docs] Unhide production metrics page (#4193 )	2025-03-08 23:41:40 -08:00
Peter Pan	0e90ae628a	[docker] Distributed Serving with k8s Statefulset ( good example for DeepSeek-R1) (#3631 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Co-authored-by: Kebe <kebe.liu@daocloud.io>	2025-03-08 23:41:20 -08:00
Lianmin Zheng	1361ab9e03	Lazily import lora backends (#4225 )	2025-03-08 23:39:26 -08:00
Yineng Zhang	5c7dd14ba1	chore: bump v0.0.4 for sgl-kernel (#4223 )	2025-03-08 23:01:59 -08:00

1 2 3 4 5 ...

2357 Commits