sglang

Author	SHA1	Message	Date
simveit	007f8b3dc2	Added example for multimodal embedding (#4206 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-10 00:53:56 -07:00
DavidChan	4455b26e76	[Bug fixed] fixed the crash when enable the dp-attention on the single card (#3958 )	2025-03-10 00:50:34 -07:00
laixin	c553e1604c	DeepGemm integrate to sgl-kernel (#4165 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com> Co-authored-by: yinfan98 <1106310035@qq.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-03-10 00:35:07 -07:00
Lianmin Zheng	7c0541b385	Move activation.cu to sgl-kernel/elementwise (#4250 )	2025-03-09 22:41:13 -07:00
Lianmin Zheng	e8a69e4d0c	Clean up fp8 support (#4230 )	2025-03-09 21:46:35 -07:00
Lianmin Zheng	fbd560028a	Auto balance CI tests (#4238 )	2025-03-09 21:05:55 -07:00
Lianmin Zheng	730d084f2a	Minor style fix for sgl-kernel (#4243 )	2025-03-09 20:15:13 -07:00
Lianmin Zheng	4a05bdfa86	Revert "Check eagle server args" (#4242 )	2025-03-09 18:53:33 -07:00
Lianmin Zheng	eb06dbcbf8	Move rope and bmm into sgl-kernel (#4241 )	2025-03-09 18:38:15 -07:00
Baizhou Zhang	9dfafa743c	Fix test of flashinfer mla with nextn (#4237 )	2025-03-09 12:45:39 -07:00
Ke Bao	f1d09a6541	Update bench speculative script (#4235 )	2025-03-09 12:19:01 -07:00
Yineng Zhang	df84ab2a5b	update sgl-kernel 3rdparty (#4228 )	2025-03-09 01:16:05 -08:00
Ying Sheng	34c8898755	Check eagle server args (#4217 )	2025-03-09 01:10:43 -08:00
HandH1998	0dd6cda288	Apply sgl w8a8 fp8 kernel (#3148 )	2025-03-09 00:03:32 -08:00
Baizhou Zhang	9fb48f951f	Support nextn for flashinfer mla attention backend (#4218 )	2025-03-09 00:01:54 -08:00
Yineng Zhang	89ccb533ad	use sgl-kernel 0.0.4 (#4224 )	2025-03-08 23:43:09 -08:00
Stefan He	dceb256f1b	[docs] Unhide production metrics page (#4193 )	2025-03-08 23:41:40 -08:00
Peter Pan	0e90ae628a	[docker] Distributed Serving with k8s Statefulset ( good example for DeepSeek-R1) (#3631 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Co-authored-by: Kebe <kebe.liu@daocloud.io>	2025-03-08 23:41:20 -08:00
Lianmin Zheng	1361ab9e03	Lazily import lora backends (#4225 )	2025-03-08 23:39:26 -08:00
Yineng Zhang	5c7dd14ba1	chore: bump v0.0.4 for sgl-kernel (#4223 )	2025-03-08 23:01:59 -08:00
Lianmin Zheng	8abf74e3c9	Rename files in sgl kernel to avoid nested folder structure (#4213 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-03-08 22:54:51 -08:00
Yineng Zhang	ee132a4515	use latest sgl-kernel for mla test (#4222 )	2025-03-08 22:27:47 -08:00
Xiaoyu Zhang	79a321af55	revert pr 3628 to pass test_mla ci (#4219 )	2025-03-08 21:15:14 -08:00
Xihuai Wang	6eec3cdce6	docs(reasoning content): 📝 deepseek-r1 parser support qwq (#4124 )	2025-03-09 04:14:50 +00:00
Lianmin Zheng	48473684cc	Split test_mla.py into two files (#4216 )	2025-03-08 15:40:49 -08:00
Xiaoyu Zhang	b3251e9f40	refine quant kernel code style (#4211 )	2025-03-08 05:47:35 -08:00
Lianmin Zheng	2cadd51d11	Test no vllm custom allreduce (#4210 )	2025-03-08 05:23:06 -08:00
Kebe	4a893d142d	Refactor Dockerfile: unify CUDA logic and reduce image size by ~2.6 GB (#3749 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-03-08 03:01:13 -08:00
Lianmin Zheng	8d323e95e4	Use clang format 18 in pr-test-sgl-kernel.yml (#4203 )	2025-03-08 01:28:10 -08:00
Mingshan	0fe7c13be1	Fix bench_serving flush cache not recognizing OPENAI_API_KEY (#4181 ) Signed-off-by: Mingshan <git@brighill.com>	2025-03-08 01:03:38 -08:00
Lianmin Zheng	08c4d764a5	lazy import attn backends (#4200 )	2025-03-08 00:41:35 -08:00
Yineng Zhang	96d0e37fa7	Revert "Minor improvement to per_tensor_quant_fp8 (#4197 )" (#4198 )	2025-03-07 22:57:09 -08:00
Rex	90bb2be27e	Minor improvement to per_tensor_quant_fp8 (#4197 )	2025-03-07 22:52:12 -08:00
lukec	b93ef5e56d	Remove the vllm dependency from the moe_align function (#4164 ) Co-authored-by: Hongbosherlock <hongbosherlock@gmail.com>	2025-03-07 22:42:16 -08:00
Lianmin Zheng	d4017a6b63	[EAGLE] many fixes for eagle (#4195 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-07 22:12:13 -08:00
Lianmin Zheng	d052f4c8a9	New clang format for sgl kernel (#4194 )	2025-03-07 20:21:08 -08:00
saienduri	e1aaa79ac9	Update amd ci docker image to v0.4.3.post4-rocm630. (#4189 )	2025-03-07 13:02:02 -08:00
Ke Bao	20c8119915	Fix eagle hang issue for max_new_tokens=1 (#4185 )	2025-03-07 12:11:18 -08:00
Yineng Zhang	70866b6f4f	use same version for ci and pyproject (#4187 )	2025-03-07 10:39:55 -08:00
Yineng Zhang	eb61f5c9af	Revert "ROCm: Flex Attention Enablement with custom backends (#4178 )" (#4186 )	2025-03-07 10:27:52 -08:00
HAI	0beea4503f	ROCm: Flex Attention Enablement with custom backends (#4178 ) Co-authored-by: linsun12 <linsun12@amd.com>	2025-03-07 04:38:53 -08:00
Michael Yao	c827c671f7	[Docs] Improve bullets appearance and grammar (#4174 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-07 03:16:25 -08:00
Yineng Zhang	b55a621ffb	fix int8 doc link (#4179 )	2025-03-07 02:49:19 -08:00
lukec	ffa1b3e318	Add an example of using deepseekv3 int8 sglang. (#4177 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-07 01:56:09 -08:00
Yineng Zhang	7e3bb52705	update release-pypi-kernel	2025-03-07 01:48:47 -08:00
Yineng Zhang	96263f275c	chore: bump v0.0.3.post7 for sgl-kernel (#4176 )	2025-03-07 01:15:34 -08:00
Zhiqiang Xie	9376ac361d	Memory pool fix for upstream change about eagle (#4170 )	2025-03-07 00:58:20 -08:00
Yineng Zhang	94a2b9d33e	Put utils in ifndef USE_ROCM to fix CI (#4167 ) (#4168 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-03-07 00:01:17 -08:00
Stefan He	3c3eb374b2	Remove non-existent AMD header include (#4166 )	2025-03-06 23:29:30 -08:00
Michael Yao	d557319a8b	[Docs] Fix links and grammar issues (#4162 )	2025-03-06 23:14:18 -08:00

... 52 53 54 55 56 ...

4977 Commits