sglang

Author	SHA1	Message	Date
Yineng Zhang	df84ab2a5b	update sgl-kernel 3rdparty (#4228 )	2025-03-09 01:16:05 -08:00
Ying Sheng	34c8898755	Check eagle server args (#4217 )	2025-03-09 01:10:43 -08:00
HandH1998	0dd6cda288	Apply sgl w8a8 fp8 kernel (#3148 )	2025-03-09 00:03:32 -08:00
Baizhou Zhang	9fb48f951f	Support nextn for flashinfer mla attention backend (#4218 )	2025-03-09 00:01:54 -08:00
Yineng Zhang	89ccb533ad	use sgl-kernel 0.0.4 (#4224 )	2025-03-08 23:43:09 -08:00
Stefan He	dceb256f1b	[docs] Unhide production metrics page (#4193 )	2025-03-08 23:41:40 -08:00
Peter Pan	0e90ae628a	[docker] Distributed Serving with k8s Statefulset ( good example for DeepSeek-R1) (#3631 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Co-authored-by: Kebe <kebe.liu@daocloud.io>	2025-03-08 23:41:20 -08:00
Lianmin Zheng	1361ab9e03	Lazily import lora backends (#4225 )	2025-03-08 23:39:26 -08:00
Yineng Zhang	5c7dd14ba1	chore: bump v0.0.4 for sgl-kernel (#4223 )	2025-03-08 23:01:59 -08:00
Lianmin Zheng	8abf74e3c9	Rename files in sgl kernel to avoid nested folder structure (#4213 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-03-08 22:54:51 -08:00
Yineng Zhang	ee132a4515	use latest sgl-kernel for mla test (#4222 )	2025-03-08 22:27:47 -08:00
Xiaoyu Zhang	79a321af55	revert pr 3628 to pass test_mla ci (#4219 )	2025-03-08 21:15:14 -08:00
Xihuai Wang	6eec3cdce6	docs(reasoning content): 📝 deepseek-r1 parser support qwq (#4124 )	2025-03-09 04:14:50 +00:00
Lianmin Zheng	48473684cc	Split test_mla.py into two files (#4216 )	2025-03-08 15:40:49 -08:00
Xiaoyu Zhang	b3251e9f40	refine quant kernel code style (#4211 )	2025-03-08 05:47:35 -08:00
Lianmin Zheng	2cadd51d11	Test no vllm custom allreduce (#4210 )	2025-03-08 05:23:06 -08:00
Kebe	4a893d142d	Refactor Dockerfile: unify CUDA logic and reduce image size by ~2.6 GB (#3749 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-03-08 03:01:13 -08:00
Lianmin Zheng	8d323e95e4	Use clang format 18 in pr-test-sgl-kernel.yml (#4203 )	2025-03-08 01:28:10 -08:00
Mingshan	0fe7c13be1	Fix bench_serving flush cache not recognizing OPENAI_API_KEY (#4181 ) Signed-off-by: Mingshan <git@brighill.com>	2025-03-08 01:03:38 -08:00
Lianmin Zheng	08c4d764a5	lazy import attn backends (#4200 )	2025-03-08 00:41:35 -08:00
Yineng Zhang	96d0e37fa7	Revert "Minor improvement to per_tensor_quant_fp8 (#4197 )" (#4198 )	2025-03-07 22:57:09 -08:00
Rex	90bb2be27e	Minor improvement to per_tensor_quant_fp8 (#4197 )	2025-03-07 22:52:12 -08:00
lukec	b93ef5e56d	Remove the vllm dependency from the moe_align function (#4164 ) Co-authored-by: Hongbosherlock <hongbosherlock@gmail.com>	2025-03-07 22:42:16 -08:00
Lianmin Zheng	d4017a6b63	[EAGLE] many fixes for eagle (#4195 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-07 22:12:13 -08:00
Lianmin Zheng	d052f4c8a9	New clang format for sgl kernel (#4194 )	2025-03-07 20:21:08 -08:00
saienduri	e1aaa79ac9	Update amd ci docker image to v0.4.3.post4-rocm630. (#4189 )	2025-03-07 13:02:02 -08:00
Ke Bao	20c8119915	Fix eagle hang issue for max_new_tokens=1 (#4185 )	2025-03-07 12:11:18 -08:00
Yineng Zhang	70866b6f4f	use same version for ci and pyproject (#4187 )	2025-03-07 10:39:55 -08:00
Yineng Zhang	eb61f5c9af	Revert "ROCm: Flex Attention Enablement with custom backends (#4178 )" (#4186 )	2025-03-07 10:27:52 -08:00
HAI	0beea4503f	ROCm: Flex Attention Enablement with custom backends (#4178 ) Co-authored-by: linsun12 <linsun12@amd.com>	2025-03-07 04:38:53 -08:00
Michael Yao	c827c671f7	[Docs] Improve bullets appearance and grammar (#4174 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-03-07 03:16:25 -08:00
Yineng Zhang	b55a621ffb	fix int8 doc link (#4179 )	2025-03-07 02:49:19 -08:00
lukec	ffa1b3e318	Add an example of using deepseekv3 int8 sglang. (#4177 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-07 01:56:09 -08:00
Yineng Zhang	7e3bb52705	update release-pypi-kernel	2025-03-07 01:48:47 -08:00
Yineng Zhang	96263f275c	chore: bump v0.0.3.post7 for sgl-kernel (#4176 )	2025-03-07 01:15:34 -08:00
Zhiqiang Xie	9376ac361d	Memory pool fix for upstream change about eagle (#4170 )	2025-03-07 00:58:20 -08:00
Yineng Zhang	94a2b9d33e	Put utils in ifndef USE_ROCM to fix CI (#4167 ) (#4168 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com>	2025-03-07 00:01:17 -08:00
Stefan He	3c3eb374b2	Remove non-existent AMD header include (#4166 )	2025-03-06 23:29:30 -08:00
Michael Yao	d557319a8b	[Docs] Fix links and grammar issues (#4162 )	2025-03-06 23:14:18 -08:00
Stefan He	95085d65e9	[Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163 )	2025-03-06 22:58:52 -08:00
HandH1998	c7f254468f	[Feature] DeepSeek V3/R1 INT8 Quantization (channel-wise) (#3888 ) Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: b0urnee <2769086541@qq.com>	2025-03-06 20:54:52 -08:00
Stefan He	63ee26d162	Add sgl_per_token_quant_fp8 (#4089 )	2025-03-06 20:53:05 -08:00
Xiaoyu Zhang	ad55f17182	[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786 )	2025-03-06 18:05:43 -08:00
Pan Lyu	361971b859	Add Support for Qwen2-VL Multi-modal Embedding Models (#3694 )	2025-03-06 16:46:20 -08:00
HAI	13bc39c5d6	ROCm: enable trillion-parameter MoE models with INT4-FP8 single node (#4152 )	2025-03-06 15:33:02 -08:00
Chayenne	9854a18a51	Hot fix small vocal eagle in docs (#4154 ) Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-03-06 15:13:26 -08:00
Chayenne	ebddb65aed	Docs: add torch compile cache (#4151 ) Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-03-06 14:27:09 -08:00
Adarsh Shirawalmath	19fd57bcd7	[docs] fix HF reference script command (#4148 )	2025-03-06 13:21:54 -08:00
Lianmin Zheng	9c58e68b4c	Release v0.4.3.post4 (#4140 )	2025-03-06 12:50:28 -08:00
Oliver Stanley	d03b3467b8	Fix constrained generation errors by adding datasets dependency (#4142 )	2025-03-06 12:07:51 -08:00

1 2 3 4 5 ...

2316 Commits