sglang

Author	SHA1	Message	Date
fzyzcjy	b4c41f7276	Refactor DeepGEMM integration (#7150 )	2025-06-13 20:41:03 -07:00
Cheng Wan	499f5e620c	Fix one missing arg in DeepEP (#6878 )	2025-06-04 19:14:47 -07:00
Cheng Wan	81964328b7	Set `num_fused_shared_experts` as `num_shared_experts` when shared_experts fusion is not disabled (#6736 )	2025-06-04 15:53:22 -07:00
Cheng Wan	ced3c07afe	Support token-level quantization for EP MoE (#6782 )	2025-05-30 17:26:30 -07:00
Zilin Zhu	e9feb48838	[RL] Remove the w13 weight_scale and input_scale for UnquantizedEPMoE… (#6308 )	2025-05-21 22:03:15 -07:00
fzyzcjy	13feffd082	Fix master CI for DeepSeek (#6447 )	2025-05-20 00:31:42 -07:00
fzyzcjy	e98afbe042	Support dispatching logical to physical experts (#6385 )	2025-05-19 22:13:55 -07:00
fzyzcjy	c471d39eb9	Support loading weights when physical experts are different from logical experts (#6386 )	2025-05-19 21:05:53 -07:00
fzyzcjy	2df9d40aa6	Minor code cleanup refactor for DeepSeek models (#6324 )	2025-05-16 19:06:03 -07:00
fzyzcjy	f194e14fb7	Reduce MoE memory usage (#6147 )	2025-05-15 09:38:28 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
lukec	acc816d8a2	DeepEP normal support deepgemm-contiguous (#5626 ) Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: Xuting Zhou <xutingz@nvidia.com> Co-authored-by: ZhengHSI <zhenghsi@qq.com>	2025-05-08 01:20:32 -07:00
fzyzcjy	463d4b7400	Fix DeepEP cannot run on latest master (#5567 )	2025-04-20 14:19:42 -07:00
Xiaoyu Zhang	d58e354472	simplify the control logic for using shared experts fusion (#5504 )	2025-04-19 13:17:35 -07:00
fzyzcjy	1e0806f30b	Fix DeepGEMM masked cannot be run on groups not being multiple or 4 (#5340 )	2025-04-18 22:38:07 -07:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
fzyzcjy	8e10fec9a8	Small refactor DeepEPMode to clean up code a bit (#4992 )	2025-04-03 02:56:44 -07:00
Jinyan Chen	23c764b18a	[Feature] Support DeepEP Low Latency (#4767 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: laixinn <xielx@shanghaitech.edu.cn> Co-authored-by: ch-wan <cwan39@gatech.edu>	2025-04-01 09:23:25 -07:00
xutizhou	c2bd094d6e	Optimize Permute Kernel in DeepEP (#4643 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-03-22 14:30:34 -07:00
Jinyan Chen	f44db16c8e	[Feature] Integrate DeepEP into SGLang (#4232 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: Xuting Zhou <xutingz@nvidia.com>	2025-03-19 08:16:31 -07:00
Yineng Zhang	977d7cd26a	cleanup deps 1/n (#4400 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-03-14 00:00:33 -07:00
Stefan He	e0917e6bd0	Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215 ) Co-authored-by: Stefan He <bhe@linkedin.com>	2025-03-12 00:08:03 -07:00
Yineng Zhang	d1da58e275	unify is_cuda and is_hip (#4321 )	2025-03-11 18:12:56 -07:00
Lianmin Zheng	66301e124f	Improve code styles (#4021 )	2025-03-03 03:20:23 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
lukec	21463e321a	Expert Parallelism (EP) Support for DeepSeek V3/R1 (#3602 ) Co-authored-by: laixin <xielx@shanghaitech.edu.cn> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: laixin <q865809639@gmail.com>	2025-02-26 02:29:37 -08:00
Yineng Zhang	4eb4b401cc	update and simplify CustomOp (#3249 )	2025-02-01 18:56:44 +08:00
Lianmin Zheng	53cef81587	Improve weight loading and code style (#3174 )	2025-01-27 03:00:41 -08:00
Lianmin Zheng	52c03f16b9	Add activation parameters to fused_moe (#3170 )	2025-01-27 00:23:37 -08:00
Yineng Zhang	033c715b46	cleanup models dependencies 1/n (#2948 )	2025-01-17 23:46:48 +08:00
Yineng Zhang	5dc54f1a62	feat: remove vllm distributed (#2907 ) Co-authored-by: Zhangyi <1109276519@qq.com>	2025-01-17 22:31:51 +08:00
Ke Bao	e835a50021	Reorg moe code (#2563 )	2024-12-24 01:10:22 +08:00

32 Commits