sglang

Author	SHA1	Message	Date
Li Hui	69dd878b51	Fix shared experts fusion error (#6289 )	2025-05-30 01:16:11 -07:00
Zilin Zhu	51cdd81f97	[fix][RL] Fix DeepSeekV3ForCausalLM.post_load_weights for multiple update weight (#6265 )	2025-05-29 16:28:10 -07:00
fzyzcjy	31589e177e	Speed up when having padding tokens two-batch overlap (#6668 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-05-28 16:00:58 -07:00
fzyzcjy	541a985f85	Fuse routed_scaling_factor in DeepSeek (#6710 )	2025-05-28 15:53:37 -07:00
HAI	183d9f969c	DeepSeek: enable none block-quant FP8 quantizations (#6638 )	2025-05-27 09:06:40 -07:00
fzyzcjy	32cd707002	Support TP in attention for two batch overlap (#6634 )	2025-05-26 20:28:12 -07:00
fzyzcjy	0ca3e56802	Tiny fix missing expert location dispatch info (#6620 )	2025-05-26 08:58:31 -07:00
Yi Zhang	65f091310c	refactor qwen moe code, use communicator to support tp+dp (#6581 )	2025-05-25 23:01:10 -07:00
fzyzcjy	0d47788025	Support overlapping two batches (#4068 )	2025-05-24 17:39:07 -07:00
fzyzcjy	b2388433be	Add back DeepSeek non-TBO branches (#6578 )	2025-05-24 17:34:00 -07:00
fzyzcjy	a38376fa99	Refactor attention into multiple stages (#6477 )	2025-05-24 17:33:25 -07:00
fzyzcjy	fc992a09f9	Support updating expert locations dynamically (#6388 )	2025-05-21 21:59:33 -07:00
Baizhou Zhang	d4c038daed	[Fix]Fix capture fail bug for DeepSeek (#6275 )	2025-05-21 11:11:20 -07:00
fzyzcjy	ccfe5c009d	Support redundant experts in expert parallel (#6461 )	2025-05-21 02:05:53 -07:00
fzyzcjy	d6e1d28c8a	Refactor DeepSeek attention dispatching (#6476 )	2025-05-21 02:03:39 -07:00
Lianmin Zheng	03886917bd	Disable all two stream overlap on amd (#6475 )	2025-05-20 19:06:59 -07:00
fzyzcjy	13feffd082	Fix master CI for DeepSeek (#6447 )	2025-05-20 00:31:42 -07:00
fzyzcjy	e98afbe042	Support dispatching logical to physical experts (#6385 )	2025-05-19 22:13:55 -07:00
HAI	6317c5c61f	Address performance regression: disable multiple streams on ROCm (#6412 )	2025-05-19 21:16:20 -07:00
fzyzcjy	d0443275f0	Refactor DeepSeek logic into atomic operations (#6326 )	2025-05-19 21:05:30 -07:00
fzyzcjy	1b19df4b2a	Refactor communication logic of DeepSeek for extensibility and understandability (#6321 )	2025-05-19 20:14:48 -07:00
fzyzcjy	f0653886a5	Expert distribution recording without overhead for EPLB (#4957 )	2025-05-19 20:07:43 -07:00
fzyzcjy	72bfb0baf0	Refactor DeepSeek MoE layer to unify the two forward branches (#6325 )	2025-05-18 15:34:36 -07:00
fzyzcjy	2716830802	Speed up when having padding tokens in DeepEP (#6175 )	2025-05-17 16:44:05 -07:00
fzyzcjy	2df9d40aa6	Minor code cleanup refactor for DeepSeek models (#6324 )	2025-05-16 19:06:03 -07:00
fzyzcjy	8dc191f237	Fix one wasted kernel in DeepSeek and minor refactor (#6316 )	2025-05-16 19:05:33 -07:00
fzyzcjy	f194e14fb7	Reduce MoE memory usage (#6147 )	2025-05-15 09:38:28 -07:00
Cheng Wan	b2e95f62b4	Fix two issues related to `--moe-dense-tp-size=1` (#5657 ) Co-authored-by: liusy58 <liusy58@linux.alibaba.com> Co-authored-by: 颉沆 <xiehang.lsy@alibaba-inc.com>	2025-05-12 23:51:39 -07:00
Cheng Wan	25c83fff6a	Performing Vocabulary Parallelism for LM Head across Attention TP Groups (#5558 ) Co-authored-by: liusy58 <liusy58@linux.alibaba.com>	2025-05-11 23:36:29 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
JieXin Liang	c178abdabc	[fix] fix determine_n_share_experts_fusion (#6118 )	2025-05-10 01:19:09 -07:00
xu-yfei	e30c273bc9	opt flashinfer mla cat (#5822 ) Co-authored-by: xuyongfei.xyf <xuyongfei.xyf@antgroup.com>	2025-05-08 23:17:14 -07:00
JieXin Liang	5e02330137	[perf] dsv3 bmm fallback to bf16 (#5662 )	2025-05-08 11:43:39 -07:00
lukec	acc816d8a2	DeepEP normal support deepgemm-contiguous (#5626 ) Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com> Co-authored-by: Xuting Zhou <xutingz@nvidia.com> Co-authored-by: ZhengHSI <zhenghsi@qq.com>	2025-05-08 01:20:32 -07:00
Baizhou Zhang	73600673bb	Clean logs for DeepSeek-V3 launching (#6079 )	2025-05-07 18:54:50 -07:00
JieXin Liang	b70957fcf8	[refactor] slightly tidy fp8 module (#5993 )	2025-05-07 17:28:24 -07:00
Ke Bao	d8ab60117f	Overlap qk norm with two streams (#5977 )	2025-05-02 09:26:30 -07:00
Ke Bao	de2faef97e	Remove extra contiguous (#5953 )	2025-05-01 09:28:46 -07:00
liwenju0	8fefdd32c7	[Feature] add support kimi vl model (#5383 ) Co-authored-by: wenju.li <wenju.li@deepctr.cn>	2025-04-29 21:31:19 -07:00
Ke Bao	dd408ee481	Auto set draft model path for MTP (#5793 )	2025-04-29 16:25:40 -07:00
Ke Bao	799c4bb502	Fuse MLA set kv cache kernel (#5748 )	2025-04-26 18:42:22 -07:00
Ke Bao	c3948ba67e	Reorder loop in shared expert weight loading (#5719 )	2025-04-25 17:27:42 -07:00
Yuhong Guo	5d93a950ee	[BugFix] Fix combination of MTP and `--n-share-experts-fusion`with R1 (#5707 )	2025-04-24 21:13:51 +08:00
fzyzcjy	71d1785f2d	Remove unnecessary `torch.full` in DeepSeek (#5601 )	2025-04-22 21:24:29 -07:00
Baizhou Zhang	3f87f83116	Fuse q_a_proj and kv_a_proj (#5619 )	2025-04-22 20:35:08 -07:00
Ke Bao	6b6e748775	Remove q concat in FA3 backend for DeepSeek decode (#5638 )	2025-04-22 11:43:12 -07:00
lambert0312	76d17c7ecb	Fix shared experts fusion error without quantization (#5632 )	2025-04-22 09:22:26 -07:00
JieXin Liang	4418f599a5	Fix FA3 DeepSeek prefill performance regression (#5624 ) Co-authored-by: ispobock <ispobaoke@gmail.com>	2025-04-22 01:41:41 -07:00
JieXin Liang	506be6b892	[fix] fix compile_deep_gemm missing kv_b_proj (#5620 )	2025-04-22 00:06:36 -07:00
Ke Bao	11b23ae97b	Remove extra copy in deepseek forward absorb (#5578 ) Co-authored-by: saienduri <saimanas.enduri@amd.com>	2025-04-21 19:33:21 -07:00

1 2 3 4

158 Commits