sglang

Author	SHA1	Message	Date
valarLip	e984d5073b	enable aiter_biased_grouped_topk kernel (#7423 )	2025-06-24 02:09:42 -07:00
YanbingJiang	094c116f7d	Update python API of activation, topk, norm and rope and remove vllm dependency (#6614 ) Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com> Co-authored-by: jianan-gu <jianan.gu@intel.com> Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>	2025-06-17 22:11:50 -07:00
fzyzcjy	da47621ccc	Minor speedup topk postprocessing (#7058 )	2025-06-13 00:50:18 -07:00
fzyzcjy	2f715f51cc	Minor compile fused topk (#6944 )	2025-06-07 01:40:38 -07:00
fzyzcjy	5aff1e9392	Fix Qwen3MoE missing token padding optimization (#6820 )	2025-06-05 00:04:59 -07:00
Cheng Wan	81964328b7	Set `num_fused_shared_experts` as `num_shared_experts` when shared_experts fusion is not disabled (#6736 )	2025-06-04 15:53:22 -07:00
Cheng Wan	8a5480528d	[Refactor] Rename `n_share_experts_fusion` as `num_fused_shared_experts` (#6735 )	2025-06-03 17:48:24 -07:00
fzyzcjy	0ca1811715	Support fake perfectly balanced EP dispatch algorithm (#6571 )	2025-05-25 22:35:51 -07:00
Yi Zhang	e6f113569e	support eplb for qwen3 (#6533 )	2025-05-23 18:31:30 -07:00
Li Hui	2f42749184	Fix topk inference performance reduce (#6474 )	2025-05-23 02:58:31 -07:00
fzyzcjy	e98afbe042	Support dispatching logical to physical experts (#6385 )	2025-05-19 22:13:55 -07:00
fzyzcjy	f0653886a5	Expert distribution recording without overhead for EPLB (#4957 )	2025-05-19 20:07:43 -07:00
fzyzcjy	2716830802	Speed up when having padding tokens in DeepEP (#6175 )	2025-05-17 16:44:05 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
Xiaoyu Zhang	d58e354472	simplify the control logic for using shared experts fusion (#5504 )	2025-04-19 13:17:35 -07:00
Xiaoyu Zhang	bed05878f6	fix kimi vl running bug after rebase main (#5461 )	2025-04-18 00:17:34 -07:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
Xiaoyu Zhang	38076dea84	apply fused moe gate in ds v3/r1 (#5371 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-04-14 16:24:26 -07:00
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
Qingquan Song	044c315970	Make torch compile configurable for biased_grouped_topk (#4749 )	2025-03-28 10:57:52 -07:00
Lianmin Zheng	74e0ac1dbd	Clean up `import vllm` in quantization/__init__.py (#4834 )	2025-03-28 10:34:10 -07:00
Xiaoyu Zhang	04e3ff6975	Support compressed tensors fp8w8a8 (#4743 )	2025-03-26 13:21:25 -07:00
yuhsaun-t	199bb01d00	Add endpoints to dump selected expert ids (#4435 ) Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-03-24 21:34:19 -07:00
Ke Bao	3ded4b215d	Revert "feat: update grouped_topk to support softmax and sigmoid" (#4505 )	2025-03-17 11:30:26 -07:00
Yineng Zhang	ad1ae7f7cd	use topk_softmax with sgl-kernel (#4439 )	2025-03-14 15:59:06 -07:00
zixuanzhang226	0c227ee373	feat: update grouped_topk to support softmax and sigmoid (#3680 )	2025-02-21 16:30:15 +08:00
chenxiaobing	d5d80ab477	[Bugfix] Fix scores mask for moe topk (#3705 )	2025-02-21 02:17:23 +08:00
Xiaoyu Zhang	2f47d710ae	refine some typo (#3473 )	2025-02-10 23:35:44 +08:00
Ke Bao	1ebe1d6de5	Optimize MoE topk with torch compile (#3236 )	2025-02-01 01:36:50 +08:00
Lianmin Zheng	72c7776355	Fix linear.py and improve weight loading (#2851 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-01-13 01:39:14 -08:00
Yineng Zhang	635a042623	docs: update deepseek v3 example (#2592 )	2024-12-26 17:43:37 +08:00
Ke Bao	e835a50021	Reorg moe code (#2563 )	2024-12-24 01:10:22 +08:00

32 Commits