Commit Graph

32 Commits

Author SHA1 Message Date
valarLip
e984d5073b enable aiter_biased_grouped_topk kernel (#7423) 2025-06-24 02:09:42 -07:00
YanbingJiang
094c116f7d Update python API of activation, topk, norm and rope and remove vllm dependency (#6614)
Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>
2025-06-17 22:11:50 -07:00
fzyzcjy
da47621ccc Minor speedup topk postprocessing (#7058) 2025-06-13 00:50:18 -07:00
fzyzcjy
2f715f51cc Minor compile fused topk (#6944) 2025-06-07 01:40:38 -07:00
fzyzcjy
5aff1e9392 Fix Qwen3MoE missing token padding optimization (#6820) 2025-06-05 00:04:59 -07:00
Cheng Wan
81964328b7 Set num_fused_shared_experts as num_shared_experts when shared_experts fusion is not disabled (#6736) 2025-06-04 15:53:22 -07:00
Cheng Wan
8a5480528d [Refactor] Rename n_share_experts_fusion as num_fused_shared_experts (#6735) 2025-06-03 17:48:24 -07:00
fzyzcjy
0ca1811715 Support fake perfectly balanced EP dispatch algorithm (#6571) 2025-05-25 22:35:51 -07:00
Yi Zhang
e6f113569e support eplb for qwen3 (#6533) 2025-05-23 18:31:30 -07:00
Li Hui
2f42749184 Fix topk inference performance reduce (#6474) 2025-05-23 02:58:31 -07:00
fzyzcjy
e98afbe042 Support dispatching logical to physical experts (#6385) 2025-05-19 22:13:55 -07:00
fzyzcjy
f0653886a5 Expert distribution recording without overhead for EPLB (#4957) 2025-05-19 20:07:43 -07:00
fzyzcjy
2716830802 Speed up when having padding tokens in DeepEP (#6175) 2025-05-17 16:44:05 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
Xiaoyu Zhang
d58e354472 simplify the control logic for using shared experts fusion (#5504) 2025-04-19 13:17:35 -07:00
Xiaoyu Zhang
bed05878f6 fix kimi vl running bug after rebase main (#5461) 2025-04-18 00:17:34 -07:00
Lianmin Zheng
177320a582 Clean up imports (#5467) 2025-04-16 15:26:49 -07:00
Xiaoyu Zhang
38076dea84 apply fused moe gate in ds v3/r1 (#5371)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-04-14 16:24:26 -07:00
Xiaoyu Zhang
924ca7c92c Add DeepSeek V3/R1 shared experts fusion (#4918) 2025-04-04 01:59:29 -07:00
Qingquan Song
044c315970 Make torch compile configurable for biased_grouped_topk (#4749) 2025-03-28 10:57:52 -07:00
Lianmin Zheng
74e0ac1dbd Clean up import vllm in quantization/__init__.py (#4834) 2025-03-28 10:34:10 -07:00
Xiaoyu Zhang
04e3ff6975 Support compressed tensors fp8w8a8 (#4743) 2025-03-26 13:21:25 -07:00
yuhsaun-t
199bb01d00 Add endpoints to dump selected expert ids (#4435)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-03-24 21:34:19 -07:00
Ke Bao
3ded4b215d Revert "feat: update grouped_topk to support softmax and sigmoid" (#4505) 2025-03-17 11:30:26 -07:00
Yineng Zhang
ad1ae7f7cd use topk_softmax with sgl-kernel (#4439) 2025-03-14 15:59:06 -07:00
zixuanzhang226
0c227ee373 feat: update grouped_topk to support softmax and sigmoid (#3680) 2025-02-21 16:30:15 +08:00
chenxiaobing
d5d80ab477 [Bugfix] Fix scores mask for moe topk (#3705) 2025-02-21 02:17:23 +08:00
Xiaoyu Zhang
2f47d710ae refine some typo (#3473) 2025-02-10 23:35:44 +08:00
Ke Bao
1ebe1d6de5 Optimize MoE topk with torch compile (#3236) 2025-02-01 01:36:50 +08:00
Lianmin Zheng
72c7776355 Fix linear.py and improve weight loading (#2851)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2025-01-13 01:39:14 -08:00
Yineng Zhang
635a042623 docs: update deepseek v3 example (#2592) 2024-12-26 17:43:37 +08:00
Ke Bao
e835a50021 Reorg moe code (#2563) 2024-12-24 01:10:22 +08:00