fzyzcjy
|
b4c41f7276
|
Refactor DeepGEMM integration (#7150)
|
2025-06-13 20:41:03 -07:00 |
|
Cheng Wan
|
499f5e620c
|
Fix one missing arg in DeepEP (#6878)
|
2025-06-04 19:14:47 -07:00 |
|
Cheng Wan
|
81964328b7
|
Set num_fused_shared_experts as num_shared_experts when shared_experts fusion is not disabled (#6736)
|
2025-06-04 15:53:22 -07:00 |
|
Cheng Wan
|
ced3c07afe
|
Support token-level quantization for EP MoE (#6782)
|
2025-05-30 17:26:30 -07:00 |
|
Zilin Zhu
|
e9feb48838
|
[RL] Remove the w13 weight_scale and input_scale for UnquantizedEPMoE… (#6308)
|
2025-05-21 22:03:15 -07:00 |
|
fzyzcjy
|
13feffd082
|
Fix master CI for DeepSeek (#6447)
|
2025-05-20 00:31:42 -07:00 |
|
fzyzcjy
|
e98afbe042
|
Support dispatching logical to physical experts (#6385)
|
2025-05-19 22:13:55 -07:00 |
|
fzyzcjy
|
c471d39eb9
|
Support loading weights when physical experts are different from logical experts (#6386)
|
2025-05-19 21:05:53 -07:00 |
|
fzyzcjy
|
2df9d40aa6
|
Minor code cleanup refactor for DeepSeek models (#6324)
|
2025-05-16 19:06:03 -07:00 |
|
fzyzcjy
|
f194e14fb7
|
Reduce MoE memory usage (#6147)
|
2025-05-15 09:38:28 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
lukec
|
acc816d8a2
|
DeepEP normal support deepgemm-contiguous (#5626)
Co-authored-by: Yingyi Huang <yingyihuang2000@outlook.com>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Xuting Zhou <xutingz@nvidia.com>
Co-authored-by: ZhengHSI <zhenghsi@qq.com>
|
2025-05-08 01:20:32 -07:00 |
|
fzyzcjy
|
463d4b7400
|
Fix DeepEP cannot run on latest master (#5567)
|
2025-04-20 14:19:42 -07:00 |
|
Xiaoyu Zhang
|
d58e354472
|
simplify the control logic for using shared experts fusion (#5504)
|
2025-04-19 13:17:35 -07:00 |
|
fzyzcjy
|
1e0806f30b
|
Fix DeepGEMM masked cannot be run on groups not being multiple or 4 (#5340)
|
2025-04-18 22:38:07 -07:00 |
|
Lianmin Zheng
|
177320a582
|
Clean up imports (#5467)
|
2025-04-16 15:26:49 -07:00 |
|
fzyzcjy
|
8e10fec9a8
|
Small refactor DeepEPMode to clean up code a bit (#4992)
|
2025-04-03 02:56:44 -07:00 |
|
Jinyan Chen
|
23c764b18a
|
[Feature] Support DeepEP Low Latency (#4767)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: laixinn <xielx@shanghaitech.edu.cn>
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-04-01 09:23:25 -07:00 |
|
xutizhou
|
c2bd094d6e
|
Optimize Permute Kernel in DeepEP (#4643)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-03-22 14:30:34 -07:00 |
|
Jinyan Chen
|
f44db16c8e
|
[Feature] Integrate DeepEP into SGLang (#4232)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: Xuting Zhou <xutingz@nvidia.com>
|
2025-03-19 08:16:31 -07:00 |
|
Yineng Zhang
|
977d7cd26a
|
cleanup deps 1/n (#4400)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-03-14 00:00:33 -07:00 |
|
Stefan He
|
e0917e6bd0
|
Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215)
Co-authored-by: Stefan He <bhe@linkedin.com>
|
2025-03-12 00:08:03 -07:00 |
|
Yineng Zhang
|
d1da58e275
|
unify is_cuda and is_hip (#4321)
|
2025-03-11 18:12:56 -07:00 |
|
Lianmin Zheng
|
66301e124f
|
Improve code styles (#4021)
|
2025-03-03 03:20:23 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
lukec
|
21463e321a
|
Expert Parallelism (EP) Support for DeepSeek V3/R1 (#3602)
Co-authored-by: laixin <xielx@shanghaitech.edu.cn>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: laixin <q865809639@gmail.com>
|
2025-02-26 02:29:37 -08:00 |
|
Yineng Zhang
|
4eb4b401cc
|
update and simplify CustomOp (#3249)
|
2025-02-01 18:56:44 +08:00 |
|
Lianmin Zheng
|
53cef81587
|
Improve weight loading and code style (#3174)
|
2025-01-27 03:00:41 -08:00 |
|
Lianmin Zheng
|
52c03f16b9
|
Add activation parameters to fused_moe (#3170)
|
2025-01-27 00:23:37 -08:00 |
|
Yineng Zhang
|
033c715b46
|
cleanup models dependencies 1/n (#2948)
|
2025-01-17 23:46:48 +08:00 |
|
Yineng Zhang
|
5dc54f1a62
|
feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
|
2025-01-17 22:31:51 +08:00 |
|
Ke Bao
|
e835a50021
|
Reorg moe code (#2563)
|
2024-12-24 01:10:22 +08:00 |
|