Cheng Wan
|
e879d8b7a8
|
[Feature] Comprehensive Hybrid Parallelism Support (#6389)
|
2025-06-20 14:43:11 -07:00 |
|
Li Hui
|
a06912ad8b
|
Fix judgment condition for enabling Deepseek V3/R1 shared expert fusion optimization (#7371)
|
2025-06-19 21:58:00 -07:00 |
|
YanbingJiang
|
094c116f7d
|
Update python API of activation, topk, norm and rope and remove vllm dependency (#6614)
Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>
|
2025-06-17 22:11:50 -07:00 |
|
Yineng Zhang
|
4f204db57c
|
fix: resolve b200 dsv3 mtp issue (#7286)
|
2025-06-17 16:22:46 -07:00 |
|
AniZpZ
|
3eb4a800e8
|
Fix AWQ Dequant and Weight Loading of deepseek v2 (#6842)
|
2025-06-17 13:45:10 -07:00 |
|
Charles Chen
|
8c16da334e
|
Fix Deepseek R1 0528 FP4 tensor name mismatch issue during weights loading. (#7164)
|
2025-06-17 11:26:23 -07:00 |
|
u4lr451
|
10d60cd41b
|
feat: mtp support dp-attention (#6081)
Co-authored-by: austindeng <austindeng@tencent.com>
Co-authored-by: tianqilin.99 <tianqilin.99@bytedance.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Co-authored-by: ch-wan <cwan39@gatech.edu>
|
2025-06-17 00:33:28 -07:00 |
|
JieXin Liang
|
5ca07eed90
|
[fix] fix DeepGEMM blackwell input quant & ut & fix style and log (#7247)
|
2025-06-16 11:45:54 -07:00 |
|
fzyzcjy
|
349bb2c92a
|
Fix error when disabling new DeepGEMM (#7198)
|
2025-06-14 19:24:54 -07:00 |
|
JieXin Liang
|
55561e2553
|
[fix] fix determine_num_fused_shared_experts (#7180)
|
2025-06-14 17:41:22 -07:00 |
|
JieXin Liang
|
ed54bf9d19
|
[fix] fix dsv3 weight loader tqdm and simplify shared experts fusion (#7181)
|
2025-06-14 11:56:29 -07:00 |
|
fzyzcjy
|
b57d87c297
|
Fix shared experts fusion + weight requant (#7177)
|
2025-06-14 02:35:18 -07:00 |
|
fzyzcjy
|
93cec4335f
|
Support new DeepGEMM (#7172)
|
2025-06-13 23:00:17 -07:00 |
|
fzyzcjy
|
b4c41f7276
|
Refactor DeepGEMM integration (#7150)
|
2025-06-13 20:41:03 -07:00 |
|
fzyzcjy
|
5b1afa7814
|
Re-quantize DeepSeek model weights to support DeepGEMM new input format (#7156)
|
2025-06-13 15:57:45 -07:00 |
|
fzyzcjy
|
c49c1d9226
|
Remove 200us slow concat kernel (part 2: srt) (#7020)
|
2025-06-13 15:19:31 -07:00 |
|
pansicheng
|
2f4ec752bc
|
filter by num_hidden_layers (#7056)
|
2025-06-13 00:53:09 -07:00 |
|
fzyzcjy
|
f6ebba537a
|
Support both approximate and exact expert distribution collection (#6964)
|
2025-06-09 20:56:17 -07:00 |
|
fzyzcjy
|
de1350ea20
|
Minor remove one kernel for DeepSeek (#6977)
|
2025-06-08 17:41:35 -07:00 |
|
Xiaoyu Zhang
|
3712abfaf9
|
Fuse routed scaling factor in deepseek (#6970)
|
2025-06-08 15:24:24 -07:00 |
|
Yineng Zhang
|
1fb76ebb93
|
Revert "Fuse routed scaling factor in topk_reduce kernel (#6220)" (#6968)
|
2025-06-07 21:02:49 -07:00 |
|
Pavani Majety
|
c2c4f57f63
|
[DeepseekR1-FP4] Add Support for nvidia/DeepSeekR1-FP4 model (#6853)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-06-07 17:24:35 -07:00 |
|
Xiaoyu Zhang
|
515ef4facb
|
Fuse routed scaling factor in topk_reduce kernel (#6220)
|
2025-06-07 11:06:50 -07:00 |
|
JieXin Liang
|
22fe787852
|
[sgl-kernel] update deepgemm (#6942)
|
2025-06-06 23:24:41 -07:00 |
|
miter
|
f8eaaab817
|
[fix] logical_to_all_physical_map index 256 is out of bounds in EP parallel. (#6767)
Signed-off-by: miter <miterv@outlook.com>
|
2025-06-06 21:32:33 -07:00 |
|
HAI
|
b819381fec
|
AITER backend extension and workload optimizations (#6838)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
|
2025-06-05 23:00:18 -07:00 |
|
Cheng Wan
|
81964328b7
|
Set num_fused_shared_experts as num_shared_experts when shared_experts fusion is not disabled (#6736)
|
2025-06-04 15:53:22 -07:00 |
|
Cheng Wan
|
8a5480528d
|
[Refactor] Rename n_share_experts_fusion as num_fused_shared_experts (#6735)
|
2025-06-03 17:48:24 -07:00 |
|
fzyzcjy
|
0ea330ca34
|
Fix wrong weight reference in dynamic EPLB (#6818)
|
2025-06-02 23:26:04 -07:00 |
|
Li Hui
|
69dd878b51
|
Fix shared experts fusion error (#6289)
|
2025-05-30 01:16:11 -07:00 |
|
Zilin Zhu
|
51cdd81f97
|
[fix][RL] Fix DeepSeekV3ForCausalLM.post_load_weights for multiple update weight (#6265)
|
2025-05-29 16:28:10 -07:00 |
|
fzyzcjy
|
31589e177e
|
Speed up when having padding tokens two-batch overlap (#6668)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-05-28 16:00:58 -07:00 |
|
fzyzcjy
|
541a985f85
|
Fuse routed_scaling_factor in DeepSeek (#6710)
|
2025-05-28 15:53:37 -07:00 |
|
HAI
|
183d9f969c
|
DeepSeek: enable none block-quant FP8 quantizations (#6638)
|
2025-05-27 09:06:40 -07:00 |
|
fzyzcjy
|
32cd707002
|
Support TP in attention for two batch overlap (#6634)
|
2025-05-26 20:28:12 -07:00 |
|
fzyzcjy
|
0ca3e56802
|
Tiny fix missing expert location dispatch info (#6620)
|
2025-05-26 08:58:31 -07:00 |
|
Yi Zhang
|
65f091310c
|
refactor qwen moe code, use communicator to support tp+dp (#6581)
|
2025-05-25 23:01:10 -07:00 |
|
fzyzcjy
|
0d47788025
|
Support overlapping two batches (#4068)
|
2025-05-24 17:39:07 -07:00 |
|
fzyzcjy
|
b2388433be
|
Add back DeepSeek non-TBO branches (#6578)
|
2025-05-24 17:34:00 -07:00 |
|
fzyzcjy
|
a38376fa99
|
Refactor attention into multiple stages (#6477)
|
2025-05-24 17:33:25 -07:00 |
|
fzyzcjy
|
fc992a09f9
|
Support updating expert locations dynamically (#6388)
|
2025-05-21 21:59:33 -07:00 |
|
Baizhou Zhang
|
d4c038daed
|
[Fix]Fix capture fail bug for DeepSeek (#6275)
|
2025-05-21 11:11:20 -07:00 |
|
fzyzcjy
|
ccfe5c009d
|
Support redundant experts in expert parallel (#6461)
|
2025-05-21 02:05:53 -07:00 |
|
fzyzcjy
|
d6e1d28c8a
|
Refactor DeepSeek attention dispatching (#6476)
|
2025-05-21 02:03:39 -07:00 |
|
Lianmin Zheng
|
03886917bd
|
Disable all two stream overlap on amd (#6475)
|
2025-05-20 19:06:59 -07:00 |
|
fzyzcjy
|
13feffd082
|
Fix master CI for DeepSeek (#6447)
|
2025-05-20 00:31:42 -07:00 |
|
fzyzcjy
|
e98afbe042
|
Support dispatching logical to physical experts (#6385)
|
2025-05-19 22:13:55 -07:00 |
|
HAI
|
6317c5c61f
|
Address performance regression: disable multiple streams on ROCm (#6412)
|
2025-05-19 21:16:20 -07:00 |
|
fzyzcjy
|
d0443275f0
|
Refactor DeepSeek logic into atomic operations (#6326)
|
2025-05-19 21:05:30 -07:00 |
|
fzyzcjy
|
1b19df4b2a
|
Refactor communication logic of DeepSeek for extensibility and understandability (#6321)
|
2025-05-19 20:14:48 -07:00 |
|