fzyzcjy
|
c6d549e773
|
Multiple tiny code cleanups (#4608)
|
2025-03-22 22:39:11 -07:00 |
|
xutizhou
|
c2bd094d6e
|
Optimize Permute Kernel in DeepEP (#4643)
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-03-22 14:30:34 -07:00 |
|
Jinyan Chen
|
f44db16c8e
|
[Feature] Integrate DeepEP into SGLang (#4232)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: Xuting Zhou <xutingz@nvidia.com>
|
2025-03-19 08:16:31 -07:00 |
|
Ke Bao
|
3ded4b215d
|
Revert "feat: update grouped_topk to support softmax and sigmoid" (#4505)
|
2025-03-17 11:30:26 -07:00 |
|
Wenbo Yang
|
75b656488a
|
Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. (#4418)
|
2025-03-17 00:03:43 -07:00 |
|
Mick
|
0f52fb55ec
|
config: Update fused moe config (#4493)
|
2025-03-16 23:51:58 -07:00 |
|
Mick
|
8ec2ce0726
|
perf: update fused moe config (#4459)
|
2025-03-15 21:23:57 -07:00 |
|
Yineng Zhang
|
ad1ae7f7cd
|
use topk_softmax with sgl-kernel (#4439)
|
2025-03-14 15:59:06 -07:00 |
|
Yineng Zhang
|
977d7cd26a
|
cleanup deps 1/n (#4400)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-03-14 00:00:33 -07:00 |
|
Lianmin Zheng
|
c6d7f8d370
|
Add some fused elementwise kernels for grok-1 (#4398)
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>
|
2025-03-13 13:39:10 -07:00 |
|
Cheng Wan
|
2f6bacee03
|
[moe] fix: correct the cache size in the last chunk (#3679)
Co-authored-by: Abatom <abzhonghua@gmail.com>
|
2025-03-12 22:22:13 -07:00 |
|
Stefan He
|
e0917e6bd0
|
Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215)
Co-authored-by: Stefan He <bhe@linkedin.com>
|
2025-03-12 00:08:03 -07:00 |
|
lambert0312
|
7140ba3573
|
Add A800 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4323)
|
2025-03-11 18:25:56 -07:00 |
|
Yineng Zhang
|
d1da58e275
|
unify is_cuda and is_hip (#4321)
|
2025-03-11 18:12:56 -07:00 |
|
Ximingwang-09
|
0f2a2e3c19
|
Add H20 tuning configs support DeepSeek V3/R1 INT8(block-wise) (#4220)
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>
|
2025-03-11 12:32:33 -07:00 |
|
yych0745
|
6a02b32d07
|
Add A100 tuning configs for DeepSeek R1/V3 channel-wise INT8 (#4287)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2025-03-11 00:49:06 -07:00 |
|
lambert0312
|
d3ecd63204
|
Add A800 tuning configs support DeepSeek V3/R1 BF16 and INT8(block-wise) (#4136)
|
2025-03-11 00:32:25 -07:00 |
|
Lianmin Zheng
|
00d25a7f5e
|
Fix quantization and nightly tests (#4258)
|
2025-03-10 03:06:21 -07:00 |
|
HandH1998
|
c7f254468f
|
[Feature] DeepSeek V3/R1 INT8 Quantization (channel-wise) (#3888)
Co-authored-by: yych0745 <1398089567@qq.com>
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
Co-authored-by: b0urnee <2769086541@qq.com>
|
2025-03-06 20:54:52 -08:00 |
|
HAI
|
13bc39c5d6
|
ROCm: enable trillion-parameter MoE models with INT4-FP8 single node (#4152)
|
2025-03-06 15:33:02 -08:00 |
|
HAI
|
71ab0dabe0
|
Fix the moe padding conditional logic (#4081)
|
2025-03-05 10:56:51 -08:00 |
|
HAI
|
51d25405a7
|
ROCm: update aiter and its usage to fused moe (bloat16, fp8, fp8 block-quant) (#4053)
|
2025-03-04 03:00:46 -08:00 |
|
kk
|
11eea69e70
|
Fix assert options.num_stages != 0 error in the latest ROCm build image (#4049)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-03-03 20:37:03 -08:00 |
|
Lianmin Zheng
|
66301e124f
|
Improve code styles (#4021)
|
2025-03-03 03:20:23 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
laixin
|
b0df5d240b
|
Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-02-27 10:59:46 +00:00 |
|
lukec
|
21463e321a
|
Expert Parallelism (EP) Support for DeepSeek V3/R1 (#3602)
Co-authored-by: laixin <xielx@shanghaitech.edu.cn>
Co-authored-by: HandH1998 <1335248067@qq.com>
Co-authored-by: laixin <q865809639@gmail.com>
|
2025-02-26 02:29:37 -08:00 |
|
laixin
|
1a6e97577a
|
Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3730)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2025-02-24 05:43:35 -08:00 |
|
zixuanzhang226
|
0c227ee373
|
feat: update grouped_topk to support softmax and sigmoid (#3680)
|
2025-02-21 16:30:15 +08:00 |
|
HAI
|
5c54ef0352
|
AMD/ROCm: update AITER repo to ROCm/aiter (#3747)
|
2025-02-21 00:18:08 -08:00 |
|
chenxiaobing
|
d5d80ab477
|
[Bugfix] Fix scores mask for moe topk (#3705)
|
2025-02-21 02:17:23 +08:00 |
|
Cheng Wan
|
6b0aeb58fd
|
[moe] optim: reduce memory consumption in fused_moe (#3692)
|
2025-02-20 02:25:05 +08:00 |
|
yigex
|
ddf39d3fce
|
[ROCm] Optimal MOE Tuning for AMD Radeon Graphics (#3567)
|
2025-02-17 17:54:10 -08:00 |
|
Xiaoyu Zhang
|
c38f3aed24
|
support multi-gpu block-gemm tuning (#3639)
|
2025-02-18 00:00:35 +08:00 |
|
Wen-Heng (Jack) Chung
|
871a4aa1bf
|
[ROCm] Add ROCm tuning configs for AMD Instinct MI325X. (#3536)
|
2025-02-12 20:09:36 -08:00 |
|
Liangsheng Yin
|
8616357a97
|
Fix deepseek awq v3 (#3450)
|
2025-02-12 22:09:52 +08:00 |
|
Xiaoyu Zhang
|
45e3a7bc41
|
use sgl_per_token_group_quant_fp8 kernel (#3493)
|
2025-02-12 18:40:42 +08:00 |
|
Wen-Heng (Jack) Chung
|
cadd5dbe6a
|
Tune MI300X fused MoE Triton kernel JSON config. (#3492)
|
2025-02-11 10:27:25 -08:00 |
|
yigex
|
fdf04a1426
|
[ROCm] Add ROCm tuning config to block gemm and Re-tune for AMD Radeon Graphics (#3418)
Co-authored-by: Bruce Xue <yigex@xilinx.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2025-02-10 23:55:04 -08:00 |
|
Xiaoyu Zhang
|
2f47d710ae
|
refine some typo (#3473)
|
2025-02-10 23:35:44 +08:00 |
|
Yineng Zhang
|
64c8713573
|
remove activation dependency in fused_moe (#3433)
|
2025-02-10 01:18:57 +08:00 |
|
Xiaoyu Zhang
|
cdae77b03d
|
optimize moe_align_kernel cuda (#3347)
|
2025-02-07 00:53:46 +08:00 |
|
Wen-Heng (Jack) Chung
|
c7256ca836
|
[ROCm] Add tuning configs for AMD Radeon Graphics. (#3294)
|
2025-02-04 10:34:57 -08:00 |
|
HAI
|
2c1a695ff1
|
ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287)
|
2025-02-04 21:44:44 +08:00 |
|
kushanam
|
d54cee1441
|
adding Triton configs for DeepSeekV3 on Blackwell (#3272)
|
2025-02-04 04:12:09 +08:00 |
|
Yineng Zhang
|
4eb4b401cc
|
update and simplify CustomOp (#3249)
|
2025-02-01 18:56:44 +08:00 |
|
Ke Bao
|
1ebe1d6de5
|
Optimize MoE topk with torch compile (#3236)
|
2025-02-01 01:36:50 +08:00 |
|
Lianmin Zheng
|
53cef81587
|
Improve weight loading and code style (#3174)
|
2025-01-27 03:00:41 -08:00 |
|
yigex
|
351a72d40b
|
add dsv3 mi300 triton config for block scale (#3146)
|
2025-01-27 17:25:53 +08:00 |
|
Lianmin Zheng
|
52c03f16b9
|
Add activation parameters to fused_moe (#3170)
|
2025-01-27 00:23:37 -08:00 |
|