Yineng Zhang
|
c1dd773c19
|
fix: use fa3 unit test on hopper only (#5304)
|
2025-04-11 15:10:49 -07:00 |
|
Yineng Zhang
|
c163bf4ff1
|
chore: bump sgl-kernel v0.0.8.post1 (#5289)
|
2025-04-11 02:11:53 -07:00 |
|
Yineng Zhang
|
5598634326
|
chore: relax the torch version restriction for sgl-kernel compilation (#5288)
|
2025-04-11 02:05:53 -07:00 |
|
Yineng Zhang
|
b75275b6f2
|
feat: add cu128 identifier for sgl-kernel (#5287)
|
2025-04-11 01:58:46 -07:00 |
|
Yineng Zhang
|
7074e9ca20
|
fix: enable fp4 compilation on cu128 (#5286)
|
2025-04-11 01:43:44 -07:00 |
|
Elfie Guo
|
a222945df2
|
Update Makefile / build script to avoid installing incompatible torch dependency (#5245)
|
2025-04-10 22:21:02 +00:00 |
|
PGFLMG
|
ed01b4515e
|
[Misc] Clean sgl-kernel test (#5216)
|
2025-04-10 11:28:41 -07:00 |
|
HAI
|
d050df368c
|
ROCm sgl-kernel: compatible to later torch (#5167)
|
2025-04-10 09:18:36 -07:00 |
|
Richard Zou
|
76f44c2a8d
|
Fix deepseek-v3 with torch.compile in PyTorch 2.6. (#5213)
|
2025-04-10 09:14:38 -07:00 |
|
Xiaoyu Zhang
|
f730362ee2
|
reduce moe_align_block_size_kernel small batch mode overhead (#5086)
|
2025-04-09 17:59:35 -07:00 |
|
Yi Zhang
|
ebf495f013
|
sgl-kernel use cutlass latest version for fp8 blockwise gemm (#5207)
|
2025-04-09 11:47:04 -07:00 |
|
yinfan98
|
d2e507df3c
|
[Misc] clean up vllm in sgl-kernel test (#5189)
|
2025-04-09 01:22:13 -07:00 |
|
Trevor Morris
|
11d760d56a
|
FP4 weight loading and inference (2/2) (#3972)
|
2025-04-08 17:26:21 -07:00 |
|
Ma Mingfei
|
a73c4df438
|
Add optimized native kernels in sgl-kernel (#5150)
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: blzheng <beilei.zheng@intel.com>
|
2025-04-08 09:37:46 -07:00 |
|
yinfan98
|
9798e72baa
|
[Misc] Use pytest.mark.skipif in sgl-kernel test (#5137)
|
2025-04-07 21:35:14 -07:00 |
|
Yineng Zhang
|
496dde8491
|
bump sgl-kernel 0.0.8 (#5089)
|
2025-04-05 14:28:04 -07:00 |
|
Yi Zhang
|
bcbbf519f9
|
sgl-kernel transfer custom allreduce from trt kernel to vllm kernel (#5079)
|
2025-04-05 14:23:20 -07:00 |
|
Yineng Zhang
|
3f287b8579
|
support sgl-kernel on blackwell (#5074)
|
2025-04-04 16:59:32 -07:00 |
|
Xiaoyu Zhang
|
924ca7c92c
|
Add DeepSeek V3/R1 shared experts fusion (#4918)
|
2025-04-04 01:59:29 -07:00 |
|
Yineng Zhang
|
d7954b7682
|
bump sgl-kernel v0.0.7 (#5046)
|
2025-04-03 13:38:13 -07:00 |
|
yinfan98
|
b8b6008f47
|
[Fix] fix fa3 build at cu118 (#5036)
|
2025-04-03 11:52:35 -07:00 |
|
Zhiqiang Xie
|
9d0b36c47a
|
fix deepgemm as well (#5030)
|
2025-04-03 02:41:37 -07:00 |
|
Yuhong Guo
|
7d8c0ce7ce
|
[Build] Support build sgl-kernel with ccache (#5020)
|
2025-04-03 00:22:37 -07:00 |
|
Zhiqiang Xie
|
a2aea59b6e
|
update cutlass tag (#5011)
|
2025-04-02 18:30:30 -07:00 |
|
Xiaoyu Zhang
|
2c8fd99363
|
[sgl-kernel] per token group quant support COLUMN MAJOR (#4817)
|
2025-04-02 18:29:59 -07:00 |
|
Yuhong Guo
|
ee47a6c1c3
|
[Build] Fix cuda12.8 build error in nvfp4_scaled_mm_kernels.cu (#4953)
|
2025-03-31 12:00:34 -07:00 |
|
Yineng Zhang
|
6384d31776
|
bump sgl-kernel v0.0.6 (#4950)
|
2025-03-31 11:24:09 -07:00 |
|
yinfan98
|
c7457191a0
|
[Fix] revert clean m.def for cudagraph (#4944)
|
2025-03-31 02:08:55 -07:00 |
|
Yineng Zhang
|
4814ecaff9
|
cleanup sgl-kernel (#4933)
|
2025-03-30 14:12:30 -07:00 |
|
yinfan98
|
37c66ec856
|
[feat] add fa3 in sgl-kernel (#4902)
Co-authored-by: Sleepcoo <Sleepcoo@gmail.com>
|
2025-03-30 12:57:10 -07:00 |
|
Yineng Zhang
|
195a09f57c
|
fix bmm fp8 (#4926)
|
2025-03-30 12:15:20 -07:00 |
|
Adarsh Shirawalmath
|
9fccda3111
|
[Feature] use pytest for sgl-kernel (#4896)
|
2025-03-30 10:36:52 -07:00 |
|
Yi Zhang
|
5ec5eaf760
|
fix allreduce test (#4909)
|
2025-03-29 23:16:53 -07:00 |
|
yinfan98
|
0d7fe866f9
|
[Misc] Clean m.def and add Development Tips (#4890)
|
2025-03-29 23:06:18 -07:00 |
|
Yineng Zhang
|
54b9a2de0a
|
remove setup for sgl-kernel (#4899)
|
2025-03-29 12:47:38 -07:00 |
|
yinfan98
|
8e7b31546c
|
quick fix: add default for new kernel (#4898)
|
2025-03-29 12:31:59 -07:00 |
|
Qingquan Song
|
45dcfc2e76
|
Add deepseek style fused moe group gate selection kernel (#4530)
|
2025-03-29 11:51:45 -07:00 |
|
yinfan98
|
ddf8981d91
|
Delete test_deep_gemm.py (#4891)
|
2025-03-29 10:46:11 -07:00 |
|
yinfan98
|
05625b9792
|
[Docs] Update DeepGEMM at README.md (#4886)
|
2025-03-29 09:53:39 -07:00 |
|
Yineng Zhang
|
ec3ee0289d
|
fix sgl-kernel cu118 build (#4872)
|
2025-03-28 17:23:51 -07:00 |
|
Yineng Zhang
|
92941ce7b5
|
bump sgl-kernel 0.0.5.post4 (#4768)
|
2025-03-28 14:40:53 -07:00 |
|
Yineng Zhang
|
2bb0e7cf43
|
fix sampling issue (#4871)
|
2025-03-28 14:07:21 -07:00 |
|
yinfan98
|
4db29e82ec
|
[Feat] support deepgemm for cmake (#4864)
|
2025-03-28 10:51:44 -07:00 |
|
Yineng Zhang
|
6dea5c96bf
|
Revert "get the python version from env (#4729)" (#4863)
|
2025-03-28 08:07:48 -07:00 |
|
DavidChan
|
5eae67cb1f
|
get the python version from env (#4729)
|
2025-03-27 22:26:42 -07:00 |
|
Yineng Zhang
|
31dfff7da7
|
use default for torch.ops (#4835)
|
2025-03-27 19:09:58 -07:00 |
|
Yineng Zhang
|
8bf6d7f406
|
support cmake for sgl-kernel (#4706)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
|
2025-03-27 01:42:28 -07:00 |
|
Yi Pan
|
45fdf1f7f3
|
Fix shared memory OOM on sm86 GPUs. (#4797)
|
2025-03-26 10:41:53 -07:00 |
|
Trevor Morris
|
e9f8e42318
|
Support FP4 gemm (1/2) (#3899)
|
2025-03-24 19:50:23 -07:00 |
|
Chunan Zeng
|
65c24c28f9
|
[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster (#4396)
|
2025-03-23 23:44:17 -07:00 |
|