Commit Graph

20 Commits

Author SHA1 Message Date
Hubert Lu
711390a971 [AMD] Support Hierarchical Caching on AMD GPUs (#8236) 2025-08-28 15:27:07 -07:00
Lianmin Zheng
ecc9f3e47a [Minor] Fix the style of sgl-kernel (#9332) 2025-08-18 23:45:00 -07:00
Hubert Lu
c6c379ab31 [AMD] Reorganize hip-related header files in sgl-kernel (#9320) 2025-08-18 16:53:44 -07:00
Hubert Lu
9c3e95d98b [AMD] Expand test coverage for AMD CI and enable apply_token_bitmask_inplace_cuda in sgl-kernel (#8268) 2025-08-15 12:32:51 -07:00
Lianmin Zheng
9e426466af Clean up allocators (#9134)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-13 13:56:04 -07:00
Hubert Lu
af4b9bae95 [AMD] Add silu_and_mul, gelu_and_mul, gelu_tanh_and_mul, and gelu_quick kernels for AMD GPUs (#7135)
Co-authored-by: yiakwy-xpu-ml-framework-team <961186938@qq.com>
Co-authored-by: HAI <hixiao@gmail.com>
2025-07-24 23:44:28 -07:00
li haoyang
28d4d47280 [Feature] Integrate quick allreduce and select the best allreduce implementation (#6619)
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-24 20:48:42 -07:00
kk
8aa68ed5c4 Solve docker build failed in the virtual machine (#7290)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
Co-authored-by: HAI <hixiao@gmail.com>
2025-06-23 09:10:30 +00:00
sogalin
4b9971e401 Add gfx950 support for sgl-kernel. (#7092)
Co-authored-by: HAI <hixiao@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-06-12 11:07:48 -07:00
Johnny
2c7dbb7cc2 [FEATURE] Enhance platform compatibility for ARM (#5746) 2025-04-29 15:06:16 -07:00
HAI
d050df368c ROCm sgl-kernel: compatible to later torch (#5167) 2025-04-10 09:18:36 -07:00
Yineng Zhang
8bf6d7f406 support cmake for sgl-kernel (#4706)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
2025-03-27 01:42:28 -07:00
Alex Sun
af6535e7aa [ROCm] Enable MTP (NextN) on AMD GPU (#4631) 2025-03-23 22:58:05 -07:00
yiakwy-xpu-ml-framework-team
9b8333d992 [ROCm] enable moe topk softmax in amd (#4448) 2025-03-16 18:16:55 -07:00
yigex
690e1f2371 [AMD] Fix rocm sgl-kernel missing modules error (#4311)
Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>
2025-03-11 10:35:28 -07:00
Lianmin Zheng
8abf74e3c9 Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-03-08 22:54:51 -08:00
Liu Jinjie
0804dd11a0 remove unused max_jobs in setup_rocm.py (#4126)
Signed-off-by: Jinjie Liu <jinjie.liu@usc.edu>
2025-03-06 00:12:19 -08:00
Lianmin Zheng
6b45a21d16 Reorganize c++ source files in sgl-kernel with multiple folders (#4025) 2025-03-03 05:32:30 -08:00
Hubert Lu
9cf4077294 Enable custom AR for AMD GPUs and maintain it in sgl-kernel (#3406) 2025-03-02 15:19:06 -08:00
HAI
2c1a695ff1 ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287) 2025-02-04 21:44:44 +08:00