Cheng Wan
|
a5f5ab4030
|
update sgl-kernel for EP: kernel part (#8514)
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Ke Bao <ispobaoke@gmail.com>
|
2025-07-30 22:19:55 -07:00 |
|
Hubert Lu
|
af4b9bae95
|
[AMD] Add silu_and_mul, gelu_and_mul, gelu_tanh_and_mul, and gelu_quick kernels for AMD GPUs (#7135)
Co-authored-by: yiakwy-xpu-ml-framework-team <961186938@qq.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2025-07-24 23:44:28 -07:00 |
|
li haoyang
|
28d4d47280
|
[Feature] Integrate quick allreduce and select the best allreduce implementation (#6619)
Signed-off-by: Haoyang Li <Haoyang.Li@amd.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-24 20:48:42 -07:00 |
|
Lianmin Zheng
|
5589b75024
|
Add treemask mode to build_eagle_tree & release sgl-kernel 0.2.3 (#7756)
Co-authored-by: Pranjal Shankhdhar <pranjal.ssh@gmail.com>
|
2025-07-05 12:17:05 -07:00 |
|
Yi Zhang
|
2998c4bdf4
|
[optimize] fuse renormalize into moe_topk_softmax (#7744)
Co-authored-by: ispobock <ispobaoke@gmail.com>
|
2025-07-03 12:42:44 -07:00 |
|
Ke Bao
|
57ab776910
|
Fuse sorted_token_ids padding to moe_align_block_size kernel (#7437)
|
2025-06-24 17:44:27 -07:00 |
|
Zhaoyi Li
|
3c9740d200
|
update variable naming and comments for rocm (#5299)
|
2025-04-11 23:15:05 -07:00 |
|
Alex Sun
|
af6535e7aa
|
[ROCm] Enable MTP (NextN) on AMD GPU (#4631)
|
2025-03-23 22:58:05 -07:00 |
|
yiakwy-xpu-ml-framework-team
|
9b8333d992
|
[ROCm] enable moe topk softmax in amd (#4448)
|
2025-03-16 18:16:55 -07:00 |
|
yigex
|
690e1f2371
|
[AMD] Fix rocm sgl-kernel missing modules error (#4311)
Co-authored-by: yiakwy-xpu-ml-framework-team <leiwang2@amd.com>
|
2025-03-11 10:35:28 -07:00 |
|
Lianmin Zheng
|
8abf74e3c9
|
Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-03-08 22:54:51 -08:00 |
|