JieXin Liang
|
ab1a4fa5cb
|
[fix] fix cutlass_mla_backend with cuda_graph and add sm_scale for sgl-kernel cutlass_mla (#7184)
|
2025-06-14 12:45:41 -07:00 |
|
fzyzcjy
|
aa46ed34d2
|
Remove 200us slow concat kernel (part 1: kernel) (#7145)
|
2025-06-13 01:58:29 -07:00 |
|
JieXin Liang
|
18efb5e8e0
|
[perf][sgl-kernel] extend cutlass_mla_decode to support num_head < 128 (#6929)
|
2025-06-08 19:37:34 -07:00 |
|
Trevor Morris
|
84810da4ae
|
Add Cutlass MLA attention backend (#5390)
|
2025-04-27 20:58:53 -07:00 |
|
Trevor Morris
|
e8f62b20ca
|
BLackwell cutlass mla: Add check for bad page size/block num combinations (#5431)
|
2025-04-15 14:07:42 -07:00 |
|
DefTruth
|
388e15c0db
|
kernel: support slightly faster merge_state_v2 cuda kernel (#5381)
|
2025-04-14 21:28:23 -07:00 |
|
Yineng Zhang
|
b62e7e99b8
|
feat: adapt merge_state (#5337)
|
2025-04-12 21:14:04 -07:00 |
|
Trevor Morris
|
f65b8d5c89
|
Blackwell Cutlass MLA kernel (#5142)
|
2025-04-11 22:16:51 -07:00 |
|
Yineng Zhang
|
31dfff7da7
|
use default for torch.ops (#4835)
|
2025-03-27 19:09:58 -07:00 |
|
Lianmin Zheng
|
8abf74e3c9
|
Rename files in sgl kernel to avoid nested folder structure (#4213)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-03-08 22:54:51 -08:00 |
|