Qi Yuhang
|
85ed8e0a5e
|
Optimize nvfp4 block scaled gemm kernel when M is small. (#10101)
|
2025-09-06 22:31:00 -07:00 |
|
triple-mu
|
444013585d
|
Fix typos and unify size(s)/stride(s) API calls (#8799)
|
2025-08-08 00:18:08 -07:00 |
|
Qi Yuhang
|
6e92da8fca
|
[Fix][Ready]Fix register spilling in cutlass nvfp4 gemm kernel on Blackwell (#8127)
|
2025-07-17 20:49:36 -07:00 |
|
Yuhong Guo
|
ee47a6c1c3
|
[Build] Fix cuda12.8 build error in nvfp4_scaled_mm_kernels.cu (#4953)
|
2025-03-31 12:00:34 -07:00 |
|
Trevor Morris
|
e9f8e42318
|
Support FP4 gemm (1/2) (#3899)
|
2025-03-24 19:50:23 -07:00 |
|