Stefan He
|
95085d65e9
|
[Refactor] Reducing code duplication across FP8 CUDA quantization kernels (#4163)
|
2025-03-06 22:58:52 -08:00 |
|
Stefan He
|
63ee26d162
|
Add sgl_per_token_quant_fp8 (#4089)
|
2025-03-06 20:53:05 -08:00 |
|
Xiaoyu Zhang
|
ad55f17182
|
[quant kernel] sgl-kernel support per_tensor_quant fp8 (#3786)
|
2025-03-06 18:05:43 -08:00 |
|
Xiaoyu Zhang
|
55a7ec388f
|
use warp shuffle style reduce and flashinfer vectorize (#3628)
|
2025-02-19 20:53:51 +08:00 |
|
Baizhou Zhang
|
67fc595bb8
|
[Feature] Apply Cublas Grouped Gemm kernel (#3629)
|
2025-02-18 15:18:31 +08:00 |
|
yizhang2077
|
640363ad20
|
support blockwise fp8 matmul kernel (#3267)
|
2025-02-13 01:49:33 +08:00 |
|
Xiaoyu Zhang
|
bb418ced80
|
optimize per token group quant fp8 (#3490)
|
2025-02-11 22:19:05 +08:00 |
|
Xiaoyu Zhang
|
81262c7b72
|
clean up useless file (#3192)
|
2025-01-28 14:29:30 +08:00 |
|
HandH1998
|
82392da830
|
support w8a8 fp8 kernel with CUTLASS (#3047)
Co-authored-by: yych0745 <1398089567@qq.com>
|
2025-01-26 15:46:51 +08:00 |
|
Ke Bao
|
7bad7e75bf
|
Add shapes for int8 gemm benchmark (#3093)
|
2025-01-24 12:27:30 +08:00 |
|
Xiaoyu Zhang
|
ac2dc35d0e
|
support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030)
|
2025-01-23 15:29:20 +08:00 |
|
Yineng Zhang
|
b7f3fec13c
|
minor: rename bench for sgl kernel (#2909)
|
2025-01-16 05:55:43 +08:00 |
|
Xiaoyu Zhang
|
d08c77c434
|
Sampling penalties memory interface (#2870)
|
2025-01-13 23:09:00 +08:00 |
|
Ke Bao
|
0f3eb1d294
|
Support cutlass Int8 gemm (#2752)
|
2025-01-06 22:51:22 +08:00 |
|