Kaixi Hou
|
5c34b4f1c7
|
[NVIDIA] [2/N] Optimize silu_and_mul_scaled_fp4_grouped_quant perf (#9556)
|
2025-08-29 17:17:03 -07:00 |
|
Lifu Huang
|
6e2da51561
|
Replace time.time() to time.perf_counter() for benchmarking. (#6178)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
|
2025-05-11 14:32:49 -07:00 |
|
Zhaoyi Li
|
c555d794f7
|
Minor update for ROCm variable style (#5562)
|
2025-04-19 23:45:27 -07:00 |
|
laixin
|
b0df5d240b
|
Tuning Script for Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3922)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
|
2025-02-27 10:59:46 +00:00 |
|
Xiaoyu Zhang
|
c38f3aed24
|
support multi-gpu block-gemm tuning (#3639)
|
2025-02-18 00:00:35 +08:00 |
|
yigex
|
fdf04a1426
|
[ROCm] Add ROCm tuning config to block gemm and Re-tune for AMD Radeon Graphics (#3418)
Co-authored-by: Bruce Xue <yigex@xilinx.com>
Co-authored-by: HAI <hixiao@gmail.com>
|
2025-02-10 23:55:04 -08:00 |
|
Yineng Zhang
|
7b020cca2d
|
add tuning block wise fp8 (#3242)
Co-authored-by: HandH1998 <007aabbcc411@gmail.com>
|
2025-02-01 03:58:18 +08:00 |
|
Ke Bao
|
85b2e05770
|
Add int8 quant kernel (#2848)
|
2025-01-13 13:16:58 +08:00 |
|