[sgl-kernel performace] fix fp8 quant kernels dispatch __nv_fp8_e4m3 bug to improve performance 10%-20% (#8499)

Co-authored-by: Ke Bao <ispobaoke@gmail.com>
This commit is contained in:
Xiaoyu Zhang
2025-07-29 23:31:54 +08:00
committed by GitHub
parent 813670660c
commit 7a4309cc8a
3 changed files with 21 additions and 23 deletions

View File

@@ -1,5 +1,4 @@
#include <ATen/cuda/CUDAContext.h>
#include <c10/util/Float8_e4m3fn.h>
#include <cuda_fp8.h>
#include <cmath>