[sgl-kernel performace] fix fp8 quant kernels dispatch __nv_fp8_e4m3 bug to improve performance 10%-20% (#8499)

Co-authored-by: Ke Bao <ispobaoke@gmail.com>
2025-07-29 23:31:54 +08:00
parent 813670660c
commit 7a4309cc8a
3 changed files with 21 additions and 23 deletions
--- a/sgl-kernel/csrc/gemm/per_token_group_quant_8bit.cu
+++ b/sgl-kernel/csrc/gemm/per_token_group_quant_8bit.cu
@@ -1,5 +1,4 @@
 #include <ATen/cuda/CUDAContext.h>
-#include <c10/util/Float8_e4m3fn.h>
 #include <cuda_fp8.h>

 #include <cmath>