[sgl-kernel performace] fix fp8 quant kernels dispatch __nv_fp8_e4m3 bug to improve performance 10%-20% (#8499)
Co-authored-by: Ke Bao <ispobaoke@gmail.com>
This commit is contained in:
@@ -1,5 +1,4 @@
|
||||
#include <ATen/cuda/CUDAContext.h>
|
||||
#include <c10/util/Float8_e4m3fn.h>
|
||||
#include <cuda_fp8.h>
|
||||
|
||||
#include <cmath>
|
||||
|
||||
Reference in New Issue
Block a user