[Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479)
This commit is contained in:
@@ -57,6 +57,10 @@ SGLang supports various environment variables that can be used to configure its
|
||||
| `SGLANG_INT4_WEIGHT` | Enable INT4 weight quantization | `false` |
|
||||
| `SGLANG_MOE_PADDING` | Enable MoE padding (sets padding size to 128 if value is `1`, often set to `1` in Docker builds) | `0` |
|
||||
| `SGLANG_FORCE_FP8_MARLIN` | Force using FP8 MARLIN kernels even if other FP8 kernels are available | `false` |
|
||||
| `SGLANG_ENABLE_FLASHINFER_GEMM` | Use flashinfer kernels when running blockwise fp8 GEMM on Blackwell GPUs | `false` |
|
||||
| `SGLANG_SUPPORT_CUTLASS_BLOCK_FP8` | Use Cutlass kernels when running blockwise fp8 GEMM on Hopper or Blackwell GPUs | `false` |
|
||||
| `SGLANG_CUTLASS_MOE` | Use Cutlass FP8 MoE kernel on Blackwell GPUs | `false` |
|
||||
|
||||
|
||||
## Distributed Computing
|
||||
|
||||
|
||||
Reference in New Issue
Block a user