[Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479)

This commit is contained in:
Baizhou Zhang
2025-05-28 16:03:43 -07:00
committed by GitHub
parent 31589e177e
commit 791b3bfabb
3 changed files with 26 additions and 4 deletions

View File

@@ -57,6 +57,10 @@ SGLang supports various environment variables that can be used to configure its
| `SGLANG_INT4_WEIGHT` | Enable INT4 weight quantization | `false` |
| `SGLANG_MOE_PADDING` | Enable MoE padding (sets padding size to 128 if value is `1`, often set to `1` in Docker builds) | `0` |
| `SGLANG_FORCE_FP8_MARLIN` | Force using FP8 MARLIN kernels even if other FP8 kernels are available | `false` |
| `SGLANG_ENABLE_FLASHINFER_GEMM` | Use flashinfer kernels when running blockwise fp8 GEMM on Blackwell GPUs | `false` |
| `SGLANG_SUPPORT_CUTLASS_BLOCK_FP8` | Use Cutlass kernels when running blockwise fp8 GEMM on Hopper or Blackwell GPUs | `false` |
| `SGLANG_CUTLASS_MOE` | Use Cutlass FP8 MoE kernel on Blackwell GPUs | `false` |
## Distributed Computing