Move deep gemm related arguments to sglang.srt.environ (#11547)

This commit is contained in:
Liangsheng Yin
2025-10-14 00:34:35 +08:00
committed by GitHub
parent bfadb5ea5f
commit acc2327bbd
20 changed files with 187 additions and 189 deletions

View File

@@ -144,7 +144,7 @@ With data parallelism attention enabled, we have achieved up to **1.9x** decodin
- **DeepGEMM**: The [DeepGEMM](https://github.com/deepseek-ai/DeepGEMM) kernel library optimized for FP8 matrix multiplications.
**Usage**: The activation and weight optimization above are turned on by default for DeepSeek V3 models. DeepGEMM is enabled by default on NVIDIA Hopper GPUs and disabled by default on other devices. DeepGEMM can also be manually turned off by setting the environment variable `SGL_ENABLE_JIT_DEEPGEMM=0`.
**Usage**: The activation and weight optimization above are turned on by default for DeepSeek V3 models. DeepGEMM is enabled by default on NVIDIA Hopper GPUs and disabled by default on other devices. DeepGEMM can also be manually turned off by setting the environment variable `SGLANG_ENABLE_JIT_DEEPGEMM=0`.
Before serving the DeepSeek model, precompile the DeepGEMM kernels using:
```bash

View File

@@ -32,9 +32,9 @@ SGLang supports various environment variables that can be used to configure its
| Environment Variable | Description | Default Value |
| --- | --- | --- |
| `SGL_ENABLE_JIT_DEEPGEMM` | Enable Just-In-Time compilation of DeepGEMM kernels | `"true"` |
| `SGL_JIT_DEEPGEMM_PRECOMPILE` | Enable precompilation of DeepGEMM kernels | `"true"` |
| `SGL_JIT_DEEPGEMM_COMPILE_WORKERS` | Number of workers for parallel DeepGEMM kernel compilation | `4` |
| `SGLANG_ENABLE_JIT_DEEPGEMM` | Enable Just-In-Time compilation of DeepGEMM kernels | `"true"` |
| `SGLANG_JIT_DEEPGEMM_PRECOMPILE` | Enable precompilation of DeepGEMM kernels | `"true"` |
| `SGLANG_JIT_DEEPGEMM_COMPILE_WORKERS` | Number of workers for parallel DeepGEMM kernel compilation | `4` |
| `SGL_IN_DEEPGEMM_PRECOMPILE_STAGE` | Indicator flag used during the DeepGEMM precompile script | `"false"` |
| `SGL_DG_CACHE_DIR` | Directory for caching compiled DeepGEMM kernels | `~/.cache/deep_gemm` |
| `SGL_DG_USE_NVRTC` | Use NVRTC (instead of Triton) for JIT compilation (Experimental) | `"0"` |

View File

@@ -80,7 +80,7 @@ spec:
value: "true"
- name: SGLANG_MOONCAKE_TRANS_THREAD
value: "16"
- name: SGL_ENABLE_JIT_DEEPGEMM
- name: SGLANG_ENABLE_JIT_DEEPGEMM
value: "1"
- name: NCCL_IB_HCA
value: ^=mlx5_0,mlx5_5,mlx5_6
@@ -217,7 +217,7 @@ spec:
value: "5"
- name: SGLANG_MOONCAKE_TRANS_THREAD
value: "16"
- name: SGL_ENABLE_JIT_DEEPGEMM
- name: SGLANG_ENABLE_JIT_DEEPGEMM
value: "1"
- name: NCCL_IB_HCA
value: ^=mlx5_0,mlx5_5,mlx5_6

View File

@@ -71,7 +71,7 @@ spec:
value: "1"
- name: SGLANG_SET_CPU_AFFINITY
value: "true"
- name: SGL_ENABLE_JIT_DEEPGEMM
- name: SGLANG_ENABLE_JIT_DEEPGEMM
value: "1"
- name: NCCL_IB_QPS_PER_CONNECTION
value: "8"
@@ -224,7 +224,7 @@ spec:
value: "0"
- name: SGLANG_MOONCAKE_TRANS_THREAD
value: "8"
- name: SGL_ENABLE_JIT_DEEPGEMM
- name: SGLANG_ENABLE_JIT_DEEPGEMM
value: "1"
- name: SGL_CHUNKED_PREFIX_CACHE_THRESHOLD
value: "0"

View File

@@ -98,7 +98,7 @@ spec:
value: "1"
- name: SGLANG_SET_CPU_AFFINITY
value: "true"
- name: SGL_ENABLE_JIT_DEEPGEMM
- name: SGLANG_ENABLE_JIT_DEEPGEMM
value: "1"
- name: NCCL_IB_QPS_PER_CONNECTION
value: "8"
@@ -257,7 +257,7 @@ spec:
value: "0"
- name: SGLANG_MOONCAKE_TRANS_THREAD
value: "8"
- name: SGL_ENABLE_JIT_DEEPGEMM
- name: SGLANG_ENABLE_JIT_DEEPGEMM
value: "1"
- name: SGL_CHUNKED_PREFIX_CACHE_THRESHOLD
value: "0"
@@ -421,7 +421,7 @@ spec:
value: "true"
- name: SGLANG_MOONCAKE_TRANS_THREAD
value: "16"
- name: SGL_ENABLE_JIT_DEEPGEMM
- name: SGLANG_ENABLE_JIT_DEEPGEMM
value: "1"
- name: NCCL_IB_HCA
value: ^=mlx5_0,mlx5_5,mlx5_6
@@ -560,7 +560,7 @@ spec:
value: "5"
- name: SGLANG_MOONCAKE_TRANS_THREAD
value: "16"
- name: SGL_ENABLE_JIT_DEEPGEMM
- name: SGLANG_ENABLE_JIT_DEEPGEMM
value: "1"
- name: NCCL_IB_HCA
value: ^=mlx5_0,mlx5_5,mlx5_6