minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926)
This commit is contained in:
18
3rdparty/amd/tuning/TUNING.md
vendored
18
3rdparty/amd/tuning/TUNING.md
vendored
@@ -29,18 +29,18 @@ def _triton_kernel_funtion():
|
||||
...
|
||||
```
|
||||
## 2. Torch Tunable Operations
|
||||
**TunableOp** is a feature in PyTorch that allows for the definition and optimization of custom kernels with tunable parameters. This feature is particularly useful for enhancing the performance of kernels by experimenting with different configurations.
|
||||
**TunableOp** is a feature in PyTorch that allows for the definition and optimization of custom kernels with tunable parameters. This feature is particularly useful for enhancing the performance of kernels by experimenting with different configurations.
|
||||
|
||||
### Key Environment Variables:
|
||||
1. **PYTORCH_TUNABLEOP_ENABLED**:
|
||||
1. **PYTORCH_TUNABLEOP_ENABLED**:
|
||||
- Default: `0`
|
||||
- Set to `1` to enable TunableOp.
|
||||
|
||||
2. **PYTORCH_TUNABLEOP_TUNING**:
|
||||
2. **PYTORCH_TUNABLEOP_TUNING**:
|
||||
- Default: `1`
|
||||
- Set to `0` to disable tuning. If a tuned entry is not found, it will run the tuning step and record the entry when PYTORCH_TUNABLEOP_ENABLED is enabled.
|
||||
|
||||
3. **PYTORCH_TUNABLEOP_VERBOSE**:
|
||||
3. **PYTORCH_TUNABLEOP_VERBOSE**:
|
||||
- Default: `0`
|
||||
- Set to `1` to enable verbose output for TunableOp.
|
||||
|
||||
@@ -66,20 +66,20 @@ The following are suggestions for optimizing matrix multiplication (GEMM) and co
|
||||
To tune Triton kernels with GEMM and convolution ops (conv), use the `torch.compile` function with the max-autotune mode. This benchmarks a predefined list of Triton configurations and selects the fastest one for each shape.
|
||||
|
||||
### Key Configurations:
|
||||
1. **Max Autotune**:
|
||||
1. **Max Autotune**:
|
||||
- Set `torch._inductor.config.max_autotune = True` or `TORCHINDUCTOR_MAX_AUTOTUNE=1`.
|
||||
|
||||
2. **Fine-Grained Control**:
|
||||
- Enable GEMM tuning: `torch._inductor.config.max_autotune_gemm = True`.
|
||||
- Enable tuning for pointwise/reduction ops: `torch._inductor.config.max_autotune.pointwise = True`.
|
||||
|
||||
3. **Backend Selection**:
|
||||
3. **Backend Selection**:
|
||||
- Use `torch._inductor.max_autotune_gemm_backends` to limit backends to TRITON for better performance.
|
||||
|
||||
4. **Freezing for Inference**:
|
||||
4. **Freezing for Inference**:
|
||||
- Use `torch._inductor.config.freezing=True` to enable constant folding optimizations.
|
||||
|
||||
5. **Debugging**:
|
||||
5. **Debugging**:
|
||||
- Set `TORCH_COMPILE_DEBUG=1` to extract Triton kernels generated by Inductor.
|
||||
|
||||
### Example Code Block:
|
||||
@@ -98,4 +98,4 @@ TORCHINDUCTOR_FREEZING=1 your_script.sh
|
||||
|
||||
For more detailed information on tuning SGLang performance with AMD GPUs, please refer to the following link:
|
||||
|
||||
[ROCm Documentation: Triton Kernel Performance Optimization](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#triton-kernel-performance-optimization)
|
||||
[ROCm Documentation: Triton Kernel Performance Optimization](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#triton-kernel-performance-optimization)
|
||||
|
||||
Reference in New Issue
Block a user