sglangv0.5.2 & support Qwen3-Next-80B-A3B-Instruct
This commit is contained in:
19
benchmark/kernels/deepseek/README.md
Normal file
19
benchmark/kernels/deepseek/README.md
Normal file
@@ -0,0 +1,19 @@
|
||||
## DeepSeek kernels benchmark
|
||||
|
||||
|
||||
### Prerequisites
|
||||
- You should install [DeepGemm](https://github.com/deepseek-ai/DeepGEMM) from source before run `benchmark_deepgemm_fp8_gemm.py` and `benchmark_deepgemm_fp8_group_gemm.py`.
|
||||
|
||||
### Benchmark
|
||||
- `benchmark_deepgemm_fp8_gemm.py`
|
||||
```bash
|
||||
python benchmark_deepgemm_fp8_gemm.py --run_correctness --tp_size 1
|
||||
```
|
||||
|
||||
- `benchmark_deepgemm_fp8_group_gemm.py`
|
||||
```bash
|
||||
python benchmark_deepgemm_fp8_group_gemm.py --run_correctness --tp_size 1
|
||||
```
|
||||
|
||||
- You can use the `--run_correctness` parameter to verify all kernels results's correctness.
|
||||
- You can use the `--tp_size` parameter to benchmark all FP8 w8a8 block-wise matrix multiplications involved in DeepSeek V3/R1 under the current tensor parallelism (TP) setting. This benchmark compares DeepSeek's open-source [DeepGemm](https://github.com/deepseek-ai/DeepGEMM) implementation with SGLang's and VLLM Triton implementation.
|
||||
Reference in New Issue
Block a user