add deepgemm and sglang fp8 block-wise gemm benchmark (#3893)

2025-03-02 15:01:58 +08:00
parent 407e2b923d
commit 90a55e2566
2 changed files with 320 additions and 0 deletions
--- a/benchmark/kernels/deepseek/README.md
+++ b/benchmark/kernels/deepseek/README.md
@@ -0,0 +1,6 @@
+## DeepSeek kernels benchmark
+
+- `benchmark_deepgemm_fp8_gemm.py`
+    - You should install [DeepGemm](https://github.com/deepseek-ai/DeepGEMM) from source before run `benchmark_deepgemm_fp8_gemm.py`.
+    - You can use the `--run_correctness` parameter to verify all kernels results's correctness.
+    - You can use the `--tp_size` parameter to benchmark all FP8 w8a8 block-wise matrix multiplications involved in DeepSeek V3/R1 under the current tensor parallelism (TP) setting. This benchmark compares DeepSeek's open-source [DeepGemm](https://github.com/deepseek-ai/DeepGEMM) implementation with SGLang's and VLLM Triton implementation.