[docs] Instructions for bench_serving.py (#9071)

Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: zhaochenyang20 <zhaochenyang20@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
This commit is contained in:
yhyang201
2025-08-27 09:30:57 +08:00
committed by GitHub
parent b21fdd5373
commit a85363c199
4 changed files with 327 additions and 1 deletions

View File

@@ -31,6 +31,7 @@
[Pytorch Profiler](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html) is a convenient basic tool to inspect kernel execution time, call stack, and kernel overlap and occupancy.
### Profile a server with `sglang.bench_serving`
```bash
# set trace path
export SGLANG_TORCH_PROFILER_DIR=/root/sglang/profile_log
@@ -44,6 +45,8 @@ python -m sglang.bench_serving --backend sglang --model meta-llama/Llama-3.1-8B-
Please make sure that the `SGLANG_TORCH_PROFILER_DIR` should be set at both server and client side, otherwise the trace file cannot be generated correctly . A secure way will be setting `SGLANG_TORCH_PROFILER_DIR` in the `.*rc` file of shell (e.g. `~/.bashrc` for bash shells).
For more details, please refer to [Bench Serving Guide](./bench_serving.md).
### Profile a server with `sglang.bench_offline_throughput`
```bash
export SGLANG_TORCH_PROFILER_DIR=/root/sglang/profile_log