bench_serving support PD Disaggregation (#11542)

2025-10-14 10:43:26 +08:00
parent cb8ed2c09a
commit 88a6f9dab5
3 changed files with 146 additions and 10 deletions
--- a/docs/developer_guide/benchmark_and_profiling.md
+++ b/docs/developer_guide/benchmark_and_profiling.md
@@ -47,6 +47,48 @@ Please make sure that the `SGLANG_TORCH_PROFILER_DIR` should be set at both serv

 For more details, please refer to [Bench Serving Guide](./bench_serving.md).

+### Profile In PD Disaggregation Mode
+
+When profiling in PD disaggregation mode, prefill and decode workers **must be profiled separately** due to torch profiler limitations. The `bench_serving` command provides dedicated options for this:
+
+#### Profile Prefill Workers
+
+```bash
+# set trace path
+export SGLANG_TORCH_PROFILER_DIR=/root/sglang/profile_log
+
+# start prefill and decode servers (see PD disaggregation docs for setup)
+python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --disaggregation-mode prefill
+python -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --disaggregation-mode decode --port 30001 --base-gpu-id 1
+
+# start router
+python -m sglang_router.launch_router --pd-disaggregation --prefill http://127.0.0.1:30000 --decode http://127.0.0.1:30001 --host 0.0.0.0 --port 8000
+
+# send profiling request targeting prefill workers
+python -m sglang.bench_serving --backend sglang --model meta-llama/Llama-3.1-8B-Instruct --num-prompts 10 --sharegpt-output-len 100 --profile --pd-separated --profile-prefill-url http://127.0.0.1:30000
+```
+
+#### Profile Decode Workers
+
+```bash
+# send profiling request targeting decode workers
+python -m sglang.bench_serving --backend sglang --model meta-llama/Llama-3.1-8B-Instruct --num-prompts 10 --sharegpt-output-len 100 --profile --pd-separated --profile-decode-url http://127.0.0.1:30001
+```
+
+#### Important Notes
+
+- `--profile-prefill-url` and `--profile-decode-url` are **mutually exclusive** - you cannot profile both at the same time
+- Both options support multiple worker URLs for multi-instance setups:
+  ```bash
+  # Profile multiple prefill workers
+  python -m sglang.bench_serving --backend sglang --model meta-llama/Llama-3.1-8B-Instruct --num-prompts 10 --profile --pd-separated --profile-prefill-url http://127.0.0.1:30000 http://127.0.0.1:30002
+
+  # Profile multiple decode workers
+  python -m sglang.bench_serving --backend sglang --model meta-llama/Llama-3.1-8B-Instruct --num-prompts 10 --profile --pd-separated --profile-decode-url http://127.0.0.1:30001 http://127.0.0.1:30003
+  ```
+- Make sure `SGLANG_TORCH_PROFILER_DIR` is set on all worker nodes before starting the servers
+- For more details on setting up PD disaggregation, see [PD Disaggregation Guide](../advanced_features/pd_disaggregation.md)
+
 ### Profile a server with `sglang.bench_offline_throughput`
 ```bash
 export SGLANG_TORCH_PROFILER_DIR=/root/sglang/profile_log