bench_serving support PD Disaggregation (#11542)
This commit is contained in:
@@ -17,6 +17,10 @@ For the design details, please refer to [link](https://docs.google.com/document/
|
||||
|
||||
Currently, we support Mooncake and NIXL as the transfer engine.
|
||||
|
||||
## Profiling in PD Disaggregation Mode
|
||||
|
||||
When you need to profile prefill or decode workers in PD disaggregation mode, please refer to the [Profile In PD Disaggregation Mode](https://docs.sglang.ai/developer_guide/benchmark_and_profiling.html#profile-in-pd-disaggregation-mode) section in the Benchmark and Profiling guide. Due to torch profiler limitations, prefill and decode workers must be profiled separately using dedicated command-line options.
|
||||
|
||||
## Router Integration
|
||||
|
||||
For deploying PD disaggregation at scale with load balancing and fault tolerance, SGLang provides a router. The router can distribute requests between prefill and decode instances using various routing policies. For detailed information on setting up routing with PD disaggregation, including configuration options and deployment patterns, see the [SGLang Router documentation](router.md#mode-3-prefill-decode-disaggregation).
|
||||
|
||||
Reference in New Issue
Block a user