From 484d0e021dcf825e692c1912365ed2a6cd3d947a Mon Sep 17 00:00:00 2001 From: Qiaolin Yu Date: Sun, 27 Jul 2025 23:07:54 -0700 Subject: [PATCH] doc: add bench_one_batch_server in the benchmark doc (#8441) --- docs/references/benchmark_and_profiling.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/references/benchmark_and_profiling.md b/docs/references/benchmark_and_profiling.md index 3e96e0cef..67d85c3f6 100644 --- a/docs/references/benchmark_and_profiling.md +++ b/docs/references/benchmark_and_profiling.md @@ -4,10 +4,15 @@ - Benchmark the latency of running a single static batch without a server. The arguments are the same as for `launch_server.py`. Note that this is a simplified test script without a dynamic batching server, so it may run out of memory for a batch size that a real server can handle. A real server truncates the prefill into several batches, while this simplified script does not. + - Without a server (do not need to launch a server) + ```bash + python -m sglang.bench_one_batch --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --batch 32 --input-len 256 --output-len 32 + ``` + - With a server (please use `sglang.launch_server` to launch a server first and run the following command.) + ```bash + python -m sglang.bench_one_batch_server --base-url http://127.0.0.1:30000 --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --batch-size 32 --input-len 256 --output-len 32 + ``` - ```bash - python -m sglang.bench_one_batch --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --batch 32 --input-len 256 --output-len 32 - ``` - Benchmark offline processing. This script will start an offline engine and run the benchmark.