doc: add bench_one_batch_server in the benchmark doc (#8441)

2025-07-27 23:07:54 -07:00
parent 5922c0cbf6
commit 484d0e021d
1 changed files with 8 additions and 3 deletions
--- a/docs/references/benchmark_and_profiling.md
+++ b/docs/references/benchmark_and_profiling.md
@@ -4,10 +4,15 @@

 - Benchmark the latency of running a single static batch without a server. The arguments are the same as for `launch_server.py`.
  Note that this is a simplified test script without a dynamic batching server, so it may run out of memory for a batch size that a real server can handle. A real server truncates the prefill into several batches, while this simplified script does not.
+  - Without a server (do not need to launch a server)
+    ```bash
+    python -m sglang.bench_one_batch --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --batch 32 --input-len 256 --output-len 32
+    ```
+  - With a server (please use `sglang.launch_server` to launch a server first and run the following command.)
+    ```bash
+    python -m sglang.bench_one_batch_server --base-url http://127.0.0.1:30000 --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --batch-size 32 --input-len 256 --output-len 32
+    ```

-  ```bash
-  python -m sglang.bench_one_batch --model-path meta-llama/Meta-Llama-3.1-8B-Instruct --batch 32 --input-len 256 --output-len 32
-  ```

 - Benchmark offline processing. This script will start an offline engine and run the benchmark.