From 8c5382e62c53d251104da36b121a9a28d0eae21a Mon Sep 17 00:00:00 2001 From: Ying Sheng Date: Sat, 3 Aug 2024 12:58:41 -0700 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 85f464fce..bbbbd23c7 100644 --- a/README.md +++ b/README.md @@ -207,7 +207,7 @@ Instructions for supporting a new model are [here](https://github.com/sgl-projec ### Benchmark Performance -- Benchmark a single static batch by running the following command without launching a server. The arguments are the same as those for `launch_server.py`. This is not a dynamic batching server, so it may run out of memory for a batch size that can run successfully with a real server. This is because a real server will truncate the prefill into several batches/chunks, while this unit test does not do this. +- Benchmark a single static batch by running the following command without launching a server. The arguments are the same as for `launch_server.py`. Note that this is not a dynamic batching server, so it may run out of memory for a batch size that a real server can handle. A real server truncates the prefill into several batches, while this unit test does not. For accurate large batch testing, consider using `sglang.bench_serving`. ``` python -m sglang.bench_latency --model-path meta-llama/Meta-Llama-3-8B-Instruct --batch 32 --input-len 256 --output-len 32 ```