Rename sglang.bench_latency to sglang.bench_one_batch (#2118)

2024-11-21 20:07:48 -08:00
parent 8048c28c11
commit dfec7fca06
16 changed files with 521 additions and 599 deletions
--- a/python/sglang/srt/models/torch_native_llama.py
+++ b/python/sglang/srt/models/torch_native_llama.py
@@ -30,10 +30,10 @@ device_mesh = torch.distributed.init_device_mesh("cuda", (tp_size,))
 tensor_parallel(model, device_mesh)
 ```

-An end-to-end example can be found in `python/sglang/bench_latency.py`.
+An end-to-end example can be found in `python/sglang/bench_one_batch.py`.
 You can run it with the following command:
 ```bash
-$ python3 -m sglang.bench_latency --correct \
+$ python3 -m sglang.bench_one_batch --correct \
  --model meta-llama/Meta-Llama-3-8B \
  --json-model-override-args '{"architectures": ["TorchNativeLlamaForCausalLM"]}' \
  --tensor-parallel-size 2 \