Update benchmark scripts (#8)

2024-01-15 16:12:57 -08:00
parent 01ca82d765
commit 70359bf31a
28 changed files with 183 additions and 50 deletions
--- a/benchmark/latency_throughput/README.md
+++ b/benchmark/latency_throughput/README.md
@@ -3,19 +3,6 @@
 wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
 ```

-### Performance
-
- Model: Llama-2-7b-chat-hf
- `--num-prompts 2000 --request-rate 200`
- On 4 A10 (24G) GPUs
-
-| Backend     | Throughput      | Latency  |
-| ----------- | --------------- | -------- |
-| srt         | 5.82 requests/s | 343.54 s |
-| vllm==0.2.6 | 3.93 requests/s | 509.08 s |
-| vllm==0.2.7 | 5.02 requests/s | 398.25 s |
-
- 
 ### SGLang
 ```
 python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
@@ -28,7 +15,7 @@ python3 bench_throughput.py --backend srt --tokenizer meta-llama/Llama-2-7b-chat

 ### vLLM
 ```
-python3 -m vllm.entrypoints.api_server --model meta-llama/Llama-2-7b-chat-hf --disable-log-requests --swap-space 16
+python3 -m vllm.entrypoints.api_server --model meta-llama/Llama-2-7b-chat-hf --disable-log-requests --swap-space 16 --port 21000
 ```

 ```