Update benchmark scripts (#8)
This commit is contained in:
@@ -3,19 +3,6 @@
|
||||
wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
|
||||
```
|
||||
|
||||
### Performance
|
||||
|
||||
- Model: Llama-2-7b-chat-hf
|
||||
- `--num-prompts 2000 --request-rate 200`
|
||||
- On 4 A10 (24G) GPUs
|
||||
|
||||
| Backend | Throughput | Latency |
|
||||
| ----------- | --------------- | -------- |
|
||||
| srt | 5.82 requests/s | 343.54 s |
|
||||
| vllm==0.2.6 | 3.93 requests/s | 509.08 s |
|
||||
| vllm==0.2.7 | 5.02 requests/s | 398.25 s |
|
||||
|
||||
|
||||
### SGLang
|
||||
```
|
||||
python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
|
||||
@@ -28,7 +15,7 @@ python3 bench_throughput.py --backend srt --tokenizer meta-llama/Llama-2-7b-chat
|
||||
|
||||
### vLLM
|
||||
```
|
||||
python3 -m vllm.entrypoints.api_server --model meta-llama/Llama-2-7b-chat-hf --disable-log-requests --swap-space 16
|
||||
python3 -m vllm.entrypoints.api_server --model meta-llama/Llama-2-7b-chat-hf --disable-log-requests --swap-space 16 --port 21000
|
||||
```
|
||||
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user