benchmark: enhance configurable multimodal benchmarking in bench_serving (#9812)
Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
This commit is contained in:
@@ -59,15 +59,16 @@ Select with `--dataset-name`:
|
||||
- `sharegpt` (default): loads ShareGPT-style pairs; optionally restrict with `--sharegpt-context-len` and override outputs with `--sharegpt-output-len`
|
||||
- `random`: random text lengths; sampled from ShareGPT token space
|
||||
- `random-ids`: random token ids (can lead to gibberish)
|
||||
- `random-image`: generates random images and wraps them in chat messages; supports custom resolutions via 'heightxwidth' format
|
||||
- `image`: generates images and wraps them in chat messages; supports custom resolutions, multiple formats, and different content types
|
||||
- `generated-shared-prefix`: synthetic dataset with shared long system prompts and short questions
|
||||
- `mmmu`: samples from MMMU (Math split) and includes images
|
||||
|
||||
Common dataset flags:
|
||||
|
||||
- `--num-prompts N`: number of requests
|
||||
- `--random-input-len`, `--random-output-len`, `--random-range-ratio`: for random/random-ids/random-image
|
||||
- `--random-image-num-images`, `--random-image-resolution`: for random-image dataset (supports presets 1080p/720p/360p or custom 'heightxwidth' format)
|
||||
- `--random-input-len`, `--random-output-len`, `--random-range-ratio`: for random/random-ids/image
|
||||
- `--image-count`: Number of images per request (for `image` dataset).
|
||||
|
||||
- `--apply-chat-template`: apply tokenizer chat template when constructing prompts
|
||||
- `--dataset-path PATH`: file path for ShareGPT json; if blank and missing, it will be downloaded and cached
|
||||
|
||||
@@ -79,14 +80,16 @@ Generated Shared Prefix flags (for `generated-shared-prefix`):
|
||||
- `--gsp-question-len`
|
||||
- `--gsp-output-len`
|
||||
|
||||
Random Image dataset flags (for `random-image`):
|
||||
Image dataset flags (for `image`):
|
||||
|
||||
- `--random-image-num-images`: Number of images per request
|
||||
- `--random-image-resolution`: Image resolution; supports presets (1080p, 720p, 360p) or custom 'heightxwidth' format (e.g., 1080x1920, 512x768)
|
||||
- `--image-count`: Number of images per request
|
||||
- `--image-resolution`: Image resolution; supports presets (4k, 1080p, 720p, 360p) or custom 'heightxwidth' format (e.g., 1080x1920, 512x768)
|
||||
- `--image-format`: Image format (jpeg or png)
|
||||
- `--image-content`: Image content type (random or blank)
|
||||
|
||||
### Examples
|
||||
|
||||
1. To benchmark random-image dataset with 3 images per request, 500 prompts, 512 input length, and 512 output length, you can run:
|
||||
1. To benchmark image dataset with 3 images per request, 500 prompts, 512 input length, and 512 output length, you can run:
|
||||
|
||||
```bash
|
||||
python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-3B-Instruct --disable-radix-cache
|
||||
@@ -95,10 +98,10 @@ python -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-3B-Instruct --disabl
|
||||
```bash
|
||||
python -m sglang.bench_serving \
|
||||
--backend sglang-oai-chat \
|
||||
--dataset-name random-image \
|
||||
--dataset-name image \
|
||||
--num-prompts 500 \
|
||||
--random-image-num-images 3 \
|
||||
--random-image-resolution 720p \
|
||||
--image-count 3 \
|
||||
--image-resolution 720p \
|
||||
--random-input-len 512 \
|
||||
--random-output-len 512
|
||||
```
|
||||
@@ -159,9 +162,10 @@ The script will add `Authorization: Bearer $OPENAI_API_KEY` automatically for Op
|
||||
Printed after each run:
|
||||
|
||||
- Request throughput (req/s)
|
||||
- Input token throughput (tok/s)
|
||||
- Input token throughput (tok/s) - includes both text and vision tokens
|
||||
- Output token throughput (tok/s)
|
||||
- Total token throughput (tok/s)
|
||||
- Total token throughput (tok/s) - includes both text and vision tokens
|
||||
- Total input text tokens and Total input vision tokens - per-modality breakdown
|
||||
- Concurrency: aggregate time of all requests divided by wall time
|
||||
- End-to-End Latency (ms): mean/median/std/p99 per-request total latency
|
||||
- Time to First Token (TTFT, ms): mean/median/std/p99 for streaming mode
|
||||
@@ -227,31 +231,48 @@ python3 -m sglang.bench_serving \
|
||||
--apply-chat-template
|
||||
```
|
||||
|
||||
4) Random images (VLM) with chat template:
|
||||
4) Images (VLM) with chat template:
|
||||
|
||||
```bash
|
||||
python3 -m sglang.bench_serving \
|
||||
--backend sglang \
|
||||
--host 127.0.0.1 --port 30000 \
|
||||
--model your-vlm-model \
|
||||
--dataset-name random-image \
|
||||
--random-image-num-images 2 \
|
||||
--random-image-resolution 720p \
|
||||
--dataset-name image \
|
||||
--image-count 2 \
|
||||
--image-resolution 720p \
|
||||
--random-input-len 128 --random-output-len 256 \
|
||||
--num-prompts 200 \
|
||||
--apply-chat-template
|
||||
```
|
||||
|
||||
4a) Random images with custom resolution:
|
||||
4a) Images with custom resolution:
|
||||
|
||||
```bash
|
||||
python3 -m sglang.bench_serving \
|
||||
--backend sglang \
|
||||
--host 127.0.0.1 --port 30000 \
|
||||
--model your-vlm-model \
|
||||
--dataset-name random-image \
|
||||
--random-image-num-images 1 \
|
||||
--random-image-resolution 512x768 \
|
||||
--dataset-name image \
|
||||
--image-count 1 \
|
||||
--image-resolution 512x768 \
|
||||
--random-input-len 64 --random-output-len 128 \
|
||||
--num-prompts 100 \
|
||||
--apply-chat-template
|
||||
```
|
||||
|
||||
4b) 1080p images with PNG format and blank content:
|
||||
|
||||
```bash
|
||||
python3 -m sglang.bench_serving \
|
||||
--backend sglang \
|
||||
--host 127.0.0.1 --port 30000 \
|
||||
--model your-vlm-model \
|
||||
--dataset-name image \
|
||||
--image-count 1 \
|
||||
--image-resolution 1080p \
|
||||
--image-format png \
|
||||
--image-content blank \
|
||||
--random-input-len 64 --random-output-len 128 \
|
||||
--num-prompts 100 \
|
||||
--apply-chat-template
|
||||
@@ -325,7 +346,7 @@ python3 -m sglang.bench_serving \
|
||||
- All requests failed: verify `--backend`, server URL/port, `--model`, and authentication. Check warmup errors printed by the script.
|
||||
- Throughput seems too low: adjust `--request-rate` and `--max-concurrency`; verify server batch size/scheduling; ensure streaming is enabled if appropriate.
|
||||
- Token counts look odd: prefer chat/instruct models with proper chat templates; otherwise tokenization of gibberish may be inconsistent.
|
||||
- Random-image/MMMU datasets: ensure you installed extra deps (`pillow`, `datasets`, `pybase64`).
|
||||
- Image/MMMU datasets: ensure you installed extra deps (`pillow`, `datasets`, `pybase64`).
|
||||
- Authentication errors (401/403): set `OPENAI_API_KEY` or disable auth on your server.
|
||||
|
||||
### Notes
|
||||
|
||||
Reference in New Issue
Block a user