enginex-mthreads-vllm/benchmarks/README.md

# Benchmarks

This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.

## Contents

- **Serving benchmarks**: Scripts for testing online inference performance (latency, throughput)
- **Throughput benchmarks**: Scripts for testing offline batch inference performance
- **Specialized benchmarks**: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference
- **Dataset utilities**: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)

## Usage

For detailed usage instructions, examples, and dataset information, see the [Benchmark CLI documentation](https://docs.vllm.ai/en/latest/contributing/benchmarks.html#benchmark-cli).

For full CLI reference see:

- <https://docs.vllm.ai/en/latest/cli/bench/latency.html>
- <https://docs.vllm.ai/en/latest/cli/bench/serve.html>
- <https://docs.vllm.ai/en/latest/cli/bench/throughput.html>
Sync from v0.13 2026-01-19 10:38:50 +08:00			`# Benchmarks`
init 2026-01-09 13:34:11 +08:00
Sync from v0.13 2026-01-19 10:38:50 +08:00			`This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.`
init 2026-01-09 13:34:11 +08:00
Sync from v0.13 2026-01-19 10:38:50 +08:00			`## Contents`

			`- Serving benchmarks: Scripts for testing online inference performance (latency, throughput)`
			`- Throughput benchmarks: Scripts for testing offline batch inference performance`
			`- Specialized benchmarks: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference`
			`- Dataset utilities: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)`

			`## Usage`

			`For detailed usage instructions, examples, and dataset information, see the [Benchmark CLI documentation](https://docs.vllm.ai/en/latest/contributing/benchmarks.html#benchmark-cli).`

			`For full CLI reference see:`

			`- <https://docs.vllm.ai/en/latest/cli/bench/latency.html>`
			`- <https://docs.vllm.ai/en/latest/cli/bench/serve.html>`
			`- <https://docs.vllm.ai/en/latest/cli/bench/throughput.html>`