Files
xc-llm-ascend/benchmarks/scripts/perf_result_template.md
SILONG ZENG 4811ba62e0 [Lint]Style: reformat markdown files via markdownlint (#5884)
### What this PR does / why we need it?
reformat markdown files via markdownlint

- vLLM version: v0.13.0
- vLLM main:
bde38c11df

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
2026-01-15 09:06:01 +08:00

1.8 KiB

Online serving tests

  • Input length: randomly sample 200 prompts from ShareGPT and lmarena-ai/vision-arena-bench-v0.1(multi-modal) dataset (with fixed random seed).
  • Output length: the corresponding output length of these 200 prompts.
  • Batch size: dynamically determined by vllm and the arrival pattern of the requests.
  • Average QPS (query per second): 1, 4, 16 and inf. QPS = inf means all requests come at once. For other QPS values, the arrival time of each query is determined using a random Poisson process (with fixed random seed).
  • Models: Qwen/Qwen3-8B, Qwen/Qwen2.5-VL-7B-Instruct
  • Evaluation metrics: throughput, TTFT (median time to the first token ), ITL (median inter-token latency) TPOT(median time per output token).

{serving_tests_markdown_table}

Offline tests

Latency tests

  • Input length: 32 tokens.
  • Output length: 128 tokens.
  • Batch size: fixed (8).
  • Models: Qwen/Qwen3-8B, Qwen/Qwen2.5-VL-7B-Instruct
  • Evaluation metrics: end-to-end latency.

{latency_tests_markdown_table}

Throughput tests

  • Input length: randomly sample 200 prompts from ShareGPT and lmarena-ai/vision-arena-bench-v0.1(multi-modal) dataset (with fixed random seed).
  • Output length: the corresponding output length of these 200 prompts.
  • Batch size: dynamically determined by vllm to achieve maximum throughput.
  • Models: Qwen/Qwen3-8B, Qwen/Qwen2.5-VL-7B-Instruct
  • Evaluation metrics: throughput.

{throughput_tests_markdown_table}