shiyuan680
00aa0bf33e
support prefill cache mode use fia op (#3696)
### What this PR does / why we need it?
support prefill cache mode use fia op for full graph
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993
origin
============ Serving Benchmark Result ============
Successful requests: 30
Maximum request concurrency: 256
Request rate configured (RPS): 0.70
Benchmark duration (s): 131.63
Total input tokens: 61363
Total generated tokens: 61440
Request throughput (req/s): 0.23
Output token throughput (tok/s): 466.77
Peak output token throughput (tok/s): 750.00
Peak concurrent requests: 30.00
Total Token throughput (tok/s): 932.95
---------------Time to First Token----------------
Mean TTFT (ms): 125.17
Median TTFT (ms): 121.51
P50 TTFT (ms): 121.51
P90 TTFT (ms): 140.91
P99 TTFT (ms): 182.36
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 43.85
Median TPOT (ms): 43.84
P50 TPOT (ms): 43.84
P90 TPOT (ms): 44.28
P99 TPOT (ms): 44.32
---------------Inter-token Latency----------------
Mean ITL (ms): 43.85
Median ITL (ms): 42.63
P50 ITL (ms): 42.63
P90 ITL (ms): 48.74
P99 ITL (ms): 59.62
==================================================
after
============ Serving Benchmark Result ============
Successful requests: 30
Maximum request concurrency: 256
Request rate configured (RPS): 0.70
Benchmark duration (s): 130.10
Total input tokens: 61363
Total generated tokens: 61440
Request throughput (req/s): 0.23
Output token throughput (tok/s): 472.26
Peak output token throughput (tok/s): 750.00
Peak concurrent requests: 30.00
Total Token throughput (tok/s): 943.94
---------------Time to First Token----------------
Mean TTFT (ms): 123.69
Median TTFT (ms): 122.51
P50 TTFT (ms): 122.51
P90 TTFT (ms): 143.69
P99 TTFT (ms): 165.00
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 43.07
Median TPOT (ms): 43.13
P50 TPOT (ms): 43.13
P90 TPOT (ms): 43.50
P99 TPOT (ms): 43.57
---------------Inter-token Latency----------------
Mean ITL (ms): 43.07
Median ITL (ms): 41.81
P50 ITL (ms): 41.81
P90 ITL (ms): 48.11
P99 ITL (ms): 62.13
==================================================
Signed-off-by: shiyuan680 <917935075@qq.com>
2025-10-27 19:41:07 +08:00
..
2025-03-28 19:34:23 +08:00
2025-10-27 19:41:07 +08:00
2025-10-27 19:41:07 +08:00
2025-10-27 09:58:23 +08:00
2025-10-25 15:53:01 +08:00
2025-10-25 15:53:01 +08:00