Log if cuda graph is used & extend cuda graph capture to cuda-graph-max-bs (#6201)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
This commit is contained in:
@@ -259,7 +259,9 @@ def throughput_test_once(
|
||||
measurement_results["total_input_tokens"]
|
||||
+ measurement_results["total_output_tokens"]
|
||||
) / latency
|
||||
measurement_results["last_gen_throughput"] = server_info["last_gen_throughput"]
|
||||
measurement_results["last_gen_throughput"] = server_info["internal_states"][0][
|
||||
"last_gen_throughput"
|
||||
]
|
||||
|
||||
return measurement_results
|
||||
|
||||
|
||||
Reference in New Issue
Block a user