[feat] support minimum token load balance in dp attention (#7379)

This commit is contained in:
Guanhua Wang
2025-08-03 15:46:47 +08:00
committed by GitHub
parent b0add2da00
commit f7b2853ff8
8 changed files with 271 additions and 6 deletions

View File

@@ -1171,6 +1171,7 @@ class ServerArgs:
choices=[
"round_robin",
"shortest_queue",
"minimum_tokens",
],
)