Before refactoring cross-DP decoding metadata aggregation, clean up the
token‐padding logic .
### What this PR does:
1. First checks whether any DP instance is in the prefill phase.
2. If in the `decode` phase and `torchair_graph_enabled `is true, pads
each DP instance’s token count up to the global maximum.
3. If in the `prefill` phase, or in decode phase with graph mode
**disabled**, returns each DP instance’s original token count without
padding.
This reordering removes the previous two‐step padding/unpadding flow and
ensures padding only occurs when strictly necessary.
- vLLM version: v0.10.0
- vLLM main:
bd3db7f469
Signed-off-by: yx0716 <jinyx1007@foxmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>