Backport of #7882 to releases/v0.18.0. Adds aime2025 benchmark test for DeepSeek-V3.2-W8A8 EP with disaggregated prefill on A3 (4-node, 16 NPUs per node, accuracy benchmark baseline 66.67%). Signed-off-by: guozr <guozr1997@hotmail.com> Co-authored-by: guozr <guozr1997@hotmail.com>