* ci : adjust params for less runtime * ci : gate BF16 on some hardware * ci : move extra tests to Arm runner