Besides enabling `recompute_scheduler_enable`, we can skip all_reduce
when max_num_batched_tokens is below mc2's requirement.
- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08
---------
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>