### What this PR does / why we need it?
If `sp` is enabled and `tp_size` >= 16,
`torch_npu.npu_mm_reduce_scatter_base` will raises a exception.
After consulting with the operator developer, we learned that the
operator does not work when `tp` = 16.
So, we disable the operator when `tp` = 16.
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested
We started a server with `sp` enabled and `tp` = 16.
It started successfully.
```text
[0;36m(APIServer pid=1855938)[0;0m INFO: Started server process [1855938]
[0;36m(APIServer pid=1855938)[0;0m INFO: Waiting for application startup.
[0;36m(APIServer pid=1855938)[0;0m INFO: Application startup complete.
```
- vLLM version: v0.13.0
- vLLM main:
d68209402d
Signed-off-by: drslark <slarksblood@qq.com>