### What this PR does / why we need it?
Fix 3 bugs in flash comm1 of Allgather
EP(https://github.com/vllm-project/vllm-ascend/pull/3334):
1. call `enable_sp()` with argument `vllm_config` trigger a lot of
warning log, this PR caches its return value.
2. `num_tokens_after_padding` should be cpu tensor as it will used as
`num_tokens_across_dp_cpu` in `DPMetadata`. It will causes may d2h copy
when running model.
3. In PD, model runner will execute `kv_connector_no_forward`,where
`num_tokens` is None
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
---------
Signed-off-by: realliujiaxu <realliujiaxu@163.com>