cherry pick https://github.com/vllm-project/vllm-ascend/pull/7614
### What this PR does / why we need it?
fix uncompatible between fc1 and non-sp-padding
After PR
[non-sp-padding](https://github.com/vllm-project/vllm-ascend/pull/7297),
kimi2.5 open flashcomm1 will raise an error : The expanded size of the
tensor do not match the existing size at non-singleton dimension 0.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.18.0
- vLLM-Ascend main: 9976e685b7
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Co-authored-by: Wangbei25 <wangbei41@huawie.com>