fix uncompatible between fc1 and non-sp-padding (#7643)
cherry pick https://github.com/vllm-project/vllm-ascend/pull/7614
### What this PR does / why we need it?
fix uncompatible between fc1 and non-sp-padding
After PR
[non-sp-padding](https://github.com/vllm-project/vllm-ascend/pull/7297),
kimi2.5 open flashcomm1 will raise an error : The expanded size of the
tensor do not match the existing size at non-singleton dimension 0.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.18.0
- vLLM-Ascend main: 9976e685b7
Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Co-authored-by: Wangbei25 <wangbei41@huawie.com>
This commit is contained in:
@@ -1976,7 +1976,7 @@ class NPUModelRunner(GPUModelRunner):
|
||||
_, num_tokens_across_dp, synced_cudagraph_mode = self._sync_batch_across_dp(
|
||||
num_tokens_padded=num_tokens_padded,
|
||||
cudagraph_mode=cudagraph_mode.value,
|
||||
allow_dp_padding=cudagraph_mode != CUDAGraphMode.NONE,
|
||||
allow_dp_padding=(cudagraph_mode != CUDAGraphMode.NONE) or enable_sp(self.vllm_config),
|
||||
)
|
||||
|
||||
# Extract DP padding if there is any
|
||||
|
||||
Reference in New Issue
Block a user