Update vllm pin to 12.24 (#5307)

### What this PR does / why we need it?
Fix vllm break in the pr:
1. [Add MiMo-V2-Flash support]
(https://github.com/vllm-project/vllm/pull/30836)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Co-authored-by: zxwang [1476209578@qq.com](mailto:1476209578@qq.com)

- vLLM version: release/v0.13.0
- vLLM main:
5fbfa8d9ef

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: zxwang <1476209578@qq.com>
Co-authored-by: zxwang <1476209578@qq.com>
This commit is contained in:
Nengjun Ma
2025-12-24 17:24:31 +08:00
committed by GitHub
parent a3f65b938f
commit 42c989a437
5 changed files with 8 additions and 6 deletions

View File

@@ -34,7 +34,7 @@ jobs:
steps: steps:
- name: Get vLLM version - name: Get vLLM version
run: | run: |
VLLM_COMMIT=5fbfa8d9ef15948599631baeb91e8220b2ee9bcc VLLM_COMMIT=bc0a5a0c089844b17cb93f3294348f411e523586
echo "VLLM_COMMIT=https://github.com/vllm-project/vllm/commit/$VLLM_COMMIT" >> $GITHUB_ENV echo "VLLM_COMMIT=https://github.com/vllm-project/vllm/commit/$VLLM_COMMIT" >> $GITHUB_ENV
- name: Checkout repository - name: Checkout repository

View File

@@ -74,7 +74,7 @@ jobs:
name: e2e-full name: e2e-full
strategy: strategy:
matrix: matrix:
vllm_version: [5fbfa8d9ef15948599631baeb91e8220b2ee9bcc, v0.13.0] vllm_version: [bc0a5a0c089844b17cb93f3294348f411e523586, v0.13.0]
needs: [changes] needs: [changes]
if: ${{ needs.changes.outputs.e2e_tracker == 'true' }} if: ${{ needs.changes.outputs.e2e_tracker == 'true' }}
uses: ./.github/workflows/_e2e_test.yaml uses: ./.github/workflows/_e2e_test.yaml

View File

@@ -42,7 +42,7 @@ jobs:
lint: lint:
uses: ./.github/workflows/_pre_commit.yml uses: ./.github/workflows/_pre_commit.yml
with: with:
vllm: 5fbfa8d9ef15948599631baeb91e8220b2ee9bcc vllm: bc0a5a0c089844b17cb93f3294348f411e523586
changes: changes:
runs-on: linux-aarch64-a2-0 runs-on: linux-aarch64-a2-0
outputs: outputs:
@@ -90,7 +90,7 @@ jobs:
SOC_VERSION: ascend910b1 SOC_VERSION: ascend910b1
strategy: strategy:
matrix: matrix:
vllm_version: [5fbfa8d9ef15948599631baeb91e8220b2ee9bcc, v0.13.0] vllm_version: [bc0a5a0c089844b17cb93f3294348f411e523586, v0.13.0]
steps: steps:
- name: Free up disk space - name: Free up disk space
@@ -160,7 +160,7 @@ jobs:
name: e2e-light name: e2e-light
strategy: strategy:
matrix: matrix:
vllm_version: [5fbfa8d9ef15948599631baeb91e8220b2ee9bcc, v0.13.0] vllm_version: [bc0a5a0c089844b17cb93f3294348f411e523586, v0.13.0]
# Note (yikun): If CI resource are limited we can split job into two chain jobs # Note (yikun): If CI resource are limited we can split job into two chain jobs
needs: [lint, changes] needs: [lint, changes]
# only trigger e2e test after lint passed and the change is e2e related with pull request. # only trigger e2e test after lint passed and the change is e2e related with pull request.

View File

@@ -50,7 +50,7 @@ If you're using v0.7.3, don't forget to install [mindie-turbo](https://pypi.org/
For main branch of vLLM Ascend, we usually make it compatible with the latest vLLM release and a newer commit hash of vLLM. Please note that this table is usually updated. Please check it regularly. For main branch of vLLM Ascend, we usually make it compatible with the latest vLLM release and a newer commit hash of vLLM. Please note that this table is usually updated. Please check it regularly.
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | | vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
|-------------|--------------|------------------|-------------|--------------------| |-------------|--------------|------------------|-------------|--------------------|
| main | 5fbfa8d9ef15948599631baeb91e8220b2ee9bcc, v0.13.0 tag | >= 3.10, < 3.12 | 8.3.RC2 | 2.8.0 / 2.8.0 | | main | bc0a5a0c089844b17cb93f3294348f411e523586, v0.13.0 tag | >= 3.10, < 3.12 | 8.3.RC2 | 2.8.0 / 2.8.0 |
## Release cadence ## Release cadence

View File

@@ -109,7 +109,9 @@ class AscendQKVParallelLinear(QKVParallelLinear):
*, *,
return_bias: bool = True, return_bias: bool = True,
disable_tp: bool = False, disable_tp: bool = False,
v_head_size: int | None = None,
): ):
self.v_head_size = v_head_size if v_head_size is not None else head_size
self.custom_op, _, tp_size = get_parallel_op(disable_tp, prefix, self, self.custom_op, _, tp_size = get_parallel_op(disable_tp, prefix, self,
"column") "column")
# TODO(realliujiaxu): Replace the initialization code below with super().__init__ after linear of vllm supports custom comm group # TODO(realliujiaxu): Replace the initialization code below with super().__init__ after linear of vllm supports custom comm group