[Main2Main] Upgrade vllm commit to 1230 (#5495)

### What this PR does / why we need it?

Upgrade vllm commit to 1230

Affected by https://github.com/vllm-project/vllm/pull/27614 (and the
core PR https://github.com/vllm-project/vllm/pull/26866), we have to
make the following changes:

1. Modify `tests/e2e/multicard/test_aclgraph_capture_replay.py` to keep
compatible with both vllm version of `v0.13.0` and latest main commitID,
while vllm enables async scheduling by default
2. Skip `test_guided_decoding.py` due to xgrammar errors
(https://github.com/vllm-project/vllm-ascend/issues/5524)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
45c1ca1ca1

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>
This commit is contained in:
wjunLu
2025-12-31 09:44:35 +08:00
committed by GitHub
parent 5d9fde9819
commit 3c2d3e52e5
6 changed files with 10 additions and 7 deletions

View File

@@ -110,7 +110,8 @@ jobs:
pytest -sv --durations=0 tests/e2e/singlecard/test_completion_with_prompt_embeds.py pytest -sv --durations=0 tests/e2e/singlecard/test_completion_with_prompt_embeds.py
pytest -sv --durations=0 tests/e2e/singlecard/test_aclgraph_accuracy.py pytest -sv --durations=0 tests/e2e/singlecard/test_aclgraph_accuracy.py
pytest -sv --durations=0 tests/e2e/singlecard/test_async_scheduling.py pytest -sv --durations=0 tests/e2e/singlecard/test_async_scheduling.py
pytest -sv --durations=0 tests/e2e/singlecard/test_guided_decoding.py # xgrammar has parameter mismatching bug, please follows: https://github.com/vllm-project/vllm-ascend/issues/5524
# pytest -sv --durations=0 tests/e2e/singlecard/test_guided_decoding.py
# torch 2.8 doesn't work with lora, fix me # torch 2.8 doesn't work with lora, fix me
#pytest -sv --durations=0 tests/e2e/singlecard/test_ilama_lora.py #pytest -sv --durations=0 tests/e2e/singlecard/test_ilama_lora.py
pytest -sv --durations=0 tests/e2e/singlecard/test_profile_execute_duration.py pytest -sv --durations=0 tests/e2e/singlecard/test_profile_execute_duration.py

View File

@@ -34,7 +34,7 @@ jobs:
steps: steps:
- name: Get vLLM version - name: Get vLLM version
run: | run: |
VLLM_COMMIT=45c1ca1ca1ee8fa06df263c8715e8a412ff408d4 VLLM_COMMIT=7157596103666ee7ccb7008acee8bff8a8ff1731
echo "VLLM_COMMIT=https://github.com/vllm-project/vllm/commit/$VLLM_COMMIT" >> $GITHUB_ENV echo "VLLM_COMMIT=https://github.com/vllm-project/vllm/commit/$VLLM_COMMIT" >> $GITHUB_ENV
- name: Checkout repository - name: Checkout repository

View File

@@ -74,7 +74,7 @@ jobs:
name: e2e-full name: e2e-full
strategy: strategy:
matrix: matrix:
vllm_version: [45c1ca1ca1ee8fa06df263c8715e8a412ff408d4, v0.13.0] vllm_version: [7157596103666ee7ccb7008acee8bff8a8ff1731, v0.13.0]
needs: [changes] needs: [changes]
if: ${{ needs.changes.outputs.e2e_tracker == 'true' }} if: ${{ needs.changes.outputs.e2e_tracker == 'true' }}
uses: ./.github/workflows/_e2e_test.yaml uses: ./.github/workflows/_e2e_test.yaml

View File

@@ -42,7 +42,7 @@ jobs:
lint: lint:
uses: ./.github/workflows/_pre_commit.yml uses: ./.github/workflows/_pre_commit.yml
with: with:
vllm: 45c1ca1ca1ee8fa06df263c8715e8a412ff408d4 vllm: 7157596103666ee7ccb7008acee8bff8a8ff1731
changes: changes:
runs-on: linux-aarch64-a2-0 runs-on: linux-aarch64-a2-0
outputs: outputs:
@@ -90,7 +90,7 @@ jobs:
SOC_VERSION: ascend910b1 SOC_VERSION: ascend910b1
strategy: strategy:
matrix: matrix:
vllm_version: [45c1ca1ca1ee8fa06df263c8715e8a412ff408d4, v0.13.0] vllm_version: [7157596103666ee7ccb7008acee8bff8a8ff1731, v0.13.0]
steps: steps:
- name: Free up disk space - name: Free up disk space
@@ -163,7 +163,7 @@ jobs:
name: e2e-light name: e2e-light
strategy: strategy:
matrix: matrix:
vllm_version: [45c1ca1ca1ee8fa06df263c8715e8a412ff408d4, v0.13.0] vllm_version: [7157596103666ee7ccb7008acee8bff8a8ff1731, v0.13.0]
# Note (yikun): If CI resource are limited we can split job into two chain jobs # Note (yikun): If CI resource are limited we can split job into two chain jobs
needs: [lint, changes] needs: [lint, changes]
# only trigger e2e test after lint passed and the change is e2e related with pull request. # only trigger e2e test after lint passed and the change is e2e related with pull request.

View File

@@ -51,7 +51,7 @@ If you're using v0.7.3, don't forget to install [mindie-turbo](https://pypi.org/
For main branch of vLLM Ascend, we usually make it compatible with the latest vLLM release and a newer commit hash of vLLM. Please note that this table is usually updated. Please check it regularly. For main branch of vLLM Ascend, we usually make it compatible with the latest vLLM release and a newer commit hash of vLLM. Please note that this table is usually updated. Please check it regularly.
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | | vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
|-------------|--------------|------------------|-------------|--------------------| |-------------|--------------|------------------|-------------|--------------------|
| main | 45c1ca1ca1ee8fa06df263c8715e8a412ff408d4, v0.13.0 tag | >= 3.10, < 3.12 | 8.3.RC2 | 2.8.0 / 2.8.0 | | main | 7157596103666ee7ccb7008acee8bff8a8ff1731, v0.13.0 tag | >= 3.10, < 3.12 | 8.3.RC2 | 2.8.0 / 2.8.0 |
## Release cadence ## Release cadence

View File

@@ -107,6 +107,8 @@ def _run_worker_process(
quantization="ascend" if "W8A8" in model_path else None, quantization="ascend" if "W8A8" in model_path else None,
enable_expert_parallel=True if "DeepSeek" in model_path else False, enable_expert_parallel=True if "DeepSeek" in model_path else False,
trust_remote_code=True, trust_remote_code=True,
# vllm enables async scheduling by default, remove below when vllm >= 0.14.0
async_scheduling=False,
) )
# Expose model config to the main test process # Expose model config to the main test process