[CI] Fix server start failure when long weight loading (#7098)

### What this PR does / why we need it? When loading large models (e.g., 163 shards), weight loading can exceed the default 600s timeout. Engine startup timeout with the error: ```shell TimeoutError: Timed out waiting for engines to send initial message on input socket. ``` We should increase the `VLLM_ENGINE_READY_TIMEOUT_S ` to avoid it ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: 4034c3d32e --------- Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-13 08:52:56 +08:00
parent 7fe0469e27
commit 1f71da80eb
1 changed files with 3 additions and 0 deletions
--- a/.github/workflows/_e2e_nightly_single_node.yaml
+++ b/.github/workflows/_e2e_nightly_single_node.yaml
@@ -76,6 +76,7 @@ jobs:
      UV_INDEX_STRATEGY: unsafe-best-match
      UV_NO_CACHE: 1
      UV_SYSTEM_PYTHON: 1
+      VLLM_ENGINE_READY_TIMEOUT_S: 1800
    steps:
      - name: Check npu and CANN info
        run: |
@@ -204,6 +205,7 @@ jobs:
          VLLM_CI_RUNNER: ${{ inputs.runner }}
        working-directory: /vllm-workspace/vllm-ascend
        run: |
+          export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
          echo "Running pytest with tests path: ${{ inputs.tests }}"
          pytest -sv "${{ inputs.tests }}" \
          --ignore=tests/e2e/nightly/single_node/ops/singlecard_ops/test_fused_moe.py
@@ -217,6 +219,7 @@ jobs:
          CONFIG_YAML_PATH: ${{ inputs.config_file_path }}
        working-directory: /vllm-workspace/vllm-ascend
        run: |
+          export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
          echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc
          echo "Running YAML-driven test with config: ${{ inputs.config_file_path }}"
          pytest -sv tests/e2e/nightly/single_node/models/scripts/test_single_node.py