[CI]Disable early exit to complete all tests (#6482)
### What this PR does / why we need it?
1. Disable the feature to exit early upon encountering an error in order
to complete all tests.
2. Within each partition, tests are re-sorted by `estimated_time` in
ascending order. This allows the CI to cover as many test cases as
possible in the early stages.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.14.1
- vLLM main:
dc917cceb8
---------
Signed-off-by: MrZ20 <2609716663@qq.com>
This commit is contained in:
4
.github/workflows/READMD.md
vendored
4
.github/workflows/READMD.md
vendored
@@ -47,8 +47,10 @@ To speed up CI execution, we support splitting large test suites into multiple p
|
||||
The partitioning algorithm uses a Greedy Approach to achieve load balancing, aiming to make the total estimated runtime of each partition as equal as possible.
|
||||
|
||||
1. **Read Configuration**: The script reads all non-skipped test cases and their `estimated_time` from `config.yaml`.
|
||||
2. **Sort**: Test cases are sorted by `estimated_time` in descending order.
|
||||
2. **Sort(Balanced Assignment)**: Test cases are sorted by `estimated_time` in descending order. This ensures that the heaviest tasks are distributed first to achieve optimal load balancing across partitions.
|
||||
3. **Assign**: Iterating through the sorted test cases, each case is assigned to the partition (Bucket) with the current minimum total time.
|
||||
4. **Re-sort (Fast Feedback)**: Within each partition, tests are re-sorted by `estimated_time` in ascending order. This allows the CI to cover as many test cases as possible in the early stages.
|
||||
> TIP: If you need to prioritize a new test case, you can temporarily set its estimated_time to 0 to ensure it runs first, then update it to the actual value later.
|
||||
|
||||
### How to Modify Partitioning Logic
|
||||
|
||||
|
||||
5
.github/workflows/_e2e_test.yaml
vendored
5
.github/workflows/_e2e_test.yaml
vendored
@@ -25,6 +25,7 @@ jobs:
|
||||
if: ${{ inputs.type == 'light' }}
|
||||
runs-on: linux-aarch64-a2b3-1
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
part: [0]
|
||||
container:
|
||||
@@ -89,6 +90,7 @@ jobs:
|
||||
if: ${{ inputs.type == 'full' }}
|
||||
runs-on: linux-aarch64-a2b3-1
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
part: [0, 1]
|
||||
container:
|
||||
@@ -153,6 +155,7 @@ jobs:
|
||||
if: ${{ inputs.type == 'light' }}
|
||||
runs-on: linux-aarch64-a3-2
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
part: [0]
|
||||
container:
|
||||
@@ -216,6 +219,7 @@ jobs:
|
||||
if: ${{ inputs.type == 'full' }}
|
||||
runs-on: linux-aarch64-a3-2
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
part: [0]
|
||||
container:
|
||||
@@ -287,6 +291,7 @@ jobs:
|
||||
if: ${{ inputs.type == 'full' }}
|
||||
runs-on: linux-aarch64-a3-4
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
part: [0]
|
||||
container:
|
||||
|
||||
3
.github/workflows/scripts/run_suite.py
vendored
3
.github/workflows/scripts/run_suite.py
vendored
@@ -74,6 +74,7 @@ def auto_partition(files, rank, size):
|
||||
|
||||
# Return the files corresponding to the indices in the specified rank's partition
|
||||
indices = partitions[rank]
|
||||
indices.sort(key=lambda i: files[i].estimated_time)
|
||||
return [files[i] for i in indices]
|
||||
|
||||
|
||||
@@ -189,7 +190,7 @@ def main():
|
||||
arg_parser.add_argument(
|
||||
"--continue-on-error",
|
||||
action="store_true",
|
||||
default=False,
|
||||
default=True,
|
||||
help="Continue running remaining tests even if one fails (useful for nightly tests)",
|
||||
)
|
||||
args = arg_parser.parse_args()
|
||||
|
||||
Reference in New Issue
Block a user