### What this PR does / why we need it?
1. Disable the feature to exit early upon encountering an error in order
to complete all tests.
2. Within each partition, tests are re-sorted by `estimated_time` in
ascending order. This allows the CI to cover as many test cases as
possible in the early stages.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.14.1
- vLLM main:
dc917cceb8
---------
Signed-off-by: MrZ20 <2609716663@qq.com>
4.4 KiB
E2E Test Workflow Guide
This document provides a guide on how to manage and extend the E2E test suite for vllm-ascend. It covers how to add new test cases and understand the automatic partitioning mechanism.
1. Adding a New Test Case
All E2E test cases are defined and managed in the .github/workflows/scripts/config.yaml file.
Steps
-
Prepare the Test Script: Ensure your test script (
.pyfile) is placed in the appropriate location under thetests/e2e/directory (e.g.,tests/e2e/singlecard/ortests/e2e/multicard/). -
Modify
config.yaml: Open.github/workflows/scripts/config.yamland locate the corresponding test suite (e.g.,e2e-singlecardore2e-multicard-2-cards). -
Add Configuration Entry: Add a new entry under the corresponding list. Each entry contains the following fields:
name: The relative path to the test file. If you only need to run a specific test function within the file, use::as a separator, e.g.,path/to/test.py::test_func.estimated_time: The estimated time (in seconds) required to run the test. This field is crucial as it is used for automatic load balancing (partitioning).is_skipped(Optional): If set totrue, the test will be skipped.
Example
Suppose you want to add a new test named tests/e2e/singlecard/test_new_feature.py with an estimated runtime of 120 seconds:
suites:
e2e-singlecard:
# ... other existing tests ...
- name: tests/e2e/singlecard/test_new_feature.py
estimated_time: 120
To add a specific test function:
- name: tests/e2e/singlecard/test_new_feature.py::test_specific_case
estimated_time: 60
2. Automatic Partitioning Mechanism
To speed up CI execution, we support splitting large test suites into multiple parallel Jobs (partitions). The partitioning logic is primarily implemented in the auto_partition function in .github/workflows/scripts/run_suite.py.
Principle
The partitioning algorithm uses a Greedy Approach to achieve load balancing, aiming to make the total estimated runtime of each partition as equal as possible.
- Read Configuration: The script reads all non-skipped test cases and their
estimated_timefromconfig.yaml. - Sort(Balanced Assignment): Test cases are sorted by
estimated_timein descending order. This ensures that the heaviest tasks are distributed first to achieve optimal load balancing across partitions. - Assign: Iterating through the sorted test cases, each case is assigned to the partition (Bucket) with the current minimum total time.
- Re-sort (Fast Feedback): Within each partition, tests are re-sorted by
estimated_timein ascending order. This allows the CI to cover as many test cases as possible in the early stages.TIP: If you need to prioritize a new test case, you can temporarily set its estimated_time to 0 to ensure it runs first, then update it to the actual value later.
How to Modify Partitioning Logic
If you need to adjust the partitioning strategy, please modify the .github/workflows/scripts/run_suite.py file.
- Algorithm Location:
auto_partitionfunction. - Input Parameters:
files: List of test files (includingestimated_time).rank: Index of the current partition (0 to size-1).size: Total number of partitions.
- Invocation:
CI workflows (e.g.,
.github/workflows/_e2e_test.yaml) call the script via command-line arguments:python3 .github/workflows/scripts/run_suite.py --suite <suite_name> --auto-partition-id <index> --auto-partition-size <total_count>
Notes
- Accurate Estimated Time: To achieve the best load balancing, please provide an accurate
estimated_timeinconfig.yaml. If a new test is very time-consuming but the estimated time is set too low, it may cause a specific partition to timeout. - Number of Partitions: The number of partitions (
auto-partition-size) is typically defined in thestrategy.matrixof the GitHub Actions workflow definition file (e.g.,_e2e_test.yaml).
3. Running Tests Locally
You can use the run_suite.py script to run test suites locally:
# Run the full e2e-singlecard suite
python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard
# Simulate partitioned execution (e.g., partition 0 of 2)
python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard --auto-partition-id 0 --auto-partition-size 2