### What this PR does / why we need it?
This patch add auto-partition feat for tests, for example, before this
pr, we are running e2e single card test for 2h40min, after the auto
partition, test case is automatically allocated into the required n
parts based on its test duration (greedy strategy) and run in parallel.
The advantage of doing this is that our overall test duration will
become 1/n of the original.
### Does this PR introduce _any_ user-facing change?
Before:
e2e single card test spend 2h40min
After:
e2e single card test spend 1h13min
### How was this patch tested?
```shell
python .github/workflows/scripts/run_suite.py --auto-partition-size 2 --auto-partition-id 0
args=Namespace(timeout_per_file=2000, suite='e2e-singlecard', auto_partition_id=0, auto_partition_size=2, continue_on_error=False, enable_retry=False, max_attempts=2, retry_wait_seconds=60, retry_timeout_increase=600)
+----------------+--------------------+
| Suite | Partition |
|----------------+--------------------|
| e2e-singlecard | 1/2 (0-based id=0) |
+----------------+--------------------+
✅ Enabled 13 test(s) (est total 4020.0s):
- tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py (est_time=1800)
- tests/e2e/singlecard/test_aclgraph_accuracy.py (est_time=480)
- tests/e2e/singlecard/test_guided_decoding.py (est_time=354)
- tests/e2e/singlecard/test_batch_invariant.py (est_time=320)
- tests/e2e/singlecard/pooling/test_embedding.py (est_time=270)
- tests/e2e/singlecard/test_quantization.py (est_time=200)
- tests/e2e/singlecard/test_llama32_lora.py (est_time=162)
- tests/e2e/singlecard/test_cpu_offloading.py (est_time=132)
- tests/e2e/singlecard/pooling/test_classification.py (est_time=120)
- tests/e2e/singlecard/test_camem.py (est_time=77)
- tests/e2e/singlecard/compile/test_norm_quant_fusion.py (est_time=70)
- tests/e2e/singlecard/test_auto_fit_max_mode_len.py (est_time=25)
- tests/e2e/singlecard/test_profile_execute_duration.py (est_time=10)
(base) wangli@Mac-mini vllm-ascend % python .github/workflows/scripts/run_suite.py --auto-partition-size 2 --auto-partition-id 1
args=Namespace(timeout_per_file=2000, suite='e2e-singlecard', auto_partition_id=1, auto_partition_size=2, continue_on_error=False, enable_retry=False, max_attempts=2, retry_wait_seconds=60, retry_timeout_increase=600)
+----------------+--------------------+
| Suite | Partition |
|----------------+--------------------|
| e2e-singlecard | 2/2 (0-based id=1) |
+----------------+--------------------+
✅ Enabled 13 test(s) (est total 4025.0s):
- tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py (est_time=1500)
- tests/e2e/singlecard/pooling/test_scoring.py (est_time=500)
- tests/e2e/singlecard/test_aclgraph_batch_invariant.py (est_time=410)
- tests/e2e/singlecard/test_vlm.py (est_time=354)
- tests/e2e/singlecard/test_models.py (est_time=300)
- tests/e2e/singlecard/test_multistream_overlap_shared_expert.py (est_time=200)
- tests/e2e/singlecard/test_sampler.py (est_time=200)
- tests/e2e/singlecard/test_async_scheduling.py (est_time=150)
- tests/e2e/singlecard/test_aclgraph_mem.py (est_time=130)
- tests/e2e/singlecard/test_ilama_lora.py (est_time=95)
- tests/e2e/singlecard/test_completion_with_prompt_embeds.py (est_time=76)
- tests/e2e/singlecard/test_qwen3_multi_loras.py (est_time=65)
- tests/e2e/singlecard/test_xlite.py (est_time=45)
```
- vLLM version: v0.14.1
- vLLM main:
dc917cceb8
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
3.9 KiB
E2E Test Workflow Guide
This document provides a guide on how to manage and extend the E2E test suite for vllm-ascend. It covers how to add new test cases and understand the automatic partitioning mechanism.
1. Adding a New Test Case
All E2E test cases are defined and managed in the .github/workflows/scripts/config.yaml file.
Steps
-
Prepare the Test Script: Ensure your test script (
.pyfile) is placed in the appropriate location under thetests/e2e/directory (e.g.,tests/e2e/singlecard/ortests/e2e/multicard/). -
Modify
config.yaml: Open.github/workflows/scripts/config.yamland locate the corresponding test suite (e.g.,e2e-singlecardore2e-multicard-2-cards). -
Add Configuration Entry: Add a new entry under the corresponding list. Each entry contains the following fields:
name: The relative path to the test file. If you only need to run a specific test function within the file, use::as a separator, e.g.,path/to/test.py::test_func.estimated_time: The estimated time (in seconds) required to run the test. This field is crucial as it is used for automatic load balancing (partitioning).is_skipped(Optional): If set totrue, the test will be skipped.
Example
Suppose you want to add a new test named tests/e2e/singlecard/test_new_feature.py with an estimated runtime of 120 seconds:
suites:
e2e-singlecard:
# ... other existing tests ...
- name: tests/e2e/singlecard/test_new_feature.py
estimated_time: 120
To add a specific test function:
- name: tests/e2e/singlecard/test_new_feature.py::test_specific_case
estimated_time: 60
2. Automatic Partitioning Mechanism
To speed up CI execution, we support splitting large test suites into multiple parallel Jobs (partitions). The partitioning logic is primarily implemented in the auto_partition function in .github/workflows/scripts/run_suite.py.
Principle
The partitioning algorithm uses a Greedy Approach to achieve load balancing, aiming to make the total estimated runtime of each partition as equal as possible.
- Read Configuration: The script reads all non-skipped test cases and their
estimated_timefromconfig.yaml. - Sort: Test cases are sorted by
estimated_timein descending order. - Assign: Iterating through the sorted test cases, each case is assigned to the partition (Bucket) with the current minimum total time.
How to Modify Partitioning Logic
If you need to adjust the partitioning strategy, please modify the .github/workflows/scripts/run_suite.py file.
- Algorithm Location:
auto_partitionfunction. - Input Parameters:
files: List of test files (includingestimated_time).rank: Index of the current partition (0 to size-1).size: Total number of partitions.
- Invocation:
CI workflows (e.g.,
.github/workflows/_e2e_test.yaml) call the script via command-line arguments:python3 .github/workflows/scripts/run_suite.py --suite <suite_name> --auto-partition-id <index> --auto-partition-size <total_count>
Notes
- Accurate Estimated Time: To achieve the best load balancing, please provide an accurate
estimated_timeinconfig.yaml. If a new test is very time-consuming but the estimated time is set too low, it may cause a specific partition to timeout. - Number of Partitions: The number of partitions (
auto-partition-size) is typically defined in thestrategy.matrixof the GitHub Actions workflow definition file (e.g.,_e2e_test.yaml).
3. Running Tests Locally
You can use the run_suite.py script to run test suites locally:
# Run the full e2e-singlecard suite
python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard
# Simulate partitioned execution (e.g., partition 0 of 2)
python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard --auto-partition-id 0 --auto-partition-size 2