Files

Li Wang e35f304419 [CI] Auto partition for test cases (#6379 )

### What this PR does / why we need it?
This patch add auto-partition feat for tests, for example, before this
pr, we are running e2e single card test for 2h40min, after the auto
partition, test case is automatically allocated into the required n
parts based on its test duration (greedy strategy) and run in parallel.
The advantage of doing this is that our overall test duration will
become 1/n of the original.

### Does this PR introduce _any_ user-facing change?
Before:
e2e single card test spend 2h40min
After:
e2e single card test spend 1h13min

### How was this patch tested?

```shell
python .github/workflows/scripts/run_suite.py --auto-partition-size 2 --auto-partition-id 0 
args=Namespace(timeout_per_file=2000, suite='e2e-singlecard', auto_partition_id=0, auto_partition_size=2, continue_on_error=False, enable_retry=False, max_attempts=2, retry_wait_seconds=60, retry_timeout_increase=600)
+----------------+--------------------+
| Suite          | Partition          |
|----------------+--------------------|
| e2e-singlecard | 1/2 (0-based id=0) |
+----------------+--------------------+
✅ Enabled 13 test(s) (est total 4020.0s):
  - tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py (est_time=1800)
  - tests/e2e/singlecard/test_aclgraph_accuracy.py (est_time=480)
  - tests/e2e/singlecard/test_guided_decoding.py (est_time=354)
  - tests/e2e/singlecard/test_batch_invariant.py (est_time=320)
  - tests/e2e/singlecard/pooling/test_embedding.py (est_time=270)
  - tests/e2e/singlecard/test_quantization.py (est_time=200)
  - tests/e2e/singlecard/test_llama32_lora.py (est_time=162)
  - tests/e2e/singlecard/test_cpu_offloading.py (est_time=132)
  - tests/e2e/singlecard/pooling/test_classification.py (est_time=120)
  - tests/e2e/singlecard/test_camem.py (est_time=77)
  - tests/e2e/singlecard/compile/test_norm_quant_fusion.py (est_time=70)
  - tests/e2e/singlecard/test_auto_fit_max_mode_len.py (est_time=25)
  - tests/e2e/singlecard/test_profile_execute_duration.py (est_time=10)

(base) wangli@Mac-mini vllm-ascend % python .github/workflows/scripts/run_suite.py --auto-partition-size 2 --auto-partition-id 1 
args=Namespace(timeout_per_file=2000, suite='e2e-singlecard', auto_partition_id=1, auto_partition_size=2, continue_on_error=False, enable_retry=False, max_attempts=2, retry_wait_seconds=60, retry_timeout_increase=600)
+----------------+--------------------+
| Suite          | Partition          |
|----------------+--------------------|
| e2e-singlecard | 2/2 (0-based id=1) |
+----------------+--------------------+
✅ Enabled 13 test(s) (est total 4025.0s):
  - tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py (est_time=1500)
  - tests/e2e/singlecard/pooling/test_scoring.py (est_time=500)
  - tests/e2e/singlecard/test_aclgraph_batch_invariant.py (est_time=410)
  - tests/e2e/singlecard/test_vlm.py (est_time=354)
  - tests/e2e/singlecard/test_models.py (est_time=300)
  - tests/e2e/singlecard/test_multistream_overlap_shared_expert.py (est_time=200)
  - tests/e2e/singlecard/test_sampler.py (est_time=200)
  - tests/e2e/singlecard/test_async_scheduling.py (est_time=150)
  - tests/e2e/singlecard/test_aclgraph_mem.py (est_time=130)
  - tests/e2e/singlecard/test_ilama_lora.py (est_time=95)
  - tests/e2e/singlecard/test_completion_with_prompt_embeds.py (est_time=76)
  - tests/e2e/singlecard/test_qwen3_multi_loras.py (est_time=65)
  - tests/e2e/singlecard/test_xlite.py (est_time=45)
```
- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: wangli <wangli858794774@gmail.com>

2026-01-29 20:28:10 +08:00

3.9 KiB

Raw Blame History

E2E Test Workflow Guide

This document provides a guide on how to manage and extend the E2E test suite for vllm-ascend. It covers how to add new test cases and understand the automatic partitioning mechanism.

1. Adding a New Test Case

All E2E test cases are defined and managed in the .github/workflows/scripts/config.yaml file.

Steps

Prepare the Test Script: Ensure your test script (.py file) is placed in the appropriate location under the tests/e2e/ directory (e.g., tests/e2e/singlecard/ or tests/e2e/multicard/).
Modify config.yaml: Open .github/workflows/scripts/config.yaml and locate the corresponding test suite (e.g., e2e-singlecard or e2e-multicard-2-cards).
Add Configuration Entry: Add a new entry under the corresponding list. Each entry contains the following fields:
- name: The relative path to the test file. If you only need to run a specific test function within the file, use :: as a separator, e.g., path/to/test.py::test_func.
- estimated_time: The estimated time (in seconds) required to run the test. This field is crucial as it is used for automatic load balancing (partitioning).
- is_skipped (Optional): If set to true, the test will be skipped.

Example

Suppose you want to add a new test named tests/e2e/singlecard/test_new_feature.py with an estimated runtime of 120 seconds:

suites:
  e2e-singlecard:
    # ... other existing tests ...
    - name: tests/e2e/singlecard/test_new_feature.py
      estimated_time: 120

To add a specific test function:

    - name: tests/e2e/singlecard/test_new_feature.py::test_specific_case
      estimated_time: 60

2. Automatic Partitioning Mechanism

To speed up CI execution, we support splitting large test suites into multiple parallel Jobs (partitions). The partitioning logic is primarily implemented in the auto_partition function in .github/workflows/scripts/run_suite.py.

Principle

The partitioning algorithm uses a Greedy Approach to achieve load balancing, aiming to make the total estimated runtime of each partition as equal as possible.

Read Configuration: The script reads all non-skipped test cases and their estimated_time from config.yaml.
Sort: Test cases are sorted by estimated_time in descending order.
Assign: Iterating through the sorted test cases, each case is assigned to the partition (Bucket) with the current minimum total time.

How to Modify Partitioning Logic

If you need to adjust the partitioning strategy, please modify the .github/workflows/scripts/run_suite.py file.

Algorithm Location: auto_partition function.
Input Parameters:
- files: List of test files (including estimated_time).
- rank: Index of the current partition (0 to size-1).
- size: Total number of partitions.

Invocation: CI workflows (e.g., .github/workflows/_e2e_test.yaml) call the script via command-line arguments:

python3 .github/workflows/scripts/run_suite.py --suite <suite_name> --auto-partition-id <index> --auto-partition-size <total_count>

Notes

Accurate Estimated Time: To achieve the best load balancing, please provide an accurate estimated_time in config.yaml. If a new test is very time-consuming but the estimated time is set too low, it may cause a specific partition to timeout.
Number of Partitions: The number of partitions (auto-partition-size) is typically defined in the strategy.matrix of the GitHub Actions workflow definition file (e.g., _e2e_test.yaml).

3. Running Tests Locally

You can use the run_suite.py script to run test suites locally:

# Run the full e2e-singlecard suite
python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard

# Simulate partitioned execution (e.g., partition 0 of 2)
python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard --auto-partition-id 0 --auto-partition-size 2

3.9 KiB Raw Blame History