Files
xc-llm-ascend/.github/workflows/READMD.md
Li Wang e35f304419 [CI] Auto partition for test cases (#6379)
### What this PR does / why we need it?
This patch add auto-partition feat for tests, for example, before this
pr, we are running e2e single card test for 2h40min, after the auto
partition, test case is automatically allocated into the required n
parts based on its test duration (greedy strategy) and run in parallel.
The advantage of doing this is that our overall test duration will
become 1/n of the original.

### Does this PR introduce _any_ user-facing change?
Before:
e2e single card test spend 2h40min
After:
e2e single card test spend 1h13min

### How was this patch tested?

```shell
python .github/workflows/scripts/run_suite.py --auto-partition-size 2 --auto-partition-id 0 
args=Namespace(timeout_per_file=2000, suite='e2e-singlecard', auto_partition_id=0, auto_partition_size=2, continue_on_error=False, enable_retry=False, max_attempts=2, retry_wait_seconds=60, retry_timeout_increase=600)
+----------------+--------------------+
| Suite          | Partition          |
|----------------+--------------------|
| e2e-singlecard | 1/2 (0-based id=0) |
+----------------+--------------------+
 Enabled 13 test(s) (est total 4020.0s):
  - tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py (est_time=1800)
  - tests/e2e/singlecard/test_aclgraph_accuracy.py (est_time=480)
  - tests/e2e/singlecard/test_guided_decoding.py (est_time=354)
  - tests/e2e/singlecard/test_batch_invariant.py (est_time=320)
  - tests/e2e/singlecard/pooling/test_embedding.py (est_time=270)
  - tests/e2e/singlecard/test_quantization.py (est_time=200)
  - tests/e2e/singlecard/test_llama32_lora.py (est_time=162)
  - tests/e2e/singlecard/test_cpu_offloading.py (est_time=132)
  - tests/e2e/singlecard/pooling/test_classification.py (est_time=120)
  - tests/e2e/singlecard/test_camem.py (est_time=77)
  - tests/e2e/singlecard/compile/test_norm_quant_fusion.py (est_time=70)
  - tests/e2e/singlecard/test_auto_fit_max_mode_len.py (est_time=25)
  - tests/e2e/singlecard/test_profile_execute_duration.py (est_time=10)

(base) wangli@Mac-mini vllm-ascend % python .github/workflows/scripts/run_suite.py --auto-partition-size 2 --auto-partition-id 1 
args=Namespace(timeout_per_file=2000, suite='e2e-singlecard', auto_partition_id=1, auto_partition_size=2, continue_on_error=False, enable_retry=False, max_attempts=2, retry_wait_seconds=60, retry_timeout_increase=600)
+----------------+--------------------+
| Suite          | Partition          |
|----------------+--------------------|
| e2e-singlecard | 2/2 (0-based id=1) |
+----------------+--------------------+
 Enabled 13 test(s) (est total 4025.0s):
  - tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py (est_time=1500)
  - tests/e2e/singlecard/pooling/test_scoring.py (est_time=500)
  - tests/e2e/singlecard/test_aclgraph_batch_invariant.py (est_time=410)
  - tests/e2e/singlecard/test_vlm.py (est_time=354)
  - tests/e2e/singlecard/test_models.py (est_time=300)
  - tests/e2e/singlecard/test_multistream_overlap_shared_expert.py (est_time=200)
  - tests/e2e/singlecard/test_sampler.py (est_time=200)
  - tests/e2e/singlecard/test_async_scheduling.py (est_time=150)
  - tests/e2e/singlecard/test_aclgraph_mem.py (est_time=130)
  - tests/e2e/singlecard/test_ilama_lora.py (est_time=95)
  - tests/e2e/singlecard/test_completion_with_prompt_embeds.py (est_time=76)
  - tests/e2e/singlecard/test_qwen3_multi_loras.py (est_time=65)
  - tests/e2e/singlecard/test_xlite.py (est_time=45)
```
- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-01-29 20:28:10 +08:00

84 lines
3.9 KiB
Markdown

# E2E Test Workflow Guide
This document provides a guide on how to manage and extend the E2E test suite for `vllm-ascend`. It covers how to add new test cases and understand the automatic partitioning mechanism.
## 1. Adding a New Test Case
All E2E test cases are defined and managed in the `.github/workflows/scripts/config.yaml` file.
### Steps
1. **Prepare the Test Script**: Ensure your test script (`.py` file) is placed in the appropriate location under the `tests/e2e/` directory (e.g., `tests/e2e/singlecard/` or `tests/e2e/multicard/`).
2. **Modify `config.yaml`**:
Open `.github/workflows/scripts/config.yaml` and locate the corresponding test suite (e.g., `e2e-singlecard` or `e2e-multicard-2-cards`).
3. **Add Configuration Entry**:
Add a new entry under the corresponding list. Each entry contains the following fields:
* `name`: The relative path to the test file. If you only need to run a specific test function within the file, use `::` as a separator, e.g., `path/to/test.py::test_func`.
* `estimated_time`: The estimated time (in seconds) required to run the test. **This field is crucial** as it is used for automatic load balancing (partitioning).
* `is_skipped` (Optional): If set to `true`, the test will be skipped.
### Example
Suppose you want to add a new test named `tests/e2e/singlecard/test_new_feature.py` with an estimated runtime of 120 seconds:
```yaml
suites:
e2e-singlecard:
# ... other existing tests ...
- name: tests/e2e/singlecard/test_new_feature.py
estimated_time: 120
```
To add a specific test function:
```yaml
- name: tests/e2e/singlecard/test_new_feature.py::test_specific_case
estimated_time: 60
```
## 2. Automatic Partitioning Mechanism
To speed up CI execution, we support splitting large test suites into multiple parallel Jobs (partitions). The partitioning logic is primarily implemented in the `auto_partition` function in `.github/workflows/scripts/run_suite.py`.
### Principle
The partitioning algorithm uses a Greedy Approach to achieve load balancing, aiming to make the total estimated runtime of each partition as equal as possible.
1. **Read Configuration**: The script reads all non-skipped test cases and their `estimated_time` from `config.yaml`.
2. **Sort**: Test cases are sorted by `estimated_time` in descending order.
3. **Assign**: Iterating through the sorted test cases, each case is assigned to the partition (Bucket) with the current minimum total time.
### How to Modify Partitioning Logic
If you need to adjust the partitioning strategy, please modify the `.github/workflows/scripts/run_suite.py` file.
* **Algorithm Location**: `auto_partition` function.
* **Input Parameters**:
* `files`: List of test files (including `estimated_time`).
* `rank`: Index of the current partition (0 to size-1).
* `size`: Total number of partitions.
* **Invocation**:
CI workflows (e.g., `.github/workflows/_e2e_test.yaml`) call the script via command-line arguments:
```bash
python3 .github/workflows/scripts/run_suite.py --suite <suite_name> --auto-partition-id <index> --auto-partition-size <total_count>
```
### Notes
* **Accurate Estimated Time**: To achieve the best load balancing, please provide an accurate `estimated_time` in `config.yaml`. If a new test is very time-consuming but the estimated time is set too low, it may cause a specific partition to timeout.
* **Number of Partitions**: The number of partitions (`auto-partition-size`) is typically defined in the `strategy.matrix` of the GitHub Actions workflow definition file (e.g., `_e2e_test.yaml`).
## 3. Running Tests Locally
You can use the `run_suite.py` script to run test suites locally:
```bash
# Run the full e2e-singlecard suite
python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard
# Simulate partitioned execution (e.g., partition 0 of 2)
python3 .github/workflows/scripts/run_suite.py --suite e2e-singlecard --auto-partition-id 0 --auto-partition-size 2
```