16 Commits

Author SHA1 Message Date
SILONG ZENG
1e3c1e76bf [Lint]Add lint hooks for clang-format, shellcheck, forbidden imports, and boolean context manager checks (#7511)
### What this PR does / why we need it?
This PR introduces several upstream `vllm`-aligned lint hooks into
`vllm-ascend` and makes them part of the actual `pre-commit` flow.

Main changes in this PR:
- add `check-boolean-context-manager` to catch boolean expressions in
`with` statements
- add `check-forbidden-imports` to forbid direct `re` imports and
disallowed direct `triton` imports
- enable shell script linting through `tools/shellcheck.sh`
- add root `.clang-format` aligned with upstream `vllm`, enable
`clang-format` in `pre-commit`, temporarily **exclude all `csrc/**`**
from `clang-format` to avoid bringing a large native code reformat into
this PR

This PR focuses on landing the smaller and immediately useful lint
alignment first, without mixing in the larger requirements-management
migration.

### Does this PR introduce _any_ user-facing change?
No.

This PR only updates repository lint configuration, static checks, and
internal import/style enforcement. It does not change runtime behavior
or public interfaces.

### How was this patch tested?
Tested locally in the project virtual environment.

Commands used:
```bash
bash format.sh
```
Verified checks passed:
``` bash
ruff check...............................................................Passed
ruff format..............................................................Passed
codespell................................................................Passed
typos....................................................................Passed
clang-format.............................................................Passed
Lint GitHub Actions workflow files.......................................Passed
Lint shell scripts.......................................................Passed
Lint PNG exports from excalidraw.........................................Passed
Check for spaces in all filenames........................................Passed
Enforce __init__.py in Python packages...................................Passed
Check for forbidden imports..............................................Passed
Check for boolean ops in with-statements.................................Passed
Suggestion...............................................................Passed
- hook id: suggestion
- duration: 0s

To bypass pre-commit hooks, add --no-verify to git commit.
```
**note:**
clang-format is enabled but currently excludes all csrc/**


- vLLM version: v0.17.0
- vLLM main:
8b6325758c

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-03-24 20:03:01 +08:00
SILONG ZENG
859f2c25b9 [Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml (#6503)
### What this PR does / why we need it?
This PR refactors the nightly single-node model test by migrating test
configurations from Python scripts to a more maintainable `YAML-based`
format.

| Original PR | Python (`.py`) | YAML (`.yaml`) |
| :--- | :--- | :--- |
| [#3568](https://github.com/vllm-project/vllm-ascend/pull/3568) |
`test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#3631](https://github.com/vllm-project/vllm-ascend/pull/3631) |
`test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#5874](https://github.com/vllm-project/vllm-ascend/pull/5874) |
`test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` |
| [#3908](https://github.com/vllm-project/vllm-ascend/pull/3908) |
`test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` |
| [#5682](https://github.com/vllm-project/vllm-ascend/pull/5682) |
`test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` |
| [#4111](https://github.com/vllm-project/vllm-ascend/pull/4111) |
`test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml`
|
| [#3733](https://github.com/vllm-project/vllm-ascend/pull/3733) |
`test_prefix_cache_deepseek_r1_0528_w8a8.py` |
`Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` |
| [#6543](https://github.com/vllm-project/vllm-ascend/pull/6543) |
`test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#6543](https://github.com/vllm-project/vllm-ascend/pull/6543) |
`test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#3973](https://github.com/vllm-project/vllm-ascend/pull/3973) |
`test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` |
| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` |
| [#3757](https://github.com/vllm-project/vllm-ascend/pull/3757) |
`test_qwq_32b.py` | `QwQ-32B.yaml` |
| [#5616](https://github.com/vllm-project/vllm-ascend/pull/5616) |
`test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` |
| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) |
`test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` |
| [#5301](https://github.com/vllm-project/vllm-ascend/pull/5301) |
`test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` |
| [#3707](https://github.com/vllm-project/vllm-ascend/pull/3707) |
`test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` |
| [#3676](https://github.com/vllm-project/vllm-ascend/pull/3676) |
`test_qwen3_32b_int8_a3_feature_stack3.py` |
`Qwen3-32B-Int8-A3-Feature-Stack3.yaml` |
| [#3709](https://github.com/vllm-project/vllm-ascend/pull/3709) |
`test_prefix_cache_qwen3_32b_int8.py` |
`Prefix-Cache-Qwen3-32B-Int8.yaml` |
| [#5395](https://github.com/vllm-project/vllm-ascend/pull/5395) |
`test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` |
| [#3474](https://github.com/vllm-project/vllm-ascend/pull/3474) |
`test_qwen3_32b.py` | `Qwen3-32B.yaml` |
| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-03-03 20:13:43 +08:00
SILONG ZENG
523e83016b [Lint]Style: Convert root, benchmarks, tools and docs to ruff format (#5843)
### What this PR does / why we need it?
Description
This PR fixes linting issues in the root directory, benchmarks/, tools/
and docs/ to align with the project's Ruff configuration.

This is part of a gradual effort to enable full linting coverage across
the repository. The corresponding paths have been removed from the
exclude list in pyproject.toml.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
Co-authored-by: root <root@LAPTOP-VQKDDVMG.localdomain>
2026-01-13 15:29:34 +08:00
SILONG ZENG
7a6fde80b1 [CI]Add Kimi k2 nightly test (#5682)
### What this PR does / why we need it?
The PR add performance and accuracy tests for **Kimi-K2-Instruct-W8A8**
and **Kimi-K2-Thinking** models to the Nightly test suite.

#### Test Configuration
**Kimi-K2-Instruct-W8A8**
- model: vllm-ascend/Kimi-K2-Instruct-W8A8
- Hardware: A3, 2 Nodes (32 NPUs total, 16 NPUs per node)
- Architecture: Unified Distributed Inference
- Parallelism: **DP4 + TP8 + EP** (Data Parallel 4, Tensor Parallel 8,
Expert Parallel enabled).
  - Optimization: **torchair graph**, **no-prefix-caching**.
  - Node 0: DP Rank 0-1, Local DP 2, Tensor Parallel 8.
  - Node 1: DP Rank 2-3, Local DP 2, Tensor Parallel 8.
- Benchmarks:
  - Performance: vllm-ascend/GSM8K-in3500-bs2800.
  - Accuracy: vllm-ascend/gsm8k-lite.

**Kimi-K2-Thinking**
- Model: moonshotai/Kimi-K2-Thinking
- Hardware: A3, 1 Node (16 NPUs total)
- Architecture: Single Node Distributed Inference
- Parallelism: TP16 + EP (Tensor Parallel 16, Expert Parallel enabled).
  - Optimization: **no-prefix-caching**
- Benchmarks:
  - Performance: vllm-ascend/GSM8K-in3500-bs400.
  - Accuracy: vllm-ascend/gsm8k-lite.


### Does this PR introduce _any_ user-facing change?
**Yes.** This PR enhances the ```AisbenchRunner``` to support dynamic
configuration of the ```trust_remote_code``` flag. This allows the
AISBench client to successfully load tokenizers for models that require
custom code execution (e.g., **Kimi-K2-Thinking and
Kimi-K2-Instruct-W8A8**).

**Changes:**
1. ```AisbenchRunner.__init__ ```Added the ability to capture the
```trust_remote_code``` parameter from the case configuration.
``` python
         self.batch_size = aisbench_config["batch_size"]
         self.request_rate = aisbench_config.get("request_rate", 0)
+        self.trust_remote_code = aisbench_config.get("trust_remote_code", False)
         self.temperature = aisbench_config.get("temperature")
         self.top_k = aisbench_config.get("top_k")
```
2. ```AisbenchRunner._init_request_conf``` Added regex substitution to
inject the parameter into the generated dynamic configuration file.
``` python
         content = re.sub(r'batch_size.*', f'batch_size = {self.batch_size},',
                          content)
+        content = re.sub(r'trust_remote_code=.*',
+                         f'trust_remote_code={self.trust_remote_code},',
+                         content)
         content = content.replace("top_k", "#top_k")
         content = content.replace("seed", "#seed")
```

**Details:**
- New Config Key: Users can add ```"trust_remote_code": True``` to any
dictionary within the ```aisbench_cases``` list.
- Default Value: Defaults to ```False``` to maintain existing security
protocols for standard models.
- Impact: Resolves ```ValueError``` when benchmarking reasoning models
or models with custom tokenizers that previously failed during the
AISBench local initialization phase.

**User Example:**
Users can now enable custom code execution for specific models (like
Kimi-K2-Thinking) directly in their test suite:
```
# Now supported in test scripts:
aisbench_cases = [{
    "case_type": "performance",
    "request_conf": "vllm_api_stream_chat",
    "trust_remote_code": True,  # New user-facing parameter
    ...
}]
```
### How was this patch tested?
Actions:
- https://github.com/vllm-project/vllm-ascend/actions/runs/20849768433

Result as following:

- **Kimi-K2-Instruct-W8A8**(25m25s)
1. Accuracy test
```
dataset    version    metric    mode      vllm-api-general-chat
---------  ---------  --------  ------  -----------------------
gsm8k      7cd45e     accuracy  gen                       96.88
```
2. Perf test
```
╒══════════════════════════╤═════════╤════════════════╤════════════════╤═══════════════╤════════════════╤════════════════╤════════════════╤════════════════╤═════╕
│ Performance Parameters   │ Stage   │ Average        │ Min            │ Max           │ Median         │ P75            │ P90            │ P99            │  N  │
╞══════════════════════════╪═════════╪════════════════╪════════════════╪═══════════════╪════════════════╪════════════════╪════════════════╪════════════════╪═════╡
│ E2EL                     │ total   │ 34571.489 ms   │ 28657.8054 ms  │ 36294.1788 ms │ 34714.7329 ms  │ 35247.2724 ms  │ 35526.6758 ms  │ 36146.4314 ms  │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼───────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ TTFT                     │ total   │ 2043.9136 ms   │ 627.4718 ms    │ 3532.3978 ms  │ 1906.0194 ms   │ 2307.7979 ms   │ 2883.8528 ms   │ 3283.7012 ms   │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼───────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ TPOT                     │ total   │ 127.5591 ms    │ 106.4937 ms    │ 137.107 ms    │ 128.3135 ms    │ 129.5704 ms    │ 131.1332 ms    │ 134.1087 ms    │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼───────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ ITL                      │ total   │ 126.5571 ms    │ 0.0095 ms      │ 1340.783 ms   │ 104.1398 ms    │ 110.1272 ms    │ 119.6124 ms    │ 950.2924 ms    │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼───────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ InputTokens              │ total   │ 3516.6055      │ 3014.0         │ 3985.0        │ 3525.0         │ 3525.0         │ 3586.8         │ 3800.67        │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼───────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ OutputTokens             │ total   │ 256.0          │ 256.0          │ 256.0         │ 256.0          │ 256.0          │ 256.0          │ 256.0          │ 512 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼───────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ OutputTokenThroughput    │ total   │ 7.4143 token/s │ 7.0535 token/s │ 8.933 token/s │ 7.3744 token/s │ 7.4118 token/s │ 7.5608 token/s │ 8.7051 token/s │ 512 │
╘══════════════════════════╧═════════╧════════════════╧════════════════╧═══════════════╧════════════════╧════════════════╧════════════════╧════════════════╧═════╛
╒══════════════════════════╤═════════╤═══════════════════╕
│ Common Metric            │ Stage   │ Value             │
╞══════════════════════════╪═════════╪═══════════════════╡
│ Benchmark Duration       │ total   │ 279430.9375 ms    │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Requests           │ total   │ 512               │
├──────────────────────────┼─────────┼───────────────────┤
│ Failed Requests          │ total   │ 0                 │
├──────────────────────────┼─────────┼───────────────────┤
│ Success Requests         │ total   │ 512               │
├──────────────────────────┼─────────┼───────────────────┤
│ Concurrency              │ total   │ 63.3452           │
├──────────────────────────┼─────────┼───────────────────┤
│ Max Concurrency          │ total   │ 64                │
├──────────────────────────┼─────────┼───────────────────┤
│ Request Throughput       │ total   │ 1.8323 req/s      │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Input Tokens       │ total   │ 1800502           │
├──────────────────────────┼─────────┼───────────────────┤
│ Prefill Token Throughput │ total   │ 1720.5255 token/s │
├──────────────────────────┼─────────┼───────────────────┤
│ Total generated tokens   │ total   │ 131072            │
├──────────────────────────┼─────────┼───────────────────┤
│ Input Token Throughput   │ total   │ 6443.4598 token/s │
├──────────────────────────┼─────────┼───────────────────┤
│ Output Token Throughput  │ total   │ 469.0676 token/s  │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Token Throughput   │ total   │ 6912.5274 token/s │
╘══════════════════════════╧═════════╧═══════════════════╛
```

- **Kimi-K2-Thinking**(43m51s)
1. Accuracy test
```
dataset    version    metric    mode      vllm-api-general-chat
---------  ---------  --------  ------  -----------------------
gsm8k      7cd45e     accuracy  gen                      100.00
```
2. Perf test
```
╒══════════════════════════╤═════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤════════════════╤═════╕
│ Performance Parameters   │ Stage   │ Average        │ Min            │ Max            │ Median         │ P75            │ P90            │ P99            │  N  │
╞══════════════════════════╪═════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪════════════════╪═════╡
│ E2EL                     │ total   │ 172384.3573 ms │ 34456.5517 ms  │ 205922.9407 ms │ 174844.2216 ms │ 202656.092 ms  │ 204428.9502 ms │ 205468.6776 ms │ 400 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ TTFT                     │ total   │ 138740.3228 ms │ 655.1066 ms    │ 171777.3003 ms │ 141088.0561 ms │ 169237.5599 ms │ 170716.4954 ms │ 171393.1278 ms │ 400 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ TPOT                     │ total   │ 131.9374 ms    │ 90.6331 ms     │ 135.4144 ms    │ 132.405 ms     │ 132.948 ms     │ 133.7549 ms    │ 135.2543 ms    │ 400 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ ITL                      │ total   │ 130.9028 ms    │ 0.0099 ms      │ 960.3683 ms    │ 116.9623 ms    │ 122.3127 ms    │ 132.0522 ms    │ 886.4662 ms    │ 400 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ InputTokens              │ total   │ 3514.575       │ 3014.0         │ 3843.0         │ 3525.0         │ 3525.0         │ 3588.0         │ 3801.08        │ 400 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ OutputTokens             │ total   │ 256.0          │ 256.0          │ 256.0          │ 256.0          │ 256.0          │ 256.0          │ 256.0          │ 400 │
├──────────────────────────┼─────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼────────────────┼─────┤
│ OutputTokenThroughput    │ total   │ 1.6799 token/s │ 1.2432 token/s │ 7.4296 token/s │ 1.4642 token/s │ 1.4737 token/s │ 1.8754 token/s │ 7.125 token/s  │ 400 │
╘══════════════════════════╧═════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧════════════════╧═════╛
╒══════════════════════════╤═════════╤═══════════════════╕
│ Common Metric            │ Stage   │ Value             │
╞══════════════════════════╪═════════╪═══════════════════╡
│ Benchmark Duration       │ total   │ 1166795.568 ms    │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Requests           │ total   │ 400               │
├──────────────────────────┼─────────┼───────────────────┤
│ Failed Requests          │ total   │ 0                 │
├──────────────────────────┼─────────┼───────────────────┤
│ Success Requests         │ total   │ 400               │
├──────────────────────────┼─────────┼───────────────────┤
│ Concurrency              │ total   │ 59.0967           │
├──────────────────────────┼─────────┼───────────────────┤
│ Max Concurrency          │ total   │ 64                │
├──────────────────────────┼─────────┼───────────────────┤
│ Request Throughput       │ total   │ 0.3428 req/s      │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Input Tokens       │ total   │ 1405830           │
├──────────────────────────┼─────────┼───────────────────┤
│ Prefill Token Throughput │ total   │ 25.332 token/s    │
├──────────────────────────┼─────────┼───────────────────┤
│ Total generated tokens   │ total   │ 102400            │
├──────────────────────────┼─────────┼───────────────────┤
│ Input Token Throughput   │ total   │ 1204.864 token/s  │
├──────────────────────────┼─────────┼───────────────────┤
│ Output Token Throughput  │ total   │ 87.7617 token/s   │
├──────────────────────────┼─────────┼───────────────────┤
│ Total Token Throughput   │ total   │ 1292.6258 token/s │
╘══════════════════════════╧═════════╧═══════════════════╛
```

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: root <root@LAPTOP-VQKDDVMG.localdomain>
2026-01-12 15:56:07 +08:00
jiangyunfan1
1486e0d06c [TEST]Add vllm bench (#5306)
### What this PR does / why we need it?
This PR adds vllm bench common method, we need it to add some test cases
later
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
- vLLM version: release/v0.13.0
- vLLM main:
5fbfa8d9ef

---------

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-12-27 09:16:08 +08:00
Li Wang
7294f89e43 [CI] Add daily images build for nightly ci (#3989)
### What this PR does / why we need it?
Given the current excessively long build time of our nightly-ci, I
recommend installing necessary, confirmed versions of packages in the
Docker image to reduce the time required for integration testing.
Including Mooncake vllm with fixed tags, This is expected to reduce
nightly-ci duration by 2 hours.

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-13 20:10:12 +08:00
Li Wang
eb0a2ee2d0 [CI] Optimize nightly CI (#3898)
### What this PR does / why we need it?
This patch mainly fix the the problem of not being able to determine the
exit status of the pod's entrypoint script and some other tiny
optimizations:
1. Shorten wait for server timeout
2. fix typo
3. fix the issue of ais_bench failing to correctly access the proxy URL
in a PD separation scenario.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-30 23:42:20 +08:00
Li Wang
4a2ab13743 [CI] Optimize nightly CI (#3858)
### What this PR does / why we need it?
This patch optimize nightly CI:
1. Bug fixes ais_bench get None repo_type error
2. Fix A2 install kubectl error with arm arch
3. Fix the multi_node CI unable to determine whether the job was
successful error
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-29 22:30:19 +08:00
jiangyunfan1
e56b0017a3 [TEST]Add aisbench log and A2 cases (#3841)
### What this PR does / why we need it?
This PR adds 2 more A2 caces which we need to test daily. It also
enhances the logging for aisbench test failures to improve issues
identification
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-10-28 23:33:15 +08:00
Li Wang
90ae114569 [CI] Fix nightly CI (#3821)
### What this PR does / why we need it?
This patch fix the nightly CI runs
[failure](https://github.com/vllm-project/vllm-ascend/actions/runs/18848144365)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-28 20:40:03 +08:00
jiangyunfan1
9030106a14 [TEST]Add 2P1D multi node cases for nightly test (#3764)
### What this PR does / why we need it?
This PR adds the 2P1D multi node func/acc/perf test cases, we need test
them daily
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
2025-10-27 23:09:15 +08:00
wangyu
d301c56d1a [TEST]Add initial multi modal cases of Qwen2.5-VL-32B-Instruct for nightly test (#3707)
### What this PR does / why we need it?
This PR adds the initial multi modal model for nightly test, including 2
cases for Qwen2.5-vl-32b acc/perf test on A3, we need test them daily.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
2025-10-24 17:12:06 +08:00
jiangyunfan1
ec9ec78b53 [TEST]Add initial prefix cache case for nightly test (#3709)
### What this PR does / why we need it?
This PR adds the initial prefix cache case for nightly test for
Qwen3-32b-int8 on A3, we need test them daily.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-10-24 16:33:18 +08:00
jiangyunfan1
9434f24ded [TEST]Add initial multi modal cases for nightly test and deepseek-r1 tests (#3631)
### What this PR does / why we need it?
This PR adds the initial multi modal model for nightly test, including 3
cases for Qwen2.5-vl-7b acc/perf test on A3, we need test them daily. It
also inclues 8 cases for deepseek-r1-0528-w8a8 func, acc and perf tests
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-10-23 17:18:49 +08:00
jiangyunfan1
80b8df881f [TEST] Add Qwen3-32b-w8a8 acc/perf A2/A3 test (#3541)
### What this PR does / why we need it?
This PR Qwen3-32b-w8a8 acc/perf 8 cases on A2 and A3, we need test them
daily.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
by running the test


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: root <root@hostname-2pbfv.foreman.pxe>
Co-authored-by: wangli <wangli858794774@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-10-21 17:34:48 +08:00
jiangyunfan1
9e59fc1510 [TEST] Add initial aisbench support and Qwen3 32B acc/perf test (#3474)
### What this PR does / why we need it?
This PR adds the first aisbench case for nightly test, it lays a
foundation for following performance and accuracy tests in nightly test.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-10-20 09:33:17 +08:00