Commit Graph

16 Commits

Author SHA1 Message Date
zhangxinyuehfad
808d00406f [v0.18.0][CI]Add rank0 process count check for DeepSeek-R1-W8A8-HBM test (#8072)
### What this PR does / why we need it?
Adds a `check_rank0_process_count` validation step to the
DeepSeek-R1-W8A8-HBM nightly single-node test.

The check verifies that after the server starts, there is **exactly 1**
`vllm serve` process running on rank0. This guards against the
regression fixed in #8041 (extra NPU context leaking on device 0),
ensuring it does not silently reappear in future releases.

#### Changes

-
**`tests/e2e/nightly/single_node/models/scripts/test_single_node.py`**:
Add `run_check_rank0_process_count` async handler. It calls `npu-smi
info` for diagnostics, then uses `psutil` to assert exactly one `vllm
serve` process exists on rank0.
-
**`tests/e2e/nightly/single_node/models/configs/DeepSeek-R1-W8A8-HBM.yaml`**:
Register `check_rank0_process_count` in the `test_content` list for the
HBM test case.

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-04-15 17:16:27 +08:00
ZYang6263
34386c8896 [v0.18.0][CI] Fix and simplify the CI for Qwen3 32B (#8093)
### What this PR does / why we need it?
This PR fixes and simplifies the CI configuration for Qwen3 32B.

The main changes are:
- Remove the redundant `Qwen3-32B-Int8-A3-Feature-Stack3.yaml` config
and consolidate the CI setup into `Qwen3-32B-Int8.yaml`.
- Improve runtime stability by adding
`PYTORCH_NPU_ALLOC_CONF=expandable_segments:True` and setting
`--max-num-seqs 80`.
- Update the accuracy benchmark from `aime2024` to `gsm8k-lite`, and
adjust the related dataset config, output length, baseline, and
threshold accordingly.

These changes make the Qwen3 32B CI easier to maintain and more stable
in nightly validation.

---------

Signed-off-by: ZYang6263 <zy626375@gmail.com>
2026-04-10 14:22:24 +08:00
hucong
4a628f1042 [UT][v0.18.0] Fix APC nightly UT and TTFT ratio (cherry-pick #7468) (#8053)
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->
Cherry-pick from https://github.com/vllm-project/vllm-ascend/pull/7468

- Fix TTFT ratio threshold from 0.8 to 0.4 for prefix cache benchmarks
- Fix max_out_len values for warm_up and benchmark configs
- Applied to both DeepSeek-R1-0528-W8A8 and Qwen3-32B-Int8 configs

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: underfituu <hzhucong@163.com>
2026-04-08 21:08:26 +08:00
guxin108
81c6f51a45 【CI】add nightly cases: MiniMax-M2.5-W8A8 Qwen3.5-27B-w8a8 Qwen3.5-397B-A1… (#7968)
### What this PR does / why we need it?
This PR Qwen3.5-27B ;MiniMax-M2.5-w8a8 ;Qwen3.5-397B-w8a8-mtp acc/perf 3
cases on A3, we need test them daily.

- vLLM version: v0.18.0
- vLLM main:
35141a7eed

Signed-off-by: guxin108 <1252896542@qq.com>
2026-04-03 17:50:59 +08:00
jiangmengyu18
3f462d251e [v0.18.0][CI] fix acc baseline of qwen3vl 235b (#7981)
### What this PR does / why we need it?
fix acc baseline of qwen3vl 235b

---------
Signed-off-by: jiangmengyu18 <56633611+jiangmengyu18@users.noreply.github.com>
2026-04-03 17:38:17 +08:00
LeeWenquan
0d773efd70 [CI]Fix qwen3Next Nightly CI config (#7903)
### What this PR does / why we need it?
Fix qwen3Next Nightly CI config in 0.18.0.
backport: #7679

Signed-off-by: Your Name <you@example.com>
Co-authored-by: Your Name <you@example.com>
2026-04-03 16:46:25 +08:00
jiangmengyu18
902d1312d9 [v0.18.0][CI] add nightly ci test for qwen3vl (#7913)
### What this PR does / why we need it?
Add nightly ci test for qwen3vl
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?

Signed-off-by: betta18 <jiangmengyu1@huawei.com>
Co-authored-by: betta18 <jiangmengyu1@huawei.com>
2026-04-03 11:39:28 +08:00
Nagisa125
2cb9195ff0 [Releases/v0.18.0][CI] Updated the parameters for the single-node test to fix the OOM issue for DeepSeek-V3.2 (#7862)
### What this PR does / why we need it?
Fix the OOM (Out-of-Memory) error in the single-node-deepseek-v3-2-w8a8
nightly test of vllm-ascend:

- Reduced the value of HCCL_BUFFSIZE

- Lowered the gpu-memory-utilization

Optimize service-side performance:
Updated service-oriented configuration parameters (e.g., max-num-seqs,
cudagraph_capture_sizes, batch_size) to improve the inference
performance,so that the performance is closer to the optimal performance
of the current mainline.
Align performance baseline with main branch:
Updated the performance baseline according to the latest performance
data

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
The test has passed.

https://github.com/vllm-project/vllm-ascend/actions/runs/23734079080/job/69134387320?pr=7793

---------

Signed-off-by: wyh145 <1987244901@qq.com>
2026-04-01 10:28:46 +08:00
LeeWenquan
9615bc33fd Fix Qwen3Next CI Config (#7561)
### What this PR does / why we need it?
This pr modifies qwen3Next nightly CI config. 
(1) Add a nightly CI .
(2) Set a more precise accuracy standard

- vLLM version: v0.18.0
- vLLM main:
6a9cceb219

Signed-off-by: Your Name <you@example.com>
Co-authored-by: Your Name <you@example.com>
2026-03-24 17:08:17 +08:00
liuhy1213-cell
fb283b5820 [CI] Add nightly CI test cases for the GLM-5 (#7429)
### What this PR does / why we need it?
Add nightly CI test cases for the GLM-5
Add model download for the GLM-5

https://github.com/vllm-project/vllm-ascend/actions/runs/23286178651/job/67710409642#logs
- vLLM version: v0.17.0
- vLLM main:
b31e9326a7
---------
Signed-off-by: liuhaiyang27 <liuhaiyang27@huawei.com>
Signed-off-by: liuhy1213-cell <liuhy1213@gmail.com>
Co-authored-by: liuhaiyang27 <liuhaiyang27@huawei.com>
2026-03-23 19:14:19 +08:00
aipaes
87d6424b2e [CI] Add nightly CI test cases for the GLM-4.7 model. (#7391)
### What this PR does / why we need it?
Add acc nightly CI test cases for the GLM-4.7 model.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
through CI

- vLLM version: v0.17.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: zjks98 <zhangjiakang4@huawei.com>
Co-authored-by: zjks98 <zhangjiakang4@huawei.com>
2026-03-19 16:43:29 +08:00
LoganJane
270c5cb8cd [CI] Add nightly CI test cases for the Kimi-K2.5 (#7416)
### What this PR does / why we need it?
Add nightly CI test cases for the Kimi-K2.5.

- vLLM version: v0.17.0
- vLLM main:
4497431df6

---------

Signed-off-by: LoganJane <loganJane73@hotmail.com>
Signed-off-by: LoganJane <42287016+LoganJane@users.noreply.github.com>
2026-03-19 11:02:29 +08:00
SparrowMu
fb8e22ec00 [DOC] MiniMax-M2.5 model intro (#7296)
### What this PR does / why we need it?
1. Add nightly test on MiniMax-M2.5 with deployment method on A3
2. Add MiniMax-M2.5 deployment introduction to vllm-ascend docs

- vLLM version: v0.17.0
- vLLM main:
4034c3d32e
---------
Signed-off-by: limuyuan <limuyuan3@huawei.com>
Signed-off-by: SparrowMu <52023119+SparrowMu@users.noreply.github.com>
Co-authored-by: limuyuan <limuyuan3@huawei.com>
2026-03-18 20:14:36 +08:00
liuhy1213-cell
58725b8b24 [doc] add Prefill-Decode Disaggregation doc for GLM5.md (#7300)
### What this PR does / why we need it?
add Prefill-Decode Disaggregation doc for GLM5.md
w8a8  65k-1.5k 
Concurrency: 80 
prefixcache: 90%
tps: 2054

- vLLM version: v0.17.0

- vLLM main:
4034c3d32e
---------
Signed-off-by: liuhaiyang27 <liuhaiyang27@huawei.com>
Co-authored-by: liuhaiyang27 <liuhaiyang27@huawei.com>
2026-03-18 17:00:31 +08:00
LeeWenquan
65eae6de7b Add Ascend Ops recurrent_gated_delta_rule (#6725)
### What this PR does / why we need it?
Change recurrent_gated_delta_rule ops from triton to ascend C version
for better performance.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: SunnyLee219 <3294305115@qq.com>
2026-03-09 14:14:14 +08:00
SILONG ZENG
859f2c25b9 [Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml (#6503)
### What this PR does / why we need it?
This PR refactors the nightly single-node model test by migrating test
configurations from Python scripts to a more maintainable `YAML-based`
format.

| Original PR | Python (`.py`) | YAML (`.yaml`) |
| :--- | :--- | :--- |
| [#3568](https://github.com/vllm-project/vllm-ascend/pull/3568) |
`test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#3631](https://github.com/vllm-project/vllm-ascend/pull/3631) |
`test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#5874](https://github.com/vllm-project/vllm-ascend/pull/5874) |
`test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` |
| [#3908](https://github.com/vllm-project/vllm-ascend/pull/3908) |
`test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` |
| [#5682](https://github.com/vllm-project/vllm-ascend/pull/5682) |
`test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` |
| [#4111](https://github.com/vllm-project/vllm-ascend/pull/4111) |
`test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml`
|
| [#3733](https://github.com/vllm-project/vllm-ascend/pull/3733) |
`test_prefix_cache_deepseek_r1_0528_w8a8.py` |
`Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` |
| [#6543](https://github.com/vllm-project/vllm-ascend/pull/6543) |
`test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#6543](https://github.com/vllm-project/vllm-ascend/pull/6543) |
`test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#3973](https://github.com/vllm-project/vllm-ascend/pull/3973) |
`test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` |
| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` |
| [#3757](https://github.com/vllm-project/vllm-ascend/pull/3757) |
`test_qwq_32b.py` | `QwQ-32B.yaml` |
| [#5616](https://github.com/vllm-project/vllm-ascend/pull/5616) |
`test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` |
| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) |
`test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` |
| [#5301](https://github.com/vllm-project/vllm-ascend/pull/5301) |
`test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` |
| [#3707](https://github.com/vllm-project/vllm-ascend/pull/3707) |
`test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` |
| [#3676](https://github.com/vllm-project/vllm-ascend/pull/3676) |
`test_qwen3_32b_int8_a3_feature_stack3.py` |
`Qwen3-32B-Int8-A3-Feature-Stack3.yaml` |
| [#3709](https://github.com/vllm-project/vllm-ascend/pull/3709) |
`test_prefix_cache_qwen3_32b_int8.py` |
`Prefix-Cache-Qwen3-32B-Int8.yaml` |
| [#5395](https://github.com/vllm-project/vllm-ascend/pull/5395) |
`test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` |
| [#3474](https://github.com/vllm-project/vllm-ascend/pull/3474) |
`test_qwen3_32b.py` | `Qwen3-32B.yaml` |
| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-03-03 20:13:43 +08:00