82 Commits

Author SHA1 Message Date
zhangyiming
c95c271538 [E2E] Optimize nightly testcase. (#4886)
### What this PR does / why we need it?
Optimize nightly testcase.
Changes:
- tests/e2e/nightly/multi_node/config/models/Qwen3-235B-A3B.yaml: Add
accuracy and performance benchmark
- tests/e2e/models/configs/Qwen3-8B-Base.yaml: Delete
- tests/e2e/models/configs/internlm-7b.yaml: Change to
internlm3-8b-instruct
- tests/e2e/nightly/models/test_deepseek_r1_w8a8_eplb.py: Change to
DeepSeek-R1-0528-W8A8 model

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: menogrey <1299267905@qq.com>
2025-12-11 10:15:39 +08:00
Li Wang
89733111fa [Nightly] Optimize nightly online test logger info (#4798)
### What this PR does / why we need it?
This patch do some tiny optimization for nightly ci:

1. Polling the frequency with which the service prints logs when it
starts up in order to obtain useful information more quickly.
2. Shorten the timeout for waiting server

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-10 09:24:19 +08:00
wangxiyuan
835b4c8f1d Drop torchair (#4814)
aclgraph is stable and fast now. Let's drop torchair graph mode now.

TODO: some logic to adapt torchair should be cleaned up as well. We'll
do it in the following PR.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-12-10 09:20:40 +08:00
Nengjun Ma
863a5a5a17 Add gsm8k accuracy test for multi-note Qwen3-235B-A22B (#4802)
### What this PR does / why we need it?
As there is not accuracy test for qwen3-235B-A22B model

Test result:
dataset    version    metric    mode      vllm-api-general-chat
---------  ---------  --------  ------  -----------------------
gsm8k      7cd45e     accuracy  gen                       96.29

Times long for test case running: 30mintues

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-12-09 23:05:41 +08:00
wangxiaoteng888
a77045f355 [P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780)
### What this PR does / why we need it?
As support for the mooncake connector is now available, the llmdatadist
connector is no longer being maintained, so the llmdatadist-related
files need to be retired.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By ci

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

---------

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
2025-12-09 22:36:43 +08:00
wangxiyuan
0b65ac6c4b remove useless patch (#4699)
patach_config is useless now. Let's remove it


- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
2025-12-08 11:02:42 +08:00
Li Wang
cd8e5be7c7 [Bugfix] Quick hot fix for nightly CI (#4727)
Quick fix for multi-node tests

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-04 23:51:16 +08:00
Li Wang
283bc5c7ba [Nightly] Optimize nightly CI (#4509)
### What this PR does / why we need it?
1. Optimize multi-node waiting logic
2. Remove the `tee` pipeline for logs, which will lead to hang issue

### How was this patch tested?


- vLLM version: v0.12.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-04 22:31:07 +08:00
zhangxinyuehfad
8813832387 [Test] Add GLM-4.5 nightly test (#4225)
### What this PR does / why we need it?
Add GLM-4.5 nightly test

- vLLM version: v0.11.2

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-12-01 22:31:56 +08:00
wangxiyuan
27b09ca9b9 [CI] drop ascend scheduler test (#4582)
let' drop ascend scheduler test first to ensure all function works
without it.


- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-01 20:33:50 +08:00
Mengqing Cao
517fd9272d Revert "drop ascend scheduler" (#4580)
Reverts vllm-project/vllm-ascend#4498
- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2
2025-11-29 22:20:48 +08:00
wangxiyuan
f10acddb78 drop ascend scheduler (#4498)
Ascend scheduler was added for non chunk prefill case before, since that
the npu ops didn't work well with chunked prefill.

Now the ops with chunked prefill work better, it's time to remove the
ascend scheduler to use vLLM default scheduler.

- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-29 16:18:34 +08:00
Li Wang
b220de33e8 [CI][Nightly] Support local debugging for multi-node CI test cases (#4489)
### What this PR does / why we need it?
 This patch mainly doing the following things:
1. Make k8s/lws optional for multi-node testing, allowing developers to
run multi-node tests locally by actively passing in the IP addresses of
all nodes.
2. Allows passing a custom proxy script path in the config file to load
the proxy.

- vLLM version: v0.11.2

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-27 17:20:29 +08:00
Li Wang
91b6ba8ffe [CI] Fix kubernetes failed to resolve ip by dns name (#4240)
### What this PR does / why we need it?
While in the scenario where the pod has been started, but the
corresponding DNS service is not yet ready. If we immediately resolve
the DNS domain name at this time, an error will occur. see
https://github.com/vllm-project/vllm-ascend/actions/runs/19436639688/job/55609108796

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-19 14:38:13 +08:00
zhangxinyuehfad
67f2b3a031 [Test] Add deepseek v3.2 exp nightly test (#4191)
### What this PR does / why we need it?

- skip the nightly image build when the github event is pull_request
- set imagepullpolicy as alway for multi_node test
- move multi_node tests ahead to have some resource clean first
- do not relevant nightly image build with nightly tests for tolerance

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
2025-11-14 15:46:10 +08:00
欧派果奶我还要
f90ed95578 [CI] Add multi-nodes EPLB configs of DeepSeek-R1-W8A8 & Qwen3-235B-W8A8 (#4144)
### What this PR does / why we need it?
add DeepSeek-R1-W8A8 and Qwen3-235B-W8A8 configs in multi-nodes and EPLB
scenario

### Does this PR introduce _any_ user-facing change?
no

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
Co-authored-by: 白永斌 <baiyongbin3@h-partners.com>
2025-11-14 08:50:29 +08:00
Li Wang
7294f89e43 [CI] Add daily images build for nightly ci (#3989)
### What this PR does / why we need it?
Given the current excessively long build time of our nightly-ci, I
recommend installing necessary, confirmed versions of packages in the
Docker image to reduce the time required for integration testing.
Including Mooncake vllm with fixed tags, This is expected to reduce
nightly-ci duration by 2 hours.

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-13 20:10:12 +08:00
Li Wang
3ca11d5a7c [CI] Fix nightly-ci (#4159)
### What this PR does / why we need it?
Explicit specification `NUMEXPR_MAX_THREADS` to avoid `Error. nthreads
cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)`

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-12 22:06:49 +08:00
zhangxinyuehfad
b77b4f1abf [Test] Add nightly test for DeepSeek-V3.2-Exp (#3908)
### What this PR does / why we need it?
Add nightly test for DeepSeek-V3.2-Exp


### How was this patch tested?
test action:

https://github.com/vllm-project/vllm-ascend/actions/runs/19156153634/job/54757008557?pr=3908


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-11-11 10:29:57 +08:00
Li Wang
259eb25f88 [CI] Quick fix mooncake for nightly-ci (#4028)
### What this PR does / why we need it?
Since we have upgraded to CANN 8.3rc1, we will no longer use the
privately maintained Mooncake repository, but instead use the official
release released by Mooncake:
https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2 .

Next step: this is only a temporary solution. We will integrate mooncake
into the vllm-ascend base image later for easier use. see
https://github.com/vllm-project/vllm-ascend/pull/3989
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-06 18:46:00 +08:00
wangxiyuan
cc2cd42ad3 Upgrade CANN to 8.3.rc1 (#3945)
### What this PR does / why we need it?
This PR upgrade CANN from 8.2rc1 to 8.3rc1 and remove the CANN version
check logic.

TODO: we notice that UT runs failed with CANN 8.3 image. So the base
image for UT is still 8.2. We'll fix it later.


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-03 20:21:07 +08:00
Li Wang
8f222f21f1 [CI][Nightly] Fix mooncake build (#3958)
### What this PR does / why we need it?
Fix https://github.com/vllm-project/vllm-ascend/pull/3943

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-03 20:07:47 +08:00
Li Wang
d0cc9c1203 [CI][Nightly] Correct the commit hash available for mooncake (#3943)
### What this PR does / why we need it?
Because the previous commit hash was accidentally deleted or
overwritten. This patch correct the commit hash available for
https://github.com/AscendTransport/Mooncake to make nightly ci happy
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-01 21:52:16 +08:00
Li Wang
eb0a2ee2d0 [CI] Optimize nightly CI (#3898)
### What this PR does / why we need it?
This patch mainly fix the the problem of not being able to determine the
exit status of the pod's entrypoint script and some other tiny
optimizations:
1. Shorten wait for server timeout
2. fix typo
3. fix the issue of ais_bench failing to correctly access the proxy URL
in a PD separation scenario.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-30 23:42:20 +08:00
Li Wang
4a2ab13743 [CI] Optimize nightly CI (#3858)
### What this PR does / why we need it?
This patch optimize nightly CI:
1. Bug fixes ais_bench get None repo_type error
2. Fix A2 install kubectl error with arm arch
3. Fix the multi_node CI unable to determine whether the job was
successful error
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-29 22:30:19 +08:00
jiangyunfan1
e56b0017a3 [TEST]Add aisbench log and A2 cases (#3841)
### What this PR does / why we need it?
This PR adds 2 more A2 caces which we need to test daily. It also
enhances the logging for aisbench test failures to improve issues
identification
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-10-28 23:33:15 +08:00
Li Wang
90ae114569 [CI] Fix nightly CI (#3821)
### What this PR does / why we need it?
This patch fix the nightly CI runs
[failure](https://github.com/vllm-project/vllm-ascend/actions/runs/18848144365)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-28 20:40:03 +08:00
Li Wang
f846bd20e4 [CI] Add multi-node test case for a2 (#3805)
### What this PR does / why we need it?
This patch add multi-node test case for a2
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-27 23:10:17 +08:00
jiangyunfan1
9030106a14 [TEST]Add 2P1D multi node cases for nightly test (#3764)
### What this PR does / why we need it?
This PR adds the 2P1D multi node func/acc/perf test cases, we need test
them daily
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
2025-10-27 23:09:15 +08:00
Li Wang
7f73c28a24 [CI][Doc] Optimize multi-node CI (#3565)
### What this PR does / why we need it?
This pull request mainly do the following things:
1. Add a doc for multi-node CI, The main content is the mechanism
principle and how to contribute
2. Simplify the config yaml for more developer-friendly
3. Optimized the mooncake installation script to prevent accidental
failures during installation
4. Fix the workflow to ensure the kubernetes can be apply correctly
5. Add Qwen3-235B-W8A8 disaggregated_prefill test
6. Add GLM-4.5 multi dp test
7. Add 2p1d 4nodes disaggregated_prefill test
8. Refactor nightly tests
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-25 09:23:47 +08:00
Li Wang
286ae9003d [CI] Multi-Node CI scalable (#3611)
### What this PR does / why we need it?
This PR adds a jinja template for the k8s configuration file, prepare
for the upcoming 4-node CI
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-22 14:18:43 +08:00
Li Wang
4c4a8458a5 [CI] Refator multi-node CI (#3487)
### What this PR does / why we need it?
Refactor the multi-machine CI use case. The purpose of this PR is to
increase the ease of adding multi-machine CI use cases, allowing
developers to add multi-machine cluster model testing use cases
(including PD separation) by simply adding a new YAML configuration
file.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-17 09:04:31 +08:00