Commit Graph

26 Commits

Author SHA1 Message Date
zhangxinyuehfad
819a4459ce Drop vLLM 0.13.0 support (#6069)
### What this PR does / why we need it?
Drop vLLM 0.13.0 support, upgrade to 0.14.0

- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-01-23 09:45:08 +08:00
wangxiyuan
69740039b7 [CI] Upgrade CANN to 8.5.0 (#6070)
### What this PR does / why we need it?
1. Upgrade CANN to 8.5.0
2. move triton-ascend 3.2.0 to requirements

note: we skipped the two failed e2e test, see
https://github.com/vllm-project/vllm-ascend/issues/6076 for more detail.
We'll fix it soon.


### How was this patch tested?
Closes: https://github.com/vllm-project/vllm-ascend/issues/5494

- vLLM version: v0.13.0
- vLLM main:
d68209402d

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-01-22 09:29:50 +08:00
Li Wang
75c92a3640 [CI] Move nightly-a2 test to hk (#5807)
### What this PR does / why we need it?
This patch initial testing involved connecting two nodes from the HK
region to nightly A2.

- vLLM version: v0.13.0
- vLLM main:
2f4e6548ef

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-01-12 22:58:35 +08:00
dependabot[bot]
ad40494b84 Bump actions/upload-artifact from 4 to 6 (#5466)
Bumps
[actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 6.

- vLLM version: v0.13.0
- vLLM main:
5326c89803

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-04 08:53:52 +08:00
Li Wang
1d81bfaed1 Fix nightly (#5413)
### What this PR does / why we need it?
This pacth mainly do the following things:
1. Bugfix for multi_node_tests log, log names must be unique when
uploading logs.
2. Optimize `get_cluster_ips` logic, increase the max retry times for
robustness
3. Abandoned the existing gh-proxy temporarily until it is stable
enough.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
81786c8774

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-27 18:16:46 +08:00
Li Wang
c2f776b846 [Nightly] Initial logging for nightly multi-node testing (#5362)
### What this PR does / why we need it?
Currently, our multi-node logs only show the master node's logs (via the
Kubernetes API), which is insufficient for effective problem
localization if other nodes experience issues. Therefore, this pull
request adds the ability to upload logs for other nodes.

Next plan: Output structured directory logs, including logs from each
node and the polog.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
bc0a5a0c08

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-26 11:39:07 +08:00
wangxiyuan
758d81dcb1 Drop 0.12.0 support (#5146)
We decided to release v0.13.0 soon. So no need to support 0.12.0 now.
Let's drop it.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-20 09:38:53 +08:00
dependabot[bot]
5f840696c1 Bump actions/checkout from 4 to 6 (#5015)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-16 11:30:41 +08:00
dependabot[bot]
3c3c9a5386 Bump actions/checkout from 6.0.0 to 6.0.1 (#4772)
Bumps [actions/checkout](https://github.com/actions/checkout) from 6.0.0 to 6.0.1.

- vLLM version: v0.12.0
- vLLM main:
ad32e3e19c

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-08 19:15:40 +08:00
Li Wang
283bc5c7ba [Nightly] Optimize nightly CI (#4509)
### What this PR does / why we need it?
1. Optimize multi-node waiting logic
2. Remove the `tee` pipeline for logs, which will lead to hang issue

### How was this patch tested?


- vLLM version: v0.12.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-04 22:31:07 +08:00
wangxiyuan
3f4c0ea0a0 upgrade vLLM to 0.12.0 tag (#4647)
Upgrade vLLM to v0.12.0 tag

- vLLM version: 86e178f7c4d8c3b0eaf3c8e3f810a83f63b90e24
- vLLM main:
86e178f7c4

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-12-03 23:43:05 +08:00
wangxiyuan
7f2673ea2d upgrade vLLM to main (#4608)
1. fix https://github.com/vllm-project/vllm/pull/28542
The model structure modifications we involved in are:
     - Qwen2.5-VL(still exist some patch)
     - Qwen2-VL
     - Qwen2
     - DeepSeek series
     - Qwen-moe series
2. fix https://github.com/vllm-project/vllm/pull/29121
   the output token now  type changed from np to `list[list[int]]`

3. fix https://github.com/vllm-project/vllm/pull/29262
    `xformers` backend for multimodal now has been deprecated
4. fix https://github.com/vllm-project/vllm/pull/29342

5. fix https://github.com/vllm-project/vllm/pull/28579
6. fix https://github.com/vllm-project/vllm/pull/28718
7. fix https://github.com/vllm-project/vllm/issues/28665
8. fix https://github.com/vllm-project/vllm/pull/26847
vllm introduced the `optimization-level`, some default config has been
changed, and the param `--enforce-eager` has been deprecated
9. fix http://github.com/vllm-project/vllm/pull/29223 it retuns tuple
for sampler.
10. fix https://github.com/vllm-project/vllm/pull/29471 we'll remove the
related patch to avoid this kind of error.

Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangli <wangli858794774@gmail.com>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2025-12-02 22:10:52 +08:00
dependabot[bot]
e18e3067a7 Bump actions/checkout from 4.3.1 to 6.0.0 (#4592)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4.3.1 to 6.0.0.

- vLLM version: v0.11.2

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-02 11:59:25 +08:00
SILONG ZENG
ab37a7d5ae [main]Upgrade cann to 8.3rc2 (#4350)
### What this PR does / why we need it?
Upgrade cann to 8.3rc2

### Does this PR introduce _any_ user-facing change?
Yes, docker image will use 8.3.RC2


- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
2025-11-28 14:06:01 +08:00
wangxiyuan
bc69d7cfe1 upgrade to vllm 0.11.2 (#4400)
Bump vLLM version to v0.11.2

What's broken and changed by vLLM:
1. structured_output is broken by
https://github.com/vllm-project/vllm/pull/26866
2. get_mrope_input_positions is broken by
https://github.com/vllm-project/vllm/pull/28399
3. graph mode is broken by
https://github.com/vllm-project/vllm/pull/25110 we'll upgrade torch to
2.8 to fix the problem later
4. embedding is broken by
https://github.com/vllm-project/vllm/pull/27583
5. `get_attn_backend_cls` and attention backend is broken are broken by
https://github.com/vllm-project/vllm/pull/28534
6. spec decode is broken by
https://github.com/vllm-project/vllm/pull/28771
7. sp feature is broken by
https://github.com/vllm-project/vllm/pull/27126
8. mtp is broken by https://github.com/vllm-project/vllm/pull/27922
9. lora is broken by https://github.com/vllm-project/vllm/pull/21068
10. execute_model is broken by
https://github.com/vllm-project/vllm/pull/26866
11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by
https://github.com/vllm-project/vllm/pull/28159
12. kv cahe is broken by https://github.com/vllm-project/vllm/pull/27753
13. dp is broken by https://github.com/vllm-project/vllm/pull/25110

 
What's broken and changed by ourself:
1. qwen vl is broken by https://github.com/vllm-project/vllm/pull/28455
We'll remove model files in the future to avoid this kind of error
2. Engine core is broken by
https://github.com/vllm-project/vllm/pull/23691 We'll remove the patch
file in the future.
3. Ascend scheduler is broken by
https://github.com/vllm-project/vllm/pull/28733 We'll remove ascend
scheudler later.
4. qwen3-next is broken by
https://github.com/vllm-project/vllm/pull/28083 We'll remove model files
in the future to avoid this kind of error
5. qwen vl is broken by https://github.com/vllm-project/vllm/pull/27764.
We'll remove model files in the future

Known issue:
1. ray doesn't work 
2. the accuracy of qwen3-next is not correct
3. qwen3-vl is broken
4. prefix cache+ ascend scheduler + deepseek v2 lite is broken.

Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: 22dimensions <waitingwind@foxmail.com>
Co-authored-by: shen-shanshan <467638484@qq.com>


- vLLM version: v0.11.2

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: leo-pony <nengjunma@outlook.com>
2025-11-26 11:48:58 +08:00
dependabot[bot]
84eae97f27 Bump actions/checkout from 4 to 6 (#4380)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.

- vLLM main:
2918c1b49c

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-25 09:05:11 +08:00
wangxiyuan
a1f142b7ad Drop 0.11.0 support (#4377)
There is a lot hack code for v0.11.0, which makes the code hard to
upgrade to newer vLLM version. Since v0.11.0 will release soon. Let's
drop v0.11.0 support first. Then we'll upgrade to v0.11.2 soon.


- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-24 17:08:20 +08:00
Li Wang
b5f7a83927 [Doc] Upgrade multi-node doc (#4365)
### What this PR does / why we need it?
When we are using `Ascend scheduler`, the param `max_num_batched_tokens`
should be larger than `max_model_len`, otherwise, will encountered the
follow error:
```shell
Value error, Ascend scheduler is enabled without chunked prefill feature. Argument max_num_batched_tokens (4096) is smaller than max_model_len (32768). This effectively limits the maximum sequence length to max_num_batched_tokens and makes vLLM reject longer sequences. Please increase max_num_batched_tokens or decrease max_model_len. [type=value_error, input_value=ArgsKwargs((), {'model_co...g': {'enabled': True}}}), input_type=ArgsKwargs]
```

### Does this PR introduce _any_ user-facing change?
Users/Developers who running the model according to the
[tutorial](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/multi_node.html),
the parameters can be specified correctly.

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-24 10:57:50 +08:00
Li Wang
b34f195cc8 [CI] Fix nightly CI for A2 series (#3825)
### What this PR does / why we need it?
For multi-node CI system, we need to ensure that cluster resources meet
the expected specifications before conducting multi-node
interoperability tests. Otherwise, unexpected errors may occur (for
example, we might mistakenly assume all nodes are ready and perform a
global cluster IP acquisition, which would cause an exception to be
thrown in Python because some nodes might not actually be ready at that
point). Therefore, we need to wait at the workflow level until all
resources meet the expected specifications.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
2918c1b49c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-23 23:05:33 +08:00
wangxiyuan
cc2cd42ad3 Upgrade CANN to 8.3.rc1 (#3945)
### What this PR does / why we need it?
This PR upgrade CANN from 8.2rc1 to 8.3rc1 and remove the CANN version
check logic.

TODO: we notice that UT runs failed with CANN 8.3 image. So the base
image for UT is still 8.2. We'll fix it later.


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-03 20:21:07 +08:00
Li Wang
eb0a2ee2d0 [CI] Optimize nightly CI (#3898)
### What this PR does / why we need it?
This patch mainly fix the the problem of not being able to determine the
exit status of the pod's entrypoint script and some other tiny
optimizations:
1. Shorten wait for server timeout
2. fix typo
3. fix the issue of ais_bench failing to correctly access the proxy URL
in a PD separation scenario.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-30 23:42:20 +08:00
Li Wang
4a2ab13743 [CI] Optimize nightly CI (#3858)
### What this PR does / why we need it?
This patch optimize nightly CI:
1. Bug fixes ais_bench get None repo_type error
2. Fix A2 install kubectl error with arm arch
3. Fix the multi_node CI unable to determine whether the job was
successful error
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-29 22:30:19 +08:00
Li Wang
90ae114569 [CI] Fix nightly CI (#3821)
### What this PR does / why we need it?
This patch fix the nightly CI runs
[failure](https://github.com/vllm-project/vllm-ascend/actions/runs/18848144365)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-28 20:40:03 +08:00
Li Wang
f846bd20e4 [CI] Add multi-node test case for a2 (#3805)
### What this PR does / why we need it?
This patch add multi-node test case for a2
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-27 23:10:17 +08:00
Li Wang
60ee4af6d0 [CI] Add custom op to nightly (#3765)
### What this PR does / why we need it?
1. Add custom op to nightly tests, fix
https://github.com/vllm-project/vllm-ascend/pull/3665
2. Correctly pass github secrets when using workflow_call, see
https://docs.github.com/en/actions/how-tos/reuse-automations/reuse-workflows
3. Fix the single node mutual cancellation issue

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-27 14:07:03 +08:00
Li Wang
7f73c28a24 [CI][Doc] Optimize multi-node CI (#3565)
### What this PR does / why we need it?
This pull request mainly do the following things:
1. Add a doc for multi-node CI, The main content is the mechanism
principle and how to contribute
2. Simplify the config yaml for more developer-friendly
3. Optimized the mooncake installation script to prevent accidental
failures during installation
4. Fix the workflow to ensure the kubernetes can be apply correctly
5. Add Qwen3-235B-W8A8 disaggregated_prefill test
6. Add GLM-4.5 multi dp test
7. Add 2p1d 4nodes disaggregated_prefill test
8. Refactor nightly tests
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-25 09:23:47 +08:00