608 Commits

Author SHA1 Message Date
Li Wang
1f71da80eb [CI] Fix server start failure when long weight loading (#7098)
### What this PR does / why we need it?
When loading large models (e.g., 163 shards), weight loading can exceed
the default 600s timeout. Engine startup timeout with the error:
```shell
TimeoutError: Timed out waiting for engines to send initial message on input socket.
```
We should increase the `VLLM_ENGINE_READY_TIMEOUT_S ` to avoid it
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-13 08:52:56 +08:00
Li Wang
7fe0469e27 [CI][Misc] Use offline mode for model downloads (#7179)
### What this PR does / why we need it?
1. For all parts of the current test module involving the millisecond
download model, add the `local_file_only` parameter to specify offline
mode; this ensures that CI will not fail due to network instability.
2. Install modelscope from a fixed commit until it next release
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
 check if the env or arg `local_files_only` works
1) set the env:
```shell
export HF_HUB_OFFLINE=1
```
2) run the script
```python
from transformers import PretrainedConfig
import huggingface_hub
from modelscope.utils.hf_util import patch_hub

patch_hub()

model="Qwen/Qwen3-0.6B"
kwargs = {}


config_dict, _ = PretrainedConfig.get_config_dict(
    model,
    trust_remote_code=True,
    local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE,
    **kwargs,
)

print(config_dict)
```
it works well:
```shell
2026-03-06 06:40:12,546 - modelscope - WARNING - We can not confirm the cached file is for revision: master
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
{'architectures': ['Qwen3ForCausalLM'], 'attention_bias': False, 'attention_dropout': 0.0, 'bos_token_id': 151643, 'eos_token_id': 151645, 'head_dim': 128, 'hidden_act': 'silu', 'hidden_size': 1024, 'initializer_range': 0.02, 'intermediate_size': 3072, 'max_position_embeddings': 40960, 'max_window_layers': 28, 'model_type': 'qwen3', 'num_attention_heads': 16, 'num_hidden_layers': 28, 'num_key_value_heads': 8, 'rms_norm_eps': 1e-06, 'rope_scaling': None, 'rope_theta': 1000000, 'sliding_window': None, 'tie_word_embeddings': True, 'torch_dtype': 'bfloat16', 'transformers_version': '4.51.0', 'use_cache': True, 'use_sliding_window': False, 'vocab_size': 151936, '_commit_hash': None}
```
3) test the model repo does not cached locally when the env
`HF_HUB_OFFLINE`==True
```python
from transformers import PretrainedConfig
import huggingface_hub
from modelscope.utils.hf_util import patch_hub

patch_hub()


model="FireRedTeam/FireRed-OCR"
kwargs = {}


config_dict, _ = PretrainedConfig.get_config_dict(
    model,
    trust_remote_code=True,
    local_files_only=huggingface_hub.constants.HF_HUB_OFFLINE,
    **kwargs,
)

print(config_dict)
```
and the result is as expected:
```shell
  File "/workspace/demo.py", line 12, in <module>
    config_dict, _ = PretrainedConfig.get_config_dict(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 189, in patch_get_config_dict
    model_dir = get_model_dir(pretrained_model_name_or_path,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 164, in get_model_dir
    model_dir = snapshot_download(
                ^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 137, in snapshot_download
    return _snapshot_download(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3.11.14/lib/python3.11/site-packages/modelscope/hub/snapshot_download.py", line 283, in _snapshot_download
    raise ValueError(
ValueError: Cannot find the requested files in the cached path and outgoing traffic has been disabled. To enable look-ups and downloads online, set 'local_files_only' to False
```
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-13 08:52:24 +08:00
drizzlezyk
5fe7942bbd [CI] add action for issue labeler on issue open/edit (#7208)
### What this PR does / why we need it?
New Workflow File bot_issue_manage.yaml

Automatically runs when issues are opened or edited
Uses the official GitHub Issue Labeler action to categorize issues
Label Configuration issue-labeler.yml

Defines regex patterns for model-specific labels (310p, GLM5, Qwen 3.5,
DeepSeek, Kimi K2, Kimi K2.5)
Enables automatic issue classification based on title/content matching

### Does this PR introduce _any_ user-facing change?
No. This PR only introduces internal GitHub Actions workflow and
configuration changes. There are no API, interface, or behavior changes
visible to end users. It purely improves the issue management process on
GitHub.

### How was this patch tested?
- GitHub Actions workflow syntax is valid and follows the official
GitHub documentation
- The issue labeler action (github/issue-labeler@v3.4) is a
well-maintained official GitHub action
- Configuration file follows the expected YAML format for the
issue-labeler action
- Regex patterns for model names have been verified for correct syntax

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: drizzlezyk <drizzlezyk@163.com>
2026-03-12 20:16:17 +08:00
Li Wang
88c56e3bf2 [Misc] Fix main lint to make CI happy (#7204)
### What this PR does / why we need it?
Fix lint failed due to the merging of a previous PR.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-12 18:27:48 +08:00
Li Wang
d866e6b238 [Bugfix] Fixed permission issues with the automatic PR submission workflow (#7142)
### What this PR does / why we need it?
Auto submit a pull request via
https://github.com/vllm-ascend-ci/vllm-ascend, the workflow looks like:
1. get a new config.yaml via run e2e tests
2. push the changed `config.yaml` to a new branch of
https://github.com/vllm-ascend-ci/vllm-ascend
3. submit a pull request to vllm-ascend via gh cli
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-12 17:18:59 +08:00
tfhddd
21fea86b08 feat: [CI] Introduce uv to accelerate pip install (#7127)
### What this PR does / why we need it?
Integrates uv: Significantly accelerates pip install execution and
resolves concurrency issues caused by traditional pip caching
mechanisms.

Why pip install uc-manager is explicitly added:
This project depends on uc-manager. However, installing it via uv pip
install uc-manager currently fails due to a known issue. An issue has
already been filed with the upstream uv repository to address this.
Consequently, we explicitly invoke pip install uc-manager as a temporary
workaround to ensure the build succeeds.
https://github.com/ModelEngine-Group/unified-cache-management/issues/736

Why use UV_SYSTEM_PYTHON: 1:
No virtual environment has been created yet; this configuration has the
same effect as directly using `pip install`.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: tfhddd <2272751277@qq.com>
2026-03-12 16:47:23 +08:00
yupeng
830f39dd70 [Bugfix][LoRA] Fix the issue when enable LoRA + tp + fully_sharded_loras (#6650)
### What this PR does / why we need it?
Fix the issue #6143 .

### Does this PR introduce _any_ user-facing change?
Allow to start the server with "--enable-lora && --fully-sharded-loras
&& --tensor_parallel_size 2".

### How was this patch tested?
pytest -sv tests/e2e/multicard/2-cards/test_llama32_lora_tp2.py
- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd

---------

Signed-off-by: paulyu12 <507435917@qq.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-03-11 15:43:15 +08:00
Mengqing Cao
1a83c8e2f5 [CI] Build Image for v0.16.0rc1 (#7155)
### What this PR does / why we need it?
Build Image for v0.16.0rc1
- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-03-11 14:48:50 +08:00
SILONG ZENG
90aa048e60 [CI] Skip test_mooncake_layerwise_connector.py in ut (#7147)
### What this PR does / why we need it?
The `test_mooncake_layerwise_connector.py` file in the `ut` test will be
skipped for now and fixed later.

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-03-11 11:46:29 +08:00
Li Wang
881c38d210 [Misc] Download on both hk and guiyang region (#7129)
### What this PR does / why we need it?
Since the PVC files for Guiyang and Hong Kong are not shared, we need to
trigger the download of both regions simultaneously when downloading the
model to ensure that the models in all regions are synchronized.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-10 19:22:32 +08:00
zhangxinyuehfad
67d40f23fd [CI]Upgrade niglty multi-node-tests max-parallel to 2 (#7035)
### What this PR does / why we need it?

1. Increase nightly multi-node test max-parallel from 1 to 2, and fix
resource conflicts that arise when tests run concurrently.
2. Fix parse-trigger job: Add an if condition so it only runs on
schedule, workflow_dispatch, or PRs labeled nightly-test
3. Adjust nightly schedule: Shift trigger time from 24:00 to 23:45
(UTC+8)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-03-10 16:25:51 +08:00
dependabot[bot]
3b25ded8b7 [CI] Bump docker/metadata-action from 5 to 6 (#7069)
Bumps [docker/metadata-action](https://github.com/docker/metadata-action) from 5 to 6.


- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-10 09:06:04 +08:00
dependabot[bot]
2325bbe79b [CI] Bump actions/checkout from 4 to 6 (#7070)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.

- vLLM version: v0.16.0
- vLLM main:
4034c3d32e

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-10 09:05:22 +08:00
wanghengkang
c49ce18ea5 [Test] Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p (#6977)
### What this PR does / why we need it?
Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: gcw_61wqY8cy <wanghengkang1@huawei.com>
2026-03-06 14:25:10 +08:00
SILONG ZENG
bd571cf6d6 [Main2Main] Upgrade vLLM to 0303 (#6944)
### What this PR does / why we need it?
break:
- https://github.com/vllm-project/vllm/pull/34102 
Disable_full param replaced with valid_modes/invalid_modes API
- https://github.com/vllm-project/vllm/pull/35503
Now must return float compilation_time
- https://github.com/vllm-project/vllm/pull/35564
New sequence_lengths param added
- https://github.com/vllm-project/vllm/pull/33807
A check was performed (if runner_backend != "auto")
- https://github.com/vllm-project/vllm/pull/34861
`BaseDeviceCommunicator` now accesses PyTorch's internal `pg_map` to
check process group state
- https://github.com/vllm-project/vllm/pull/35274

**Important change:**
- https://github.com/vllm-project/vllm/pull/28672

`matcher_utils` directly accesses `torch.ops._C.*` during the import
phase. In the Ascend environment, some unregistered ops trigger
`AttributeError`, causing e2e initialization failure.

https://github.com/vllm-project/vllm-ascend/actions/runs/22607260487/job/65502047131#step:10:2323

https://github.com/vllm-project/vllm/blob/main/vllm/compilation/passes/fusion/matcher_utils.py#L29

This PR adds temporary compatibility placeholders (rms_norm,
fused_add_rms_norm, rotate_embedding, static/dynamic fp8 quant,
silu_and_mul) to
`vllm_ascend/patch/platform/patch_fusion_matcher_compat_ops.py` to
ensure no crashes during the import phase. Upstream repairs will be
considered later.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: Meihan-chen <jcccx.cmh@gmail.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
2026-03-06 09:08:52 +08:00
zhangxinyuehfad
1e4017e3fa [CI] support nightly ci for per pr by labels (#6483)
### What this PR does / why we need it?

This PR refactors the nightly CI workflows (A2 and A3) to support
running tests against a specific PR's code, in addition to the existing
scheduled/dispatch runs using pre-built images.

#### Motivation:
Previously, nightly tests could only be triggered by schedule or
workflow_dispatch, always using the pre-built nightly image. This change
allows developers to trigger nightly tests against their own PR's source
code, enabling early validation without waiting for a nightly build.

#### Changes
Trigger logic (parse-trigger job)

A new parse-trigger job is introduced in both
schedule_nightly_test_a2.yaml and schedule_nightly_test_a3.yaml to
centralize trigger evaluation:

`schedule / workflow_dispatch`: runs all tests with the pre-built image
(existing behavior preserved)
`pull_request (labeled + synchronize)`: runs only when:The PR has the
nightly-test label, and /nightly [test-names] comment exists (latest one
wins)

1. /nightly or /nightly all — runs all tests
2. /nightly test1 test2 — runs only named tests (comma-wrapped for exact
matching)

#### How to trigger
1. Add the nightly-test label to your PR
2. Comment /nightly (all tests) or /nightly test1 test2 (specific tests)
4. Re-triggering: add another /nightly comment and push a new commit
(synchronize event)

### Does this PR introduce _any_ user-facing change?
None

### How was this patch tested?

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-03-05 16:46:37 +08:00
zhangxinyuehfad
566c367a10 [CI] Add DeepSeek-V3.2 large EP nightly ci (#6378)
### What this PR does / why we need it?

Add DeepSeek-V3.2 nightly ci

Fix PD routing to exclude headless nodes when collecting
prefiller/decoder IPs

- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-03-04 16:15:56 +08:00
SILONG ZENG
95b44d7b73 [bugfix]fix file not found error in nightly of single-node (#6976)
### What this PR does / why we need it?
1. The **main image build** takes approximately **two hours**. The main
image build time needs to be moved forward to **21pm(UTC+8)** to ensure
that the nightly image build can use the latest main image.
``` bash
schedule:
   # UTC+8: 8am, 12pm, 16pm, 22pm
   - cron: '0 0,4,8,14 * * *'
```
--->
``` bash
schedule:
   # UTC+8: 8am, 12pm, 16pm, 21pm
   - cron: '0 0,4,8,13 * * *'
```
Link:
https://github.com/vllm-project/vllm-ascend/actions/runs/22632712302/job/65641055135#step:8:26

2. The nightly test is encountering the following error: 
``` bash
ImportError: ascend_transport.so: cannot open shared object file: No such file or directory.
```
Path need to be added:
``` bash
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib" >> ~/.bashrc
```
Link:
https://github.com/vllm-project/vllm-ascend/actions/runs/22632712302/job/65641054911#step:7:529
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-03-04 11:47:26 +08:00
Li Wang
d431d7d526 [CI] Enable auto upgrade e2e estimated time for auto-partition suites (#6840)
### What this PR does / why we need it?
This patch add a schedule triggered workflow for auto upgrade e2e
estimated-time for batter load balance
1. The workflow will run the full e2e test to get the duration of each
test.
2. The script `update_estimated_time.py` will upgrade the
[config.json](https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/scripts/config.yaml)
according to the latest time
3. The workflow will submit a pull request that includes changes to
`config.json` automatically
<img width="2484" height="764" alt="image"
src="https://github.com/user-attachments/assets/02f3459c-bb3b-4f8e-9966-8bb2e5c1bbea"
/>


### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1
- 
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-03-04 10:38:34 +08:00
SILONG ZENG
859f2c25b9 [Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml (#6503)
### What this PR does / why we need it?
This PR refactors the nightly single-node model test by migrating test
configurations from Python scripts to a more maintainable `YAML-based`
format.

| Original PR | Python (`.py`) | YAML (`.yaml`) |
| :--- | :--- | :--- |
| [#3568](https://github.com/vllm-project/vllm-ascend/pull/3568) |
`test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#3631](https://github.com/vllm-project/vllm-ascend/pull/3631) |
`test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` |
| [#5874](https://github.com/vllm-project/vllm-ascend/pull/5874) |
`test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` |
| [#3908](https://github.com/vllm-project/vllm-ascend/pull/3908) |
`test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` |
| [#5682](https://github.com/vllm-project/vllm-ascend/pull/5682) |
`test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` |
| [#4111](https://github.com/vllm-project/vllm-ascend/pull/4111) |
`test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml`
|
| [#3733](https://github.com/vllm-project/vllm-ascend/pull/3733) |
`test_prefix_cache_deepseek_r1_0528_w8a8.py` |
`Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` |
| [#6543](https://github.com/vllm-project/vllm-ascend/pull/6543) |
`test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#6543](https://github.com/vllm-project/vllm-ascend/pull/6543) |
`test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` |
| [#3973](https://github.com/vllm-project/vllm-ascend/pull/3973) |
`test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` |
| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` |
| [#3757](https://github.com/vllm-project/vllm-ascend/pull/3757) |
`test_qwq_32b.py` | `QwQ-32B.yaml` |
| [#5616](https://github.com/vllm-project/vllm-ascend/pull/5616) |
`test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` |
| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) |
`test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` |
| [#5301](https://github.com/vllm-project/vllm-ascend/pull/5301) |
`test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` |
| [#3707](https://github.com/vllm-project/vllm-ascend/pull/3707) |
`test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` |
| [#3676](https://github.com/vllm-project/vllm-ascend/pull/3676) |
`test_qwen3_32b_int8_a3_feature_stack3.py` |
`Qwen3-32B-Int8-A3-Feature-Stack3.yaml` |
| [#3709](https://github.com/vllm-project/vllm-ascend/pull/3709) |
`test_prefix_cache_qwen3_32b_int8.py` |
`Prefix-Cache-Qwen3-32B-Int8.yaml` |
| [#5395](https://github.com/vllm-project/vllm-ascend/pull/5395) |
`test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` |
| [#3474](https://github.com/vllm-project/vllm-ascend/pull/3474) |
`test_qwen3_32b.py` | `Qwen3-32B.yaml` |
| [#3541](https://github.com/vllm-project/vllm-ascend/pull/3541) |
`test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-03-03 20:13:43 +08:00
Xiaoshuang Wang
f7a8befc20 [CI] Upgrade CANN to 8.5.1 (#6897)
### What this PR does / why we need it?
[CI] Upgrade CANN to 8.5.1

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: wxsIcey <1790571317@qq.com>
2026-03-03 09:02:42 +08:00
pu-zhe
632801b0ad [CI][310P] Add 310p tracked files in CI light. (#6923)
### What this PR does / why we need it?
Add 310p tracked files in CI light.
'vllm_ascend/attention/attention_v1.py'
'vllm_ascend/ops/fused_moe/**'
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI test
- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: pu-zhe <zpuaa@outlook.com>
2026-03-02 18:03:46 +08:00
dependabot[bot]
86c9109d16 Bump actions/upload-artifact from 6 to 7 (#6906)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6 to 7.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 14:08:28 +08:00
dependabot[bot]
002ec24dd8 Bump actions/download-artifact from 7 to 8 (#6907)
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 7 to 8.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-02 14:07:59 +08:00
wjunLu
c324053b44 [CI] Revert speedup image building and CI Installation related PRs (#6891)
### What this PR does / why we need it?

Revert speedup image building and CI Installation related PRs

git revert 8835236181
git revert 64fba51275
git revert 263c2f8e8d
git revert 84b00695f8


### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2026-03-02 08:53:10 +08:00
wjunLu
8835236181 [Image] Fix docker image merge tag settings (#6884)
### What this PR does / why we need it?
Fix docker image merge tag settings, to use tag as the branch name.

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2026-03-01 12:20:57 +08:00
wjunLu
64fba51275 [Bugfix] Fix openEuler dockerfile error (#6871)
### What this PR does / why we need it?
This pull request addresses several issues within the openEuler
Dockerfiles : `Dockerfile.a3.openEuler` and `Dockerfile.openEuler` to
ensure correct installation and setup of the Mooncake dependency. The
changes primarily involve fixing incorrect file paths for the
mooncake_installer.sh script and streamlining the declaration of the
MOONCAKE_TAG build argument, leading to more robust and accurate
container builds.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2026-02-28 20:55:18 +08:00
wjunLu
263c2f8e8d [CI] Revert auto rebase (#6867)
### What this PR does / why we need it?
Revert auto rebase

- vLLM version: v0.16.0
- vLLM main:
15d76f74e2

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2026-02-28 11:54:31 +08:00
wjunLu
84b00695f8 [CI] Refactor to speedup image building and CI Installation (#6708)
### What this PR does / why we need it?
1. Refactor  image workflow using cache-from to speedup builds

![build](https://github.com/user-attachments/assets/02135c12-0069-44f8-a3ec-5c2b4282448a)

Simultaneously refactored all Dockerfiles by placing layers that rarely
change before those that change frequently, improving build cache hit
rate.

2. Refactor E2E test using vllm-ascend container images, to skip C
compile while no C code are changed

![e2e](https://github.com/user-attachments/assets/49f5b166-0df3-41e1-8f71-b3bbbed17cfd)

In this case, the job will only replace the source code of vllm-ascend
and install `requirements-dev.txt`, saving about 10min before tests

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

Signed-off-by: wjunLu <wjunlu217@gmail.com>
2026-02-28 09:06:00 +08:00
wjunLu
b60b991005 [CI] Add nightly test for Qwen3-235B-A22B with mooncake layerwise connector (#5441)
### What this PR does / why we need it?

Add nightly test for Qwen3-235B-A22B with  mooncake layerwise connector.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
81786c8774

---------

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2026-02-27 16:31:02 +08:00
Canlin Guo
e4458b2d2b [Main2Main] Upgrade vLLM to 0226 (#6813)
### What this PR does / why we need it?

Breaking:
1. https://github.com/vllm-project/vllm/pull/33452
2. https://github.com/vllm-project/vllm/pull/33451
3. https://github.com/vllm-project/vllm/pull/32567
4. https://github.com/vllm-project/vllm/pull/32344

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
83b47f67b1

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: MrZ20 <2609716663@qq.com>
2026-02-27 16:05:21 +08:00
realliujiaxu
5def28dcd3 [Feat]support sequence parallelism by pass for VL models (#5632) 2026-02-27 08:27:41 +08:00
Li-Yongwen
2870f7c8ad [Feat] Support routing replay (#6696)
### What this PR does / why we need it?

[Feat] Support routing replay
same as https://github.com/vllm-project/vllm-ascend/pull/6666
resubmit  because of DOC failure

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: liyongwen <1310439159@qq.com>
Signed-off-by: Li-Yongwen <63399187+Li-Yongwen@users.noreply.github.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-02-26 10:22:47 +08:00
bowenli
e3927cc8f5 [Bugfix] fix bug for mtp (#6514)
### What this PR does / why we need it?
fix(mtp): resolve MTP core bugs and enhance eager mode test cases
1. Resolved critical issues in eager mode MTP core execution logic;
2. Fixed functional bugs in the _update_states_after_model_execute
function;
3. Updated and released test_mtp_qwen3_next.py to validate eager mode
acceptance rate.
### Does this PR introduce _any_ user-facing change?
None
### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: Bowen-Leee <caoshankuangren@gmail.com>
2026-02-25 17:50:57 +08:00
Icey
ee59429015 upgrade main to 0212 (#6712)
### What this PR does / why we need it?
Fixes `transformers_utils/processors/__init__` import error, due to
https://github.com/vllm-project/vllm/pull/33247
Fixes Fused MoE break introduced by `MoERunner abstraction,` due to
https://github.com/vllm-project/vllm/pull/32344

> delete AscendMoERunnere when
https://github.com/vllm-project/vllm/pull/35178 is merged

Fixes `Make Qwen3VL compatible with Transformers v5`, due to
https://github.com/vllm-project/vllm/pull/34262

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
2026-02-25 09:17:29 +08:00
Nengjun Ma
f0caeeadcb [CI] unlock when load model (#6771)
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

Signed-off-by: leo-pony <nengjunma@outlook.com>
2026-02-14 18:54:04 +08:00
SILONG ZENG
e2237819a9 [CI]Fixed the spell check function in typos.toml (#6753)
### What this PR does / why we need it?
The incorrect regular expression syntax `.*[UE4M3|ue4m3].*` actually
ignores all words containing any of the following characters: `u, e, 4,
m, 3, |`

```yaml
extend-ignore-identifiers-re = [".*Unc.*", ".*_thw",
    ".*UE8M0.*", ".*[UE4M3|ue4m3].*", ".*eles.*", ".*fo.*", ".*ba.*",
    ".*ot.*", ".*[Tt]h[rR].*"]
```
===fix===>
```yaml
extend-ignore-identifiers-re = [".*Unc.*", ".*_thw",
    ".*UE8M0.*", ".*(UE4M3|ue4m3]).*", ".*eles.*", ".*fo.*", ".*ba.*",
    ".*ot.*", ".*[Tt]h[rR].*"]
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main:
9562912cea

Signed-off-by: MrZ20 <2609716663@qq.com>
2026-02-14 11:57:26 +08:00
jiahao.quan
7221045777 [Attention] add gpt-oss support (#5901)
### What this PR does / why we need it?
Please refer to the following link for the historical conversation
https://github.com/vllm-project/vllm-ascend/pull/4467. We have made
updates in light of the comments from the prior PR review. Given the
refactoring of the attention_v1 component, we have carried out necessary
adjustments to fit the newly revised code.

### Does this PR introduce _any_ user-facing change?

1. Modified the code in the Attention section to adapt to the SWA and
Sink features required by gpt-oss.
2. Modified the code in the MoE section to add support for bias and
swigluoai.

### How was this patch tested?
Please refer to the
https://github.com/vllm-project/vllm-ascend/pull/4467 for performance
tests, on the basis of which the accuracy tests from AIME2024 have been
newly added.

![img_v3_02tu_501e88e3-2217-4565-8edf-b9acf4f43f2g](https://github.com/user-attachments/assets/024f8283-18ab-4d4d-ab12-27917b5d7d06)


- vLLM version: v0.13.0
- vLLM main:
bde38c11df

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: mikequan0425 <mikequan0425@foxmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: pu-zhe <zpuaa@outlook.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: luomin2005 <luomin2005@huawei.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: MrZ20 <2609716663@qq.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: leon_tao <taoyao2@huawei.com>
Co-authored-by: nurxat <738457498@qq.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: mikequan <199741451@qq.com>
Co-authored-by: LI SHENGYONG <49200266+shenchuxiaofugui@users.noreply.github.com>
Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Co-authored-by: pu-zhe <zpuaa@outlook.com>
Co-authored-by: luomin2005 <luomin2005@huawei.com>
Co-authored-by: liziyu <56102866+liziyu179@users.noreply.github.com>
Co-authored-by: wangxiaoteng <wangxiaoteng@huawei.com>
Co-authored-by: whx <56632993+whx-sjtu@users.noreply.github.com>
Co-authored-by: Cao Yi <slightwindsec@gmail.com>
Co-authored-by: Icey <1790571317@qq.com>
Co-authored-by: SILONG ZENG <2609716663@qq.com>
2026-02-12 10:55:34 +08:00
Icey
88773bb101 [main to main] upgrade main 0210 (#6673)
### What this PR does / why we need it?
upgrade vllm commit to `9562912cead1f11e8540fb91306c5cbda66f0007`

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
all tests passed

- vLLM version: v0.15.0
- vLLM main:
13397841ab

---------

Signed-off-by: wxsIcey <1790571317@qq.com>
2026-02-11 18:10:14 +08:00
jiangyunfan1
1eb07986bf [TEST]add a qwen3-30b acc case with mooncake mempool (#6244)
### What this PR does / why we need it?
This PR adds a case of qwen3-30b w8a8 with mooncake mempool, we need to
test it regual
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
- vLLM version: v0.14.1
- vLLM main:
d68209402d

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2026-02-10 16:26:55 +08:00
wangxiyuan
2a826b5fad [Misc] upgrade to vllm main (#6646)
### What this PR does / why we need it?
This PR upgrades the core vLLM dependency to a newer version from the
main branch (`13397841ab469cecf1ed425c3f52a9ffc38139b5`). This is
necessary to keep our project up-to-date with the latest features and
fixes from upstream vLLM.

1.
ac32e66cf9
pass file is moved.

- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Co-authored-by: wxsIcey <1790571317@qq.com>
2026-02-10 14:08:59 +08:00
Qiu
cb7c419bc0 [Feat](sfa,dcp) support dcp for sfa (#6563)
### What this PR does / why we need it?
This PR adds DCP support to the SFA backend.

Please note that due to operator constraints, the current implementation
has to all-gather the entire KV cache and modify the block table to
satisfy the operator input requirements. This results in significantly
increased communication overhead and peak memory usage. Therefore, this
is only a temporary workaround and will be refactored once the operator
provides proper support.

Additionally, because of the above limitations,
`cp_kv_cache_interleave_size` is currently required to be equal to
`block_size`. This restriction will also be removed after the refactor.

#### Test
accuracy test using DeepSeek-V3.2-Exp-W8A8 with dp2tp8dcp8

| dataset | version | metric | mode | vllm-api-general-stream |
|----- | ----- | ----- | ----- | -----|
| gsm8kdataset | - | accuracy | gen | 96.35 |

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
2026-02-09 18:52:25 +08:00
wangxiyuan
d266fd7b47 [CI] Add workflow support for lint image build (#6489)
Support specify commit hash with lint image build workflow

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-02-07 09:32:01 +08:00
pu-zhe
1cc225711d [Refactor]310p_e2e test case update (#6539)
### What this PR does / why we need it?
This pull request significantly enhances the test suite by adding new
end-to-end test cases for Qwen3 models on the 310P hardware platform.
The primary goal is to ensure the stability and correctness of these
models under diverse operational conditions, including various
parallelism strategies, data types, and quantization methods.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
E2E test
- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: pu-zhe <zpuaa@outlook.com>
2026-02-07 09:28:37 +08:00
wangyu
c63b7a1188 [Test] Add initial multi modal cases of Qwen2.5-VL-7B-Instruct for disaggregated encoder (#5301)
### What this PR does / why we need it?
This PR adds disaggregated encoder  tests for Qwen2.5-VL-7B-Instruct 
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test
by running ci

- vLLM version: release/v0.12.0

---------

Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>
Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>
2026-02-06 17:30:17 +08:00
wangxiyuan
d0bc16859c [CI][Misc] Some improvement for github action (#6587)
### What this PR does / why we need it?

- This PR removes several self-hosted runner labels from the
`actionlint.yaml` configuration file. These runners are likely no longer
in use, so this change cleans up the configuration and ensures
`actionlint` has an accurate list of available runners.
- Move all Action dockerfiles to one folder
- remove useless `runner` input for e2e test.
- update workflow option version

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This is a configuration change for the CI linter. The correctness will
be verified by `actionlint` running in CI on subsequent pull requests.

- vLLM version: v0.15.0
- vLLM main:
d7e17aaacd

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2026-02-06 14:06:27 +08:00
Li Wang
d018aeb5fa [Image] Bump mooncake version to v0.3.8.post1 (#6428)
### What this PR does / why we need it?
This patch bump the mooncake version to the latest
[release](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.8.post1)
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
test is locally
>>> from mooncake.engine import TransferEngine
- vLLM version: v0.14.1
- vLLM main:
dc917cceb8

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2026-02-06 10:54:03 +08:00
Nengjun Ma
11339eb48a [CI] Update UT CANN version to 8.5.0 for main branch (#6564)
### What this PR does / why we need it?
Update UT CANN version to 8.5.0

### Does this PR introduce _any_ user-facing change?
NA


- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2026-02-06 10:28:42 +08:00
zhangxinyuehfad
81f3c09d6d [CI] Change A2 runner (#6557)
### What this PR does / why we need it?

This PR updates the CI runner from `linux-aarch64-a2-*` to
`linux-aarch64-a2b3-*` in various test configuration files. This change
is necessary to adapt to updates in the CI infrastructure.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The changes are configuration updates for CI tests. The correctness will
be verified by the CI pipeline.

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2026-02-05 23:43:57 +08:00
meihanc
922e5c163b [main2main] upgrade vllm main 0202 (#6560)
### What this PR does / why we need it?
1. Fix `TypeError: FusedMoEParallelConfig.__init__() missing 1 required
positional argument: 'is_sequence_parallel'` due to
https://github.com/vllm-project/vllm/pull/32567
2. Fix ` TypeError: '>' not supported between instances of 'MagicMock'
and 'int'` due to https://github.com/vllm-project/vllm/pull/33035
3. Fix `TypeError: Can't instantiate abstract class AscendMLAImpl with
abstract methods forward_mha, forward_mqa` and AttributeError: 'bool'
object has no attribute 'process_weights_after_loading' due to
https://github.com/vllm-project/vllm/pull/33284
4. Fix `'AscendSharedFusedMoE' object has no attribute
'_routed_input_transform'`due to
https://github.com/vllm-project/vllm/pull/32790
5. Fix `NPUModelRunner._dummy_run() got an unexpected keyword argument
'num_active_loras'` due to
https://github.com/vllm-project/vllm/pull/32005
6. Fix the problem caused by` 'tuple' object has no attribute 'job_id'`
due to https://github.com/vllm-project/vllm/pull/27492
7. Fix the problem that all_moe_layers is not equal to vllm.moe_forward,
vllm.moe_forward_shared due to
https://github.com/vllm-project/vllm/pull/33184
8. Add patch to fix the problem "got multiple values for keyword
argument 'add_special_tokens'" due to
https://github.com/vllm-project/vllm/pull/32863
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.15.0
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
2026-02-05 19:31:17 +08:00