54 Commits

Author SHA1 Message Date
starkwj
389030a8f8 add env vars & misc 2026-02-11 06:27:58 +00:00
starkwj
2a571d8bc8 support multi npu partially 2026-01-09 04:36:39 +00:00
074ae28d6e 更新 README.md 2026-01-05 20:33:31 +08:00
starkwj
caf0289e1a add Dockerfile and readme 2026-01-05 11:31:07 +00:00
wangxiyuan
7ee0b0b5d8 [cherry-pick]Upgrade CANN to 8.3.rc1 (#3945) (#3962)
This PR upgrade CANN from 8.2rc1 to 8.3rc1 and remove the CANN version
check logic.

TODO: we notice that UT runs failed with CANN 8.3 image. So the base
image for UT is still 8.2. We'll fix it later.

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-06 09:05:08 +08:00
wangxiyuan
8a7154001e [0.11.0]Chery pick pta upgrade change (#3940)
This PR cherry-pick two commit from main to upgrade torch-npu to 2.7.1
official release

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-31 22:14:26 +08:00
wangxiyuan
ba19dd3183 Revert PTA upgrade PR (#3352)
we notice that torch npu 0919 doesn't work. This PR revert related
change which rely on 0919 version.
Revert PR: #3295  #3205  #3102 

Related: #3353

- vLLM version: v0.11.0
2025-10-10 14:09:53 +08:00
wangxiyuan
4abdcdba4e upgrade pta to 0919 (#3295)
### What this PR does / why we need it?
Upgrade torch-npu to the newest POC version
### Does this PR introduce _any_ user-facing change?
yes, user need upgrade the pta version as well.
### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-09-30 17:14:23 +08:00
wangxiyuan
00ba071022 [Doc] Release note for v0.11.0rc0 (#3224)
### What this PR does / why we need it?
Add release note for v0.11.0rc0

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-09-30 03:26:18 +08:00
weiguihua2
b1380f3b87 [Doc] modify the version compatibility between vllm and vllm-ascend (#3130)
### What this PR does / why we need it?
modify the version compatibility between vllm and vllm-ascend, the main
branch of vllm-ascend corresponds to the v0.10.2 tag of vllm.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
f225ea7dd9

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-09-23 20:31:49 +08:00
wangxiyuan
048bfd5553 [Release] Add release note for v0.10.2rc1 (#2921)
Add release note for v0.10.2rc1

- vLLM version: v0.10.2
- vLLM main:
b834b4cbf1

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-09-16 01:20:05 +08:00
Yikun Jiang
8ece6956e7 Revert "Upgrade CANN version to 8.3.rc1.alpha001 (#2903)" (#2909)
### What this PR does / why we need it?
This reverts commit 339fceb89c.

### Does this PR introduce _any_ user-facing change?
Yes, use 8.2rc1 image by default

### How was this patch tested?
CI passed

- vLLM version: v0.10.2rc2
- vLLM main:
cfa3234a5b

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-09-13 16:21:54 +08:00
Yikun Jiang
339fceb89c Upgrade CANN version to 8.3.rc1.alpha001 (#2903)
### What this PR does / why we need it?
Upgrade CANN version to 8.3.rc1.alpha001

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?


- vLLM version: v0.10.2rc2
- vLLM main:
89e08d6d18

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-09-13 12:10:21 +08:00
Yikun Jiang
752e272a55 Add note for Ascend HDK version (#2765)
### What this PR does / why we need it?
Add note for Ascend HDK version

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

- vLLM version: v0.10.1.1
- vLLM main:
e599e2c65e

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-09-07 10:33:41 +08:00
Mengqing Cao
7e16b4a7cd [ReleaseNote] Add Release Note for v0.10.1rc1 (#2635)
Add Release Note for v0.10.1rc1

- vLLM version: v0.10.1.1
- vLLM main:
b5ee1e3261

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-09-04 11:26:47 +08:00
wangxiyuan
e11a1bbfc1 [Doc] Update news (#2736)
Refresh the news. Add meetup and official release info

- vLLM version: v0.10.1.1
- vLLM main:
b5ee1e3261

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-09-04 10:10:24 +08:00
wangxiyuan
41b028aa5f [Doc] add v0.9.1 release note (#2646)
Add release note for 0.9.1

- vLLM version: v0.10.1.1
- vLLM main:
8bd5844989

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-09-03 18:04:27 +08:00
Shanshan Shen
98c68220c1 [Doc] Update v0.9.1rc3 doc (#2512)
### What this PR does / why we need it?
Update `v0.9.1rc3` doc, which are supplements to
https://github.com/vllm-project/vllm-ascend/pull/2488.

- vLLM version: v0.10.0
- vLLM main:
170e8ea9ea

Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
2025-08-25 11:39:29 +08:00
Yikun Jiang
67a222c383 [Doc] Add feature branch policy (#2432)
### What this PR does / why we need it?

This patch add the feature branch policy.

After this patch: maintainers are allowed to create a feature branch.
Feature branches are used for collaboration and must include an RFC
link, merge plan and mentor info.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI passed

- vLLM version: v0.10.0
- vLLM main:
7be5d113d8

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-08-21 10:37:21 +08:00
Mengqing Cao
4604882a3e [ReleaseNote] Release note of v0.10.0rc1 (#2225)
### What this PR does / why we need it?
Release note of v0.10.0rc1

- vLLM version: v0.10.0
- vLLM main:
8e8e0b6af1

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-07 14:46:49 +08:00
Yikun Jiang
54ace9e12b Add release note for v0.9.1rc2 (#2188)
### What this PR does / why we need it?
Add release note for v0.9.1rc2

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

- vLLM version: v0.10.0
- vLLM main:
c494f96fbc

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-08-06 09:04:46 +08:00
leo-pony
807f0895b2 Bump torch version to 2.7.1 (#1562)
### What this PR does / why we need it?
Bump torch version to 2.7.1, and cleanup infer schema patch
https://github.com/vllm-project/vllm-ascend/commit/857f489
(https://github.com/vllm-project/vllm-ascend/pull/837), this patch
depends on also: https://github.com/vllm-project/vllm-ascend/pull/1974

### Does this PR introduce any user-facing change?
No

#### How was this patch tested?
CI passed

torch-npu 2.7.1rc1 install guide:
https://gitee.com/ascend/pytorch/tree/v2.7.1/
install depending:
```
pip3 install pyyaml
pip3 install setuptools
```
install torch-npu:

Closes: https://github.com/vllm-project/vllm-ascend/issues/1866
Closes: https://github.com/vllm-project/vllm-ascend/issues/1390


- vLLM version: v0.10.0
- vLLM main:
9af654cc38

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-08-05 08:43:24 +08:00
Mengqing Cao
ed2ab8a197 [CI/Build] Upgrade CANN to 8.2.RC1 (#1653)
### What this PR does / why we need it?
Upgrade CANN to 8.2.rc1

Backport: https://github.com/vllm-project/vllm-ascend/pull/1653

### Does this PR introduce _any_ user-facing change?
Yes, docker image will use 8.2.RC1

### How was this patch tested?
CI passed

- vLLM version: v0.10.0
- vLLM main:
7728dd77bb

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-26 22:37:46 +08:00
Yikun Jiang
8b3a483269 Add recommend version and refresh readme / contribution.md (#1757)
### What this PR does / why we need it?
Add recommend version and contribution.md

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed






- vLLM version: v0.9.2
- vLLM main:
890323dc1b

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-12 12:35:40 +08:00
wangxiyuan
9c560b009a [Release] Add 0.9.2rc1 release note (#1725)
Add release note for 0.9.2rc1, we'll release soon









- vLLM version: v0.9.2
- vLLM main:
7bd4c37ae7

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-11 17:36:05 +08:00
wangxiyuan
205cb85a1e [Doc] Fix doc typo (#1424)
1. Fix the typo
2. Fix 404 url
3. update graph mode and additional config user guide

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-06-25 19:28:26 +08:00
Pleaplusone
7e6efbf2a9 update torch-npu to 2.5.1.post1.dev20250619 (#1347)
### What this PR does / why we need it?
This PR update the torch_npu to newest release version
2.5.1.post1.dev20250619 .

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI tested will guarantee the update

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-06-23 09:02:09 +08:00
Mengqing Cao
96fa7ff63b [DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235)
### What this PR does / why we need it?
1. Fix rank set in DP scenario. The new poc version of torch-npu support
setting `ASCEND_RT_VISIBLE_DEVICES` dynamically, thus we could use the
rank set in `DPEngineCoreProc` directly instead of calculating local
rank across dp by hand in the patched `_init_data_parallel`

Closes: https://github.com/vllm-project/vllm-ascend/issues/1170

2. Bump torch-npu version to 2.5.1.post1.dev20250528

Closes: https://github.com/vllm-project/vllm-ascend/pull/1242
Closes: https://github.com/vllm-project/vllm-ascend/issues/1232


### How was this patch tested?
CI passed with new added test.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Icey <1790571317@qq.com>
2025-06-16 23:09:53 +08:00
wangxiyuan
5903547d09 [doc] add 0.7.3.post1 release note (#1008)
Add release note for 0.7.3.post1
Add the missing release note back for 0.7.3

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-05-29 17:38:34 +08:00
Yikun Jiang
ec27af346a [Doc] Add 0.8.5rc1 release note (#756)
### What this PR does / why we need it?
Add 0.8.5rc1 release note and bump vllm version to v0.8.5.post1

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

CI passed

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-05-06 23:46:35 +08:00
Yikun Jiang
79538b5d73 Upgrade CANN version to 8.1.rc1 (#747)
### What this PR does / why we need it?

Make CANN version bump separately from
https://github.com/vllm-project/vllm-ascend/pull/708

- Upgrade CANN version to 8.1.rc1
- Add prefix to speed up download
`m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10`
- Address tail sapce for Dockerfile.openEuler
- Add note for `/workspace` and `/vllm-workspace` as followup of
https://github.com/vllm-project/vllm-ascend/pull/741

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

CI passed

Co-authored-by: MengqingCao <cmq0113@163.com>

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
2025-05-06 05:44:18 +08:00
Yikun Jiang
2e20797934 [BUILD] Upgrade torch-npu to 2.5.1 (#661)
### What this PR does / why we need it?
The torch-npu 2.5.1 are published:
https://pypi.org/project/torch-npu/2.5.1/
It's time to remove all torch-npu dev version from vllm-ascend code base

### Does this PR introduce _any_ user-facing change?
Yes, using torch-npu 2.5.1

### How was this patch tested?
- [ ] CI passed
- [ ] Manually test
- [ ] Grep all `dev2025`

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-04-27 17:28:29 +08:00
wangxiyuan
e66ded5679 [Doc] Add release note for 0.8.4rc1 (#557)
Add release note for 0.8.4rc1, we'll release 0.8.4rc1 now.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-18 13:24:36 +08:00
Yikun Jiang
1864c40520 Add vLLM Ascend Weekly meeting link (#400)
### What this PR does / why we need it?
Add vLLM Ascend Weekly meeting link

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-27 09:00:21 +08:00
Mengqing Cao
6295d2e9bc [CI/Build][Doc] upgrade torch-npu to 0320 (#392)
### What this PR does / why we need it?
This pr upgrades torch-npu to 0320, so that #321,
https://github.com/vllm-project/vllm-ascend/issues/267#issuecomment-2745045743
could be fixed, and #372 should be reverted after this pr

### Does this PR introduce _any_ user-facing change?
upgrade torch-npu to 0320

### How was this patch tested?
tested locally with long seq inferencing.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-03-26 09:04:12 +08:00
wangxiyuan
befbee5883 Update README and add collect_env info (#369)
1. Doc: Fix error link
2. Doc: make Chinese version the same with english
3. remove useless file `test.py`
4. update `collect_env.py`
5. Fix v1 import error

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-21 15:43:43 +08:00
Yikun Jiang
243ed4da69 Add vLLM forum info and update readme (#366)
### What this PR does / why we need it?
Add vLLM forum info and update readme

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-21 09:32:42 +08:00
Shanshan Shen
c06af8b2e0 [V1][Core] Add support for V1 Engine (#295)
### What this PR does / why we need it?
Add support for V1 Engine.

Please note that this is just the initial version, and there may be some
places need to be fixed or optimized in the future, feel free to leave
some comments to us.

### Does this PR introduce _any_ user-facing change?

To use V1 Engine on NPU device, you need to set the env variable shown
below:

```bash
export VLLM_USE_V1=1
export VLLM_WORKER_MULTIPROC_METHOD=spawn
```

If you are using vllm for offline inferencing, you must add a `__main__`
guard like:

```bash
if __name__ == '__main__':

    llm = vllm.LLM(...)
```

Find more details
[here](https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#python-multiprocessing).

### How was this patch tested?
I have tested the online serving with `Qwen2.5-7B-Instruct` using this
command:

```bash
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
```

Query the model with input prompts:

```bash
curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2.5-7B-Instruct",
        "prompt": "The future of AI is",
        "max_tokens": 7,
        "temperature": 0
    }'
```

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: didongli182 <didongli@huawei.com>
2025-03-20 19:34:44 +08:00
Yikun Jiang
18bb8d1f52 Adapt vLLM requirements changes to fix main CI (#279)
### What this PR does / why we need it?
Adapt vLLM requirements changes:
206e2577fa (diff-01ec17406c969585ed075609a2bbf2f2f4fe3e3def36946694abe6d4eb60a6f2)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-09 16:07:45 +08:00
Yikun Jiang
be58d5f3d8 Bump torch_npu version to dev20250308.3 (#276)
### What this PR does / why we need it?
Bump torch_npu version to dev20250308.3 to fix performance regression on
multi-stream case:
e04c580d07
.


### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-09 15:59:15 +08:00
Mengqing Cao
91f7d8115d [CI/Build] Bump torch_npu to dev20250307.3 (#265)
Update torch-npu version to fix torch npu exponential_ accuracy
With this update, the percision issue when setting `temperature > 0` is
fixed.

---------

Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-03-07 20:34:07 +08:00
Yikun Jiang
ebe14f20cf Recover vllm-ascend dev image (#209)
### What this PR does / why we need it?
Recover vllm-ascend dev image

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-03 09:08:41 +08:00
wangxiyuan
51ae37b22a [Doc] update readme (#147)
Fix doc issue in README

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-25 11:00:58 +08:00
Yikun Jiang
d21b3be685 Mark v0.7.1 as unmaintained and v0.7.3 as maintained (#139)
### What this PR does / why we need it?
Mark v0.7.1 as unmaintained and v0.7.3 as maintained:
vLLM released the v0.7.3 version:
https://github.com/vllm-project/vllm/releases/tag/v0.7.3 which include
serval commits:
- https://github.com/vllm-project/vllm/pull/12874
- https://github.com/vllm-project/vllm/pull/12432
- https://github.com/vllm-project/vllm/pull/13208

We'd better to bump the versions to v0.7.3.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-21 22:41:44 +08:00
Yikun Jiang
7cc024a2d3 [Docs] Refeactor installation doc (#78)
### What this PR does / why we need it?
Refeactor installation doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI, preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-17 22:12:07 +08:00
Yikun Jiang
a6f91f70b7 [Doc] Add versioning_policy doc (#62)
### What this PR does / why we need it?

This patch add the versioning policy doc for vllm-ascend

Reference:
- https://spark.apache.org/versioning-policy.html
- https://docs.openstack.org/project-team-guide/stable-branches.html
- https://github.com/pytorch/pytorch/blob/main/RELEASE.md

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
preview: https://vllm-ascend--62.org.readthedocs.build/en/62/

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-17 14:13:28 +08:00
wangxiyuan
e264987af2 [Doc] Add install doc (#49)
Add official install guide.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-14 10:22:15 +08:00
Yikun Jiang
46977f9f06 [Doc] Add sphinx build for vllm-ascend (#55)
### What this PR does / why we need it?

This patch enables the doc build for vllm-ascend

- Add sphinx build for vllm-ascend
- Enable readthedocs for vllm-ascend
- Fix CI:
- exclude vllm-empty/tests/mistral_tool_use to skip `You need to agree
to share your contact information to access this model` which introduce
in
314cfade02
- Install test req to fix
https://github.com/vllm-project/vllm-ascend/actions/runs/13304112758/job/37151690770:
      ```
      vllm-empty/tests/mistral_tool_use/conftest.py:4: in <module>
          import pytest_asyncio
      E   ModuleNotFoundError: No module named 'pytest_asyncio'
      ```
  - exclude docs PR

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. test locally:
    ```bash
    # Install dependencies.
    pip install -r requirements-docs.txt
    
    # Build the docs and preview
    make clean; make html; python -m http.server -d build/html/
    ```
    
    Launch browser and open http://localhost:8000/.

2. CI passed with preview:
    https://vllm-ascend--55.org.readthedocs.build/en/55/

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-13 18:44:17 +08:00
Yikun Jiang
eb189aac81 Followup fix on official doc update (#34)
### What this PR does / why we need it?
- Fix typos: vllm-ascned --> vllm-ascend
- For version info

### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
preview

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-11 14:28:26 +08:00
wangxiyuan
51eadc68b9 [Docs] Add official doc index (#29)
Add official doc index. Move the release content to the right place.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-11 12:00:27 +08:00