Commit Graph

560 Commits

Author SHA1 Message Date
wangxiyuan
7265dc090d [2/4][Refactor] Refactor torchair utils (#1892)
There is a lot torchair specified logic in common code. It results hard
code maintenance. We will create a new torchair module to launch
torchair related logic there. I plan to add 4 PR.

1. Refactor worker
2. Refactor utils (this PR)
- simple change that move all torchair related util function to torchair
module
3. Refactor model_runner
4. Refactor attention

- vLLM version: v0.9.2
- vLLM main:
8188196a1c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-21 19:43:30 +08:00
Shanshan Shen
957b0b611f [Misc][V0 Deprecation] Remove V0 Model Runner (#1823)
### What this PR does / why we need it?
Remove V0 model runner.

This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.

- vLLM version: v0.9.2
- vLLM main:
7ba34b1241

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-21 16:35:50 +08:00
Shanshan Shen
a66ef39bb6 [Misc][V0 Deprecation] Remove Redundant Offline Distributed Inference Example (#1899)
### What this PR does / why we need it?
The file `offline_distributed_inference_npu.py` is the same as
`offline_inference_npu_tp2.py`, thus we delete one of them.

This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
8188196a1c

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-21 12:01:45 +08:00
wangxiyuan
af56ae3ed1 [1/4][Refactor] Refactor torchair worker (#1885)
There is a lot torchair specified logic in common code. It results hard
code maintenance. We will create a new torchair module to launch
torchair related logic there. I plan to add 4 PR.

1. Refactor worker (this PR)
- create torchair module and move torchair related code in worker to the
new module
3. Refactor utils
4. Refactor model_runner
5. Refactor attention


- vLLM version: v0.9.2
- vLLM main:
8188196a1c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-21 11:50:46 +08:00
aidoczh
c32eea96b7 [Doc]Add Chinese translation for documentation (#1870)
### What this PR does / why we need it?

This PR adds a complete Chinese translation for the documentation using
PO files and the gettext toolchain. The goal is to make the
documentation more accessible to Chinese-speaking users and help the
community grow.

### Does this PR introduce any user-facing change?

Yes. This PR introduces Chinese documentation, which users can access
alongside the original English documentation. No changes to the core
code or APIs.

### How was this patch tested?

The translated documentation was built locally using the standard
documentation build process (`make html` or `sphinx-build`). I checked
the generated HTML pages to ensure the Chinese content displays
correctly and matches the original structure. No code changes were made,
so no additional code tests are required.

vLLM version: v0.9.2  
vLLM main: vllm-project/vllm@5780121

---

Please review the translation and let me know if any improvements are
needed. I am happy to update the translation based on feedback.

- vLLM version: v0.9.2
- vLLM main:
7ba34b1241

---------

Signed-off-by: aidoczh <aidoczh@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-21 11:26:27 +08:00
Mengqing Cao
8cfd257992 [Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681)
### What this PR does / why we need it?
Remove ETP/EP maintained in branch main. We drop this as there is no
relevant scenarios to use ETP now, and we may subsequently advocate
implementing expert tensor parallelism in vLLM to support scenarios
where the expert is needed to be sliced

This is a part of #1422 backport.

Fixes https://github.com/vllm-project/vllm-ascend/issues/1396
https://github.com/vllm-project/vllm-ascend/issues/1154

### Does this PR introduce _any_ user-facing change?
We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in
vllm instead.

### How was this patch tested?
CI passed with new added and existing test.


- vLLM version: v0.9.2
- vLLM main:
fe8a2c544a

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-07-21 09:08:04 +08:00
wangxiyuan
a8b316ac5b [CI] Make AttentionBackend interface compatible to fix broken CI (#1893)
vLLM commit
752c6ade2e
removed `blocksparse_params` for attention backend. This PR does the
same change to make CI happy.


- vLLM version: v0.9.2
- vLLM main:
9499e26e2a

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-21 08:21:06 +08:00
JohnJan
54f2b31184 [Doc] Add a doc for qwen omni (#1867)
Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

### What this PR does / why we need it?
Add FAQ note for qwen omni
Fixes https://github.com/vllm-project/vllm-ascend/issues/1760 issue1



- vLLM version: v0.9.2
- vLLM main:
b9a21e9173
2025-07-20 09:05:41 +08:00
wangxiyuan
2b726d8f90 [CI] Fix broken CI (#1889)
1. vLLM commit
45badd05d0
changed the pooling check logic which broken vLLM Ascend.
2. vLLM commit
3e04107d97
requires higher version of transformers. The transformers version bug
has been fixed by
e936e401de.
We can safe to remove the version limit now.
3. vLLM commit
217937221b
added a new input `enable_eplb` for FusedMoe Ops

This PR fix the broken CI.


- vLLM version: v0.9.2
- vLLM main:
6a971ed692

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-20 02:11:57 +08:00
leo-pony
2ee90461d0 Fix e2e data parallel test: add resource release code (#1881)
### What this PR does / why we need it?
Fix e2e data parallel test: add resource release code and give more time
to engine to pause their processing loops before exiting.

### Does this PR introduce _any_ user-facing change?
No

- vLLM version: v0.9.2
- vLLM main:
5895afd780

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-07-19 11:39:48 +08:00
xleoken
b824525be3 Move deepseek_v3 from deepseek_v2.py (#1793)
### What this PR does / why we need it?
Before patch, we can see
`vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM`, it seems
not friendly format.

```
WARNING 07-14 23:57:34 [registry.py:413] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 07-14 23:57:34 [registry.py:413] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 07-14 23:57:34 [registry.py:413] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
```


### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Local Test.


- vLLM version: v0.9.2
- vLLM main:
bcdfb2a330

Signed-off-by: xleoken <xleoken@163.com>
2025-07-19 11:37:03 +08:00
Shanshan Shen
ab68d31a24 [Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878)
### What this PR does / why we need it?
This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.

- vLLM version: v0.9.2
- vLLM main:
5895afd780

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-19 09:42:32 +08:00
lianyibo
53d2ea3789 [Bugfix]Fix the performance gap between 0.9.2rc1 and 0.9.1 (#1811)
### What this PR does / why we need it?

maybe fixes
[#1728](https://github.com/vllm-project/vllm-ascend/issues/1728#issuecomment-3065083433)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Test Qwen3-32B tp=4 with: 

```bash
vllm serve --port 1234 Qwen/Qwen3-32B \
    --served-model-name Qwen3-32B \
    --tensor-parallel-size 4 \
    --swap-space 16 \
    --max-model-len 6000 \
    --load-format dummy \
    --disable-log-stats \
    --disable-log-requests \
```

Request batch_size=128 input/output token=1024

**In 0.9.2rc1**

```text
=====================================================
Total TPS with    prefill(tokens/s)         : 785.1395
Total TPS without prefill                   : 846.6809
Mean TPS with    prefill                    : 6.1339
Mean TPS without prefill                    : 6.6147
=====================================================
Mean TTFT(ms)                               : 10307.8123
Max  TTFT(ms)                               : 21423.0733
Min  TTFT(ms)                               : 362.3602
=====================================================
Mean TPOT(ms)                               : 151.3051
Max  TPOT(ms)                               : 159.4649
Min  TPOT(ms)                               : 140.899
=====================================================
Total Time(s)                               : 175.6032
Request Throughput(requests/s)              : 0.7289
=====================================================
```

**Apply this PR**

```text
=====================================================
Total TPS with    prefill(tokens/s)         : 811.0014
Total TPS without prefill                   : 876.4423
Mean TPS with    prefill                    : 6.3359
Mean TPS without prefill                    : 6.8472
=====================================================
Mean TTFT(ms)                               : 10263.8382
Max  TTFT(ms)                               : 21151.2547
Min  TTFT(ms)                               : 375.9136
=====================================================
Mean TPOT(ms)                               : 146.1686
Max  TPOT(ms)                               : 154.0957
Min  TPOT(ms)                               : 136.8879
=====================================================
Total Time(s)                               : 169.8579
Request Throughput(requests/s)              : 0.7536
=====================================================
```

The TPOT performance gap between these two sets of data is about 3%.

- vLLM version: v0.9.2
- vLLM main:
8dfb45ca33

Signed-off-by: lianyibo <lianyibo1@kunlunit.com>
2025-07-18 23:09:54 +08:00
Mengqing Cao
574fe407eb [1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841)
### What this PR does / why we need it?
We'll refator `CustomOp` in vllm-ascend from this pr on. 

Use function `CustomOp.register_oot` to achieve the customop registery,
taking `AscendQuickGELU` as an example:
```python
from vllm_ascend.ops.activation import AscendQuickGELU
CustomOp.register_oot(_decorated_op_cls=AscendQuickGELU, name="QuickGELU")
```

This is a quick adapt for `CustomOp.register_oot` mechanism from vllm
0.9.2. For further step, we can remove inherit from `QuickGELU` can
write our own `QuickGELU` at all.

Part of https://github.com/vllm-project/vllm-ascend/pull/1647



- vLLM version: v0.9.2
- vLLM main:
8dfb45ca33

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-07-18 23:07:14 +08:00
Shanshan Shen
8a91e6e59c [Misc][V0 Deprecation] Remove V0 Related Custom Ops (#1871)
### What this PR does / why we need it?
This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.

- vLLM version: v0.9.2
- vLLM main:
ca4eb82bcb

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-18 23:06:03 +08:00
li chaoran
3e39d7234c [CI] Switching to infra cache server to reduce network pressure (#1792)
### What this PR does / why we need it?
This PR introduce the infra cache server to speed up apt/pip package
installation

### Does this PR introduce _any_ user-facing change?
None

### How was this patch tested?
Tested locally, with this config, the network bandwith reduce from 100%
to 5% usage when a new PR was submitted.
<img width="807" height="334" alt="image"
src="https://github.com/user-attachments/assets/16f03bce-4531-4c71-ab6e-8308dc2c022c"
/>


- vLLM version: v0.9.2
- vLLM main:
8dfb45ca33

---------

Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
2025-07-18 18:39:25 +08:00
Shanshan Shen
d08ff304cd [Misc][V0 Deprecation] Remove V0 Attention (#1835)
### What this PR does / why we need it?
This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.

- vLLM version: v0.9.2
- vLLM main:
8dfb45ca33

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-18 14:10:13 +08:00
xudongLi-cmss
33ef5dc813 add unit test for func wrapper (#1863)
### What this PR does / why we need it?
test func wrapper file

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with new added test.

- vLLM version: v0.9.2
- vLLM main:
8dfb45ca33

Signed-off-by: lixudong <lixudong@cmss.chinamobile.com>
2025-07-18 11:05:17 +08:00
Li Wang
f9dfde02fd [Bugfix] Fix broken CI (#1848)
### What this PR does / why we need it?
- Fix broken commit by
[#20927](https://github.com/vllm-project/vllm/pull/20927)
- Fix broken commit by
[#20466](https://github.com/vllm-project/vllm/pull/20466)
- TODO: more fully adapt to the upstream reconstruction, let's first
make CI happy

- vLLM version: v0.9.2
- vLLM main:
11dfdf21bf

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-17 20:10:12 +08:00
Zhu Yi Lin
538dd357e6 Add graph mode and improve on multi_npu_moge.md (#1849)
### What this PR does / why we need it?
Add graph mode and improve on multi_npu_moge.md

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
CI passed with new existing test.


- vLLM version: v0.9.2
- vLLM main:
5a7fb3ab9e

Signed-off-by: GDzhu01 <809721801@qq.com>
2025-07-17 17:53:37 +08:00
Shanshan Shen
aeb5aa8b88 [Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837)
### What this PR does / why we need it?
Add `__main__` guard to all offline examples.

- vLLM version: v0.9.2
- vLLM main:
76b494444f

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-17 14:13:30 +08:00
Shanshan Shen
19e37cd379 [Misc] Add fusion_result.json to .gitignore (#1836)
### What this PR does / why we need it?
Add `fusion_result.json` to `.gitignore`.



- vLLM version: v0.9.2
- vLLM main:
72ad273582

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-17 11:54:49 +08:00
Icey
875a920d4a [Platform] Add support for Altlas A3 series (#1794)
### What this PR does / why we need it?
Add support for Ascend A3 and remove latest tag

### Does this PR introduce _any_ user-facing change?
User can run vLLM on Altlas A3 series

### How was this patch tested?
CI passed with:

- remove latest tag test:
https://github.com/wxsIcey/wxs-vllm-ascend/actions/runs/16267635040/job/45926924765
- E2E image build for A3
- CI test on A3 with e2e test and longterm test
- Unit test missing because need a real A3 hardware to have a test

Closes: https://github.com/vllm-project/vllm-ascend/issues/1696


- vLLM version: v0.9.2
- vLLM main:
d0dc4cfca4

---------

Signed-off-by: Icey <1790571317@qq.com>
2025-07-17 11:13:02 +08:00
wangxiyuan
ef99fe1c54 [Test] Clean up duplicate test for ascend scheduler (#1819)
There are some duplicate tests for ascend scheduler. This PR remove them
to make the test clear.

After this PR. the singlecard e2e cost time is reduced from 47min to
46min.

- vLLM version: v0.9.2
- vLLM main:
1eb2b9c102

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-16 17:57:48 +08:00
Shanshan Shen
c66b0827a7 [Misc][V0 Deprecation] Remove Pooling Model Runner (#1824)
### What this PR does / why we need it?
Remove pooling model runner.

This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.

- vLLM version: v0.9.2
- vLLM main:
d31a647124

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-16 17:48:21 +08:00
Yikun Jiang
ba7e934b21 Remove redundant empty lines in commit msg (#1814)
### What this PR does / why we need it?
Remove redundant empty lines in commit msg

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
test locally: https://github.com/Yikun/vllm-ascend/pull/48

- vLLM version: v0.9.2
- vLLM main:
d0dc4cfca4

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-16 16:50:44 +08:00
Shanshan Shen
06655002c5 [Misc][V0 Deprecation] Remove V0 Worker (#1821)
### What this PR does / why we need it?
Remove V0 worker.

This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.

- vLLM version: v0.9.2
- vLLM main:
6cbc4d4bea

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-16 14:07:17 +08:00
Shanshan Shen
b005def0a5 [Misc][V0 Deprecation] Remove Multi-Step Model Runner (#1820)
### What this PR does / why we need it?
Remove multi-step model runner.

This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.



- vLLM version: v0.9.2
- vLLM main:
34cda778a0

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-16 14:06:49 +08:00
Shanshan Shen
f9e2e9bb31 [Misc][V0 Deprecation] Remove Draft Model Runner Used for V0 Spec Decode (#1810)
### What this PR does / why we need it?
Remove draft model runner used for V0 spec decode.

This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.

- vLLM version: v0.9.2
- vLLM main:
34cda778a0

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-16 10:51:23 +08:00
Shanshan Shen
f96100fad5 [Misc][V0 Deprecation] Remove V0 related codes of test, example, platform (#1805)
### What this PR does / why we need it?
Remove V0 related codes of test, example, platform.

This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.

- vLLM version: v0.9.2
- vLLM main:
235bfd5dfe

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-15 19:58:55 +08:00
Shanshan Shen
a929699e98 [Misc][V0 Deprecation] Remove multi-step worker (#1809)
### What this PR does / why we need it?
Remove multi-step worker

This PR is a part of
https://github.com/vllm-project/vllm-ascend/issues/1620.

- vLLM version: v0.9.2
- vLLM main:
235bfd5dfe

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-15 19:48:47 +08:00
wangxiyuan
bf2549856f [CI] Fix changes CI to recover codecov (#1799)
Add `checkout` action before `dorny/paths-filter` to make it works with
`push` case.
This is a known issue that `dorny/paths-filter` works without `checkout`
in `pull_request` case but failed in `push` case. More detail is here:
https://github.com/dorny/paths-filter/issues/60#issuecomment-1464281021

The push CI works after this PR. The test result is here:

https://github.com/wangxiyuan/vllm-ascend/actions/runs/16285606468/job/45983607539
- vLLM version: v0.9.2
- vLLM main:
d4d309409f

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-15 15:01:13 +08:00
wangxiyuan
787010a637 [Test] Remove VLLM_USE_V1 in example and tests (#1733)
V1 is enabled by default, no need to set it by hand now. This PR remove
the useless setting in example and tests

- vLLM version: v0.9.2
- vLLM main:
9ad0a4588b

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-15 12:49:57 +08:00
wangxiyuan
eb921d2b6f [Doc] Fix 404 error (#1797)
Fix url 404 error in doc
- vLLM version: v0.9.2
- vLLM main:
9ad0a4588b

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-15 11:52:38 +08:00
wangxiyuan
7bdada58eb [Misc] Remove VLLM_USE_V1 usage in code (#1764)
We plan to remove V0 code from this version. The first step is to delete
v0 usage.

Related: https://github.com/vllm-project/vllm-ascend/issues/1620

- vLLM version: v0.9.2
- vLLM main:
61e20828da

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-15 11:52:16 +08:00
wangxiyuan
494b0f474f [CI]Fix broken CI (#1773)
This PR fixed the broken CI. It require
https://github.com/vllm-project/vllm/pull/20900 merged first.

- vLLM version: v0.9.2
- vLLM main:
e8cc53af5e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-15 00:54:20 +08:00
Li Wang
afcfe91dfa [Doc] Fix multi node doc (#1783)
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
Pin docker image to latest release
### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
1e9438e0b0

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-14 17:56:57 +08:00
zhangxinyuehfad
cabfb2bc31 [Test] Resolve vllm-ascend version accuracy test (#1769)
### What this PR does / why we need it?
Resolve vllm-ascend version for accuracy test

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
66f6fbd393

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-07-14 15:43:37 +08:00
Shanshan Shen
d3c6dd985a [Misc] Add include dir to .gitignore (#1771)
### What this PR does / why we need it?
Add `include` dir to `.gitignore`.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
66f6fbd393

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-07-14 12:05:29 +08:00
Li Wang
9cd4ac76a1 [CI] Remove benchmark patch and increase the scheduler frequency (#1762)
### What this PR does / why we need it?
This pr purpose to do the following things:
1. Remove `benchmark_datasets.py` patch
2. Increase the scheduler frequency to 2 times per day, due to the
recent large number of daily submissions, we need to increase the
default test time(6h)
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
247102f07f

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-13 20:00:35 +08:00
Yikun Jiang
d118bf8a26 Update README.zh.md to fix typo (#1758)
### What this PR does / why we need it?


Update README.zh.md to fix typo

### Does this PR introduce _any_ user-facing change?


No

### How was this patch tested?


CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-12 14:01:34 +08:00
Yikun Jiang
eff4b5791c Recover offline_inference_npu.py to make doctest passed (#1756)
### What this PR does / why we need it?
Rename offline_inference_npu_v1.py to offline_inference_npu.py to
recover doctest

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed









- vLLM version: v0.9.2
- vLLM main:
a8593237c0

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-12 12:36:35 +08:00
Yikun Jiang
8b3a483269 Add recommend version and refresh readme / contribution.md (#1757)
### What this PR does / why we need it?
Add recommend version and contribution.md

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed






- vLLM version: v0.9.2
- vLLM main:
890323dc1b

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-12 12:35:40 +08:00
wangxiyuan
3c404de1b1 [Release]Update release note (#1753)
There is still issue with pp in some case. such as aclgraph, ray. Remove
the related doc in release note

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-11 17:58:26 +08:00
wangxiyuan
b5b7e0ecc7 [Doc] Add qwen3 embedding 8b guide (#1734)
1. Add the tutorials for qwen3-embedding-8b
2. Remove VLLM_USE_V1=1  in docs, it's useless any more from 0.9.2


- vLLM version: v0.9.2
- vLLM main:
5923ab9524

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-11 17:40:17 +08:00
wangxiyuan
9c560b009a [Release] Add 0.9.2rc1 release note (#1725)
Add release note for 0.9.2rc1, we'll release soon









- vLLM version: v0.9.2
- vLLM main:
7bd4c37ae7

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-11 17:36:05 +08:00
zhangxinyuehfad
1b4a2f3817 [CI] Add accuracy ci for DP and EP and TP and ETP (#1140)
### What this PR does / why we need it?

Add accuracy ci for DP and EP and TP

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.9.2
- vLLM main:
35514b682a

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-07-11 17:25:17 +08:00
Pr0Wh1teGivee
d13fb0766e [Perf] add patch to optimize apply_topk_topp (#1732)
### What this PR does / why we need it?
Performance optimization for apply_top_k_top_p
### Does this PR introduce _any_ user-facing change?
Use VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION to enable this feature
### How was this patch tested?
e2e & ut

















- vLLM version: v0.9.2
- vLLM main:
6a9e6b2abf

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
2025-07-11 15:32:02 +08:00
weiguihua2
aa4240c67f Support pipeline parallel in V1 Engine (#1700)
### What this PR does / why we need it?
This patch supports pipeline parallel in V1 Engine

### Does this PR introduce _any_ user-facing change?
Yes, users can run PP in V1

### How was this patch tested?
Manully test














- vLLM version: v0.9.2
- vLLM main:
31d5c1797f

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-07-11 15:30:51 +08:00
zhangxinyuehfad
1cd27da5fb [Test] Remove VLLM_USE_V1 in accuracy test (#1739)
### What this PR does / why we need it?
Remove VLLM_USE_V1 in accuracy test

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-07-11 15:29:11 +08:00