Commit Graph

183 Commits

Author SHA1 Message Date
Jade Zheng
955411611c Nominate Mengqing Cao as vllm-ascend maintainer (#2433)
I would like to nominate Mengqing Cao (@MengqingCao
https://github.com/MengqingCao) as a maintainer, starting with my +1.

## Reason

Review Quality‌: She has completed [120+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao)
since Feb. 2025, include
[#review-3077842852](https://github.com/vllm-project/vllm-ascend/pull/2088#pullrequestreview-3077842852),
[comment-2990074116](https://github.com/vllm-project/vllm-ascend/pull/1032#issuecomment-2990074116),
[comment-2921063723](https://github.com/vllm-project/vllm-ascend/pull/1013#issuecomment-2921063723)
high quality review.

Sustained and Quality Contributions: She has Deep understanding of
‌vLLM‌ and ‌vLLM Ascend‌ codebases and solid contributions include The
vLLM contributions and help vLLM Ascend release is the main reason I
nominated her:

- vLLM: Things worth mentioning that she completed [28+ PR
contributions](https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+)
in vllm-project/vllm, especially for vLLM platform module to improve
vLLM mult hardware support. She is one of the important co-authors of
[vllm#8054](https://github.com/vllm-project/vllm/pull/8054) and hardware
plugin RFC, this makes vllm-ascend plugin possible.
Community Involvement: She is also very active and involved in [60+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao).

So I think she's a great addition to the vLLM Ascend Maintainer team.

- **Review Quality‌:**

She has completed 120+ reviews since Feb. 2025.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao,
include
https://github.com/vllm-project/vllm-ascend/pull/2088#pullrequestreview-3077842852,
https://github.com/vllm-project/vllm-ascend/pull/1446#issuecomment-3015166908,
https://github.com/vllm-project/vllm-ascend/pull/1032#issuecomment-2990074116,
https://github.com/vllm-project/vllm-ascend/pull/1013#issuecomment-2921063723
quality review.

- **Sustained Contributions:**

99+ PR merged in vllm-project/vllm-ascend

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged

- **Quality Contribution‌:**

She is one of the important co-authors of
https://github.com/vllm-project/vllm/pull/8054 , this makes vllm-ascend
plugin possible.

Things worth mentioning that she complete 28+ PR contributions in
vllm-project/vllm, especially for vLLM platform module to improve vLLM
mult hardware support:

https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+.

At 2025 Q2, She also lead the [[RFC]: E2E CI test for key
features](https://github.com/vllm-project/vllm-ascend/issues/413) and
[[RFC]: Unit test coverage
improvement](https://github.com/vllm-project/vllm-ascend/issues/1298) to
help vllm ascend improve the coverage.

Her main contributions focus on the adaptation of parallel strategies
and communicator, such as
https://github.com/vllm-project/vllm-ascend/pull/1800,
https://github.com/vllm-project/vllm-ascend/pull/1856.

These contributions are sufficient to prove she has “Deep understanding
of ‌vLLM‌ and ‌vLLM Ascend‌ codebases”

- **Community Involvement‌:**

Involved in 63+ issue reviewer
https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao

She led the v0.10.1 release as release manager


- vLLM version: v0.10.0
- vLLM main:
78dba404ad

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
2025-08-19 14:13:54 +08:00
wangxiyuan
6335fe39ea Nominate ApsarasX as vllm-ascend maintainer (#2419)
I would like to nominate Wengang Chen (@ApsarasX
https://github.com/ApsarasX) as a maintainer, starting with my +1.

## Reason
Review Quality‌: He focuses on the vLLM Ascend Core module review with
100+ high quality review, such as [#2326
(comment)](https://github.com/vllm-project/vllm-ascend/pull/2326#discussion_r2268509365),
[#768
(comment)](https://github.com/vllm-project/vllm-ascend/pull/768#discussion_r2075278516),
[#2312
(comment)](https://github.com/vllm-project/vllm-ascend/pull/2312#issuecomment-3174677159),
[#2268
(comment)](https://github.com/vllm-project/vllm-ascend/pull/2268#discussion_r2260920578),
[#2192
(comment)](https://github.com/vllm-project/vllm-ascend/pull/2192#issuecomment-3149414586),
[#2156
(comment)](https://github.com/vllm-project/vllm-ascend/pull/2156#discussion_r2249096673).
This helped vLLM Ascend v0.9.x and v0.10.x to be released with high
quality.

Sustained and Quality Contributions: He has a very good habit of sharing
his design ideas, development process, performance test results, such as
[#966](https://github.com/vllm-project/vllm-ascend/pull/966), he
contributed [many
PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Amerged+),
valuable bugfixes and also perf improvements.

Community Involvement: Active involved in community discussion, he is
collaborative and helps the users solve problems, involved in [120+ PR
and
issues](https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX).
He is also the speaker of [vLLM Beijing
Meetup](https://mp.weixin.qq.com/s/7n8OYNrCC_I9SJaybHA_-Q).

So I think he's a great addition to the vLLM Ascend Maintainer team.

- Review Quality‌:
108+ PR with valuable review
https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3AApsarasX
with many valuable review, like 

https://github.com/vllm-project/vllm-ascend/pull/2326#discussion_r2268509365

https://github.com/vllm-project/vllm-ascend/pull/768#discussion_r2075278516

https://github.com/vllm-project/vllm-ascend/pull/2312#issuecomment-3174677159

https://github.com/vllm-project/vllm-ascend/pull/2268#discussion_r2260920578

https://github.com/vllm-project/vllm-ascend/pull/2192#issuecomment-3149414586

https://github.com/vllm-project/vllm-ascend/pull/2156#discussion_r2249096673

-  Sustained and Major Contributions
https://github.com/vllm-project/vllm-ascend/pulls/ApsarasX

-  Quality Contribution‌:

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Aclosed
Good quality with well documents
[Perf] Refactor tensor disposal logic to reduce memory usage
https://github.com/vllm-project/vllm-ascend/pull/966

- Community Involvement‌: 
7 issue:

https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20author%3AApsarasX

- 120+ PR and issue:

https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-08-19 10:44:35 +08:00
TaoYu Chen
9e7c168d99 Add ModelRunner_prepare_inputs doc (#1493)
### What this PR does / why we need it?
To help more developers quickly get started with vLLM, we need to write
clear and easy-to-understand code documentation and technical
interpretations. This will effectively lower the learning curve, attract
more excellent contributors, and collectively build a better developer
community.

Add ModelRunner_prepare_inputs doc

### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
Pass CI


- vLLM version: v0.10.0
- vLLM main:
4be02a3776

---------

Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com>
2025-08-18 15:41:24 +08:00
Li Wang
2ad7e1251e [Doc] Fix quant documentation to make it reproducible (#2277)
### What this PR does / why we need it?
Fixed the expression of msit for code clone

- vLLM version: v0.10.0
- vLLM main:
afa5b7ca0b

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-08-14 17:19:47 +08:00
jack
8bfd16a145 [Doc] Add container image save/load FAQ for offline environments (#2347)
### What this PR does / why we need it?

Add Docker export/import guide for air-gapped environments

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

NA

- vLLM version: v0.10.0
- vLLM main:
d16aa3dae4

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
2025-08-13 16:00:43 +08:00
Mengqing Cao
49ec6c98b7 [Doc] Update faq (#2334)
### What this PR does / why we need it?
  - update determinitic calculation
  - update support device

### Does this PR introduce _any_ user-facing change?
- Users should update ray and protobuf when using ray as distributed
backend
- Users should change to use `export HCCL_DETERMINISTIC=true` when
enabling determinitic calculation

### How was this patch tested?
N/A

- vLLM version: v0.10.0
- vLLM main:
ea1292ad3e

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-12 14:12:53 +08:00
Wang Kunpeng
dc585f148a [main][prefill optimization] Optimize parallel strategies to reduce communication overhead (#2198)
### What this PR does / why we need it?
1.Shared Expert Sharding Strategy Update: Switched from TP-aligned to
pure DP for shared experts, enabling more efficient execution.
2.O_Proj AllReduce → ReduceScatter: Reduced communication overhead by
using ReduceScatter, made possible by pure DP sharding.
3.AllGather Postponed: Delayed to after QKV down projection to reduce
synchronization impact during prefill.

### How was this patch tested?
Adding ut case in `tests/ut/attention/test_mla_v1.py`

#### How to run

use parameter `--additional_config='{"enable_shared_expert_dp": true}'`

##### a.How to run eager mode

eg:
python -m vllm.entrypoints.openai.api_server --model=/model_path
--trust-remote-code -tp 8 -dp 2 --enable_expert_parallel --port 8002
--max-model-len 5120 --max-num-batched-tokens 16384 --enforce-eager
--disable-log-requests
--additional_config='{"ascend_scheduler_config":{"enabled":true},"enable_shared_expert_dp":
true,"chunked_prefill_for_mla":true}'

##### b.How to run graph mode

eg:
python -m vllm.entrypoints.openai.api_server --model=/model_path
--trust-remote-code -tp 8 -dp 2 --enable_expert_parallel --port 8002
--max-model-len 5120 --max-num-batched-tokens 16384
--disable-log-requests
--additional_config='{"ascend_scheduler_config":{"enabled":true},"enable_shared_expert_dp":
true,"chunked_prefill_for_mla":true,"torchair_graph_config":{"enabled":true}}'


- vLLM version: v0.10.0
- vLLM main:
9edd1db02b

---------

Signed-off-by: Wang Kunpeng <1289706727@qq.com>
Signed-off-by: SlightwindSec <slightwindsec@gmail.com>
Co-authored-by: SlightwindSec <slightwindsec@gmail.com>
2025-08-12 14:12:12 +08:00
Mengqing Cao
4604882a3e [ReleaseNote] Release note of v0.10.0rc1 (#2225)
### What this PR does / why we need it?
Release note of v0.10.0rc1

- vLLM version: v0.10.0
- vLLM main:
8e8e0b6af1

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-07 14:46:49 +08:00
zhangxinyuehfad
92eebc0c9b [Doc] Update user guide for suported models (#2263)
### What this PR does / why we need it?
 Update user guide for suported models 

- vLLM version: v0.10.0
- vLLM main:
4be02a3776

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-08-07 14:39:51 +08:00
22dimensions
440d28a138 [Tutorial] Add qwen3 8b w4a8 tutorial (#2249)
### What this PR does / why we need it?

Add a new single npu quantization tutorial, and using the latest qwen3
model.

- vLLM version: v0.10.0
- vLLM main:
8e8e0b6af1

Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2025-08-07 14:39:38 +08:00
zhangxinyuehfad
bcd0b532f5 [Doc] Update user guide for using lm-eval (#1325)
### What this PR does / why we need it?
Update user guide for using lm-eval
1. add using lm-eval on online server
2. add using offline datasets

- vLLM version: v0.10.0
- vLLM main:
9edd1db02b

---------

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-08-07 14:15:49 +08:00
zhangxinyuehfad
dbba3cabb0 [Doc] Update tutorials for single_npu_audio and single_npu_multimodal (#2252)
### What this PR does / why we need it?
Update tutorials for single_npu_audio and single_npu_multimodal

- vLLM version: v0.10.0
- vLLM main:
6b47ef24de

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-08-07 14:08:14 +08:00
Li Wang
bf84f2dbfa [Doc] Support kimi-k2-w8a8 (#2162)
### What this PR does / why we need it?
In fact, the kimi-k2 model is similar to the deepseek model, and we only
need to make a few changes to support it. what does this pr do:
1. Add kimi-k2-w8a8 deployment doc
2. Update quantization doc
3. Upgrade torchair support list
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.10.0
- vLLM main:
9edd1db02b

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-08-06 19:28:47 +08:00
Yikun Jiang
54ace9e12b Add release note for v0.9.1rc2 (#2188)
### What this PR does / why we need it?
Add release note for v0.9.1rc2

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

- vLLM version: v0.10.0
- vLLM main:
c494f96fbc

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-08-06 09:04:46 +08:00
leo-pony
807f0895b2 Bump torch version to 2.7.1 (#1562)
### What this PR does / why we need it?
Bump torch version to 2.7.1, and cleanup infer schema patch
https://github.com/vllm-project/vllm-ascend/commit/857f489
(https://github.com/vllm-project/vllm-ascend/pull/837), this patch
depends on also: https://github.com/vllm-project/vllm-ascend/pull/1974

### Does this PR introduce any user-facing change?
No

#### How was this patch tested?
CI passed

torch-npu 2.7.1rc1 install guide:
https://gitee.com/ascend/pytorch/tree/v2.7.1/
install depending:
```
pip3 install pyyaml
pip3 install setuptools
```
install torch-npu:

Closes: https://github.com/vllm-project/vllm-ascend/issues/1866
Closes: https://github.com/vllm-project/vllm-ascend/issues/1390


- vLLM version: v0.10.0
- vLLM main:
9af654cc38

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: leo-pony <nengjunma@outlook.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-08-05 08:43:24 +08:00
leo-pony
f0c1f0c828 [Doc] Add qwen vl example in tutorials for 310I series (#2160)
### What this PR does / why we need it?
Add qwen vl example in tutorials for 310I series. 

Model: Qwen2.5-VL-3B-Instruct
Accuracy test result, dataset MMM-val:
| | 910B3 | 310P3 |
| --- | --- | --- |
|Summary|0.455 | 0.46 |
|--art_and_design| 0.558 | 0.566 |
|--business| 0.373 | 0.366 |
|--health_and_medicine|0.513 | 0.52 |
|--science|0.333 | 0.333 |
|--tech_and_engineering|0.362 | 0.380 |
|--humanities_and_social_science|0.691 | 0.691 |

Function test result:

1. On line:
![image](https://github.com/user-attachments/assets/d81bba61-df28-4676-a246-c5d094815ac7)
![image](https://github.com/user-attachments/assets/0be81628-9999-4ef2-93c1-898b3043e09e)

2. Offline:
![image](https://github.com/user-attachments/assets/603275c1-6ed6-4cfc-a6e2-7726156de087)

- vLLM version: v0.10.0
- vLLM main:
ad57f23f6a

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-08-02 08:58:56 +08:00
Mengqing Cao
936df1cb9b [Doc] Fix cann related urls (#2106)
### What this PR does / why we need it?
Fix cann related urls in installation doc.

### Does this PR introduce _any_ user-facing change?
The users install cann manually could use the correct url after this pr

### How was this patch tested?
N/A

- vLLM version: v0.10.0
- vLLM main:
5bbaf492a6

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-07-30 22:31:30 +08:00
Shanshan Shen
61fc35184b [Doc] Add performance tuning doc to main (#1392)
### What this PR does / why we need it?
Add performance tuning doc to main.

Closes: https://github.com/vllm-project/vllm-ascend/issues/1387


- vLLM version: v0.9.1
- vLLM main:
923147b5e8

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
2025-07-29 19:36:34 +08:00
Wang Kunpeng
e3a2443c3a [main][Doc] add mla pertoken quantization FAQ (#2018)
### What this PR does / why we need it?
When using deepseek series models generated by the --dynamic parameter,
if torchair graph mode is enabled, we should modify the configuration
file in the CANN package to prevent incorrect inference results.

- vLLM version: v0.10.0
- vLLM main:
7728dd77bb

---------

Signed-off-by: Wang Kunpeng <1289706727@qq.com>
2025-07-27 08:47:51 +08:00
Mengqing Cao
ed2ab8a197 [CI/Build] Upgrade CANN to 8.2.RC1 (#1653)
### What this PR does / why we need it?
Upgrade CANN to 8.2.rc1

Backport: https://github.com/vllm-project/vllm-ascend/pull/1653

### Does this PR introduce _any_ user-facing change?
Yes, docker image will use 8.2.RC1

### How was this patch tested?
CI passed

- vLLM version: v0.10.0
- vLLM main:
7728dd77bb

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-26 22:37:46 +08:00
zhangxinyuehfad
d1c640841b [Bugfix] Fix num_hidden_layers when Qwen2-Audio 7B (#1803)
### What this PR does / why we need it?
Fix num_hidden_layers when Qwen2-Audio 7B and #1760 :
```
INFO 07-15 04:38:53 [platform.py:174] PIECEWISE compilation enabled on NPU. use_inductor not supported - using only ACL Graph mode
Traceback (most recent call last):
  File "/workspace/test1.py", line 58, in <module>
    main(audio_count)
  File "/workspace/test1.py", line 38, in main
    llm = LLM(model="Qwen/Qwen2-Audio-7B-Instruct",
  File "/vllm-workspace/vllm/vllm/entrypoints/llm.py", line 271, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 494, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
  File "/vllm-workspace/vllm/vllm/engine/arg_utils.py", line 1286, in create_engine_config
    config = VllmConfig(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
  File "/vllm-workspace/vllm/vllm/config.py", line 4624, in __post_init__
    current_platform.check_and_update_config(self)
  File "/vllm-workspace/vllm-ascend/vllm_ascend/platform.py", line 180, in check_and_update_config
    update_aclgraph_sizes(vllm_config)
  File "/vllm-workspace/vllm-ascend/vllm_ascend/utils.py", line 307, in update_aclgraph_sizes
    num_hidden_layers = vllm_config.model_config.hf_config.num_hidden_layers
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/transformers/configuration_utils.py", line 211, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'Qwen2AudioConfig' object has no attribute 'num_hidden_layers'
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes: https://github.com/vllm-project/vllm-ascend/issues/1780
https://github.com/vllm-project/vllm-ascend/issues/1760
https://github.com/vllm-project/vllm-ascend/issues/1276
https://github.com/vllm-project/vllm-ascend/issues/359

- vLLM version: v0.10.0
- vLLM main:
7728dd77bb

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-07-26 20:13:00 +08:00
Yikun Jiang
17a430f7b8 Upgrade vLLM to v0.10.0 (#1927)
### What this PR does / why we need it?
- Upgrade to v0.10.0
- Drop v0.9.2 version compatibility
- Add patch for
`vllm_ascend/patch/worker/patch_common/patch_sampler_gather_logprobs.py`
as workaround of
f3a683b7c9
for v0.10.0 and also add e2e test `test_models_prompt_logprobs`
- Pin transformers<4.54.0 as workaround of
https://github.com/vllm-project/vllm-ascend/issues/2034

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Test locally:
`VLLM_USE_MODELSCOPE=true pytest -sv
tests/e2e/singlecard/test_offline_inference.py::test_models_prompt_logprobs`
- CI passed

- vLLM version: v0.9.2
- vLLM main:
7728dd77bb

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-26 15:43:29 +08:00
Li Wang
bdfb065b5d [1/2/N] Enable pymarkdown and python __init__ for lint system (#2011)
### What this PR does / why we need it?
1. Enable pymarkdown check
2. Enable python `__init__.py` check for vllm and vllm-ascend
3. Make clean code

### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
29c6fbe58c

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-25 22:16:10 +08:00
wangxiyuan
326dcf2576 [Doc] Update support feature (#1828)
The feature support matrix is out of date. This PR refresh the content.

- vLLM version: v0.9.2
- vLLM main:
107111a859

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-23 15:19:15 +08:00
aidoczh
c32eea96b7 [Doc]Add Chinese translation for documentation (#1870)
### What this PR does / why we need it?

This PR adds a complete Chinese translation for the documentation using
PO files and the gettext toolchain. The goal is to make the
documentation more accessible to Chinese-speaking users and help the
community grow.

### Does this PR introduce any user-facing change?

Yes. This PR introduces Chinese documentation, which users can access
alongside the original English documentation. No changes to the core
code or APIs.

### How was this patch tested?

The translated documentation was built locally using the standard
documentation build process (`make html` or `sphinx-build`). I checked
the generated HTML pages to ensure the Chinese content displays
correctly and matches the original structure. No code changes were made,
so no additional code tests are required.

vLLM version: v0.9.2  
vLLM main: vllm-project/vllm@5780121

---

Please review the translation and let me know if any improvements are
needed. I am happy to update the translation based on feedback.

- vLLM version: v0.9.2
- vLLM main:
7ba34b1241

---------

Signed-off-by: aidoczh <aidoczh@163.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-21 11:26:27 +08:00
Mengqing Cao
8cfd257992 [Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681)
### What this PR does / why we need it?
Remove ETP/EP maintained in branch main. We drop this as there is no
relevant scenarios to use ETP now, and we may subsequently advocate
implementing expert tensor parallelism in vLLM to support scenarios
where the expert is needed to be sliced

This is a part of #1422 backport.

Fixes https://github.com/vllm-project/vllm-ascend/issues/1396
https://github.com/vllm-project/vllm-ascend/issues/1154

### Does this PR introduce _any_ user-facing change?
We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in
vllm instead.

### How was this patch tested?
CI passed with new added and existing test.


- vLLM version: v0.9.2
- vLLM main:
fe8a2c544a

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-07-21 09:08:04 +08:00
JohnJan
54f2b31184 [Doc] Add a doc for qwen omni (#1867)
Signed-off-by: wuzhongjian <wuzhongjian_yewu@cmss.chinamobile.com>

### What this PR does / why we need it?
Add FAQ note for qwen omni
Fixes https://github.com/vllm-project/vllm-ascend/issues/1760 issue1



- vLLM version: v0.9.2
- vLLM main:
b9a21e9173
2025-07-20 09:05:41 +08:00
Zhu Yi Lin
538dd357e6 Add graph mode and improve on multi_npu_moge.md (#1849)
### What this PR does / why we need it?
Add graph mode and improve on multi_npu_moge.md

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
CI passed with new existing test.


- vLLM version: v0.9.2
- vLLM main:
5a7fb3ab9e

Signed-off-by: GDzhu01 <809721801@qq.com>
2025-07-17 17:53:37 +08:00
wangxiyuan
eb921d2b6f [Doc] Fix 404 error (#1797)
Fix url 404 error in doc
- vLLM version: v0.9.2
- vLLM main:
9ad0a4588b

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-15 11:52:38 +08:00
Li Wang
afcfe91dfa [Doc] Fix multi node doc (#1783)
### What this PR does / why we need it?

### Does this PR introduce _any_ user-facing change?
Pin docker image to latest release
### How was this patch tested?


- vLLM version: v0.9.2
- vLLM main:
1e9438e0b0

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-14 17:56:57 +08:00
wangxiyuan
3c404de1b1 [Release]Update release note (#1753)
There is still issue with pp in some case. such as aclgraph, ray. Remove
the related doc in release note

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-11 17:58:26 +08:00
wangxiyuan
b5b7e0ecc7 [Doc] Add qwen3 embedding 8b guide (#1734)
1. Add the tutorials for qwen3-embedding-8b
2. Remove VLLM_USE_V1=1  in docs, it's useless any more from 0.9.2


- vLLM version: v0.9.2
- vLLM main:
5923ab9524

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-11 17:40:17 +08:00
wangxiyuan
9c560b009a [Release] Add 0.9.2rc1 release note (#1725)
Add release note for 0.9.2rc1, we'll release soon









- vLLM version: v0.9.2
- vLLM main:
7bd4c37ae7

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-11 17:36:05 +08:00
wangxiyuan
3d1e6a5929 [Doc] Update user doc index (#1581)
Add user doc index to make the user guide more clear
- vLLM version: v0.9.1
- vLLM main:
49e8c7ea25

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-10 14:26:59 +08:00
Li Wang
c7446438a9 [1/N][CI] Move linting system to pre-commits hooks (#1256)
### What this PR does / why we need it?

Follow vllm-project/vllm lint way:
https://github.com/vllm-project/vllm/blob/main/.pre-commit-config.yaml

Enable pre-commit to avoid some low level error  AMAP.

This pr is one step of #1241, The purpose is make linting system more
clear and convenient, on this step, Mainly did the following things:
yapf, actionlint, ruff, typos, isort, mypy, png-lint, signoff-commit,
enforce-import-regex-instead-of-re.

TODO: 
- clang-format(check for csrc with google style)
need clean code, disable for now 
- pymarkdown
need clean code, disable for now 
- shellcheck
need clean code, disable for now 

### Does this PR introduce _any_ user-facing change?

Only developer UX change:

https://vllm-ascend--1256.org.readthedocs.build/en/1256/developer_guide/contributing.html#run-lint-locally

```
pip install -r requirements-lint.txt && pre-commit install
bash format.sh
```

### How was this patch tested?

CI passed with new added/existing test.

Co-authored-by: Yikun [yikunkero@gmail.com](mailto:yikunkero@gmail.com)
Co-authored-by: wangli
[wangli858794774@gmail.com](mailto:wangli858794774@gmail.com)
- vLLM version: v0.9.1
- vLLM main:
5358cce5ff

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-10 14:17:15 +08:00
Yikun Jiang
997f156a51 Use ci_vllm_version when recording vLLM commit (#1689)
### What this PR does / why we need it?
Use ci_vllm_version when recording vllm commit

Followup on https://github.com/vllm-project/vllm-ascend/pull/1623

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Test mannually.
$ python3 docs/source/conf.py | jq .ci_vllm_version | tr -d '"'
v0.9.2
- Test on my local repo: https://github.com/Yikun/vllm-ascend/pull/35

- vLLM version: v0.9.1
- vLLM main:
49e8c7ea25

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-10 11:07:27 +08:00
Li Wang
0c4aa2b4f1 [Doc] Add multi node data parallel doc (#1685)
### What this PR does / why we need it?
 add multi node data parallel doc
### Does this PR introduce _any_ user-facing change?
 add multi node data parallel doc
### How was this patch tested?

- vLLM version: v0.9.1
- vLLM main:
805d62ca88

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-10 09:36:37 +08:00
leo-pony
b4b19ea588 [Doc] Add multi-npu qwen3-MoE-32B Tutorials (#1419)
Signed-off-by: leo-pony <nengjunma@outlook.com>

### What this PR does / why we need it?
Add multi-npu qwen3-MoE-32B Tutorials
Relate RFC: https://github.com/vllm-project/vllm-ascend/issues/1248
- vLLM version: v0.9.1
- vLLM main:
5358cce5ff

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
2025-07-10 09:06:51 +08:00
wangxiyuan
830332ebfc Clean up v0.9.1 code (#1672)
vllm has released 0.9.2. This PR drop 0.9.1 support.

- vLLM version: v0.9.1
- vLLM main:
b942c094e3

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-09 08:52:24 +08:00
Yikun Jiang
e4e9ea02ab Upgrade vLLM version to v0.9.2 (#1652)
### What this PR does / why we need it?

This patch upgrade vLLM version to v0.9.2, this patch didn't remove the
v0.9.1 compatible code to easy review.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- vLLM version: v0.9.1
- vLLM main:
14601f5fba
- Accuracy test with 0.9.2:
https://github.com/vllm-project/vllm-ascend/actions/runs/16121612087

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-08 14:18:17 +08:00
Yikun Jiang
0c1d239df4 Add unit test local cpu guide and enable base testcase (#1566)
### What this PR does / why we need it?
Use Base test and cleanup all manaul patch code
- Cleanup EPLB config to avoid tmp test file
- Use BaseTest with global cache
- Add license
- Add a doc to setup unit test in local env 

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-07-06 10:42:27 +08:00
Angazenn
a5f33590d3 [CORE]initial support for torchair with non-mla backend (#1506)
### What this PR does / why we need it?
This PR supports torchair graph mode with non-mla backend on both 800IA2
and 300I Duo platforms. The main change is to add
`attention_v1_torchair.py` to support specific attention related
operations that are required by torchair.

### Does this PR introduce _any_ user-facing change?
Before this PR, vLLM-Ascend only allows deepseek to use torchair. Now we
can also use it with pangu. Besides, we add a support model list to
control which type of models that can use torchair.

### How was this patch tested?
We have test it with PanguProMoE on both 800IA2 and 300I Duo platforms,
and model generates answer normally.

---------

Signed-off-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: tianyitang <tangtianyi4@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: tianyitang <tangtianyi4@huawei.com>
2025-07-03 22:21:42 +08:00
yupeng
d96da1f00c [DOC] Fix word spelling (#1595)
### What this PR does / why we need it?
Fix word spelling in DOC.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No.

Signed-off-by: paulyu12 <507435917@qq.com>
2025-07-02 21:42:39 +08:00
yupeng
c3c8c9317c [DOC] add LoRA user guide (#1265)
### What this PR does / why we need it?
Add LoRA user guide to DOC. The content refers to [LoRA
Adapters](https://docs.vllm.ai/en/latest/features/lora.html).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No

---------

Signed-off-by: paulyu12 <507435917@qq.com>
2025-07-02 14:41:31 +08:00
leo-pony
53ec583bbb [Docs] Update Altlas 300I series doc and fix CI lint (#1537)
### What this PR does / why we need it?
- Update Altlas 300I series doc: cleanup unused parameters and enable
optimized ops
- Fix code spell CI

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-30 23:34:00 +08:00
Shanshan Shen
ba577dfc52 [Doc] Add Structured Output guide (#1499)
### What this PR does / why we need it?
Add Structured Output guide.


Signed-off-by: shen-shanshan <467638484@qq.com>
2025-06-30 17:21:44 +08:00
Yikun Jiang
e4df0a4395 Add Pangu MoE Pro for 300I series docs (#1516)
### What this PR does / why we need it?
Add Pangu MoE Pro for 300I series docs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-30 13:37:22 +08:00
Yikun Jiang
cad4c693c6 Add Pangu MoE Pro docs (#1512)
### What this PR does / why we need it?
This PR add Pangu MoE Pro 72B docs

[1] https://gitcode.com/ascend-tribe/pangu-pro-moe-model

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-06-30 12:15:33 +08:00
Zhu Yi Lin
b308a7a258 support pangumoe w8a8c8 and docs (#1477)
### What this PR does / why we need it?
support pangu moe w8a8c8

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed with new added test.

Signed-off-by: zhuyilin <809721801@qq.com>
2025-06-28 18:51:07 +08:00
Shanshan Shen
99e685532d [Doc] Add Qwen2.5-VL eager mode doc (#1394)
### What this PR does / why we need it?
Add Qwen2.5-VL eager mode doc.

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-06-28 09:08:51 +08:00