Commit Graph

745 Commits

Author SHA1 Message Date
wangxiyuan
de7649492d [Refactor] cleanup converting_weight_acl_format_format (#2482)
move maybe_converting_weight_acl_format_format to torchair module, it's
only used with 310p+torchair

- vLLM version: v0.10.1.1
- vLLM main:
49ab23b3cc

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-08-25 19:48:55 +08:00
Wang Yixuan
0f81e032f0 [1/N][refactor] torchair fused_moe refactor (#2438)
### What this PR does / why we need it?
Move torchair related fused_moe section into torchair_fused_moe to make
the code clear. Next step we'll remove all torchair related code outside
of torchair_fused_moe .

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
vLLM version: v0.10.0
vLLM main:
08d5f7113a

- vLLM version: v0.10.1.1
- vLLM main:
170e8ea9ea

Signed-off-by: hust17yixuan <303660421@qq.com>
2025-08-25 15:46:10 +08:00
Shanshan Shen
334c44613a [Doc] Update release version info (#2518)
### What this PR does / why we need it?
Update release version info.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.10.1.1
- vLLM main:
712d0f88d8

Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
2025-08-25 15:39:10 +08:00
Shanshan Shen
98c68220c1 [Doc] Update v0.9.1rc3 doc (#2512)
### What this PR does / why we need it?
Update `v0.9.1rc3` doc, which are supplements to
https://github.com/vllm-project/vllm-ascend/pull/2488.

- vLLM version: v0.10.0
- vLLM main:
170e8ea9ea

Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>
2025-08-25 11:39:29 +08:00
Mengqing Cao
4c4ffeebe5 [Doc] update vllm version in ci (#2513)
### What this PR does / why we need it?
update vllm version in ci

- vLLM version: v0.10.0
- vLLM main:
170e8ea9ea

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-25 11:35:37 +08:00
Shanshan Shen
0767d51dd5 [Structured Output][CI] Add test for outlines backend for structured output in CI (#2283)
### What this PR does / why we need it?
Add test for `outlines` backend for structured output in CI.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

Tests have all passed with:

```bash
pytest -sv tests/e2e/singlecard/test_guided_decoding.py
```

- vLLM version: v0.10.0
- vLLM main:
53415653ff

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-08-25 09:59:13 +08:00
Icey
891b2bfe71 Accuracy report formatting (#2279)
### What this PR does / why we need it?
Accuracy report formatting

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.10.0
- vLLM main:
53415653ff

---------

Signed-off-by: Icey <1790571317@qq.com>
2025-08-25 09:39:30 +08:00
Icey
f796e6280b [CustomOp] Register RotaryEmbedding instead of overwrite forward (#2385)
### What this PR does / why we need it?
Register RotaryEmbedding instead of overwrite forward

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with new added/existing test.

- vLLM version: v0.10.0
- vLLM main:
808d2e9aa0

---------

Signed-off-by: Icey <1790571317@qq.com>
Signed-off-by: wxsIcey <1790571317@qq.com>
2025-08-25 09:32:35 +08:00
weichen
950c4b219a [main] refactor alltoallv in fused_moe (#2487)
### What this PR does / why we need it?
Refactor all2all-related fused_experts (both quantized/unquantized) into
TokenDispatcherWithAll2AllV, including dispatch & combine calculation.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
E2E & UT
- vLLM version: v0.10.0
- vLLM main:
65197a5fb3

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
2025-08-23 20:38:17 +08:00
linfeng-yuan
4af5b80606 [Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig (#2434)
### What this PR does / why we need it?
Add configuration check logic for ascend scheduler: if chunked_prefill
is disabled, max_num_batched_tokens couldn't be less than max_model_len,
following vLLM;

### Does this PR introduce _any_ user-facing change?
users cannot set max_num_batched_tokens smaller than max_model_len with
ascend scheduler
### How was this patch tested?
CI and vllm serving passed

- vLLM version: v0.10.0
- vLLM main:
f77a0802b7

Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-08-23 19:39:44 +08:00
ZhaoJiangJiang
3629bc4431 feat: add mtp ut and fix some bugs (#2453)
### What this PR does / why we need it?
Fix mtp mode ut

### Does this PR introduce _any_ user-facing change?
Nothing

### How was this patch tested?
This can be tested in the same way as a unit test.


- vLLM version: v0.10.0
- vLLM main:
53415653ff

Signed-off-by: 赵江江 <zhaojiangjiang1@h-partners.com>
Co-authored-by: 赵江江 <zhaojiangjiang1@h-partners.com>
2025-08-22 17:09:08 +08:00
weiguihua2
dd04a96ee3 [Bugfix] Fix the bug of incorrect precision (#2479)
### What this PR does / why we need it?
Fix the bug of incorrect precision

- vLLM version: v0.10.0
- vLLM main:
53415653ff

---------

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-08-22 17:08:56 +08:00
Shanshan Shen
f0be3eed84 [Doc] Add release note for v0.9.1rc3 (#2488)
### What this PR does / why we need it?

Add release note for `v0.9.1rc3`.

- vLLM version: v0.10.0
- vLLM main:
53415653ff

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-08-22 16:06:29 +08:00
Mengqing Cao
60ac4fb576 [QuickFix] Skip failed ut to recover CI quickly (#2484)
### What this PR does / why we need it?
Skip failed ut to recover CI quickly
related ut:
- `test_embed_models_correctness`: revert me when pooler is adapted with
the latest vllm main
- `test_check_and_update_config_enforce_eager_mode`: revert me when the
occasional failed is fixed

- vLLM version: v0.10.0
- vLLM main:
8896eb72eb

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-22 14:14:51 +08:00
LookAround0301
e9fb895b10 [Doc] Add feature branch long_seq_optimization (#2477)
### What this PR does / why we need it?
Add cp/sp feature branch

- vLLM version: v0.10.0
- vLLM main:
0c6e40bbaa

Signed-off-by: LookAround <lixushi@huawei.com>
2025-08-22 08:53:12 +08:00
Mengqing Cao
b0403f8d8a [CI] fix ci (#2464)
### What this PR does / why we need it?
1. use action/checkout@v5 instead of v4
2. remove dbo test case because there is issue with it and will be
refactored later
3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it
4. fix sampler api changes introduced by
https://github.com/vllm-project/vllm/pull/22387
6. fix qwen3 moe config changes intruoduced by
https://github.com/vllm-project/vllm/pull/20562
7. fix kvcache block changes introduced by
https://github.com/vllm-project/vllm/pull/23262

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.10.0
- vLLM main:
0c6e40bbaa

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-22 07:30:48 +08:00
linfeng-yuan
0ca3f48c90 [2/N][refactor] torchair deepseek mla backend refactor (#2459)
### What this PR does / why we need it?
This PR move current unified mla backend to torchair folder and remove
torchair-related code in attention/mla_v1.py (1.3k -> 0.9k).

 
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Running eager mode with mla backend, and torchair mode with code before
[2445](https://github.com/vllm-project/vllm-ascend/pull/2445)


- vLLM version: v0.10.0
- vLLM main:
f571ff8eb6

Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-08-21 14:02:30 +08:00
Yikun Jiang
67a222c383 [Doc] Add feature branch policy (#2432)
### What this PR does / why we need it?

This patch add the feature branch policy.

After this patch: maintainers are allowed to create a feature branch.
Feature branches are used for collaboration and must include an RFC
link, merge plan and mentor info.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI passed

- vLLM version: v0.10.0
- vLLM main:
7be5d113d8

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-08-21 10:37:21 +08:00
sherie
3fb80ee356 add mlp tp optimze (#2120)
### What this PR does / why we need it?
For dense models, by not applying tensor parallelism (TP) to the
attention module and applying TP to the MLP module, the allreduce
operations in the attention module can be eliminated, thereby reducing
computational overhead. However, this approach increases memory usage,
so the environment variable VLLM_ASCEND_ENABLE_MLP_OPTIMZE is used to
control this optimization.

- vLLM main:
b17109beea

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
2025-08-21 09:22:07 +08:00
yupeng
973a7cfdf0 [DOC] update doc: LoRA with ACLGraph (#2430)
### What this PR does / why we need it?
Update DOC. Guide users to run LoRA with ACLGraph.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No.

- vLLM version: v0.10.0
- vLLM main:
de7b67a023

---------

Signed-off-by: paulyu12 <507435917@qq.com>
2025-08-21 08:55:55 +08:00
weiguihua2
0dca4c6dbd refact runner model v1 (#2461)
refact model runner v1

### What this PR does / why we need it?
1. Separate the execute model logic from the prepare input logic
2. Disassemble the torchchair in model runner v1

- vLLM version: v0.10.0
- vLLM main:
68fcd3fa73

---------

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-08-21 08:54:57 +08:00
Wang Kunpeng
1de16ead8e [main][bugfix] Modify the default value of the enable_shared_pert_dp to false (#2457)
### What this PR does / why we need it?
enable_shared_pert_dp is currently on by default. This optimization is
currently only valid for deepseek series models. The default opening
affects the accuracy of the qwen series models.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
use parameter --additional_config='{"enable_shared_expert_dp": true}'

- vLLM version: v0.10.0
- vLLM main:
d983769c41

Signed-off-by: Wang Kunpeng <1289706727@qq.com>
2025-08-20 20:25:53 +08:00
Wang Kunpeng
c40d4171bc [main][quantization] Adapt to the new format of ds w4a8 weight (#2392)
### What this PR does / why we need it?

The deepseek w4a8 weights we supported before were in mindie-format
format. It uses int8 to represent int4, so the weight size is similar to
w8a8, and we need to do a few extra steps to make vllm-ascend load it
normally.

Now we can directly use the new weight format, which uses two int4 packs
to save the weight, the weight size is reduced, and there is no need to
do many extra operations to directly use it on vllm-ascend, but we are
also compatible with the weights of the previous mindie format.

The weight changes in the new version: 
1. The weight is packed (2 int4 pack to int8)
2. The bias required in the apply method is directly generated by
modelslim

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

Adding ut case in `tests/ut/quantization/test_w4a8_dynamic.py`

#### 1.How to get weights using Modelslim

##### Installation steps

we can use the branch br_release_MindStudio_8.1.RC2_TR5_20260624
git clone -b br_release_MindStudio_8.1.RC2_TR5_20260624
https://gitee.com/ascend/msit.git
cd msit/msmodelslim
bash install.sh

##### Generate w4a8 weights

cd /example/DeepSeek
Command reference: msmodelslim/example/DeepSeek/README.md Execute the
[pre-check](https://gitee.com/ascend/msit/blob/br_release_MindStudio_8.1.RC2_TR5_20260624/msmodelslim/example/DeepSeek/README.md#%E8%BF%90%E8%A1%8C%E5%89%8D%E5%BF%85%E6%A3%80)
and [DeepSeek-R1 w4a8 mix
quantization](https://gitee.com/ascend/msit/blob/br_release_MindStudio_8.1.RC2_TR5_20260624/msmodelslim/example/DeepSeek/README.md#deepseek-r1-w4a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96%E5%89%8D%E4%B8%89%E5%B1%82-mlpw8a8-dynamic-%E9%87%8F%E5%8C%96mla%E5%85%B1%E4%BA%AB%E4%B8%93%E5%AE%B6w8a8%E9%87%8F%E5%8C%96%E8%B7%AF%E7%94%B1%E4%B8%93%E5%AE%B6w4a8-dynamic%E9%87%8F%E5%8C%96)
chapter
Reference command:python3 quant_deepseek_w4a8.py --model_path {Original
weight path} --save_path {Generate weight path}

##### Adapt to vllm-ascend

Modification in `config.json`:`"model_type":deepseekv2` is changed to
`"model_type":deepseek_v3`;

#### 2.How to run w4a8

##### a.How to run eager mode

export VLLM_ASCEND_MLA_PA=1

python -m vllm.entrypoints.openai.api_server --model=$1
--trust-remote-code -tp $2 -dp $3 --enable_expert_parallel
--quantization ascend --port $4 --max-model-len $5 --max-num-seqs $6
--enforce-eager
eg: python -m vllm.entrypoints.openai.api_server
--model=/weightpath/w4a8_4_layer --trust-remote-code -tp 4 -dp 4
--enable_expert_parallel --quantization ascend --port 8002
--max-model-len 5120 --max-num-seqs 128 --enforce-eager

##### b.How to run graph mode

export HCCL_BUFFSIZE=1024

python -m vllm.entrypoints.openai.api_server --model=$1
--trust-remote-code -tp $2 -dp $3 --enable_expert_parallel
--quantization ascend --port $4 --max-model-len $5
--additional_config='{"ascend_scheduler_config":{"enabled":true},"torchair_graph_config":{"enabled":true}}'
eg: python -m vllm.entrypoints.openai.api_server
--model=/weight/dsr1_w4a8_vllm --trust-remote-code -tp 4 -dp 4
--enable_expert_parallel --quantization ascend --port 8002
--max-model-len 5120
--additional_config='{"ascend_scheduler_config":{"enabled":true},"torchair_graph_config":{"enabled":true}}'


- vLLM version: v0.10.0
- vLLM main:
103f1ec8d3

---------

Signed-off-by: Wang Kunpeng <1289706727@qq.com>
2025-08-20 20:25:18 +08:00
wangxiyuan
eccfb715f6 [CI] Fix UT (#2452)
Make UT CI happy 

- vLLM version: v0.10.0
- vLLM main:
d983769c41

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
2025-08-20 16:26:07 +08:00
sherie
3f867ee708 refactor allgather/mc2-related fused_experts (#2369)
### What this PR does / why we need it?
refactor allgather/mc2-related fused_experts

- vLLM version: v0.10.0
- vLLM main:
de7b67a023

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
2025-08-20 14:20:46 +08:00
wangxiyuan
73acdcfc3b [PD] Correct the ip and port env (#2450)
1. rename `VLLM_LLMDD_RPC_PORT` to `VLLM_ASCEND_LLMDD_RPC_PORT` to make
the prefix the same in vllm-ascend
2. enable `VLLM_ASCEND_LLMDD_RPC_IP` env for PD feature.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-08-20 11:39:05 +08:00
Nicholas Tao
7bec1a9b9c qwen3_moe/qwen25 support torchair graph (#2403)
### What this PR does / why we need it?
Added support for the TorchAir graph mode in qwen3_moe and qwen2.5
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
```bash
llm = LLM(
    model=model,
    tensor_parallel_size=GPUs_per_dp_rank,
    enforce_eager=False,
    enable_expert_parallel=True,
    max_model_len=4096,
    max_num_seqs=16,
    trust_remote_code=trust_remote_code,
    gpu_memory_utilization=0.4,
    additional_config={
             "torchair_graph_config": {
                 "enabled": True,
                 "use_cached_graph": False,
                 "graph_batch_sizes_init": False,
                 "graph_batch_sizes": [16]
             },
             "ascend_scheduler_config": {
                 "enabled": True,
                 "chunked_prefill_enabled":True,
             },
             "refresh": True,
    },
)
```

- vLLM version: v0.10.0
- vLLM main:
b87cb97a53

Signed-off-by: taoyuxiang <oui.nicholas.tao@gmail.com>
2025-08-20 11:23:50 +08:00
wangxiyuan
31ae249742 [misc] remove uesless envs (#2448)
Remove the env used for v0 pd feature.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-08-20 10:50:21 +08:00
Mengqing Cao
3a384492e1 [CI] add lint block before running e2e (#2447)
### What this PR does / why we need it?
add lint block before running e2e. follow up
https://github.com/vllm-project/vllm-ascend/pull/2445

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
N/A

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-20 09:53:23 +08:00
Mengqing Cao
1327f9be1c Fix some ci issue and refactor modelrunner (#2445)
### What this PR does / why we need it?
Fix some ci issue and refactor modelrunner

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.

- vLLM version: v0.10.0
- vLLM main:
4d9c61993a

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
Co-authored-by: weiguihua2 <weiguihua2@huawei.com>
2025-08-20 09:01:04 +08:00
Jade Zheng
955411611c Nominate Mengqing Cao as vllm-ascend maintainer (#2433)
I would like to nominate Mengqing Cao (@MengqingCao
https://github.com/MengqingCao) as a maintainer, starting with my +1.

## Reason

Review Quality‌: She has completed [120+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao)
since Feb. 2025, include
[#review-3077842852](https://github.com/vllm-project/vllm-ascend/pull/2088#pullrequestreview-3077842852),
[comment-2990074116](https://github.com/vllm-project/vllm-ascend/pull/1032#issuecomment-2990074116),
[comment-2921063723](https://github.com/vllm-project/vllm-ascend/pull/1013#issuecomment-2921063723)
high quality review.

Sustained and Quality Contributions: She has Deep understanding of
‌vLLM‌ and ‌vLLM Ascend‌ codebases and solid contributions include The
vLLM contributions and help vLLM Ascend release is the main reason I
nominated her:

- vLLM: Things worth mentioning that she completed [28+ PR
contributions](https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+)
in vllm-project/vllm, especially for vLLM platform module to improve
vLLM mult hardware support. She is one of the important co-authors of
[vllm#8054](https://github.com/vllm-project/vllm/pull/8054) and hardware
plugin RFC, this makes vllm-ascend plugin possible.
Community Involvement: She is also very active and involved in [60+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao).

So I think she's a great addition to the vLLM Ascend Maintainer team.

- **Review Quality‌:**

She has completed 120+ reviews since Feb. 2025.

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+is%3Aclosed+commenter%3Amengqingcao+-author%3Amengqingcao,
include
https://github.com/vllm-project/vllm-ascend/pull/2088#pullrequestreview-3077842852,
https://github.com/vllm-project/vllm-ascend/pull/1446#issuecomment-3015166908,
https://github.com/vllm-project/vllm-ascend/pull/1032#issuecomment-2990074116,
https://github.com/vllm-project/vllm-ascend/pull/1013#issuecomment-2921063723
quality review.

- **Sustained Contributions:**

99+ PR merged in vllm-project/vllm-ascend

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged

- **Quality Contribution‌:**

She is one of the important co-authors of
https://github.com/vllm-project/vllm/pull/8054 , this makes vllm-ascend
plugin possible.

Things worth mentioning that she complete 28+ PR contributions in
vllm-project/vllm, especially for vLLM platform module to improve vLLM
mult hardware support:

https://github.com/vllm-project/vllm/pulls?q=is%3Apr+author%3AMengqingCao+is%3Amerged+.

At 2025 Q2, She also lead the [[RFC]: E2E CI test for key
features](https://github.com/vllm-project/vllm-ascend/issues/413) and
[[RFC]: Unit test coverage
improvement](https://github.com/vllm-project/vllm-ascend/issues/1298) to
help vllm ascend improve the coverage.

Her main contributions focus on the adaptation of parallel strategies
and communicator, such as
https://github.com/vllm-project/vllm-ascend/pull/1800,
https://github.com/vllm-project/vllm-ascend/pull/1856.

These contributions are sufficient to prove she has “Deep understanding
of ‌vLLM‌ and ‌vLLM Ascend‌ codebases”

- **Community Involvement‌:**

Involved in 63+ issue reviewer
https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20-author%3AMengqingCao%20commenter%3AMengqingCao

She led the v0.10.1 release as release manager


- vLLM version: v0.10.0
- vLLM main:
78dba404ad

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
2025-08-19 14:13:54 +08:00
xleoken
d91c6daf89 [improve] Remove redundant parentheses in pangu_moe.py (#2081)
### What this PR does / why we need it?

Remove redundant parentheses in pangu_moe.py.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Local.

- vLLM version: v0.10.0
- vLLM main:
099c046463

Signed-off-by: xleoken <xleoken@163.com>
2025-08-19 11:00:18 +08:00
wangxiyuan
6335fe39ea Nominate ApsarasX as vllm-ascend maintainer (#2419)
I would like to nominate Wengang Chen (@ApsarasX
https://github.com/ApsarasX) as a maintainer, starting with my +1.

## Reason
Review Quality‌: He focuses on the vLLM Ascend Core module review with
100+ high quality review, such as [#2326
(comment)](https://github.com/vllm-project/vllm-ascend/pull/2326#discussion_r2268509365),
[#768
(comment)](https://github.com/vllm-project/vllm-ascend/pull/768#discussion_r2075278516),
[#2312
(comment)](https://github.com/vllm-project/vllm-ascend/pull/2312#issuecomment-3174677159),
[#2268
(comment)](https://github.com/vllm-project/vllm-ascend/pull/2268#discussion_r2260920578),
[#2192
(comment)](https://github.com/vllm-project/vllm-ascend/pull/2192#issuecomment-3149414586),
[#2156
(comment)](https://github.com/vllm-project/vllm-ascend/pull/2156#discussion_r2249096673).
This helped vLLM Ascend v0.9.x and v0.10.x to be released with high
quality.

Sustained and Quality Contributions: He has a very good habit of sharing
his design ideas, development process, performance test results, such as
[#966](https://github.com/vllm-project/vllm-ascend/pull/966), he
contributed [many
PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Amerged+),
valuable bugfixes and also perf improvements.

Community Involvement: Active involved in community discussion, he is
collaborative and helps the users solve problems, involved in [120+ PR
and
issues](https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX).
He is also the speaker of [vLLM Beijing
Meetup](https://mp.weixin.qq.com/s/7n8OYNrCC_I9SJaybHA_-Q).

So I think he's a great addition to the vLLM Ascend Maintainer team.

- Review Quality‌:
108+ PR with valuable review
https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3AApsarasX
with many valuable review, like 

https://github.com/vllm-project/vllm-ascend/pull/2326#discussion_r2268509365

https://github.com/vllm-project/vllm-ascend/pull/768#discussion_r2075278516

https://github.com/vllm-project/vllm-ascend/pull/2312#issuecomment-3174677159

https://github.com/vllm-project/vllm-ascend/pull/2268#discussion_r2260920578

https://github.com/vllm-project/vllm-ascend/pull/2192#issuecomment-3149414586

https://github.com/vllm-project/vllm-ascend/pull/2156#discussion_r2249096673

-  Sustained and Major Contributions
https://github.com/vllm-project/vllm-ascend/pulls/ApsarasX

-  Quality Contribution‌:

https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Aclosed
Good quality with well documents
[Perf] Refactor tensor disposal logic to reduce memory usage
https://github.com/vllm-project/vllm-ascend/pull/966

- Community Involvement‌: 
7 issue:

https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20author%3AApsarasX

- 120+ PR and issue:

https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-08-19 10:44:35 +08:00
Shanshan Shen
83e0f41408 [3/N][Refactor] Move torchair_attention to torchair dir (#2017)
### What this PR does / why we need it?

1. Move `torchair_attention` to `torchair` dir.
2. Make `AscendAttentionTorchairBackend` extend `AscendAttentionBackend`
to reduce duplicate methods.
3. Make `AscendTorchairMetadata` extend `AscendMetadata` to reduce
duplicate properties.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.10.0
- vLLM main:
0933f9d518

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-08-19 10:25:22 +08:00
xleoken
2a763b8326 [Bug] Fix bug in test_chunked.py (#1992)
### What this PR does / why we need it?

1. Remove the return statement, it will always skip following logic.

2. Update `deepseek` to `Qwen2.5-Instruct` for OOM in github e2e test
env.

3. Fix the comparison logic

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
Local Test.


- vLLM version: v0.10.0
- vLLM main:
0933f9d518

Signed-off-by: xleoken <xleoken@163.com>
2025-08-19 10:23:47 +08:00
G.O.D
27d038dc66 fix doc typo (#2407)
fix doc typo

- vLLM version: v0.10.0
- vLLM main:
5f5664b3e4

---------

Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-19 09:10:01 +08:00
Pleaplusone
3f4a358b14 [Bugfix] Fix custom op register issue (#2409)
### What this PR does / why we need it?
Our current code register the custom ops inside the platform
intialization phase. however, when a new process started by creating a
worker, the former patch will lose it effect on the custom ops and lead
to fallback to the native pass wrote in vllm. This PR move the patch
code to the worker to make sure the custom op patch worker as our
expected.

### Does this PR introduce _any_ user-facing change?
No

- vLLM version: v0.10.0
- vLLM main:
8ea0c2753a

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-08-19 09:09:43 +08:00
liuchenbing
3648d18e67 Add Custom Kernels For LoRA Performance (#2325)
### What this PR does / why we need it?
Add two custom operators (sgmv_shrink and sgmv_expand) to address the
performance issues of LoRA. Meanwhile, enable the graph mode for LoRA
operators to enter ACL, so as to improve the model inference
performance.
### Does this PR introduce _any_ user-facing change?
      no user-facing change
### How was this patch tested?
Based on the actual test of the QWen2.5 7B model using vllm-ascend
version v0.9.2.rc1, in acl graph mode, the TTFT, TPOT and throughput
have increased by about 100%.

Signed-off-by: liuchn <909698896@qq.com>

- vLLM version: v0.10.0
- vLLM main:
1f83e7d849

---------

Signed-off-by: liuchn <909698896@qq.com>
Co-authored-by: liuchn <909698896@qq.com>
2025-08-19 09:09:11 +08:00
dependabot[bot]
8fb50a4248 Bump actions/checkout from 4 to 5 (#2420)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5.

- vLLM version: v0.10.0
- vLLM main:
5f5664b3e4

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-19 08:54:56 +08:00
TaoYu Chen
9e7c168d99 Add ModelRunner_prepare_inputs doc (#1493)
### What this PR does / why we need it?
To help more developers quickly get started with vLLM, we need to write
clear and easy-to-understand code documentation and technical
interpretations. This will effectively lower the learning curve, attract
more excellent contributors, and collectively build a better developer
community.

Add ModelRunner_prepare_inputs doc

### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
Pass CI


- vLLM version: v0.10.0
- vLLM main:
4be02a3776

---------

Signed-off-by: ChenTaoyu-SJTU <ctynb@qq.com>
2025-08-18 15:41:24 +08:00
linfeng-yuan
3fc31ee1cb [1/N][refactor] torchair deepseek modeling refactor (#2384)
### What this PR does / why we need it?

Move torchair related model arch into torchair moduel to make the code
clear. Next step we'll remove all torchair related code outside of
torchair moduel.

### Does this PR introduce _any_ user-facing change?
No.

- vLLM version: v0.10.0
- vLLM main:
08d5f7113a

Signed-off-by: linfeng-yuan <1102311262@qq.com>
2025-08-18 15:00:37 +08:00
Pleaplusone
19fdc9a3f0 [Bugfix] Fix header include issue in rope (#2397)
### What this PR does / why we need it?
vLLM-Ascend's rope implementaion include several header file that are
not supposed to be included by outside users. Current implementation may
break when canntoolkits update, this PR remove those not compatible file
includes to guarantee the safety of upgrading cann toolkits.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Tested by rope unittest

- vLLM version: v0.10.0
- vLLM main:
3e6dd40016

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-08-18 14:33:38 +08:00
Chao Lei
03ca2b26ca [P/D] Mooncake Connector for v1 distributed (#1568)
### What this PR does / why we need it?
This PR adopt Mooncake TransferEngine for kv cache register and
pull_blocks style disaggregate prefill implementation.

### Does this PR introduce any user-facing change?
No

### Dependencies
1. Cann Dependencies
Using Mooncake TransferEngine with Ascend Transport requires CANN
version 8.2.RC1 or higher.(see detail
Mooncake[#502](https://github.com/kvcache-ai/Mooncake/pull/502))

2. vllm-ascend
This PR depends on changes introduced by #950 (modifications to
`model_runner_v1`) and #1361 (updates to `schedule`), both of which have
been merged into the `v0.9.1-dev` branch and are expected to land in
`main` shortly.

### How was this patch tested?


- vLLM version: v0.10.0
- vLLM main:
1c859a1387

---------

Signed-off-by: leichao.lc <leichao139636@163.com>
Co-authored-by: jianzs <zheng.shoujian@outlook.com>
Co-authored-by: zzy-ContiLearn <1831242919@qq.com>
Co-authored-by: fems14 <1804143737@qq.com>
Co-authored-by: Dreamerleader <2270923832@qq.com>
Co-authored-by: chris668899 <15105191595@126.com>
Co-authored-by: Pz1116 <zpbzpb123123@gmail.com>
2025-08-18 14:30:07 +08:00
CaveNightingale
2bb7e55022 [Bugfix][PD]fix non-working disaggregated prefill (#2374)
### What this PR does / why we need it?

Mainline vLLM fixes its disaggregated prefill in
https://github.com/vllm-project/vllm/pull/22598 . But it is still not
working in vllm-ascend.
To be concrete, decoder instances crash before vllm's fix and hang after
vllm's fix in ascend devices.
This patch allows disaggregated prefill to work.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Qwen3-0.6B 1P1D tp=1 dp=1


- vLLM version: v0.10.0
- vLLM main:
0fe85087a9

---------

Signed-off-by: CaveNightingale <cavenightingale@foxmail.com>
2025-08-15 16:59:52 +08:00
22dimensions
1b40665548 [Misc] remove unused file (cache.py) (#2377)
### What this PR does / why we need it?
cache.py only contains a function that will never be called, so remove
it.

### Does this PR introduce _any_ user-facing change?
No

- vLLM version: v0.10.0
- vLLM main:
f1f0d2fab8

Signed-off-by: 22dimensions <waitingwind@foxmail.com>
2025-08-15 10:27:43 +08:00
Mengqing Cao
61866b8ac6 [Quickfix] update CachedRequestState as NewRequestData changed (#2367)
### What this PR does / why we need it?
1. update `CachedRequestState` as `NewRequestData` changed in
https://github.com/vllm-project/vllm/pull/22570
2. drop maintenance of vllm v0.10.0 in the branch main

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.10.0
- vLLM main:
92ff41abea

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-08-15 07:35:27 +08:00
Li Wang
2ad7e1251e [Doc] Fix quant documentation to make it reproducible (#2277)
### What this PR does / why we need it?
Fixed the expression of msit for code clone

- vLLM version: v0.10.0
- vLLM main:
afa5b7ca0b

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-08-14 17:19:47 +08:00
Icey
c721ae6042 [CustomOp] Register RMSNorm instead of overwrite forward_oot (#2284)
### What this PR does / why we need it?
Use function CustomOp.register_oot to achieve the customop registery
```
from vllm.model_executor.custom_op import CustomOp
CustomOp.register_oot(_decorated_op_cls=AscendRMSNorm, name="RMSNorm")
```

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with new added/existing test.

- vLLM version: v0.10.0
- vLLM main:
afa5b7ca0b

---------

Signed-off-by: Icey <1790571317@qq.com>
2025-08-14 17:18:30 +08:00
shiyuan680
e14f2ef669 refactor select_experts of moe module (#2150)
### What this PR does / why we need it?
this pr refactor select_experts of moe module
i merge implementations of quantitative and non-quantitative method in a
new class
use such as vllm like ExpertsSelector.select_experts
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
test in qwen3-moe and all ut.

- vLLM version: v0.10.0
- vLLM main:
e18859298d

Signed-off-by: yangcheng <yangcheng104@huawei.com>
Co-authored-by: yangcheng (AJ) <y00806874@china.huawei.com>
2025-08-14 11:50:53 +08:00
Shanshan Shen
103654ccd6 [Misc] Remove redundant imported envs, using envs_ascend instead (#2193)
### What this PR does / why we need it?
Remove redundant imported `envs`, using `envs_ascend` instead.

```python
import vllm.envs as envs_vllm
import vllm_ascend.envs as envs_ascend
```

- vLLM version: v0.10.0
- vLLM main:
71683ca6f6

---------

Signed-off-by: shen-shanshan <467638484@qq.com>
2025-08-14 09:33:39 +08:00