Commit Graph

1148 Commits

Author SHA1 Message Date
wangx700
24d6314718 [Bugfix] fix sleepmode level2 e2e test (#4019)
### What this PR does / why we need it?

enable sleepmode level2 e2e test and add the check logic to ensure the
nz is not enabled.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

use e2e tests


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: wangx700 <wangxin700@huawei.com>
2025-11-08 14:11:55 +08:00
offline893
f7ca3bc0fa [CI]Fix eplb ci. (#4052)
### What this PR does / why we need it?
This pr fixes ci on eplb

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: offline0806 <3337230449@qq.com>
Co-authored-by: offline0806 <3337230449@qq.com>
2025-11-07 23:53:35 +08:00
drslark
23b785fdfb [Feat] Adapted mtp function to Qwen3-next (#3918)
### What this PR does / why we need it?

Adapts mtp function to Qwen3-next.

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: drslark <slarksblood@qq.com>
2025-11-07 16:39:03 +08:00
lilinsiman
22286fc67d [UT] Add new ut case for aclgraph in auto enable (#4031)
### What this PR does / why we need it?
add new ut case for aclgraph in auto enable

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2025-11-07 10:39:11 +08:00
Li Wang
259eb25f88 [CI] Quick fix mooncake for nightly-ci (#4028)
### What this PR does / why we need it?
Since we have upgraded to CANN 8.3rc1, we will no longer use the
privately maintained Mooncake repository, but instead use the official
release released by Mooncake:
https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2 .

Next step: this is only a temporary solution. We will integrate mooncake
into the vllm-ascend base image later for easier use. see
https://github.com/vllm-project/vllm-ascend/pull/3989
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-06 18:46:00 +08:00
jiangyunfan1
34b278a339 [TEST]Update nightly acc test standard (#4032)
### What this PR does / why we need it?
This PR updates the acc test standard for some cases, we need it to
better maintain acc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
by running the test

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-11-06 16:58:38 +08:00
weiguihua2
2eebe1dc0a [feat]decode convert bsnd to tnd and fix bug when pcp and dcp (#3980)
### What this PR does / why we need it?
1、in attention_v1 module, convert bsnd t0 tnd when pcp and dcp
2、fix tochair bug: service startup problem

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-11-06 14:58:24 +08:00
Liziqi-77
25b24c02ea [Feat](Mooncake) Supports multiple input suffixes for global_segment_size (#3690)
### What this PR does / why we need it?
- global_segment_size and local_buffer_size use constants for unified
management.
- Newly added support for input formats ending with GB, MB, KB, and B,
while being compatible with existing input methods.

### Does this PR introduce _any_ user-facing change?
- Users can use new input methods
- The documentation has also been modified

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: 李子琦 <liziqi_ing@163.com>
2025-11-06 14:48:15 +08:00
zxr2333
b206e831e9 [P/D]Make kv-transfer env variable take effect & Fix load-balance proxy (#3981)
### What this PR does / why we need it?
Make kv-transfer env variable take effect and Fix load-balance proxy.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By CI.


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: liziyu <liziyu16@huawei.com>
Signed-off-by: nwpu-zxr <zhouxuerong2@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
2025-11-06 12:02:47 +08:00
XiaoxinWang
738bf2b720 support qwen3-next full_decode_only mode. (#3949)
### What this PR does / why we need it?
support qwen3-next full_decode_only mode. 
bs=1, max_token=1024
| branch| tps| e2e time|
| --- | --- | --- |
|piecewise  |3.06  | 8.15 |
|fulldecodeonly | 7.2 | 3.47 |

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
2025-11-05 08:46:05 +08:00
zhangxinyuehfad
49e6983b3b [Test] Add accuracy test for qwen3-30b-a3b-w8a8 (#3807)
### What this PR does / why we need it?
Add accuracy test for qwen3-30b-a3b-w8a8
This PR depends on https://github.com/vllm-project/vllm-ascend/pull/3799

### How was this patch tested?
qwen3-30b-a3b-w8a8 accuarcy test ok:

https://github.com/vllm-project/vllm-ascend/actions/runs/19062045267/job/54443732877?pr=3807
- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-11-04 18:56:31 +08:00
realliujiaxu
bedf223771 [Perf] move quant before allgather in Allgather EP (#3420)
### What this PR does / why we need it?
move quant before allgather in Allgather EP, rely on
https://github.com/vllm-project/vllm-ascend/pull/3334

Deepseek R1 W8A8 performance on A2 with
`HCCL_ALGO="level0:NA;level1:pipeline"`:
| Seq length | Mean TTFT (ms) main | Mean TTFT (ms)  this PR |
|----------|----------|----------|
| 4k   |  375.21  | 364.99   |
| 16k  | 1465.23   | 1421.75  |
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
2025-11-04 16:49:58 +08:00
jiangyunfan1
44b58b8665 [TEST]Add full graph for multimodal nightly tests (#3968)
### What this PR does / why we need it?
This PR adds full graph for multimodal nightly test, we need to maintain
this senario

### How was this patch tested?
by running the test
- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-11-04 16:47:48 +08:00
ZengSilong
dc1a6cb503 [Test]Add accuracy test for multiple models (#3823)
### What this PR does / why we need it?
Add accuracy test for multiple models:
- Meta_Llama_3.1_8B_Instruct
- Qwen2.5-Omni-7B
- Qwen3-VL-8B-Instruct

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: MrZ20 <2609716663@qq.com>
2025-11-04 14:46:39 +08:00
zhangxinyuehfad
646fbac7a9 [Test] Add accuracy test for qwen3-8b-w8a8 (#3799)
### What this PR does / why we need it?
Add accuracy test for qwen3-8b-w8a8

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-11-04 09:23:11 +08:00
wangxiyuan
cc2cd42ad3 Upgrade CANN to 8.3.rc1 (#3945)
### What this PR does / why we need it?
This PR upgrade CANN from 8.2rc1 to 8.3rc1 and remove the CANN version
check logic.

TODO: we notice that UT runs failed with CANN 8.3 image. So the base
image for UT is still 8.2. We'll fix it later.


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-03 20:21:07 +08:00
CodeCat
49d74785c4 [Test] Add new e2e test use deepseek-v2-lite in ge graph mode (#3937)
### What this PR does / why we need it?
The current test cases lack end-to-end (e2e) testing for the
deepseek-v2-lite network in ge graph mode.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: CodeNine-CJ <chenjian343@huawei.com>
2025-11-03 20:10:01 +08:00
Li Wang
8f222f21f1 [CI][Nightly] Fix mooncake build (#3958)
### What this PR does / why we need it?
Fix https://github.com/vllm-project/vllm-ascend/pull/3943

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-03 20:07:47 +08:00
1Fire4
0b9b6d79fe [Feat][UT] Support Deepseekv32 FULL_DECODE_ONLY mode and add unit test of sfa_v1 (#3763)
### What this PR does / why we need it?
- Add support for DeepSeek v3.2 in FULL_DECODE_ONLY mode.
- Add unit test for sfa_v1.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
2025-11-03 10:02:47 +08:00
Li Wang
d0cc9c1203 [CI][Nightly] Correct the commit hash available for mooncake (#3943)
### What this PR does / why we need it?
Because the previous commit hash was accidentally deleted or
overwritten. This patch correct the commit hash available for
https://github.com/AscendTransport/Mooncake to make nightly ci happy
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-11-01 21:52:16 +08:00
wangxiyuan
fcc9a0eaeb Update torch-npu version to 2.7.1 (#3896)
### What this PR does / why we need it?
Upgrade torch-npu to the official release version 2.7.1


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-10-31 17:16:31 +08:00
Canlin Guo
f99762eb25 [E2E][MM] Add e2e tests for InternVL model (#3796)
### What this PR does / why we need it?

As a validation for #3664, add end-to-end tests to monitor the InternVL
model and ensure its continuous proper operation. This PR is only for
single-card. So the models that have more parameters than 8B like 78B
are needed to test using multi-cards.
 

### Does this PR introduce _any_ user-facing change?

None.

### How was this patch tested?

`pytest -sv tests/e2e/singlecard/multi-modal/test_internvl.py`


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
2025-10-31 15:42:47 +08:00
lilinsiman
1f486b2dd1 [Test] Add new test model for aclgraph single_request (#3888)
### What this PR does / why we need it?
add new test model for aclgraph single_request

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2025-10-31 11:23:13 +08:00
lilinsiman
35a913cf1e add new e2e tests case for aclgraph memory (#3879)
### What this PR does / why we need it?
add new e2e tests case for aclgraph memory

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2025-10-31 09:16:52 +08:00
Li Wang
eb0a2ee2d0 [CI] Optimize nightly CI (#3898)
### What this PR does / why we need it?
This patch mainly fix the the problem of not being able to determine the
exit status of the pod's entrypoint script and some other tiny
optimizations:
1. Shorten wait for server timeout
2. fix typo
3. fix the issue of ais_bench failing to correctly access the proxy URL
in a PD separation scenario.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-30 23:42:20 +08:00
wangxiaoteng888
2c291bc63f [bugfix] layerwise D first plan (#3866)
### What this PR does / why we need it?
Refactored the layerwise code to send to the D node first, preventing
P-node hangs due to communication timeouts when DP > 1.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By ci

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
Signed-off-by: liziyu <liziyu16@huawei.com>
Co-authored-by: liziyu <liziyu16@huawei.com>
2025-10-30 22:20:34 +08:00
jiangyunfan1
655a229455 [TEST]Add MALPO for aclgraph in nightly test (#3894)
### What this PR does / why we need it?
This PR adds MALPO for deepseek aclgraph, we need to test it nightly
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running the test

- vLLM version: v0.11.0
- vLLM main:
83f478bb19

---------

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-10-30 18:25:54 +08:00
Song Zhixin
216fc0e8e4 [feature] Prompt Embeddings Support for v1 Engine (#3026)
### What this PR does / why we need it?
this PR based on
[19746](https://github.com/vllm-project/vllm/issues/19746), support
Prompt Embeddings for v1 engine on NPU

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

```python
python examples/prompt_embed_inference.py
```


- vLLM version: v0.11.0
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: jesse <szxfml@gmail.com>
2025-10-30 17:15:57 +08:00
xuyexiong
eff3e5fc6f [FEAT] Refactor spec decode to support efficient padded speculation (#3528)
### What this PR does / why we need it?
1. Refactor the file `mtp_proposer.py`, splits torchair related codes
into `mtp_torchair_proposer.py`
2. According to https://github.com/vllm-project/vllm/pull/24539,
implements padded speculative decoding as described in
https://github.com/vllm-project/vllm/issues/21984.
### Does this PR introduce _any_ user-facing change?
User can use `disable_padded_drafter_batch` to disable/enable padded
speculation, default is `False`.
offline example:
```
speculative_config={"method": "deepseek_mtp", "num_speculative_tokens": 1, "disable_padded_drafter_batch": False}
```

### How was this patch tested?

- [x] egaer with pad/unpad:
- [x] aclgraph with pad/unpad
- [x] torchair with pad/unpad

performance test of deepseek-r1 with tp16、dp1
aclgraph with pad ITL: 168ms
aclgraph with unpad ITL: 169ms
original: 178ms


- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19

---------

Signed-off-by: xuyexiong <xuyexiong@huawei.com>
2025-10-30 16:53:05 +08:00
Meihan-chen
67dd3a4581 [UT] fix skip ut test for test_utils (#3803)
### What this PR does / why we need it?
[UT] fix ut test for test_utils that
https://github.com/vllm-project/vllm-ascend/pull/3612 skipped.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
vLLM version: v0.11.0rc3
vLLM main:
17c540a993

- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19

---------

Signed-off-by: Meihan-chen <jcccx.cmh@gmail.com>
2025-10-30 15:52:53 +08:00
offline893
14ca1e5cb2 [CI]Fix oom of deepseek-eplb nigtly test. (#3884)
### What this PR does / why we need it?
Fix oom of deepseek-eplb nigtly test

- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19

---------

Signed-off-by: offline0806 <3337230449@qq.com>
Co-authored-by: offline0806 <3337230449@qq.com>
2025-10-30 10:18:07 +08:00
baxingpiaochong
d6ef3df3b3 [Bugfix]fix_mulit_connector_bug (#3332)
### What this PR does / why we need it?
When using multi connector, the multi connector does not define
get_finished_count, which will cause the kv cache to be released
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19

---------

Signed-off-by: baxingpiaochong <771405853@qq.com>
2025-10-29 23:23:06 +08:00
offline893
5f176ca992 [CI]Fix eplb nightly tests. (#3863)
### What this PR does / why we need it?

Fix eplb nightly tests.
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19

---------

Signed-off-by: offline0806 <3337230449@qq.com>
Co-authored-by: offline0806 <3337230449@qq.com>
2025-10-29 23:06:05 +08:00
Li Wang
4a2ab13743 [CI] Optimize nightly CI (#3858)
### What this PR does / why we need it?
This patch optimize nightly CI:
1. Bug fixes ais_bench get None repo_type error
2. Fix A2 install kubectl error with arm arch
3. Fix the multi_node CI unable to determine whether the job was
successful error
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
83f478bb19

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-29 22:30:19 +08:00
realliujiaxu
74191864b7 [Perf] Delete redundant operations in model_runner and forward_context (#3677)
### What this PR does / why we need it?

Remove redundant operations from `model_runner` and `forward_context`.
This optimization can significantly reduce the idle time (bubble) before
decoding when running models with small parameter counts (e.g.,
Qwen/Qwen2.5-0.5B).

Testing on 800I A2, bubble is reduced from 3.8ms to 2.8ms :
Before
<img width="1655" height="696" alt="image"
src="https://github.com/user-attachments/assets/d7608e52-2438-46dd-8fc9-391fd6274495"
/>

After
<img width="1607" height="774" alt="image"
src="https://github.com/user-attachments/assets/56daf081-2dba-4d2e-99d4-e055187d9806"
/>

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
2025-10-29 15:59:55 +08:00
weichen
0d1859af08 [Bugfix] [MoE] fix error in deepseek when using allgather (#3824)
### What this PR does / why we need it?
After refactoring vllm_ascend/models and FusedMoE, we are unable to pass
`gate` from deepseekv2.py to `AscendFusedMoE.forward`, which will result
in error when running deepseek v3/r1 with allgather.
Hence, this pr removes `gate` related computations from FusedMoE module
in eager/aclgraph mode.
### Does this PR introduce _any_ user-facing change?
`rm_router_logits` is deprecated in eager/aclgraph.
### How was this patch tested?
e2e & ut

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
2025-10-29 14:51:39 +08:00
Mengqing Cao
900086fdc6 [HybridKV][Bugfix] Fix Hybrid kvcache sharing bug in same attention type (#3760)
### What this PR does / why we need it?
Part of https://github.com/vllm-project/vllm-ascend/pull/3106
Fix Hybrid kvcache sharing bug in same attention type
Change the `shared_by` logic so that the same attention spec could share
the same buffer instead of allocating more hbm.
After this pr, kvcache memory saved 50% in qwen3-next compared with
before (`self_attn:linear_attn=1:3` in an `attn_group`), and
`gpu_memory_utilization` could increase to `0.8` on Qwen3-Next when
running on A2 64G/card with tp4

<img width="2833" height="1540" alt="image"
src="https://github.com/user-attachments/assets/2a91fa99-fb0f-447c-9e8b-acd587890fbe"
/>

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
Test pass with the latest e2e test case on qwen3-next

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-10-29 14:18:52 +08:00
jiangyunfan1
e56b0017a3 [TEST]Add aisbench log and A2 cases (#3841)
### What this PR does / why we need it?
This PR adds 2 more A2 caces which we need to test daily. It also
enhances the logging for aisbench test failures to improve issues
identification
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
2025-10-28 23:33:15 +08:00
Li Wang
90ae114569 [CI] Fix nightly CI (#3821)
### What this PR does / why we need it?
This patch fix the nightly CI runs
[failure](https://github.com/vllm-project/vllm-ascend/actions/runs/18848144365)

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?


- vLLM version: v0.11.0rc3
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.1

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-28 20:40:03 +08:00
Li Wang
f846bd20e4 [CI] Add multi-node test case for a2 (#3805)
### What this PR does / why we need it?
This patch add multi-node test case for a2
### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-27 23:10:17 +08:00
jiangyunfan1
9030106a14 [TEST]Add 2P1D multi node cases for nightly test (#3764)
### What this PR does / why we need it?
This PR adds the 2P1D multi node func/acc/perf test cases, we need test
them daily
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
by running the test

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
2025-10-27 23:09:15 +08:00
Li Wang
60ee4af6d0 [CI] Add custom op to nightly (#3765)
### What this PR does / why we need it?
1. Add custom op to nightly tests, fix
https://github.com/vllm-project/vllm-ascend/pull/3665
2. Correctly pass github secrets when using workflow_call, see
https://docs.github.com/en/actions/how-tos/reuse-automations/reuse-workflows
3. Fix the single node mutual cancellation issue

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-10-27 14:07:03 +08:00
weiguihua2
4312a92a4f [feat]dcp pcp support aclgraph (#3731)
### What this PR does / why we need it?
dcp pcp support  full aclgraph, including mla attention_v1

- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
2025-10-27 09:58:23 +08:00
ck-hw-1018
7572939b94 add qwq testcase (#3757)
### What this PR does / why we need it?
This PR adds a qwq case for nightly test for qwen-qwq on A3 ,we need
test them daily

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
by running the test


- vLLM version: v0.11.0rc3
- vLLM main:
c9461e05a4

---------

Signed-off-by: ckhw <cuikai1@huawei.com>
2025-10-25 17:11:35 +08:00
zzzzwwjj
e5676fc36e [main] remove dbo code (#3712)
### What this PR does / why we need it?
Remove codes of dbo.
Currently, vLLM has supported dbo with pr:
https://github.com/vllm-project/vllm/pull/23693.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993

Signed-off-by: zzzzwwjj <1183291235@qq.com>
2025-10-25 15:53:01 +08:00
Icey
d9cdc65854 Upgrade to new vllm commit (#3719)
### What this PR does / why we need it?
Upgrade to new vllm commit:
c9461e05a4

- Fix many imports, caused by
https://github.com/vllm-project/vllm/pull/26908
- Fix import ```sha256```, caused by
https://github.com/vllm-project/vllm/pull/27169
- Remove ```SchedulerConfig.send_delta_data```, caused by
https://github.com/vllm-project/vllm/pull/27142
- Fix ```FusedMoE``` because of dual stream execution, caused by
https://github.com/vllm-project/vllm/pull/26440

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with new added/existing test.


- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
2025-10-25 15:36:32 +08:00
HuaJiaHeng
11f75883be [Test] add test for prefix cache feature of deepseek (#3733)
### What this PR does / why we need it?
This PR adds a prefix cache case for nightly test for
DeepSeek-r1-0528-W8A8 on A3, we need test them daily.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the test

- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993

---------

Signed-off-by: root <root@hostname-2pbfv.foreman.pxe>
Co-authored-by: root <root@hostname-2pbfv.foreman.pxe>
2025-10-25 14:08:15 +08:00
weichen
63c363d3de [Refactor] [MoE] Rename moe-related classes & files (#3646)
### What this PR does / why we need it?
1. Rename common_fused_moe.py to fused_moe.py.
2. Rename fused_moe_prepare_and_finalize.py / FusedMoEPrepareAndFinalize
to prepare_finalize.py / PrepareAndFinalize.
3. Rename vllm_ascend/ops/moe to vllm_ascend/ops/fused_moe.
4. Move vllm_ascend/ops/fused_moe.py to
vllm_ascend/ops/fused_moe/fused_moe.py
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
e2e & ut

- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993

Signed-off-by: Pr0Wh1teGivee <calvin_zhu0210@outlook.com>
2025-10-25 11:22:03 +08:00
zhangxinyuehfad
8f6f967028 [Test] Add e2e test and accuracy test for Qwen3-Next-80B-A3B-Instruct (#3450)
### What this PR does / why we need it?

Add e2e test and accuracy test for Qwen3-Next-80B-A3B-Instruct

### How was this patch tested?
accuracy test:
https://github.com/vllm-project/vllm-ascend/actions/runs/18771221544/job/53556027634?pr=3450
ci test:
https://github.com/vllm-project/vllm-ascend/actions/runs/18771221530/job/53556027614?pr=3450
<img width="1703" height="562" alt="image"
src="https://github.com/user-attachments/assets/973b6cfa-8240-41e3-893a-5024ff8d0693"
/>



- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-10-25 10:57:56 +08:00
whx
d5609e2c48 [BugFix] Comment out newly added vlm e2e. (#3736)
This PR comments out newly added vlm e2e test of ascend scheduler
scenario because I found that when running in multi-batch this will
stuck. Need to add this back after dealing with this issue.
- vLLM version: v0.11.0rc3
- vLLM main:
17c540a993

Signed-off-by: whx-sjtu <2952154980@qq.com>
2025-10-25 10:34:59 +08:00