Commit Graph

16 Commits

Author SHA1 Message Date
Chen Chen
6b290acfe1 remove redundant params in mla_preprocess kernel (#3530)
### What this PR does / why we need it?

This pull request removes the redundant parameters `gamma1` and `beta1`
(also named `gamma0`/`beta0` in some places) from the `mla_preprocess`
kernel and its calling hierarchy. The changes are consistent across C++
kernel code, bindings, and Python call sites. The parameters were unused
in the lower-level functions, so their removal is a good cleanup.

### Does this PR introduce _any_ user-facing change?

The python interface of the kernel is affected, and the params of
`gamma0` and `beta0` are not needed.

### How was this patch tested?

The unit-test of the kernel is adapted accordingly.


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: mojave2 <chenchen145@huawei.com>
2025-10-21 19:20:13 +08:00
anon189Ty
248ee7fa11 [Feat]Make full graph mode compalible with MTP (#3276)
### What this PR does / why we need it?
Make the Full Graph mode can run with MTP.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>
2025-10-17 20:19:56 +08:00
lilinsiman
1b424fb7f1 ACLgraph enable: Test cases revisions for all features (#3388)
### What this PR does / why we need it?
This PR revise the test cases of various features on the warehouse which
add the enablement of aclgraph to the test cases.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
2025-10-17 17:15:19 +08:00
zouyida2052
3642b64afc bugfix for mtp with multistream_moe (#3419)
### What this PR does / why we need it?
when infer deepseek mtp layer with multistream_moe, we should pass a
boolean to evaluate this feature and fix bugs when we are in mtp layer

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
2025-10-15 08:59:58 +08:00
wangxiyuan
c73dd8fecb [CI] Fix CI by addressing max_split_size_mb config (#3258)
### What this PR does / why we need it?
Fix CI by addressing max_split_size_mb config

### Does this PR introduce _any_ user-facing change?
No, test onyl

### How was this patch tested?
Full CI passed, espcially eagle one


- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-09-29 14:05:12 +08:00
wangxiyuan
15b8aff582 [CI] Add max_split_size_mb for e2e test to avoid oom (#3252)
### What this PR does / why we need it?
we add a patch for model weight loader to avoid using vLLM weight loader
v2, since v2 will lead unknown issue for torchair. While this patch make
some unknown memory usage problem. To quick fix the problem, let's
expend the `max_split_size_mb` to a larger value to avoid weight load
oom issue.

Further solution is to remove the patch and address weight loader v2
from vLLM.

Closes: https://github.com/vllm-project/vllm-ascend/issues/3251

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?

- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-09-29 09:13:08 +08:00
Icey
2a9d02e080 [Bugfix] eagle and eagle3 spec decode failures and enable e2e test (#2979)
### What this PR does / why we need it?
- Fix the bug https://github.com/vllm-project/vllm-ascend/issues/2978
- Enable e2e test,
- Adapt to scenarios where Speculative tokens are greater than 2,
- Fix the bug that causes Eagle3 inference failures under high
concurrency and improve the acceptance rate of draft models, by
https://github.com/vllm-project/vllm-ascend/pull/2794

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
CI passed with new added/existing test.

Co-authored-by: hukongyi
[hukongyi@cmbchina.com](mailto:hukongyi@cmbchina.com)
Co-authored-by: guanyuzhu
[zhuguanyu@huawei.com](mailto:zhuguanyu@huawei.com)
Co-authored-by: liumail680
[liumail680@163.com](mailto:liumail680@163.com)


- vLLM version: v0.10.2
- vLLM main:
f225ea7dd9

---------

Signed-off-by: Icey <1790571317@qq.com>
2025-09-25 14:39:12 +08:00
xuyexiong
6681dde902 [Feat][Graph] Support MTP for ACL Graph (#2932)
### What this PR does / why we need it?
This PR depends on the merge of #2707 and has adapted the aclgraph
functionality to support MTP.

### How was this patch tested?


- vLLM version: v0.10.2
- vLLM main:
2b85697031

---------

Signed-off-by: xuyexiong <xuyexiong@huawei.com>
2025-09-18 14:05:33 +08:00
1092626063
5b3646ab21 [FEATURE][MTP] Support MTP > 1 (#2708)
### What this PR does / why we need it?
[RFC:Support MTP > 1 for
DeepSeek](https://github.com/vllm-project/vllm-ascend/issues/2745)

- [x] dp1 tp16
- [x] dp4 tp4
- [x] dp2 tp 8
- [x] torchair graph

- vLLM version: v0.10.1.1
- vLLM main:
c9f7081f9c

Signed-off-by: 1092626063 <1092626063@qq.com>
2025-09-05 09:11:22 +08:00
wangxiyuan
24d4dad7b2 [CI] Enable MTP torchair e2e test (#2705)
enable MTP torchair e2e test

- vLLM version: v0.10.1.1
- vLLM main:
ce30dca5c4

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-09-03 08:57:43 +08:00
xuyexiong
214b32a346 [V1][BUGFIX][0.10.1] FIX mtp on main branch (#2632)
### What this PR does / why we need it?
Fix MTP torchair bug caused by torchair refactor and moe refactor

Depends on PRs:
fused moe fix: https://github.com/vllm-project/vllm-ascend/pull/2627 
torchair multi DP fix:
https://github.com/vllm-project/vllm-ascend/pull/2626

### Does this PR introduce _any_ user-facing change?
when dp is enabled, to run mtp online server, need to disable server log
due to the current metrics does not support multi dp
`--disable-log-stats`
### How was this patch tested?


- vLLM version: v0.10.1.1
- vLLM main:
7c8271cd1e

Signed-off-by: xuyexiong <xuyexiong@huawei.com>
2025-09-02 11:12:41 +08:00
wangxiyuan
fef18b60bc Refactor e2e CI (#2276)
Refactor E2E CI to make it clear and faster
1. remove some uesless e2e test
2. remove some uesless function
3. Make sure all test runs with VLLMRunner to avoid oom error
4. Make sure all ops test end with torch.empty_cache to avoid oom error
5. run the test one by one to avoid resource limit error


- vLLM version: v0.10.1.1
- vLLM main:
a344a5aa0a

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-09-02 09:02:22 +08:00
ZhaoJiangJiang
3629bc4431 feat: add mtp ut and fix some bugs (#2453)
### What this PR does / why we need it?
Fix mtp mode ut

### Does this PR introduce _any_ user-facing change?
Nothing

### How was this patch tested?
This can be tested in the same way as a unit test.


- vLLM version: v0.10.0
- vLLM main:
53415653ff

Signed-off-by: 赵江江 <zhaojiangjiang1@h-partners.com>
Co-authored-by: 赵江江 <zhaojiangjiang1@h-partners.com>
2025-08-22 17:09:08 +08:00
wangxiyuan
9260910c8d [CI] Fix broken CI (#2302)
1. disable test_eagle_ccorrectness test, we'll reopen it once oom error
fixed.
2. drop transformers version limit for main, since vLLM rely on
>=4.55.0, see:
65552b476b
3. fix kv_connector_output bug, see:
796bae07c5

- vLLM version: v0.10.0
- vLLM main:
d1af8b7be9

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-08-11 11:22:32 +08:00
wangxiyuan
787010a637 [Test] Remove VLLM_USE_V1 in example and tests (#1733)
V1 is enabled by default, no need to set it by hand now. This PR remove
the useless setting in example and tests

- vLLM version: v0.9.2
- vLLM main:
9ad0a4588b

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-15 12:49:57 +08:00
Mengqing Cao
dd22ac38b2 [CI/UT][Refactor] move e2e spec decode and deepseek acc test to per pr (#1136)
### What this PR does / why we need it?
1. run deepseek acc ut per pr --- multicard CI time increased by 9 min
2. run spec decode e2e test on v1 per pr --- singlecard CI time
increased by 3 min (partly is disabled due to not work now)
~~3. align the output of whether dbo is enabled or not~~
    The generated results with and without dbo cannot be aligned.

https://github.com/vllm-project/vllm-ascend/actions/runs/15822900528/job/44600029405?pr=1136
4. skip V0 mtp test due to failure in
https://github.com/vllm-project/vllm-ascend/actions/runs/16012172833/job/45171988816
5. fix some version conflicts
### How was this patch tested?
CI passed with new added test.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-07-04 18:05:45 +08:00