### What this PR does / why we need it?
[Feat] Supports Aclgraph for bge-m3
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
```
pytest -s tests/e2e/singlecard/test_embedding.py
pytest -s tests/e2e/singlecard/test_embedding_aclgraph.py
```
to start an online server with bs 10, each batch's seq length=8192, we
set --max-num-batched-tokens=8192*10 to ensure encoder is not chunked:
```
vllm serve /home/data/bge-m3 --max_model_len 1024 --served-model-name "bge-m3" --task embed --host 0.0.0.0 --port 9095 --max-num-batched-tokens 81920 --compilation-config '{"cudagraph_capture_sizes":[8192, 10240, 20480, 40960, 81920]}'
```
For bs10, each batch's seq length=8192, QPS is improved from 85 to 104,
which is a 22% improvement, lots of host bound is reduced.
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
---------
Signed-off-by: xuyexiong <xuyexiong@huawei.com>
Co-authored-by: wangyongjun <1104133197@qq.com>
### What this PR does / why we need it?
Make the test_pipeline_parallel take effect in full test of CI.
### Does this PR introduce _any_ user-facing change?
NA
### How was this patch tested?
NA
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
---------
Signed-off-by: leo-pony <nengjunma@outlook.com>
### What this PR does / why we need it?
1. Enable tests/e2e/multicard/test_external_launcher.py
2. Add e2e test for sleep mode in level2
### Does this PR introduce _any_ user-facing change?
not involved
### How was this patch tested?
CI passed with existing test.
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
Signed-off-by: huangxialu <huangxialu1@huawei.com>
Co-authored-by: Shangwei-Li <lishangwei2@huawei.com>
### What this PR does / why we need it?
This pr purpose to add multi-node test, on the first step, add
`deepseek-v3` dp+tp+ep test
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
we notice that torch npu 0919 doesn't work. This PR revert related
change which rely on 0919 version.
Revert PR: #3295#3205#3102
Related: #3353
- vLLM version: v0.11.0
### What this PR does / why we need it?
Calculate in advance the workspace memory size needed for the
PagedAttention operator to avoid deadlocks during resource cleanup. This
PR requires torch_npu version 0920 or newer.
### How was this patch tested?
- vLLM version: v0.11.0
---------
Signed-off-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
Co-authored-by: wangxiaoxin-sherie <wangxiaoxin7@huawei.com>
### What this PR does / why we need it?
1. clean up v0.10.2 support in ut and e2e test
2. remove v0.11.0 period job, we're at v0.11.0 now.
3. remove uesless patch for deepseek v3.2. They have been done in vLLM
already.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
There are 3 step to upgrade vllm-ascend to newest vllm. We'll create 3
PR
- [x] Upgrade vllm to v0.11.0 to make CI happy first .
- [ ] Move deepseek v3.2 to vllm way
- [ ] Then we'll add a new PR to add vllm main support.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: v0.11.0
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Bump version to v0.11.0rc2 and prepare vLLM Ascend v0.11.0rc0
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0
---------
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Fix CI by addressing max_split_size_mb config
### Does this PR introduce _any_ user-facing change?
No, test onyl
### How was this patch tested?
Full CI passed, espcially eagle one
- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Add vLLM 0.11.0 release hourly job to monitor release branch changes
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
- vLLM version: v0.10.2
- vLLM main:
https://github.com/vllm-project/vllm/commit/releases/v0.11.0
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
- Pin vLLM commit to releases/v0.11.0 branch.
- Fix the break change by vLLM commit
d4d9899860
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
- vLLM version: v0.10.2
- vLLM main:
17b4c6685c
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
LoRA e2e test uses ilama-3.2-1B model. It uses transformers.py model
files. Its self-attention layer names end with "\*.attn", not
"\*.self_attn".
There are some other model attention layer names end with "*.attn", such
as baichuan.py, bert.py.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
pytest -sv tests/e2e/singlecard/test_ilama_lora.py
pytest -sv tests/e2e/multicard/test_ilama_lora_tp2.py
- vLLM version: v0.10.2
- vLLM main:
17b4c6685c
---------
Signed-off-by: paulyu12 <507435917@qq.com>
### What this PR does / why we need it?
Upgrade vLLM to newest commit
- Fix the aclgraph doesn't work problem, caused by
24fab45d96
- Fix PoolerOutput import error, caused by
755ed7b05b
- Fix the aclgraph weight load error to keep the same with torchair fix.
4492e3a554
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
All test should pass
- vLLM version: v0.10.2
- vLLM main:
52d0cb8458
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
`ready` label now is used for trigger full e2e test now. If a PR is
ready and merge conflict then, no need to drop the ready label.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Just a github action change. No need for function test.
- vLLM version: v0.10.2
- vLLM main:
52d0cb8458
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Upgrade vLLM to newest commit.
1. Remove the useless func get_state_cls, it has been removed from vLLM
already.
e6750d0b18
2. Fix ut broken by
6160ba4151
- vLLM version: v0.10.2
- vLLM main:
b1068903fd
---------
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Bump vLLM commit hash to
f225ea7dd9
### How was this patch tested?
- vLLM version: v0.10.2
- vLLM main:
5aeb925452
---------
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it?
Follow up `UniformTypeKVCacheSpecs` changes introduced by
https://github.com/vllm-project/vllm/pull/25101, which support different
hidden size in uniform type kvcache specs
This also fix the CI issue about `TypeError: AttentionGroup.__init__()
missing 1 required positional argument: 'kv_cache_spec'`
### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
Tests passed with exsiting e2e tests.
- vLLM version: v0.10.2
- vLLM main:
c60e6137f0
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it?
Followup on https://github.com/vllm-project/vllm-ascend/pull/3064
1. should limit vllm version to the same hash with mypy
2. fix the vllm version bug for e2e light test.
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
CI passed
- vLLM version: v0.10.2
- vLLM main:
c60e6137f0
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
1. Refactor ci to reuse base workflow and enable main 2 hours trigger
job:
- Extract e2e test in to _e2e_test.yaml
- Reuse _e2e_test in light / full job
- Enable main 2 hours trigger job
2. Rename e2e test to ascend test to make sure action display label
3. Re-enable ut coverage which was failed since
5bcb4c1528
and disable on
6d8bc38c7b
### Does this PR introduce _any_ user-facing change?
Only developer behavior changes:
- Every job trigger full test with vllm release and hash
- Run full job per 2 hours with vllm main
- e2e light test (30 mins): `lint` (6mins) ---> ut (10mins) --->
`v0.10.2 + main / 4 jobs` (15mins)
- e2e full test (1.5h): `ready label` ---> `v0.10.2 + main / 4 jobs`,
about 1.5h
- schedule test: 2hours ---> `v0.10.2 + main / 4 jobs`, about 1.5h
### How was this patch tested?
CI passed
- vLLM version: v0.10.2
- vLLM main:
c60e6137f0
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Bump main to
c60e6137f0
- Updated imports in `vllm.config` to
`vllm.config.model`(aed16879a9)
https://github.com/vllm-project/vllm/pull/25252
- Refactored `vllm_ascend/sample/sampler.py` to use string values for
`logprobs_mode` instead of the `LogprobsMode` enum, simplifying logprobs
mode handling and improving compatibility with recent vLLM changes
(aed16879a9)
https://github.com/vllm-project/vllm/pull/25252
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
- vLLM version: v0.10.2
- vLLM main:
6d8246aaff
---------
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Fix VocabParallelEmbedding UT
### How was this patch tested?
CI passed with new added/existing test.
- vLLM version: main
- vLLM main:
f592b3174b
---------
Signed-off-by: Icey <1790571317@qq.com>
### What this PR does / why we need it?
Bump vLLM version to v0.10.2
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
- vLLM version: v0.10.2rc3
- vLLM main:
15b8fef453
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
This reverts commit 339fceb89c.
### Does this PR introduce _any_ user-facing change?
Yes, use 8.2rc1 image by default
### How was this patch tested?
CI passed
- vLLM version: v0.10.2rc2
- vLLM main:
cfa3234a5b
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Enable push trigger for image job
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
Followup on https://github.com/vllm-project/vllm-ascend/pull/2864
- vLLM version: v0.10.2rc2
- vLLM main:
89e08d6d18
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Upgrade CANN version to 8.3.rc1.alpha001
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.10.2rc2
- vLLM main:
89e08d6d18
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Upgrade vLLM version to 0.10.2rc2
### Does this PR introduce _any_ user-facing change?
Yes, image will use 0.10.2rc2 vLLM
### How was this patch tested?
- vLLM version: main
- vLLM main:
f17c075884
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
- Enable label-based image test and use free runner to run lint
- soft revert
26f388ba08
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: main
- vLLM main:
404c85ca72
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
This PR prefetchs the weight of mlp layers in Qwen Dense Models to
optimize the performance in Decode phase mainly.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI passed with new added/existing test.
- vLLM version: main
- vLLM main:
a1213fae5f
Signed-off-by: rjg-lyh <1318825571@qq.com>
Co-authored-by: Shuming19 <313093131@qq.com>
### What this PR does / why we need it?
LLMdatadist connector adapt the distributed KV aggregation for the main
branch. Change the P node from returning "finish sending" only when TP0
responds to returning "finish sending" as soon as each NPU receives it.
The D node will send a finish receive signal to the corresponding tp
rank of the P node.
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
gsm8k test
2*A3 1P 1D
P: dp2 tp8 D:dp 4 tp4
P: dp2 tp8 D:dp 2 tp8
- vLLM version: main
- vLLM main:
cc99baf14d
Signed-off-by: liziyu <liziyu16@huawei.com>
### What this PR does / why we need it?
Update pre_commit runner
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
- vLLM version: main
- vLLM main:
0ae43dbf8c
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
### What this PR does / why we need it?
Remove compatibility maintenance for vllm v0.10.1 and v0.10.1.1
### Does this PR introduce _any_ user-facing change?
branch main of vllm-ascend will not be compatible with vllm v0.10.1 and
v0.10.1.1
### How was this patch tested?
CI passed with existing test.
- vLLM version: v0.10.1.1
- vLLM main:
6fb2788163
---------
Signed-off-by: MengqingCao <cmq0113@163.com>
1. Only run light e2e test before the PR is `ready` to reduce CI time.
2. Run full test once the PR is labled `ready` and `ready for test`
3. Run lint job on self host CPU container to avoid waiting much.
- vLLM version: v0.10.1.1
- vLLM main:
6910b56da2
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
### What this PR does / why we need it?
Remove git .extraheader and fecth all commtis in
/vllm-workspace/vllm-ascend
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
Closes: https://github.com/vllm-project/vllm-ascend/issues/2735
- vLLM version: v0.10.1.1
- vLLM main:
51d5e9be7d
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
### What this PR does / why we need it?
Fix accuracy issue on prefix caching with AscendScheduler
### How was this patch tested?
CI passed with `test_prefix_cache_with_ascend_scheduler`
- vLLM version: v0.10.1.1
- vLLM main:
6997a25ac6
---------
Signed-off-by: MengqingCao <cmq0113@163.com>