Commit Graph

97 Commits

Author SHA1 Message Date
Yikun Jiang
96d6fa7c90 [Docker] Fix openEuler image suffix (#586)
### What this PR does / why we need it?
There was a bug when we release v0.8.4rc1 (openEuler image tag was wrong
set to 0.8.4rc1), according doc of docker-meta-action, it should be
append suffix:
```
tags: |
  type=pep440,enable=true,priority=900,prefix=,suffix=,pattern=,value=
```

This patch just fix openEuler image suffix to make pep440 tag rule work.

This patch also remove the cache step because the cache step bring more
than 10mins export, but reduce less time in next trigger.

### Does this PR introduce _any_ user-facing change?
Yes, docker image tag set to right

### How was this patch tested?
I test with in my fork repo by setting default branch:
- release a tag: v0.7.88rc1 (pep440 tag)
- The log show `--label
org.opencontainers.image.version=v0.7.88rc1-openeuler` is right rule


https://github.com/Yikun/vllm-ascend/actions/runs/14560411481/job/40842950165#step:9:205

Related: https://github.com/vllm-project/vllm-ascend/pull/489

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-04-21 08:55:26 +08:00
wangxiyuan
42c7fbb10e [Misc] Fix import error and address nits to make CI happy (#563)
1. Add `vllm_version_is` function to check vllm version.
2. `ensure_kv_transfer_initialized` and `get_kv_transfer_group ` have
been moved to other place in vllm main branch via
3408e47159
, this patch fix the import error.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-18 12:23:32 +08:00
Mengqing Cao
6ee7f5cf71 [SpecDecode] Add spec decode support (#500)
### What this PR does / why we need it?
Backport: https://github.com/vllm-project/vllm-ascend/pull/252
This support speculative decoding in Ascend, including speculating with
a draft model、by matching n-grams in the prompt、using MLP speculators
and using EAGLE based draft models.

Backport: https://github.com/vllm-project/vllm-ascend/pull/423
spec decode MultiStepWorker support TP1DraftModelRunner fully, support
run the draft_model_runner with multi-step prepare on the NPU directly
and support draft_model_runner use MLA.

1. before this pr, `MultiStepWorker` would not step into the branch
using NPU prepare, but only into the branch using CPU prepare (`line 52`
of `vllm_ascend/patch/patch_multi_step_worker.py`). Although this has
`no effect` on the `correct operation` of speculative decoding and the
performance of the two branches is basically the same as of the current
version, I support entering this branch in this PR. In general, there
are two main changes in `patch_multi_step_worker.py`: first, the
`is_cuda_like()` check is removed and the `TP1DraftModelRunner`
rewritten in vllm_ascend is used; second, the
`supports_gpu_multi_step()` function is made to return true on NPU
devices when outer Multi_step_worker could work correct.

3. before this pr, `TP1DraftModelRunner` only supports Attention on NPU,
but not MLA. The relevant adaptation is in
`vllm_ascend/worker/draft_model_runner.py`. Although I don’t know why
the `input_positions` of `model_input.attn_metadata` in vllm-ascend
needs to be added in `execute_model`, it is done in `model_runner.py`,
so I also made corresponding changes. Otherwise, when atten_backend is
MLA, it will prompt that input_positions cannot be found.

4. I commented out two lines in `draft_model_runner.py` in `line118` to
support the scenario of K>1.
  ```
  # lora_mapping=model_input.lora_mapping,
  # lora_requests=model_input.lora_requests,
  ```
I added comments. In the future, when vllm-ascend supports lora feature,
the changes here can be restored.

TODO:
- [ ] revert the patch when the related issues are addressed in vllm

### How was this patch tested?
CI passed with new added test.
- e2e test for medusa proposer:
tests/singlecard/spec_decode/e2e/test_medusa_correctness.py
- e2e test for mlp proposer:
tests/singlecard/spec_decode/e2e/test_mlp_correctness.py
- e2e test for n-gram proposer:
tests/singlecard/spec_decode/e2e/test_ngram_correctness.py

Tests for patched files:
- tests/singlecard/spec_decode/test_dynamic_spec_decode.py
- tests/singlecard/spec_decode/test_multi_step_worker.py
- tests/singlecard/spec_decode/test_ngram_worker.py
- tests/singlecard/spec_decode/test_spec_decode_worker.py

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Co-authored-by: mengwei805 <mengwei25@huawei.com>
2025-04-17 20:16:32 +08:00
hfadzxy
9935d45728 [CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460)
### What this PR does / why we need it?
Add model basic accuracy test(Qwen2.5-0.5B-Instruct)

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
2025-04-17 14:59:56 +08:00
Li Wang
9859e7313f [CI]Add global env to runner (#537)
### What this PR does / why we need it?
- add `HF_TOKEN` as global var to the runner
- add `HF_ENDPOINT` as global var to the runner
- change concurrency group, rely on current pr num

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-04-17 10:08:00 +08:00
wangxiyuan
434749d299 [CI] update 0.8.3 to 0.8.4 (#528)
Update 0.8.3 CI to 0.8.4

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-16 09:26:30 +08:00
Li Wang
13480d1238 [CI]Fix workflow (#532)
### What this PR does / why we need it?
make linux-npu-4 runner run parallel for now


Signed-off-by: wangli <wangli858794774@gmail.com>
2025-04-15 19:55:41 +08:00
wangxiyuan
9c7428b3d5 [CI] enable custom ops build (#466)
### What this PR does / why we need it?
This PR enable custom ops build  by default. 

### Does this PR introduce _any_ user-facing change?

Yes, users now install vllm-ascend from source will trigger custom ops
build step.

### How was this patch tested?
By image build and e2e CI

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-12 10:24:53 +08:00
Icey
d05ea17427 Add openEuler based container image for vLLM Ascend (#489)
### What this PR does / why we need it?

Provide users with openEuler-based vllm images, so modify the quick
start readme

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

There is no need for performing any test.

---------

Signed-off-by: Icey <1790571317@qq.com>
2025-04-10 14:30:49 +08:00
Li Wang
afdbf77483 [CI] Add new runner and enable QwQ multinpu test (#417)
### What this PR does / why we need it?

- Add a new runner to the continuous integration system and keep the
original CI runner until the new runner runs stably
- Add distributed test cases

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

---------

Signed-off-by: wangli <wangli858794774@gmail.com>
2025-04-08 16:52:45 +08:00
Yikun Jiang
adabdeea7f Set numpy < 2.0.0 to resolve numpy VersionConflict (#476)
### What this PR does / why we need it?
vLLM bumps numpy version to 2.x:
8427f70493
, this will cause a
`pip._vendor.pkg_resources.ContextualVersionConflict: (numpy 2.2.4
(/usr/local/python3.10/lib/python3.10/site-packages),
Requirement.parse('numpy==1.26.4'), {'vllm-ascend'})` failure when vllm
ascend install. This PR resolved the issue by:
- Set numpy < 2.0.0 to resolve numpy VersionConflict
- Sync requirements and toml 
- Reorder


### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Closes: https://github.com/vllm-project/vllm-ascend/issues/473

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-04-07 16:07:21 +08:00
Pleaplusone
ce8259975e [core] Support custom ascendc kernels in vllm-ascend (#233)
This PR add custom ascendc kernel rotary_embedding support in
vllm-ascend, related CMakeLists and setuptools is also added in this PR.

Related: https://github.com/vllm-project/vllm-ascend/issues/156

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
2025-04-03 14:52:34 +08:00
dependabot[bot]
78083d405e Bump actions/setup-python from 5.4.0 to 5.5.0 (#440)
Bumps [actions/setup-python](https://github.com/actions/setup-python)
from 5.4.0 to 5.5.0.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-01 14:34:33 +08:00
Mengqing Cao
2dbd763584 [CI] Fix mypy CI (#443)
### What this PR does / why we need it?
Fix CI by updating mypy and pining numpy version

_the modification of model_runner_v1 is just to make CI happy_

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-04-01 09:25:33 +08:00
wangxiyuan
b6499ed97d [CI] Use CI pool (#428)
Use CI pool instead of self-host for e2e test to speed up CI.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-29 12:42:59 +08:00
wangxiyuan
31f29b9f30 [Core] Make V1 work and enable V1 engine test (#389)
1. Make sure the version is string before parse in collect_env
2. Add basic V1 engine test

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-28 19:34:23 +08:00
Zhenyu Zheng
4804b74e95 Update 110-user-story.yml (#402)
Fix a few typos in issue template

Signed-off-by: Zhenyu Zheng <zheng.zhenyu@outlook.com>
2025-03-27 08:58:57 +08:00
Zhenyu Zheng
0b5a9643fd Add an example for user stories (#399)
Add an example for user stories and fix some typo

Add a new section, user story in the docs, to collect user stories of
llvm-ascend, also add an example and the issue template to collect user
story

Signed-off-by: Zhenyu Zheng <zheng.zhenyu@outlook.com>
2025-03-26 16:25:57 +08:00
Mengqing Cao
6295d2e9bc [CI/Build][Doc] upgrade torch-npu to 0320 (#392)
### What this PR does / why we need it?
This pr upgrades torch-npu to 0320, so that #321,
https://github.com/vllm-project/vllm-ascend/issues/267#issuecomment-2745045743
could be fixed, and #372 should be reverted after this pr

### Does this PR introduce _any_ user-facing change?
upgrade torch-npu to 0320

### How was this patch tested?
tested locally with long seq inferencing.

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-03-26 09:04:12 +08:00
Mengqing Cao
8996733307 [CI] fix vllm test (#365)
fix vllm test

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-03-24 16:09:06 +08:00
wangxiyuan
663dca7578 [CI] fix race condition problem (#353)
fix race condition problem

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-03-19 17:04:36 +08:00
Yikun Jiang
18bb8d1f52 Adapt vLLM requirements changes to fix main CI (#279)
### What this PR does / why we need it?
Adapt vLLM requirements changes:
206e2577fa (diff-01ec17406c969585ed075609a2bbf2f2f4fe3e3def36946694abe6d4eb60a6f2)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-09 16:07:45 +08:00
Yikun Jiang
be58d5f3d8 Bump torch_npu version to dev20250308.3 (#276)
### What this PR does / why we need it?
Bump torch_npu version to dev20250308.3 to fix performance regression on
multi-stream case:
e04c580d07
.


### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-09 15:59:15 +08:00
Mengqing Cao
91f7d8115d [CI/Build] Bump torch_npu to dev20250307.3 (#265)
Update torch-npu version to fix torch npu exponential_ accuracy
With this update, the percision issue when setting `temperature > 0` is
fixed.

---------

Signed-off-by: Mengqing Cao <cmq0113@163.com>
2025-03-07 20:34:07 +08:00
Yikun Jiang
35cb7b5234 [CI] Add dispatch job to leverage dynamic devices (#251)
### What this PR does / why we need it?
Add dispatch job to leverage jobs to dynamic devices include 2 stage as
below:

The dispatch job will spend extra about `10s * parallel number + 30s`
time to wait other job launch container and release lock.

- **Stage 1: Acquire lock**
add a dispatch job, this job use lockfile to acquire locks and then get
device number dynamically
- **Stage 2.1: Launch container with dynamic device**
pass the device number via output and start the container job with
dynamic device
- **Stage 2.2: Release lock**
once the job started, release the lock.

In the backend, we use multiple path to setup multiple self host runners
as load balancer:
```
$ pwd
/home/action
$ ll | grep actions
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-01
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-02
drwx------   6 action action 4096 Mar  7 08:55 actions-runner-03
drwx------   6 action action 4096 Mar  7 08:56 actions-runner-04
drwx------   4 action action 4096 Jan 24 22:08 actions-runner-05
drwx------   4 action action 4096 Jan 24 22:08 actions-runner-06
```

```
adduser -G docker action
su action
pip3 install docker prettytable
sudo yum install procmail
```

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
- CI passed
- E2E test manully, triggered 3 jobs in parallel:
- [1st
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711345757/job/38348309297)
dispatch to /dev/davinci2.
- [2nd
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711348739/job/38348316250)
dispatch to /dev/davinci3
- [3rd
job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711351493/job/38348324551)
dispatch to /dev/davinci4

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-07 09:47:13 +08:00
Shanshan Shen
b9f0e25c16 [Misc] Add collect_env.py scripts for bug reporting (#175)
### What this PR does / why we need it?
Add `collect_env.py` scripts from vLLM and remove `nvidia`, `gpu`,
`cuda` related codes, thus users of vllm-ascend can collect their env
info when reporting bugs.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
Run `python collect_env.py` works


Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-03-04 14:14:37 +08:00
Yikun Jiang
ebe14f20cf Recover vllm-ascend dev image (#209)
### What this PR does / why we need it?
Recover vllm-ascend dev image

### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-03-03 09:08:41 +08:00
dependabot[bot]
81dfaae88b Bump docker/setup-buildx-action from 2 to 3 (#191)
Bumps
[docker/setup-buildx-action](https://github.com/docker/setup-buildx-action)
from 2 to 3.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-28 09:06:46 +08:00
dependabot[bot]
a710a7563a Bump docker/setup-qemu-action from 2 to 3 (#192)
Bumps
[docker/setup-qemu-action](https://github.com/docker/setup-qemu-action)
from 2 to 3.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-28 09:06:13 +08:00
dependabot[bot]
a5564ed5d8 Bump actions/setup-python from 5.3.0 to 5.4.0 (#193)
Bumps [actions/setup-python](https://github.com/actions/setup-python)
from 5.3.0 to 5.4.0.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-27 20:05:15 +08:00
Yuanhao Ji
6aed83335c [CI] Add dependabot support and labeler workflow (#162)
Add dependabot support and labeler workflow

---------

Signed-off-by: Yuanhao Ji <jiyuanhao@apache.org>
2025-02-27 19:46:31 +08:00
wangxiyuan
6042c210bc [CI] upgrade to newest pta (#187)
Upgrade to newest torch-npu

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
2025-02-27 16:40:23 +08:00
Mengqing Cao
94483775e1 [CI] fix hf_token (#180)
Fix the bug introduced by #173

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-26 17:29:31 +08:00
Mengqing Cao
78530c0667 [CI/Build] add HF_TOKEN for model downloading (#173)
### What this PR does / why we need it?
Add `HF_TOKEN` for downloading models that requires access rights from
huggingface hub. This will fix the CI error in #123 and #76

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-26 15:35:03 +08:00
Mengqing Cao
3a7882208f [CI] enable test if pytest.ini changes (#151)
enable test if pytest.ini changes

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-24 16:47:05 +08:00
Yikun Jiang
72a43a61d8 [Docs] Add issue template (#113)
### What this PR does / why we need it?
Add issue templates.

Most of templates in this PR are from vllm-project/vllm:
https://github.com/vllm-project/vllm/tree/main/.github/ISSUE_TEMPLATE

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Test on my local repo by setting default branch to ISSUE_TEMPLATE:
https://github.com/Yikun/vllm-ascend/issues

https://github.com/Yikun/vllm-ascend/issues/new/choose

Closes: https://github.com/vllm-project/vllm-ascend/issues/48

---------

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-21 17:20:21 +08:00
wangxiyuan
5f465010de [Core] Cherry pick from 0.7.1 to keep the main code newest (#127)
Cherry pick from 0.7.1 to keep the main code newest

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-21 17:07:37 +08:00
Mengqing Cao
36991b2052 [CI] enable CI on all branch (#124)
Enable CI on all branch.
Installing with the torch-npu-2.5.1.dev20250218 so that we could enable
CI on all branch and prepare for merging 0.7.1-dev to main

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-21 16:16:48 +08:00
wangxiyuan
cff03a4913 [CI] change to quay.io (#102)
change docker registry to quay

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-19 17:04:46 +08:00
wangxiyuan
fafd70e91c [Doc] Update doc to work with release (#85)
1. Update CANN image name
2. Add pta install step
3. update vllm-ascend docker image name to ghcr
4. update quick_start to use vllm-ascend image directly.
5. fix `note` style

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-19 09:51:43 +08:00
Yikun Jiang
bfbfbce184 [CI] Add container image build ci (#64)
### What this PR does / why we need it?
Add container image build ci:
- Enable branch, tag docker image publish
    - branch image: `vllm-ascend:main`, `vllm-ascend:v0.7.1-dev`
    - tag image: `vllm-ascend:v0.7.1rc1`
- Enable PR docker image build check
- other changes:
    - Prepare the `REPO_OWNER` because the ghcr lowerercase required
- Add `Free up disk space` step to avoid `No space left on device` like
https://github.com/vllm-project/vllm-ascend/issues/27
- Setup qemu with image to resolve
https://github.com/docker/setup-qemu-action/issues/198

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
build: CI passed [push
false](https://github.com/vllm-project/vllm-ascend/actions/runs/13347017608/job/37278724158?pr=64)
Note for test case:
1. merge commits ot `main`, `v0.7.1-dev` branch 
 main: https://github.com/Yikun/vllm-ascend/actions/runs/13347238961
--> ghcr.io/yikun/vllm-ascend:main OK
v0.7.1-dev:
https://github.com/Yikun/vllm-ascend/actions/runs/13347229912 -->
ghcr.io/yikun/vllm-ascend:v0.7.1-dev OK

2. create pep440 tag from github release: v0.7.1rc1, v0.7.1,
v0.7.1rc1.dev1 all release has latest
 v0.7.5 --> v0.7.5, latest 
 v0.7.5rc1 --> v0.7.5rc1
 v0.7.5rc1.dev1 --> v0.7.5rc1.dev1
 (no latest, add a todo here) v0.7.5rc1.post1 --> v0.7.5rc1.post1

3. create unknow tag from github release:
 create 0.7.1 on v0.7.1-dev: not trigger ( only prefix v triggerd)

4. create tag from git: 
 also works, `git tag v0.7.99;git push origin v0.7.99` from
publish-image

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-17 09:07:35 +08:00
Yikun Jiang
c1ac822642 [CI] Switch to cann latest version (#63)
### What this PR does / why we need it?
Switch to cann latest version

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-16 13:38:01 +08:00
Yikun Jiang
46977f9f06 [Doc] Add sphinx build for vllm-ascend (#55)
### What this PR does / why we need it?

This patch enables the doc build for vllm-ascend

- Add sphinx build for vllm-ascend
- Enable readthedocs for vllm-ascend
- Fix CI:
- exclude vllm-empty/tests/mistral_tool_use to skip `You need to agree
to share your contact information to access this model` which introduce
in
314cfade02
- Install test req to fix
https://github.com/vllm-project/vllm-ascend/actions/runs/13304112758/job/37151690770:
      ```
      vllm-empty/tests/mistral_tool_use/conftest.py:4: in <module>
          import pytest_asyncio
      E   ModuleNotFoundError: No module named 'pytest_asyncio'
      ```
  - exclude docs PR

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. test locally:
    ```bash
    # Install dependencies.
    pip install -r requirements-docs.txt
    
    # Build the docs and preview
    make clean; make html; python -m http.server -d build/html/
    ```
    
    Launch browser and open http://localhost:8000/.

2. CI passed with preview:
    https://vllm-ascend--55.org.readthedocs.build/en/55/

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-13 18:44:17 +08:00
Yikun Jiang
28d7691361 [FOLLOWUP][Misc] Remove unused mypy config for base_communicator (#45)
### What this PR does / why we need it?
- Remove on communicator mypy to address:
https://github.com/vllm-project/vllm-ascend/pull/24#issuecomment-2647696781
- Add mypy.ini to trigger list

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
2025-02-12 09:17:05 +08:00
wangxiyuan
c59375caff [Misc] version control by setuptools_scm (#21)
make package version control by setuptools_scm to keep the same with
vllm

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-02-10 09:36:09 +08:00
Mengqing Cao
7d9ae22ecb [CI] use pytest.ini to manage vllm native tests (#5)
### What this PR does / why we need it?
Use `pytest.ini` to manage vllm native tests.
This will convert the original test script whitelist to a blacklist to
prevent missing the newly added test scripts of the upstream vLLM.

**note**: _we do **not** manage the test scripts of vLLM-Ascend in
`pytest.ini`, because if we do so, there will be conflicts between vLLM
and vLLM-Ascend's `conftest.py`._
### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with new existing test.

Signed-off-by: MengqingCao <cmq0113@163.com>
2025-02-06 23:57:51 +08:00
Yikun Jiang
d5e7756028 [Core] Init vllm-ascend (#3)
### What this PR does / why we need it?
vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on
the Ascend NPU.

This plugin is the recommended approach for supporting the Ascend
backend within the vLLM community. It adheres to the principles outlined
in the [RFC]: Hardware pluggable, providing a hardware-pluggable
interface that decouples the integration of the Ascend NPU with vLLM.

This patch also include changes to make CI work and use cache speed up
e2e test, including:
1. Change push (post merge ci) and pull_request (pr ci) trigger branch
to main
   2. Make mypy work by ignore base_communicator and clear unused deps
   3. Several improvements for vllm_ascend_test:
     - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins)
- switch `git clone` command to `action/checkout` to speedup checkout
and
     - Enable sv for pytest for better info dump
- Remove network host to resole `docker: conflicting ontions: cannot
attach both user-defined and non-user-definednetwork-modes`, which is a
problem on docker 1.45 but not on 1.39.
4. Adapt MLA decode optimizations:
cabaf4eff3

### Does this PR introduce _any_ user-facing change?
Yes, init the PR.

### How was this patch tested?
- This is the first PR to make ascend NPU work on vLLM. All code is
tested on ascend with vLLM V0 Engine.
- CI passed

---------

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: MengqingCao <cmq0113@163.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: wangli <wangli858794774@gmail.com>
2025-02-05 10:53:12 +08:00