xc-llm-ascend

Author	SHA1	Message	Date
Yikun Jiang	96d6fa7c90	[Docker] Fix openEuler image suffix (#586 ) ### What this PR does / why we need it? There was a bug when we release v0.8.4rc1 (openEuler image tag was wrong set to 0.8.4rc1), according doc of docker-meta-action, it should be append suffix: ``` tags: \| type=pep440,enable=true,priority=900,prefix=,suffix=,pattern=,value= ``` This patch just fix openEuler image suffix to make pep440 tag rule work. This patch also remove the cache step because the cache step bring more than 10mins export, but reduce less time in next trigger. ### Does this PR introduce _any_ user-facing change? Yes, docker image tag set to right ### How was this patch tested? I test with in my fork repo by setting default branch: - release a tag: v0.7.88rc1 (pep440 tag) - The log show `--label org.opencontainers.image.version=v0.7.88rc1-openeuler` is right rule https://github.com/Yikun/vllm-ascend/actions/runs/14560411481/job/40842950165#step:9:205 Related: https://github.com/vllm-project/vllm-ascend/pull/489 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-21 08:55:26 +08:00
wangxiyuan	42c7fbb10e	[Misc] Fix import error and address nits to make CI happy (#563 ) 1. Add `vllm_version_is` function to check vllm version. 2. `ensure_kv_transfer_initialized` and `get_kv_transfer_group ` have been moved to other place in vllm main branch via `3408e47159` , this patch fix the import error. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-18 12:23:32 +08:00
Mengqing Cao	6ee7f5cf71	[SpecDecode] Add spec decode support (#500 ) ### What this PR does / why we need it? Backport: https://github.com/vllm-project/vllm-ascend/pull/252 This support speculative decoding in Ascend, including speculating with a draft model、by matching n-grams in the prompt、using MLP speculators and using EAGLE based draft models. Backport: https://github.com/vllm-project/vllm-ascend/pull/423 spec decode MultiStepWorker support TP1DraftModelRunner fully, support run the draft_model_runner with multi-step prepare on the NPU directly and support draft_model_runner use MLA. 1. before this pr, `MultiStepWorker` would not step into the branch using NPU prepare, but only into the branch using CPU prepare (`line 52` of `vllm_ascend/patch/patch_multi_step_worker.py`). Although this has `no effect` on the `correct operation` of speculative decoding and the performance of the two branches is basically the same as of the current version, I support entering this branch in this PR. In general, there are two main changes in `patch_multi_step_worker.py`: first, the `is_cuda_like()` check is removed and the `TP1DraftModelRunner` rewritten in vllm_ascend is used; second, the `supports_gpu_multi_step()` function is made to return true on NPU devices when outer Multi_step_worker could work correct. 3. before this pr, `TP1DraftModelRunner` only supports Attention on NPU, but not MLA. The relevant adaptation is in `vllm_ascend/worker/draft_model_runner.py`. Although I don’t know why the `input_positions` of `model_input.attn_metadata` in vllm-ascend needs to be added in `execute_model`, it is done in `model_runner.py`, so I also made corresponding changes. Otherwise, when atten_backend is MLA, it will prompt that input_positions cannot be found. 4. I commented out two lines in `draft_model_runner.py` in `line118` to support the scenario of K>1. ``` # lora_mapping=model_input.lora_mapping, # lora_requests=model_input.lora_requests, ``` I added comments. In the future, when vllm-ascend supports lora feature, the changes here can be restored. TODO： - [ ] revert the patch when the related issues are addressed in vllm ### How was this patch tested? CI passed with new added test. - e2e test for medusa proposer: tests/singlecard/spec_decode/e2e/test_medusa_correctness.py - e2e test for mlp proposer: tests/singlecard/spec_decode/e2e/test_mlp_correctness.py - e2e test for n-gram proposer: tests/singlecard/spec_decode/e2e/test_ngram_correctness.py Tests for patched files: - tests/singlecard/spec_decode/test_dynamic_spec_decode.py - tests/singlecard/spec_decode/test_multi_step_worker.py - tests/singlecard/spec_decode/test_ngram_worker.py - tests/singlecard/spec_decode/test_spec_decode_worker.py --------- Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: mengwei805 <mengwei25@huawei.com>	2025-04-17 20:16:32 +08:00
hfadzxy	9935d45728	[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 ) ### What this PR does / why we need it? Add model basic accuracy test(Qwen2.5-0.5B-Instruct) Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-04-17 14:59:56 +08:00
Li Wang	9859e7313f	[CI]Add global env to runner (#537 ) ### What this PR does / why we need it? - add `HF_TOKEN` as global var to the runner - add `HF_ENDPOINT` as global var to the runner - change concurrency group, rely on current pr num --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-17 10:08:00 +08:00
wangxiyuan	434749d299	[CI] update 0.8.3 to 0.8.4 (#528 ) Update 0.8.3 CI to 0.8.4 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-16 09:26:30 +08:00
Li Wang	13480d1238	[CI]Fix workflow (#532 ) ### What this PR does / why we need it? make linux-npu-4 runner run parallel for now Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-15 19:55:41 +08:00
wangxiyuan	9c7428b3d5	[CI] enable custom ops build (#466 ) ### What this PR does / why we need it? This PR enable custom ops build by default. ### Does this PR introduce _any_ user-facing change? Yes, users now install vllm-ascend from source will trigger custom ops build step. ### How was this patch tested? By image build and e2e CI --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-12 10:24:53 +08:00
Icey	d05ea17427	Add openEuler based container image for vLLM Ascend (#489 ) ### What this PR does / why we need it? Provide users with openEuler-based vllm images, so modify the quick start readme ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? There is no need for performing any test. --------- Signed-off-by: Icey <1790571317@qq.com>	2025-04-10 14:30:49 +08:00
Li Wang	afdbf77483	[CI] Add new runner and enable QwQ multinpu test (#417 ) ### What this PR does / why we need it? - Add a new runner to the continuous integration system and keep the original CI runner until the new runner runs stably - Add distributed test cases ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-08 16:52:45 +08:00
Yikun Jiang	adabdeea7f	Set numpy < 2.0.0 to resolve numpy VersionConflict (#476 ) ### What this PR does / why we need it? vLLM bumps numpy version to 2.x: `8427f70493` , this will cause a `pip._vendor.pkg_resources.ContextualVersionConflict: (numpy 2.2.4 (/usr/local/python3.10/lib/python3.10/site-packages), Requirement.parse('numpy==1.26.4'), {'vllm-ascend'})` failure when vllm ascend install. This PR resolved the issue by: - Set numpy < 2.0.0 to resolve numpy VersionConflict - Sync requirements and toml - Reorder ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes: https://github.com/vllm-project/vllm-ascend/issues/473 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-07 16:07:21 +08:00
Pleaplusone	ce8259975e	[core] Support custom ascendc kernels in vllm-ascend (#233 ) This PR add custom ascendc kernel rotary_embedding support in vllm-ascend, related CMakeLists and setuptools is also added in this PR. Related: https://github.com/vllm-project/vllm-ascend/issues/156 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>	2025-04-03 14:52:34 +08:00
dependabot[bot]	78083d405e	Bump actions/setup-python from 5.4.0 to 5.5.0 (#440 ) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.4.0 to 5.5.0. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-04-01 14:34:33 +08:00
Mengqing Cao	2dbd763584	[CI] Fix mypy CI (#443 ) ### What this PR does / why we need it? Fix CI by updating mypy and pining numpy version _the modification of model_runner_v1 is just to make CI happy_ ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed Signed-off-by: MengqingCao <cmq0113@163.com>	2025-04-01 09:25:33 +08:00
wangxiyuan	b6499ed97d	[CI] Use CI pool (#428 ) Use CI pool instead of self-host for e2e test to speed up CI. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-29 12:42:59 +08:00
wangxiyuan	31f29b9f30	[Core] Make V1 work and enable V1 engine test (#389 ) 1. Make sure the version is string before parse in collect_env 2. Add basic V1 engine test Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-28 19:34:23 +08:00
Zhenyu Zheng	4804b74e95	Update 110-user-story.yml (#402 ) Fix a few typos in issue template Signed-off-by: Zhenyu Zheng <zheng.zhenyu@outlook.com>	2025-03-27 08:58:57 +08:00
Zhenyu Zheng	0b5a9643fd	Add an example for user stories (#399 ) Add an example for user stories and fix some typo Add a new section, user story in the docs, to collect user stories of llvm-ascend, also add an example and the issue template to collect user story Signed-off-by: Zhenyu Zheng <zheng.zhenyu@outlook.com>	2025-03-26 16:25:57 +08:00
Mengqing Cao	6295d2e9bc	[CI/Build][Doc] upgrade torch-npu to 0320 (#392 ) ### What this PR does / why we need it? This pr upgrades torch-npu to 0320, so that #321, https://github.com/vllm-project/vllm-ascend/issues/267#issuecomment-2745045743 could be fixed, and #372 should be reverted after this pr ### Does this PR introduce _any_ user-facing change? upgrade torch-npu to 0320 ### How was this patch tested? tested locally with long seq inferencing. --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-03-26 09:04:12 +08:00
Mengqing Cao	8996733307	[CI] fix vllm test (#365 ) fix vllm test Signed-off-by: MengqingCao <cmq0113@163.com>	2025-03-24 16:09:06 +08:00
wangxiyuan	663dca7578	[CI] fix race condition problem (#353 ) fix race condition problem Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-19 17:04:36 +08:00
Yikun Jiang	18bb8d1f52	Adapt vLLM requirements changes to fix main CI (#279 ) ### What this PR does / why we need it? Adapt vLLM requirements changes: `206e2577fa (diff-01ec17406c969585ed075609a2bbf2f2f4fe3e3def36946694abe6d4eb60a6f2)` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-09 16:07:45 +08:00
Yikun Jiang	be58d5f3d8	Bump torch_npu version to dev20250308.3 (#276 ) ### What this PR does / why we need it? Bump torch_npu version to dev20250308.3 to fix performance regression on multi-stream case: `e04c580d07` . ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-09 15:59:15 +08:00
Mengqing Cao	91f7d8115d	[CI/Build] Bump torch_npu to dev20250307.3 (#265 ) Update torch-npu version to fix torch npu exponential_ accuracy With this update, the percision issue when setting `temperature > 0` is fixed. --------- Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-03-07 20:34:07 +08:00
Yikun Jiang	35cb7b5234	[CI] Add dispatch job to leverage dynamic devices (#251 ) ### What this PR does / why we need it? Add dispatch job to leverage jobs to dynamic devices include 2 stage as below: The dispatch job will spend extra about `10s * parallel number + 30s` time to wait other job launch container and release lock. - Stage 1: Acquire lock add a dispatch job, this job use lockfile to acquire locks and then get device number dynamically - Stage 2.1: Launch container with dynamic device pass the device number via output and start the container job with dynamic device - Stage 2.2: Release lock once the job started, release the lock. In the backend, we use multiple path to setup multiple self host runners as load balancer: ``` $ pwd /home/action $ ll \| grep actions drwx------ 6 action action 4096 Mar 7 08:55 actions-runner-01 drwx------ 6 action action 4096 Mar 7 08:55 actions-runner-02 drwx------ 6 action action 4096 Mar 7 08:55 actions-runner-03 drwx------ 6 action action 4096 Mar 7 08:56 actions-runner-04 drwx------ 4 action action 4096 Jan 24 22:08 actions-runner-05 drwx------ 4 action action 4096 Jan 24 22:08 actions-runner-06 ``` ``` adduser -G docker action su action pip3 install docker prettytable sudo yum install procmail ``` ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? - CI passed - E2E test manully, triggered 3 jobs in parallel: - [1st job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711345757/job/38348309297) dispatch to /dev/davinci2. - [2nd job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711348739/job/38348316250) dispatch to /dev/davinci3 - [3rd job](https://github.com/vllm-project/vllm-ascend/actions/runs/13711351493/job/38348324551) dispatch to /dev/davinci4 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-07 09:47:13 +08:00
Shanshan Shen	b9f0e25c16	[Misc] Add collect_env.py scripts for bug reporting (#175 ) ### What this PR does / why we need it? Add `collect_env.py` scripts from vLLM and remove `nvidia`, `gpu`, `cuda` related codes, thus users of vllm-ascend can collect their env info when reporting bugs. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? Run `python collect_env.py` works Signed-off-by: Shanshan Shen <467638484@qq.com>	2025-03-04 14:14:37 +08:00
Yikun Jiang	ebe14f20cf	Recover vllm-ascend dev image (#209 ) ### What this PR does / why we need it? Recover vllm-ascend dev image ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-03 09:08:41 +08:00
dependabot[bot]	81dfaae88b	Bump docker/setup-buildx-action from 2 to 3 (#191 ) Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 2 to 3. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-28 09:06:46 +08:00
dependabot[bot]	a710a7563a	Bump docker/setup-qemu-action from 2 to 3 (#192 ) Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 2 to 3. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-28 09:06:13 +08:00
dependabot[bot]	a5564ed5d8	Bump actions/setup-python from 5.3.0 to 5.4.0 (#193 ) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.3.0 to 5.4.0. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-27 20:05:15 +08:00
Yuanhao Ji	6aed83335c	[CI] Add dependabot support and labeler workflow (#162 ) Add dependabot support and labeler workflow --------- Signed-off-by: Yuanhao Ji <jiyuanhao@apache.org>	2025-02-27 19:46:31 +08:00
wangxiyuan	6042c210bc	[CI] upgrade to newest pta (#187 ) Upgrade to newest torch-npu Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: angazenn <zengyanjia@huawei.com>	2025-02-27 16:40:23 +08:00
Mengqing Cao	94483775e1	[CI] fix hf_token (#180 ) Fix the bug introduced by #173 Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-26 17:29:31 +08:00
Mengqing Cao	78530c0667	[CI/Build] add HF_TOKEN for model downloading (#173 ) ### What this PR does / why we need it? Add `HF_TOKEN` for downloading models that requires access rights from huggingface hub. This will fix the CI error in #123 and #76 Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-26 15:35:03 +08:00
Mengqing Cao	3a7882208f	[CI] enable test if pytest.ini changes (#151 ) enable test if pytest.ini changes Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-24 16:47:05 +08:00
Yikun Jiang	72a43a61d8	[Docs] Add issue template (#113 ) ### What this PR does / why we need it? Add issue templates. Most of templates in this PR are from vllm-project/vllm: https://github.com/vllm-project/vllm/tree/main/.github/ISSUE_TEMPLATE ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test on my local repo by setting default branch to ISSUE_TEMPLATE: https://github.com/Yikun/vllm-ascend/issues https://github.com/Yikun/vllm-ascend/issues/new/choose Closes: https://github.com/vllm-project/vllm-ascend/issues/48 --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-21 17:20:21 +08:00
wangxiyuan	5f465010de	[Core] Cherry pick from 0.7.1 to keep the main code newest (#127 ) Cherry pick from 0.7.1 to keep the main code newest Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-21 17:07:37 +08:00
Mengqing Cao	36991b2052	[CI] enable CI on all branch (#124 ) Enable CI on all branch. Installing with the torch-npu-2.5.1.dev20250218 so that we could enable CI on all branch and prepare for merging 0.7.1-dev to main --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-21 16:16:48 +08:00
wangxiyuan	cff03a4913	[CI] change to quay.io (#102 ) change docker registry to quay Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-19 17:04:46 +08:00
wangxiyuan	fafd70e91c	[Doc] Update doc to work with release (#85 ) 1. Update CANN image name 2. Add pta install step 3. update vllm-ascend docker image name to ghcr 4. update quick_start to use vllm-ascend image directly. 5. fix `note` style Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-19 09:51:43 +08:00
Yikun Jiang	bfbfbce184	[CI] Add container image build ci (#64 ) ### What this PR does / why we need it? Add container image build ci: - Enable branch, tag docker image publish - branch image: `vllm-ascend:main`, `vllm-ascend:v0.7.1-dev` - tag image: `vllm-ascend:v0.7.1rc1` - Enable PR docker image build check - other changes: - Prepare the `REPO_OWNER` because the ghcr lowerercase required - Add `Free up disk space` step to avoid `No space left on device` like https://github.com/vllm-project/vllm-ascend/issues/27 - Setup qemu with image to resolve https://github.com/docker/setup-qemu-action/issues/198 ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? build: CI passed [push false](https://github.com/vllm-project/vllm-ascend/actions/runs/13347017608/job/37278724158?pr=64) Note for test case: 1. merge commits ot `main`, `v0.7.1-dev` branch ✅ main: https://github.com/Yikun/vllm-ascend/actions/runs/13347238961 --> ghcr.io/yikun/vllm-ascend:main OK ✅v0.7.1-dev: https://github.com/Yikun/vllm-ascend/actions/runs/13347229912 --> ghcr.io/yikun/vllm-ascend:v0.7.1-dev OK 2. create pep440 tag from github release: v0.7.1rc1, v0.7.1, v0.7.1rc1.dev1 all release has latest ✅ v0.7.5 --> v0.7.5, latest ✅ v0.7.5rc1 --> v0.7.5rc1 ✅ v0.7.5rc1.dev1 --> v0.7.5rc1.dev1 (no latest, add a todo here) v0.7.5rc1.post1 --> v0.7.5rc1.post1 3. create unknow tag from github release: ✅ create 0.7.1 on v0.7.1-dev: not trigger ( only prefix v triggerd) 4. create tag from git: ✅ also works, `git tag v0.7.99;git push origin v0.7.99` from publish-image Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-17 09:07:35 +08:00
Yikun Jiang	c1ac822642	[CI] Switch to cann latest version (#63 ) ### What this PR does / why we need it? Switch to cann latest version ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-16 13:38:01 +08:00
Yikun Jiang	46977f9f06	[Doc] Add sphinx build for vllm-ascend (#55 ) ### What this PR does / why we need it? This patch enables the doc build for vllm-ascend - Add sphinx build for vllm-ascend - Enable readthedocs for vllm-ascend - Fix CI: - exclude vllm-empty/tests/mistral_tool_use to skip `You need to agree to share your contact information to access this model` which introduce in `314cfade02` - Install test req to fix https://github.com/vllm-project/vllm-ascend/actions/runs/13304112758/job/37151690770: ``` vllm-empty/tests/mistral_tool_use/conftest.py:4: in <module> import pytest_asyncio E ModuleNotFoundError: No module named 'pytest_asyncio' ``` - exclude docs PR ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? 1. test locally: ```bash # Install dependencies. pip install -r requirements-docs.txt # Build the docs and preview make clean; make html; python -m http.server -d build/html/ ``` Launch browser and open http://localhost:8000/. 2. CI passed with preview: https://vllm-ascend--55.org.readthedocs.build/en/55/ Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-13 18:44:17 +08:00
Yikun Jiang	28d7691361	[FOLLOWUP][Misc] Remove unused mypy config for base_communicator (#45 ) ### What this PR does / why we need it? - Remove on communicator mypy to address: https://github.com/vllm-project/vllm-ascend/pull/24#issuecomment-2647696781 - Add mypy.ini to trigger list ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-02-12 09:17:05 +08:00
wangxiyuan	c59375caff	[Misc] version control by setuptools_scm (#21 ) make package version control by setuptools_scm to keep the same with vllm Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-10 09:36:09 +08:00
Mengqing Cao	7d9ae22ecb	[CI] use pytest.ini to manage vllm native tests (#5 ) ### What this PR does / why we need it? Use `pytest.ini` to manage vllm native tests. This will convert the original test script whitelist to a blacklist to prevent missing the newly added test scripts of the upstream vLLM. note: _we do not manage the test scripts of vLLM-Ascend in `pytest.ini`, because if we do so, there will be conflicts between vLLM and vLLM-Ascend's `conftest.py`._ ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new existing test. Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-06 23:57:51 +08:00
Yikun Jiang	d5e7756028	[Core] Init vllm-ascend (#3 ) ### What this PR does / why we need it? vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on the Ascend NPU. This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [RFC]: Hardware pluggable, providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM. This patch also include changes to make CI work and use cache speed up e2e test, including: 1. Change push (post merge ci) and pull_request (pr ci) trigger branch to main 2. Make mypy work by ignore base_communicator and clear unused deps 3. Several improvements for vllm_ascend_test: - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins) - switch `git clone` command to `action/checkout` to speedup checkout and - Enable sv for pytest for better info dump - Remove network host to resole `docker: conflicting ontions: cannot attach both user-defined and non-user-definednetwork-modes`, which is a problem on docker 1.45 but not on 1.39. 4. Adapt MLA decode optimizations: `cabaf4eff3` ### Does this PR introduce _any_ user-facing change? Yes, init the PR. ### How was this patch tested? - This is the first PR to make ascend NPU work on vLLM. All code is tested on ascend with vLLM V0 Engine. - CI passed --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com>	2025-02-05 10:53:12 +08:00

1 2

97 Commits