xc-llm-ascend

Author	SHA1	Message	Date
Mengqing Cao	77cd960524	[Misc] fast fail for exiting if tools/install_flash_infer_attention_score_ops_a2.sh (#5422 ) ### What this PR does / why we need it? Use `set -euo pipefail` to exit if tools/install_flash_infer_attention_score_ops_a2.sh failed in any line - vLLM version: release/v0.13.0 - vLLM main: `81786c8774` --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-27 17:30:34 +08:00
jiangyunfan1	1486e0d06c	[TEST]Add vllm bench (#5306 ) ### What this PR does / why we need it? This PR adds vllm bench common method, we need it to add some test cases later ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: release/v0.13.0 - vLLM main: `5fbfa8d9ef` --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-12-27 09:16:08 +08:00
Mengqing Cao	8ed6f98a5a	[Build] Add installation script of fused_infer_attention_score kernel with flash decoding (#5402 ) ### What this PR does / why we need it? Add installation script of `fused_infer_attention_score` kernel with flash decoding ### Userface changes Users can install the kernel `fused_infer_attention_score` with flash decoding feature by `bash tools/install_flash_infer_attention_score_ops_a2.sh` or `bash tools/install_flash_infer_attention_score_ops_a3.sh` - vLLM version: release/v0.13.0 - vLLM main: `254f6b9867` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-12-27 02:01:06 +08:00
jiangyunfan1	48854aef5c	[TEST]Add sending request with and without chat (#5286 ) ### What this PR does / why we need it? This PR adds the method for sending chat and non-chat request, we need it to test much folloing cases. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: release/v0.13.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-12-26 18:04:17 +08:00
zhangyiming	35dbdbb398	[Doc] Add new contributors and relative scripts. (#5070 ) ### What this PR does / why we need it? [Doc] Add new contributors and relative scripts. Usage of scripts: - `export GITHUB_TOKEN=<your github token>` - `bash tools/collect_user_first_contribution.sh vllm-project/vllm-ascend <base_sha> <head_sha>` and save the result to one temporary file such as `contributors.txt` - `python tools/format_contributors.py contributors.txt --start <start index now>` - Use the output to update the `contributors.md` - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: menogrey <1299267905@qq.com>	2025-12-23 10:01:45 +08:00
Li Wang	7294f89e43	[CI] Add daily images build for nightly ci (#3989 ) ### What this PR does / why we need it? Given the current excessively long build time of our nightly-ci, I recommend installing necessary, confirmed versions of packages in the Docker image to reduce the time required for integration testing. Including Mooncake vllm with fixed tags, This is expected to reduce nightly-ci duration by 2 hours. - vLLM version: v0.11.0 - vLLM main: `2918c1b49c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-13 20:10:12 +08:00
jiangyunfan1	0e6e08e939	[TEST]Update nightly cases and add mtpx (#4111 ) ### What this PR does / why we need it? This PR updates some nightly test cases and adds mtpx cases, we need to test them daily ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-11-11 17:39:58 +08:00
Li Wang	9cc42226d5	[CI] Integrate mooncake to vllm-ascend base image (#4062 ) ### What this PR does / why we need it? This patch aims to integrate the mooncake [v0.3.7.2.post2](https://github.com/kvcache-ai/Mooncake/releases/tag/v0.3.7.post2) to vllm-ascend images - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-11-11 15:51:16 +08:00
Li Wang	eb0a2ee2d0	[CI] Optimize nightly CI (#3898 ) ### What this PR does / why we need it? This patch mainly fix the the problem of not being able to determine the exit status of the pod's entrypoint script and some other tiny optimizations: 1. Shorten wait for server timeout 2. fix typo 3. fix the issue of ais_bench failing to correctly access the proxy URL in a PD separation scenario. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-30 23:42:20 +08:00
Li Wang	4a2ab13743	[CI] Optimize nightly CI (#3858 ) ### What this PR does / why we need it? This patch optimize nightly CI: 1. Bug fixes ais_bench get None repo_type error 2. Fix A2 install kubectl error with arm arch 3. Fix the multi_node CI unable to determine whether the job was successful error ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: `83f478bb19` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-29 22:30:19 +08:00
jiangyunfan1	e56b0017a3	[TEST]Add aisbench log and A2 cases (#3841 ) ### What this PR does / why we need it? This PR adds 2 more A2 caces which we need to test daily. It also enhances the logging for aisbench test failures to improve issues identification ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-10-28 23:33:15 +08:00
Li Wang	90ae114569	[CI] Fix nightly CI (#3821 ) ### What this PR does / why we need it? This patch fix the nightly CI runs [failure](https://github.com/vllm-project/vllm-ascend/actions/runs/18848144365) ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.1 --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-10-28 20:40:03 +08:00
jiangyunfan1	9030106a14	[TEST]Add 2P1D multi node cases for nightly test (#3764 ) ### What this PR does / why we need it? This PR adds the 2P1D multi node func/acc/perf test cases, we need test them daily ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.11.0rc3 - vLLM main: `c9461e05a4` --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: wangli <wangli858794774@gmail.com> Co-authored-by: wangli <wangli858794774@gmail.com>	2025-10-27 23:09:15 +08:00
wangyu	d301c56d1a	[TEST]Add initial multi modal cases of Qwen2.5-VL-32B-Instruct for nightly test (#3707 ) ### What this PR does / why we need it? This PR adds the initial multi modal model for nightly test, including 2 cases for Qwen2.5-vl-32b acc/perf test on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test vLLM version: v0.11.0rc3 vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com>	2025-10-24 17:12:06 +08:00
jiangyunfan1	ec9ec78b53	[TEST]Add initial prefix cache case for nightly test (#3709 ) ### What this PR does / why we need it? This PR adds the initial prefix cache case for nightly test for Qwen3-32b-int8 on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-10-24 16:33:18 +08:00
jiangyunfan1	9434f24ded	[TEST]Add initial multi modal cases for nightly test and deepseek-r1 tests (#3631 ) ### What this PR does / why we need it? This PR adds the initial multi modal model for nightly test, including 3 cases for Qwen2.5-vl-7b acc/perf test on A3, we need test them daily. It also inclues 8 cases for deepseek-r1-0528-w8a8 func, acc and perf tests ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-10-23 17:18:49 +08:00
jiangyunfan1	80b8df881f	[TEST] Add Qwen3-32b-w8a8 acc/perf A2/A3 test (#3541 ) ### What this PR does / why we need it? This PR Qwen3-32b-w8a8 acc/perf 8 cases on A2 and A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? by running the test - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: jiangyunfan1 <jiangyunfan1@h-partners.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: root <root@hostname-2pbfv.foreman.pxe> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-10-21 17:34:48 +08:00
jiangyunfan1	9e59fc1510	[TEST] Add initial aisbench support and Qwen3 32B acc/perf test (#3474 ) ### What this PR does / why we need it? This PR adds the first aisbench case for nightly test, it lays a foundation for following performance and accuracy tests in nightly test. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: jiangyunfan1 <jiangyunfan1@h-partners.com>	2025-10-20 09:33:17 +08:00
Li Wang	205eff2b12	[Bugfix] Disable check vllm init temporary (#2250 ) ### What this PR does / why we need it? For the vllm src https://github.com/vllm-project/vllm/tree/main/vllm/attention/layers do not have `__init__.py`, which will break the python src init check, so we skip it for now ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: `6b47ef24de` Signed-off-by: wangli <wangli858794774@gmail.com>	2025-08-07 10:37:22 +08:00
Li Wang	bdfb065b5d	[1/2/N] Enable pymarkdown and python __init__ for lint system (#2011 ) ### What this PR does / why we need it? 1. Enable pymarkdown check 2. Enable python `__init__.py` check for vllm and vllm-ascend 3. Make clean code ### How was this patch tested? - vLLM version: v0.9.2 - vLLM main: `29c6fbe58c` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-07-25 22:16:10 +08:00
Li Wang	c7446438a9	[1/N][CI] Move linting system to pre-commits hooks (#1256 ) ### What this PR does / why we need it? Follow vllm-project/vllm lint way: https://github.com/vllm-project/vllm/blob/main/.pre-commit-config.yaml Enable pre-commit to avoid some low level error AMAP. This pr is one step of #1241, The purpose is make linting system more clear and convenient, on this step, Mainly did the following things: yapf, actionlint, ruff, typos, isort, mypy, png-lint, signoff-commit, enforce-import-regex-instead-of-re. TODO: - clang-format(check for csrc with google style) need clean code, disable for now - pymarkdown need clean code, disable for now - shellcheck need clean code, disable for now ### Does this PR introduce _any_ user-facing change? Only developer UX change: https://vllm-ascend--1256.org.readthedocs.build/en/1256/developer_guide/contributing.html#run-lint-locally ``` pip install -r requirements-lint.txt && pre-commit install bash format.sh ``` ### How was this patch tested? CI passed with new added/existing test. Co-authored-by: Yikun [yikunkero@gmail.com](mailto:yikunkero@gmail.com) Co-authored-by: wangli [wangli858794774@gmail.com](mailto:wangli858794774@gmail.com) - vLLM version: v0.9.1 - vLLM main: `5358cce5ff` --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-07-10 14:17:15 +08:00
Shuqiao Li	01e3d59eae	add workflow to build and release wheel (#775 ) ### What this PR does / why we need it? This is a continuing work of #716. This PR add workflow to build and release wheel, and also release source to PYPI. We have 3 conditions to trigger the workflow: 1. PR to `main` and `-dev` 2. push to `main` and `-dev` 3. push tag with name of `v*` Release to PYPI will only be done under condition 3. Under condition 1 and 2, it will generate .tar.gz and build .whl, upload to github artifacts but will not release. update: Will build .whl and upload to github artifacts with scheduled task. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? All triggered conditions are well tested with my fork repo. --------- Signed-off-by: Shuqiao Li <celestialli@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-26 14:18:26 +08:00
wangxiyuan	0dae55a9a3	[MISC] fix format check error (#654 ) This pr makes format.sh works as expect. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-29 11:14:19 +08:00
hfadzxy	9935d45728	[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 ) ### What this PR does / why we need it? Add model basic accuracy test(Qwen2.5-0.5B-Instruct) Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-04-17 14:59:56 +08:00
wangxiyuan	9c7428b3d5	[CI] enable custom ops build (#466 ) ### What this PR does / why we need it? This PR enable custom ops build by default. ### Does this PR introduce _any_ user-facing change? Yes, users now install vllm-ascend from source will trigger custom ops build step. ### How was this patch tested? By image build and e2e CI --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-12 10:24:53 +08:00
Shanshan Shen	c06af8b2e0	[V1][Core] Add support for V1 Engine (#295 ) ### What this PR does / why we need it? Add support for V1 Engine. Please note that this is just the initial version, and there may be some places need to be fixed or optimized in the future, feel free to leave some comments to us. ### Does this PR introduce _any_ user-facing change? To use V1 Engine on NPU device, you need to set the env variable shown below: ```bash export VLLM_USE_V1=1 export VLLM_WORKER_MULTIPROC_METHOD=spawn ``` If you are using vllm for offline inferencing, you must add a `__main__` guard like: ```bash if __name__ == '__main__': llm = vllm.LLM(...) ``` Find more details [here](https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#python-multiprocessing). ### How was this patch tested? I have tested the online serving with `Qwen2.5-7B-Instruct` using this command: ```bash vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240 ``` Query the model with input prompts: ```bash curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "The future of AI is", "max_tokens": 7, "temperature": 0 }' ``` --------- Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: didongli182 <didongli@huawei.com>	2025-03-20 19:34:44 +08:00
Mengqing Cao	7d9ae22ecb	[CI] use pytest.ini to manage vllm native tests (#5 ) ### What this PR does / why we need it? Use `pytest.ini` to manage vllm native tests. This will convert the original test script whitelist to a blacklist to prevent missing the newly added test scripts of the upstream vLLM. note: _we do not manage the test scripts of vLLM-Ascend in `pytest.ini`, because if we do so, there will be conflicts between vLLM and vLLM-Ascend's `conftest.py`._ ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new existing test. Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-06 23:57:51 +08:00
Yikun Jiang	d5e7756028	[Core] Init vllm-ascend (#3 ) ### What this PR does / why we need it? vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on the Ascend NPU. This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [RFC]: Hardware pluggable, providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM. This patch also include changes to make CI work and use cache speed up e2e test, including: 1. Change push (post merge ci) and pull_request (pr ci) trigger branch to main 2. Make mypy work by ignore base_communicator and clear unused deps 3. Several improvements for vllm_ascend_test: - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins) - switch `git clone` command to `action/checkout` to speedup checkout and - Enable sv for pytest for better info dump - Remove network host to resole `docker: conflicting ontions: cannot attach both user-defined and non-user-definednetwork-modes`, which is a problem on docker 1.45 but not on 1.39. 4. Adapt MLA decode optimizations: `cabaf4eff3` ### Does this PR introduce _any_ user-facing change? Yes, init the PR. ### How was this patch tested? - This is the first PR to make ascend NPU work on vLLM. All code is tested on ascend with vLLM V0 Engine. - CI passed --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com>	2025-02-05 10:53:12 +08:00

28 Commits