xc-llm-ascend

Author	SHA1	Message	Date
zhangxinyuehfad	ca0823f238	[0.11.0][Bugfix] fix fastapi version (#5052 ) ### What this PR does / why we need it? fix fastapi version Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-12-16 11:34:11 +08:00
zhangxinyuehfad	a686f2962a	[0.11.0][Bugfix] fix e2e full test (#4424 ) ### What this PR does / why we need it? pin Transformer version to 4.57.1 fix 'dict' object has no attribute 'model_type' https://github.com/vllm-project/vllm-ascend/actions/runs/19660859460/job/56306822464 picked from https://github.com/vllm-project/vllm-ascend/pull/4423 Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-11-25 21:21:42 +08:00
wangxiyuan	8a7154001e	[0.11.0]Chery pick pta upgrade change (#3940 ) This PR cherry-pick two commit from main to upgrade torch-npu to 2.7.1 official release --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-10-31 22:14:26 +08:00
Mengqing Cao	1b16c01afd	[v0.11.0-dev][Installation] limit opencv-python-headless version to resolve numpy version conflict (#3767 ) ### What this PR does / why we need it? vllm requires opencv-python-headless >= 4.11.0 which requires (numpy<2.3.0,>=2), but vllm-ascend numpy version must be less than 2.0.0, so limit opencv-python-headless less than 4.11.0.86 will fix this conflict. backport of `afc58184ec` Signed-off-by: 22dimensions <waitingwind@foxmail.com> Co-authored-by: 22dimensions <waitingwind@foxmail.com>	2025-10-25 18:18:28 +08:00
wangxiyuan	ba19dd3183	Revert PTA upgrade PR (#3352 ) we notice that torch npu 0919 doesn't work. This PR revert related change which rely on 0919 version. Revert PR: #3295 #3205 #3102 Related: #3353 - vLLM version: v0.11.0	2025-10-10 14:09:53 +08:00
wangxiyuan	4abdcdba4e	upgrade pta to 0919 (#3295 ) ### What this PR does / why we need it? Upgrade torch-npu to the newest POC version ### Does this PR introduce _any_ user-facing change? yes, user need upgrade the pta version as well. ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-30 17:14:23 +08:00
wangxiyuan	9260910c8d	[CI] Fix broken CI (#2302 ) 1. disable test_eagle_ccorrectness test, we'll reopen it once oom error fixed. 2. drop transformers version limit for main, since vLLM rely on >=4.55.0, see: `65552b476b` 3. fix kv_connector_output bug, see: `796bae07c5` - vLLM version: v0.10.0 - vLLM main: `d1af8b7be9` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-11 11:22:32 +08:00
leo-pony	807f0895b2	Bump torch version to 2.7.1 (#1562 ) ### What this PR does / why we need it? Bump torch version to 2.7.1, and cleanup infer schema patch https://github.com/vllm-project/vllm-ascend/commit/857f489 (https://github.com/vllm-project/vllm-ascend/pull/837), this patch depends on also: https://github.com/vllm-project/vllm-ascend/pull/1974 ### Does this PR introduce any user-facing change? No #### How was this patch tested? CI passed torch-npu 2.7.1rc1 install guide: https://gitee.com/ascend/pytorch/tree/v2.7.1/ install depending: ``` pip3 install pyyaml pip3 install setuptools ``` install torch-npu: Closes: https://github.com/vllm-project/vllm-ascend/issues/1866 Closes: https://github.com/vllm-project/vllm-ascend/issues/1390 - vLLM version: v0.10.0 - vLLM main: `9af654cc38` --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-08-05 08:43:24 +08:00
Yikun Jiang	17a430f7b8	Upgrade vLLM to v0.10.0 (#1927 ) ### What this PR does / why we need it? - Upgrade to v0.10.0 - Drop v0.9.2 version compatibility - Add patch for `vllm_ascend/patch/worker/patch_common/patch_sampler_gather_logprobs.py` as workaround of `f3a683b7c9` for v0.10.0 and also add e2e test `test_models_prompt_logprobs` - Pin transformers<4.54.0 as workaround of https://github.com/vllm-project/vllm-ascend/issues/2034 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Test locally: `VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models_prompt_logprobs` - CI passed - vLLM version: v0.9.2 - vLLM main: `7728dd77bb` --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-07-26 15:43:29 +08:00
wangxiyuan	2b726d8f90	[CI] Fix broken CI (#1889 ) 1. vLLM commit `45badd05d0` changed the pooling check logic which broken vLLM Ascend. 2. vLLM commit `3e04107d97` requires higher version of transformers. The transformers version bug has been fixed by `e936e401de`. We can safe to remove the version limit now. 3. vLLM commit `217937221b` added a new input `enable_eplb` for FusedMoe Ops This PR fix the broken CI. - vLLM version: v0.9.2 - vLLM main: `6a971ed692` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-20 02:11:57 +08:00
Mengqing Cao	2cf9c4c3a2	[CI/Build] Fix version conflict on transformers (#1490 ) ### What this PR does / why we need it? Fix version conflict on transformers: `pip._vendor.pkg_resources.ContextualVersionConflict: (transformers 4.53.0 (/usr/local/python3.10.17/lib/python3.10/site-packages), Requirement.parse('transformers<4.53.0'), {'vllm-ascend'})` Fix https://github.com/vllm-project/vllm-ascend/actions/runs/15933263325/job/44947231642 ### Does this PR introduce _any_ user-facing change? Fix broken build ### How was this patch tested? CI passed with new existing test. Signed-off-by: MengqingCao <cmq0113@163.com>	2025-06-28 15:11:04 +08:00
Mengqing Cao	d59e7fa095	[CI] Pin transformers<4.53.0 and fix EPLB load_weights to make CI passed (#1482 ) ### What this PR does / why we need it? - Fix vLLM EPLB break `e9fd658a73` by recovering load_weights back to [v0.9.1 version](`07b8fae219`) temporarily. - Fix transformers>=4.53.0 image processor break Related: https://github.com/vllm-project/vllm-ascend/issues/1470 - Mirror torch_npu requirements to pyproject.toml ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-28 00:12:43 +08:00
Pleaplusone	7e6efbf2a9	update torch-npu to 2.5.1.post1.dev20250619 (#1347 ) ### What this PR does / why we need it? This PR update the torch_npu to newest release version 2.5.1.post1.dev20250619 . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI tested will guarantee the update Signed-off-by: ganyi <pleaplusone.gy@gmail.com>	2025-06-23 09:02:09 +08:00
Mengqing Cao	96fa7ff63b	[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235 ) ### What this PR does / why we need it? 1. Fix rank set in DP scenario. The new poc version of torch-npu support setting `ASCEND_RT_VISIBLE_DEVICES` dynamically, thus we could use the rank set in `DPEngineCoreProc` directly instead of calculating local rank across dp by hand in the patched `_init_data_parallel` Closes: https://github.com/vllm-project/vllm-ascend/issues/1170 2. Bump torch-npu version to 2.5.1.post1.dev20250528 Closes: https://github.com/vllm-project/vllm-ascend/pull/1242 Closes: https://github.com/vllm-project/vllm-ascend/issues/1232 ### How was this patch tested? CI passed with new added test. --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Icey <1790571317@qq.com>	2025-06-16 23:09:53 +08:00
Yikun Jiang	4976b48b98	[Build] Move numba/quart to requirments and update DS baseline and sync graph typo fix (#1121 ) ### What this PR does / why we need it? 1. The dependency was introduced by https://github.com/vllm-project/vllm-ascend/pull/874 - Move numba/quart from requirements-dev to requirments - Align pyproject.toml with requirements 2. This patch also fix deepseek accuracy baseline which https://github.com/vllm-project/vllm-ascend/pull/1118 was not addressed. According to https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite the gsm8k is about `41.1` 3. This also sync the vLLM upstream changes: `eaa2e51088` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed vllm ascend test (basic workflow) vllm longterm test (spec decode) Closes: https://github.com/vllm-project/vllm-ascend/issues/1120 --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-06-08 22:33:37 +08:00
wangxiyuan	6193ba679b	[CI] add codespell CI and fix format.sh (#827 ) 1. Fix format check error to make format.sh work 2. Add codespell check CI 3. Add the missing required package for vllm-ascend. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-12 22:04:48 +08:00
whx	8b194ad12e	[Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694 ) ### What this PR does / why we need it? - This PR proposes a P2P version of Disaggregated Prefill based on llm_datadist which manages data transfer. - This solution reconstructs previous offline single-node Disaggregated Prefill solution, and supports multi-node and online serveing now. - Currently this solution supports 1P1D situation of Deepseek hybrid parallelism (P: TP+EP, D: DP+EP). Note that xPyD situation is considered in the solution design, and will be supported soon within v1 engine. --------- Signed-off-by: hw_whx <wanghexiang7@huawei.com> Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Co-authored-by: hw_whx <wanghexiang7@huawei.com> Co-authored-by: ganyi <pleaplusone.gy@gmail.com>	2025-05-01 22:31:36 +08:00
Yikun Jiang	2e20797934	[BUILD] Upgrade torch-npu to 2.5.1 (#661 ) ### What this PR does / why we need it? The torch-npu 2.5.1 are published: https://pypi.org/project/torch-npu/2.5.1/ It's time to remove all torch-npu dev version from vllm-ascend code base ### Does this PR introduce _any_ user-facing change? Yes, using torch-npu 2.5.1 ### How was this patch tested? - [ ] CI passed - [ ] Manually test - [ ] Grep all `dev2025` --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-27 17:28:29 +08:00
Bug Hunter Yan	05bdcbeae4	support aclgraph (#426 ) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> This PR supports the access of vllm-acend to the piecewise_graph feature provided by the v1 engine. 1. register unifiled_ascend_attention_with_output for piecewise_graph to split graph. 2. support NPUGraph to accelerate kernel launch. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> support npugraph to default， Users can disenable the npugraph feature by configuring enforce_eager. This has corresponding requirements for the versions of torch_npu and CANN, and they need to support graph capture. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> it turn to default --------- Signed-off-by: Bug Hunter Yan <yanpq@zju.edu.cn> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-04-23 20:56:24 +08:00
wangxiyuan	bbe7ccd366	[MISC] Add patch module (#526 ) This PR added patch module for vllm 1. platform patch: the patch will be registered when load the platform 2. worker patch: the patch will be registered when worker is started. The detail is: 1. patch_common: patch for main and 0.8.4 version 4. patch_main: patch for main verison 5. patch_0_8_4: patch for 0.8.4 version	2025-04-16 09:28:58 +08:00
wangxiyuan	9c7428b3d5	[CI] enable custom ops build (#466 ) ### What this PR does / why we need it? This PR enable custom ops build by default. ### Does this PR introduce _any_ user-facing change? Yes, users now install vllm-ascend from source will trigger custom ops build step. ### How was this patch tested? By image build and e2e CI --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-12 10:24:53 +08:00
Yikun Jiang	579d858a20	Set torchvision<0.21.0 to match torch/torch_npu version (#479 ) ### What this PR does / why we need it? Set torchvision<0.21.0 to match torch/torch_npu version to resolve `RuntimeError: operator torchvision::nms does not exist`. Closes: https://github.com/vllm-project/vllm-ascend/issues/477 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-08 09:15:42 +08:00
Yikun Jiang	adabdeea7f	Set numpy < 2.0.0 to resolve numpy VersionConflict (#476 ) ### What this PR does / why we need it? vLLM bumps numpy version to 2.x: `8427f70493` , this will cause a `pip._vendor.pkg_resources.ContextualVersionConflict: (numpy 2.2.4 (/usr/local/python3.10/lib/python3.10/site-packages), Requirement.parse('numpy==1.26.4'), {'vllm-ascend'})` failure when vllm ascend install. This PR resolved the issue by: - Set numpy < 2.0.0 to resolve numpy VersionConflict - Sync requirements and toml - Reorder ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes: https://github.com/vllm-project/vllm-ascend/issues/473 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-07 16:07:21 +08:00
Pleaplusone	ce8259975e	[core] Support custom ascendc kernels in vllm-ascend (#233 ) This PR add custom ascendc kernel rotary_embedding support in vllm-ascend, related CMakeLists and setuptools is also added in this PR. Related: https://github.com/vllm-project/vllm-ascend/issues/156 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>	2025-04-03 14:52:34 +08:00
Mengqing Cao	2dbd763584	[CI] Fix mypy CI (#443 ) ### What this PR does / why we need it? Fix CI by updating mypy and pining numpy version _the modification of model_runner_v1 is just to make CI happy_ ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed Signed-off-by: MengqingCao <cmq0113@163.com>	2025-04-01 09:25:33 +08:00
Mengqing Cao	36991b2052	[CI] enable CI on all branch (#124 ) Enable CI on all branch. Installing with the torch-npu-2.5.1.dev20250218 so that we could enable CI on all branch and prepare for merging 0.7.1-dev to main --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-02-21 16:16:48 +08:00
Huazhong Ji	c8b57d10b2	[Misc] update the dependency version of torch-npu (#50 ) ### What this PR does / why we need it? This PR updates the dependency version of vllm-ascend on torch-npu, so that the vllm-ascend can be installed in a later version environment (like to torch-npu 2.6.0rc1), ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI Test Signed-off-by: ji-huazhong <hzji210@gmail.com>	2025-02-12 15:50:38 +08:00
wangxiyuan	c59375caff	[Misc] version control by setuptools_scm (#21 ) make package version control by setuptools_scm to keep the same with vllm Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-02-10 09:36:09 +08:00
Yikun Jiang	d5e7756028	[Core] Init vllm-ascend (#3 ) ### What this PR does / why we need it? vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on the Ascend NPU. This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [RFC]: Hardware pluggable, providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM. This patch also include changes to make CI work and use cache speed up e2e test, including: 1. Change push (post merge ci) and pull_request (pr ci) trigger branch to main 2. Make mypy work by ignore base_communicator and clear unused deps 3. Several improvements for vllm_ascend_test: - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins) - switch `git clone` command to `action/checkout` to speedup checkout and - Enable sv for pytest for better info dump - Remove network host to resole `docker: conflicting ontions: cannot attach both user-defined and non-user-definednetwork-modes`, which is a problem on docker 1.45 but not on 1.39. 4. Adapt MLA decode optimizations: `cabaf4eff3` ### Does this PR introduce _any_ user-facing change? Yes, init the PR. ### How was this patch tested? - This is the first PR to make ascend NPU work on vLLM. All code is tested on ascend with vLLM V0 Engine. - CI passed --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com>	2025-02-05 10:53:12 +08:00

29 Commits