xc-llm-ascend/requirements.txt

# Should be mirrored in pyporject.toml
cmake>=3.26
decorator
einops
numpy<2.0.0
packaging
pip
pybind11
pyyaml
scipy
setuptools>=64
setuptools-scm>=8
torch>=2.5.1
torchvision<0.21.0
wheel
# Remove after https://github.com/vllm-project/vllm-ascend/issues/2034
transformers<4.54.0

# requirements for disaggregated prefill
msgpack
quart

# Required for N-gram speculative decoding
numba

# Install torch_npu
--pre
--extra-index-url https://mirrors.huaweicloud.com/ascend/repos/pypi
torch-npu==2.5.1.post1.dev20250619
Set numpy < 2.0.0 to resolve numpy VersionConflict (#476) ### What this PR does / why we need it? vLLM bumps numpy version to 2.x: https://github.com/vllm-project/vllm/commit/8427f70493ed67bf26cb9e7fa98ac202b991c37d , this will cause a `pip._vendor.pkg_resources.ContextualVersionConflict: (numpy 2.2.4 (/usr/local/python3.10/lib/python3.10/site-packages), Requirement.parse('numpy==1.26.4'), {'vllm-ascend'})` failure when vllm ascend install. This PR resolved the issue by: - Set numpy < 2.0.0 to resolve numpy VersionConflict - Sync requirements and toml - Reorder ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes: https://github.com/vllm-project/vllm-ascend/issues/473 Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-07 16:07:21 +08:00			`# Should be mirrored in pyporject.toml`
			`cmake>=3.26`
[Core] Init vllm-ascend (#3) ### What this PR does / why we need it? vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on the Ascend NPU. This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [RFC]: Hardware pluggable, providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM. This patch also include changes to make CI work and use cache speed up e2e test, including: 1. Change push (post merge ci) and pull_request (pr ci) trigger branch to main 2. Make mypy work by ignore base_communicator and clear unused deps 3. Several improvements for vllm_ascend_test: - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins) - switch `git clone` command to `action/checkout` to speedup checkout and - Enable sv for pytest for better info dump - Remove network host to resole `docker: conflicting ontions: cannot attach both user-defined and non-user-definednetwork-modes`, which is a problem on docker 1.45 but not on 1.39. 4. Adapt MLA decode optimizations: https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a ### Does this PR introduce _any_ user-facing change? Yes, init the PR. ### How was this patch tested? - This is the first PR to make ascend NPU work on vLLM. All code is tested on ascend with vLLM V0 Engine. - CI passed --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com> 2025-02-05 10:53:12 +08:00			`decorator`
[CI] add codespell CI and fix format.sh (#827) 1. Fix format check error to make format.sh work 2. Add codespell check CI 3. Add the missing required package for vllm-ascend. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> 2025-05-12 22:04:48 +08:00			`einops`
Set numpy < 2.0.0 to resolve numpy VersionConflict (#476) ### What this PR does / why we need it? vLLM bumps numpy version to 2.x: https://github.com/vllm-project/vllm/commit/8427f70493ed67bf26cb9e7fa98ac202b991c37d , this will cause a `pip._vendor.pkg_resources.ContextualVersionConflict: (numpy 2.2.4 (/usr/local/python3.10/lib/python3.10/site-packages), Requirement.parse('numpy==1.26.4'), {'vllm-ascend'})` failure when vllm ascend install. This PR resolved the issue by: - Set numpy < 2.0.0 to resolve numpy VersionConflict - Sync requirements and toml - Reorder ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes: https://github.com/vllm-project/vllm-ascend/issues/473 Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-07 16:07:21 +08:00			`numpy<2.0.0`
[MISC] Add patch module (#526) This PR added patch module for vllm 1. platform patch: the patch will be registered when load the platform 2. worker patch: the patch will be registered when worker is started. The detail is: 1. patch_common: patch for main and 0.8.4 version 4. patch_main: patch for main verison 5. patch_0_8_4: patch for 0.8.4 version 2025-04-16 09:28:58 +08:00			`packaging`
support aclgraph (#426) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> This PR supports the access of vllm-acend to the piecewise_graph feature provided by the v1 engine. 1. register unifiled_ascend_attention_with_output for piecewise_graph to split graph. 2. support NPUGraph to accelerate kernel launch. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> support npugraph to default， Users can disenable the npugraph feature by configuring enforce_eager. This has corresponding requirements for the versions of torch_npu and CANN, and they need to support graph capture. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> it turn to default --------- Signed-off-by: Bug Hunter Yan <yanpq@zju.edu.cn> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com> 2025-04-23 20:56:24 +08:00			`pip`
Set numpy < 2.0.0 to resolve numpy VersionConflict (#476) ### What this PR does / why we need it? vLLM bumps numpy version to 2.x: https://github.com/vllm-project/vllm/commit/8427f70493ed67bf26cb9e7fa98ac202b991c37d , this will cause a `pip._vendor.pkg_resources.ContextualVersionConflict: (numpy 2.2.4 (/usr/local/python3.10/lib/python3.10/site-packages), Requirement.parse('numpy==1.26.4'), {'vllm-ascend'})` failure when vllm ascend install. This PR resolved the issue by: - Set numpy < 2.0.0 to resolve numpy VersionConflict - Sync requirements and toml - Reorder ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes: https://github.com/vllm-project/vllm-ascend/issues/473 Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-07 16:07:21 +08:00			`pybind11`
[Core] Init vllm-ascend (#3) ### What this PR does / why we need it? vLLM Ascend plugin (vllm-ascend) is a backend plugin for running vLLM on the Ascend NPU. This plugin is the recommended approach for supporting the Ascend backend within the vLLM community. It adheres to the principles outlined in the [RFC]: Hardware pluggable, providing a hardware-pluggable interface that decouples the integration of the Ascend NPU with vLLM. This patch also include changes to make CI work and use cache speed up e2e test, including: 1. Change push (post merge ci) and pull_request (pr ci) trigger branch to main 2. Make mypy work by ignore base_communicator and clear unused deps 3. Several improvements for vllm_ascend_test: - use cache (pip, ms, hf) speed up e2e test (25mins --> 5mins) - switch `git clone` command to `action/checkout` to speedup checkout and - Enable sv for pytest for better info dump - Remove network host to resole `docker: conflicting ontions: cannot attach both user-defined and non-user-definednetwork-modes`, which is a problem on docker 1.45 but not on 1.39. 4. Adapt MLA decode optimizations: https://github.com/vllm-project/vllm/commit/cabaf4eff3c7df30d785769d5a0a1fa1a1c48a8a ### Does this PR introduce _any_ user-facing change? Yes, init the PR. ### How was this patch tested? - This is the first PR to make ascend NPU work on vLLM. All code is tested on ascend with vLLM V0 Engine. - CI passed --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: wangli <wangli858794774@gmail.com> 2025-02-05 10:53:12 +08:00			`pyyaml`
			`scipy`
Set numpy < 2.0.0 to resolve numpy VersionConflict (#476) ### What this PR does / why we need it? vLLM bumps numpy version to 2.x: https://github.com/vllm-project/vllm/commit/8427f70493ed67bf26cb9e7fa98ac202b991c37d , this will cause a `pip._vendor.pkg_resources.ContextualVersionConflict: (numpy 2.2.4 (/usr/local/python3.10/lib/python3.10/site-packages), Requirement.parse('numpy==1.26.4'), {'vllm-ascend'})` failure when vllm ascend install. This PR resolved the issue by: - Set numpy < 2.0.0 to resolve numpy VersionConflict - Sync requirements and toml - Reorder ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes: https://github.com/vllm-project/vllm-ascend/issues/473 Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-07 16:07:21 +08:00			`setuptools>=64`
			`setuptools-scm>=8`
support aclgraph (#426) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> This PR supports the access of vllm-acend to the piecewise_graph feature provided by the v1 engine. 1. register unifiled_ascend_attention_with_output for piecewise_graph to split graph. 2. support NPUGraph to accelerate kernel launch. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> support npugraph to default， Users can disenable the npugraph feature by configuring enforce_eager. This has corresponding requirements for the versions of torch_npu and CANN, and they need to support graph capture. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> it turn to default --------- Signed-off-by: Bug Hunter Yan <yanpq@zju.edu.cn> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com> 2025-04-23 20:56:24 +08:00			`torch>=2.5.1`
Set torchvision<0.21.0 to match torch/torch_npu version (#479) ### What this PR does / why we need it? Set torchvision<0.21.0 to match torch/torch_npu version to resolve `RuntimeError: operator torchvision::nms does not exist`. Closes: https://github.com/vllm-project/vllm-ascend/issues/477 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-04-08 09:15:42 +08:00			`torchvision<0.21.0`
support aclgraph (#426) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> This PR supports the access of vllm-acend to the piecewise_graph feature provided by the v1 engine. 1. register unifiled_ascend_attention_with_output for piecewise_graph to split graph. 2. support NPUGraph to accelerate kernel launch. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> support npugraph to default， Users can disenable the npugraph feature by configuring enforce_eager. This has corresponding requirements for the versions of torch_npu and CANN, and they need to support graph capture. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> it turn to default --------- Signed-off-by: Bug Hunter Yan <yanpq@zju.edu.cn> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com> 2025-04-23 20:56:24 +08:00			`wheel`
Upgrade vLLM to v0.10.0 (#1927) ### What this PR does / why we need it? - Upgrade to v0.10.0 - Drop v0.9.2 version compatibility - Add patch for `vllm_ascend/patch/worker/patch_common/patch_sampler_gather_logprobs.py` as workaround of https://github.com/vllm-project/vllm/commit/f3a683b7c9df8b251092e48e53d58220bb920f2c for v0.10.0 and also add e2e test `test_models_prompt_logprobs` - Pin transformers<4.54.0 as workaround of https://github.com/vllm-project/vllm-ascend/issues/2034 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Test locally: `VLLM_USE_MODELSCOPE=true pytest -sv tests/e2e/singlecard/test_offline_inference.py::test_models_prompt_logprobs` - CI passed - vLLM version: v0.9.2 - vLLM main: https://github.com/vllm-project/vllm/commit/7728dd77bb802e1876012eb264df4d2fa2fc6f3c --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-07-26 15:43:29 +08:00			`# Remove after https://github.com/vllm-project/vllm-ascend/issues/2034`
			`transformers<4.54.0`
[Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694) ### What this PR does / why we need it? - This PR proposes a P2P version of Disaggregated Prefill based on llm_datadist which manages data transfer. - This solution reconstructs previous offline single-node Disaggregated Prefill solution, and supports multi-node and online serveing now. - Currently this solution supports 1P1D situation of Deepseek hybrid parallelism (P: TP+EP, D: DP+EP). Note that xPyD situation is considered in the solution design, and will be supported soon within v1 engine. --------- Signed-off-by: hw_whx <wanghexiang7@huawei.com> Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Co-authored-by: hw_whx <wanghexiang7@huawei.com> Co-authored-by: ganyi <pleaplusone.gy@gmail.com> 2025-05-01 22:31:36 +08:00
			`# requirements for disaggregated prefill`
			`msgpack`
			`quart`
[Build] Move numba/quart to requirments and update DS baseline and sync graph typo fix (#1121) ### What this PR does / why we need it? 1. The dependency was introduced by https://github.com/vllm-project/vllm-ascend/pull/874 - Move numba/quart from requirements-dev to requirments - Align pyproject.toml with requirements 2. This patch also fix deepseek accuracy baseline which https://github.com/vllm-project/vllm-ascend/pull/1118 was not addressed. According to https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite the gsm8k is about `41.1` 3. This also sync the vLLM upstream changes: https://github.com/vllm-project/vllm/commit/eaa2e51088d4daf36d47e566ad90e812f80e91b8 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed vllm ascend test (basic workflow) vllm longterm test (spec decode) Closes: https://github.com/vllm-project/vllm-ascend/issues/1120 --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com> 2025-06-08 22:33:37 +08:00
			`# Required for N-gram speculative decoding`
			`numba`
[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235) ### What this PR does / why we need it? 1. Fix rank set in DP scenario. The new poc version of torch-npu support setting `ASCEND_RT_VISIBLE_DEVICES` dynamically, thus we could use the rank set in `DPEngineCoreProc` directly instead of calculating local rank across dp by hand in the patched `_init_data_parallel` Closes: https://github.com/vllm-project/vllm-ascend/issues/1170 2. Bump torch-npu version to 2.5.1.post1.dev20250528 Closes: https://github.com/vllm-project/vllm-ascend/pull/1242 Closes: https://github.com/vllm-project/vllm-ascend/issues/1232 ### How was this patch tested? CI passed with new added test. --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Icey <1790571317@qq.com> Co-authored-by: Icey <1790571317@qq.com> 2025-06-16 23:09:53 +08:00
			`# Install torch_npu`
			`--pre`
			`--extra-index-url https://mirrors.huaweicloud.com/ascend/repos/pypi`
update torch-npu to 2.5.1.post1.dev20250619 (#1347) ### What this PR does / why we need it? This PR update the torch_npu to newest release version 2.5.1.post1.dev20250619 . ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI tested will guarantee the update Signed-off-by: ganyi <pleaplusone.gy@gmail.com> 2025-06-23 09:02:09 +08:00			`torch-npu==2.5.1.post1.dev20250619`