xc-llm-ascend

Author	SHA1	Message	Date
Yikun Jiang	79538b5d73	Upgrade CANN version to 8.1.rc1 (#747 ) ### What this PR does / why we need it? Make CANN version bump separately from https://github.com/vllm-project/vllm-ascend/pull/708 - Upgrade CANN version to 8.1.rc1 - Add prefix to speed up download `m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10` - Address tail sapce for Dockerfile.openEuler - Add note for `/workspace` and `/vllm-workspace` as followup of https://github.com/vllm-project/vllm-ascend/pull/741 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? CI passed Co-authored-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com>	2025-05-06 05:44:18 +08:00
hfadzxy	affca6f348	[Test] Add accuracy test report workflow (#542 ) ### What this PR does / why we need it? 1. Provide accuracy test report for development branch release. 2. Models and datasets for accuracy test： \| Model \| datasets \| \|---------------------------- \| --------------------------- \| \| Qwen2.5-7B-Instruct \| ceval-val, gsm8k, mmlu \| \| Qwen3-8B \| ceval-val, gsm8k, mmlu \| \| Llama-3.1-8B-Instruct \| ceval-val, gsm8k, mmlu \| \| Qwen2.5-VL-7B-Instruct \| mmmu_val \| ### Does this PR introduce _any_ user-facing change? This PR will display the accuracy test report of the release versionin docs/source/developer_guide/accuracy_report。 Qwen2.5-7B-Instruct.md Qwen3-8B.md Llama-3.1-8B-Instruct.md Qwen2.5-VL-7B-Instruct .md Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-04-30 14:53:58 +08:00
Li Wang	90aabaeb2e	[Doc] Add benchmark guide (#635 ) ### What this PR does / why we need it? Add benchmark developer guide --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-30 09:17:59 +08:00
wangxiyuan	0dae55a9a3	[MISC] fix format check error (#654 ) This pr makes format.sh works as expect. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-29 11:14:19 +08:00
Yikun Jiang	d39855b075	Update installation and tutorial doc (#711 ) ### What this PR does / why we need it? Update installation and tutorial doc ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-28 21:52:17 +08:00
wangxiyuan	5995d23532	[Doc] Add 0.8.4rc2 release note (#705 ) Add 0.8.4rc2 release note Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-28 21:51:35 +08:00
Li Wang	58f9d932d3	[Doc] Update faqs (#699 ) ### What this PR does / why we need it? Update faqs to make it more clear Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-28 18:48:23 +08:00
Li Wang	d0a0c81ced	[Doc] Add deepsee-v2-lite w8a8 quantization turorial (#630 ) ### What this PR does / why we need it? Add deepsee-v2-lite w8a8 quantization turorial --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-28 17:14:26 +08:00
wangxiyuan	5de3646522	[MISC] Make vllm version configurable (#651 ) Sometimes, user install a dev/editable version of vllm. In this case, we should make sure vllm-ascend works as well. This PR add a new env `VLLM_VERSION`. It's used for developers who edit vllm. In this case, developers should set thie env to make sure which vllm version is installed and used. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-28 14:19:06 +08:00
Yikun Jiang	413657ae43	[FOLLOWUP][DOC] Fix pip install cmd in installation.md (#680 ) ### What this PR does / why we need it? Fix pip install cmd in installation.md Followup on: https://github.com/vllm-project/vllm-ascend/pull/661 ### Does this PR introduce _any_ user-facing change? No, doc only ### How was this patch tested? Preview Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-27 18:37:25 +08:00
Yikun Jiang	2e20797934	[BUILD] Upgrade torch-npu to 2.5.1 (#661 ) ### What this PR does / why we need it? The torch-npu 2.5.1 are published: https://pypi.org/project/torch-npu/2.5.1/ It's time to remove all torch-npu dev version from vllm-ascend code base ### Does this PR introduce _any_ user-facing change? Yes, using torch-npu 2.5.1 ### How was this patch tested? - [ ] CI passed - [ ] Manually test - [ ] Grep all `dev2025` --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-27 17:28:29 +08:00
wangxiyuan	c99c4c8c70	[Doc] Update feature support list (#650 ) 1. remove Chinese doc. The content is out of data and we don't have enough time to maintain it. 2. Update feature support matrix. Refresh the content and add V1 status. --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-26 10:27:29 +08:00
RongRongStudio	848e041a54	Using EvalScope evaluation (#611 ) ### What this PR does / why we need it? Using EvalScope to hava a evaluation (include eval and test): - https://evalscope.readthedocs.io/en/latest/user_guides/stress_test/quick_start.html#basic-usage - https://evalscope.readthedocs.io/en/latest/get_started/basic_usage.html#model-api-service-evaluation ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test locally --------- Signed-off-by: RongRongStudio <82669040+RongRongStudio@users.noreply.github.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-23 00:50:09 +08:00
Shuqiao Li	ad845bfe82	fix doc to mention env setting for v0.7.3-dev (#602 ) ### What this PR does / why we need it? fix doc to mention env setting for v0.7.3-dev Signed-off-by: Shuqiao Li <celestialli@outlook.com>	2025-04-22 14:11:41 +08:00
Mengqing Cao	c5850d302d	[Doc] Update installation (#596 ) Many users facing a failed installation when using `pip install -e .`, this is mainly introduced by the released `torch-npu` version conflict with `torch>=2.5.1`. This conflict mainly exist in the temp env of pyproject build. This pr updates installation tutorial by using `python setup.py develop` to quick fix this. cc @wangxiyuan --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-04-22 09:04:20 +08:00
Yikun Jiang	12cae04db9	[quantization] Support w8a8 quantization (#580 ) ### What this PR does / why we need it? Add a `VLLMAscendQuantizer` to support w8a8 static (W8A8) and dynamic on linear and moe (W8A8_DYNAMIC), the quantizer will be enable if a model has [quantize filed](https://huggingface.co/vllm-ascend/Qwen2.5-0.5B-Instruct-w8a8/blob/main/config.json#L27). If MindIE Turbo is installed, the MindIE Turbo Quantizer will apply, otherwise will use VLLMAscendQuantizer directly. - This patch fix installation docs to make installation work - This patch enable norm quantization by patch `RMSNorm.__init__`, `RMSNorm.forward_oot`, `NPUModelRunnerBase.load_model` - Add `AscendW8A8LinearMethod` for W8A8 - Add `AscendW8A8DynamicLinearMethod` and `AscendW8A8DynamicFusedMoEMethod` for W8A8_DYNAMIC - Add a e2e test for `vllm-ascend/Qwen2.5-0.5B-Instruct-w8a8` ### Does this PR introduce _any_ user-facing change? Yes, support w8a8 quantization. After this patch supported, users can use below commands to run w8a8 models: ``` vllm serve /root/.cache/modelscope/hub/Qwen/Qwen2.5-7B-Instruct-w8a8 --served-model-name "qwen2.5-7B" ``` ### How was this patch tested? 0. CI passed: add e2e test for `vllm-ascend/Qwen2.5-0.5B-Instruct-w8a8` 1. From @Yikun: I test Qwen2.5-0.5B-Instruct-w8a8 for functional test all is well, pls refer to https://github.com/vllm-project/vllm-ascend/pull/580#issuecomment-2816747613 2. From @dingdingchaomian : Use qwen2.5-72b-instruct model and deepseek-v2-lite-chat tested, both models were quantized using Ascend's msmodelslim tool: - Qwen2.5-72b-instruct were tested twice, one for w8a8 static and one for w8a8 dynamic. - Deepseek-v2-lite-chat were tested once because its quantization used both static and dynamic w8a8. Models were tested using both off line inference and online serving, and both work well. The inference codes are exactly the same with the examples in https://vllm-ascend.readthedocs.io/en/latest/quick_start.html, with model path and tensor parallel number changed. --------- Signed-off-by: dingdingchaomian <wangce21@huawei.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: dingdingchaomian <wangce21@huawei.com> Co-authored-by: Angazenn <zengyanjia@huawei.com> Co-authored-by: liujiaxu <liujiaxu4@huawei.com> Co-authored-by: ApsarasX <apsarax@outlook.com> Co-authored-by: ganyi1996ppo <pleaplusone.gy@gmail.com>	2025-04-20 18:14:05 +08:00
Shanshan Shen	985b0548b0	[Doc] Update v0.8.4 release note, add contents for structured output feature (#576 ) ### What this PR does / why we need it? Update v0.8.4 release note: - Add contents for structured output feature. - Remove redundant `(` in spec decoding. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? Preview Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-18 17:44:16 +08:00
Mengqing Cao	2c903bc7ac	[Doc] Update doc for custom ops build (#570 ) - update doc about custom ops compile --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-04-18 15:35:10 +08:00
Mengqing Cao	b91f9a5afd	[Doc][Build] Update build doc and faq (#568 ) Update build doc and faq about deepseek w8a8 Signed-off-by: MengqingCao <cmq0113@163.com>	2025-04-18 14:16:41 +08:00
wangxiyuan	e66ded5679	[Doc] Add release note for 0.8.4rc1 (#557 ) Add release note for 0.8.4rc1, we'll release 0.8.4rc1 now. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-18 13:24:36 +08:00
Shanshan Shen	7eeff60715	[Doc] Update FAQ doc (#561 ) ### What this PR does / why we need it? Update FAQ doc to make `docker pull` more clear Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-18 13:13:13 +08:00
Mengqing Cao	b71f193cb0	[Model][Doc] Update model support list (#552 ) Update model support list cc @Yikun plz help review, thanks! Signed-off-by: MengqingCao <cmq0113@163.com>	2025-04-17 19:32:20 +08:00
hfadzxy	9935d45728	[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 ) ### What this PR does / why we need it? Add model basic accuracy test(Qwen2.5-0.5B-Instruct) Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-04-17 14:59:56 +08:00
Li Wang	64fdf4cbef	[Doc]Update faq (#536 ) ### What this PR does / why we need it? update performance and accuracy faq Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-17 14:56:51 +08:00
hfadzxy	00de2ee6ad	[Doc] update faq about progress bar display issue (#538 ) ### What this PR does / why we need it? update faq about progress bar display issue Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-04-16 16:07:08 +08:00
Mengqing Cao	fe13cd9ea5	[Doc] update faq about w8a8 (#534 ) update faq about w8a8 --------- Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-04-16 09:37:21 +08:00
wangxiyuan	bbe7ccd366	[MISC] Add patch module (#526 ) This PR added patch module for vllm 1. platform patch: the patch will be registered when load the platform 2. worker patch: the patch will be registered when worker is started. The detail is: 1. patch_common: patch for main and 0.8.4 version 4. patch_main: patch for main verison 5. patch_0_8_4: patch for 0.8.4 version	2025-04-16 09:28:58 +08:00
Shanshan Shen	bcbc04f92b	[Doc] Add environment variables doc (#519 ) ### What this PR does / why we need it? Add environment variables doc. --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-15 16:09:36 +08:00
wangxiyuan	5c6d79687c	[Doc] Update FAQ (#518 ) Update FAQ Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-15 10:17:56 +08:00
wangxiyuan	5fa70b6393	[Build] Update doc (#509 ) 1. install torch-npu before vllm-ascend to ensure custom ops build success. 2. set `COMPILE_CUSTOM_KERNELS=0` if users want to disable custom ops build. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-14 14:38:50 +08:00
Shanshan Shen	11ecbfdb31	[Doc] Update FAQ doc (#504 ) ### What this PR does / why we need it? Update FAQ doc. --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2025-04-14 11:11:40 +08:00
wangxiyuan	9c7428b3d5	[CI] enable custom ops build (#466 ) ### What this PR does / why we need it? This PR enable custom ops build by default. ### Does this PR introduce _any_ user-facing change? Yes, users now install vllm-ascend from source will trigger custom ops build step. ### How was this patch tested? By image build and e2e CI --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-12 10:24:53 +08:00
Icey	d05ea17427	Add openEuler based container image for vLLM Ascend (#489 ) ### What this PR does / why we need it? Provide users with openEuler-based vllm images, so modify the quick start readme ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? There is no need for performing any test. --------- Signed-off-by: Icey <1790571317@qq.com>	2025-04-10 14:30:49 +08:00
jinyuxin	5d6239306b	[DOC] Update multi_node.md (#468 ) ### What this PR does / why we need it? - Added instructions for verifying multi-node communication environment. - Included explanations of Ray-related environment variables for configuration. - Provided detailed steps for launching services in a multi-node environment. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually tested. Signed-off-by: jinyuxin <jinyuxin2@huawei.com>	2025-04-08 14:19:57 +08:00
hfadzxy	94bf9c379e	[Doc]Add developer guide for using lm-eval (#456 ) ### What this PR does / why we need it? Add developer guide for using lm-eval ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test manually --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-01 23:43:51 +08:00
Yikun Jiang	c42e21a5aa	[Docs] Add install system dependencies in install doc (#438 ) ### What this PR does / why we need it? Add install system dependencies in install doc Resolve: ``` $ pip install vllm==v0.7.3 CMake Error at CMakeLists.txt:14 (project): No CMAKE_CXX_COMPILER could be found. Tell CMake where to find the compiler by setting either the environment variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path to the compiler, or to the compiler name if it is in the PATH. // ... ... note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for vllm Failed to build vllm ERROR: Failed to build installable wheels for some pyproject.toml based projects (vllm) ``` Closes: https://github.com/vllm-project/vllm-ascend/issues/439 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-31 14:17:55 +08:00
hfadzxy	7beb4339dc	[Doc]Add developer guide for using OpenCompass (#368 ) ### What this PR does / why we need it? Add developer guide for using OpenCompass ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test manually --------- Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-31 00:24:25 +08:00
wangxiyuan	ca8b1c3e47	[Doc] Add 0.7.3rc2 release note (#419 ) Add 0.7.3rc2 release note. We'll release 0.7.3rc2 right now. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-29 09:02:08 +08:00
Tony	b1557abab6	fix multistep bug,remove uselesscodes (#355 ) 1. remove useluss code in attention.py 2. multistep now using StatefulModelInputForNPU and do not use StatefulModelInput Signed-off-by: new-TonyWang <wangtonyyu222@gmail.com>	2025-03-28 09:55:35 +08:00
Zhenyu Zheng	0b5a9643fd	Add an example for user stories (#399 ) Add an example for user stories and fix some typo Add a new section, user story in the docs, to collect user stories of llvm-ascend, also add an example and the issue template to collect user story Signed-off-by: Zhenyu Zheng <zheng.zhenyu@outlook.com>	2025-03-26 16:25:57 +08:00
Mengqing Cao	d4accf4ec2	[Doc][Model] update LLaVA 1.6 support (#373 ) update LLaVA 1.6 support --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-03-26 09:07:55 +08:00
Mengqing Cao	6295d2e9bc	[CI/Build][Doc] upgrade torch-npu to 0320 (#392 ) ### What this PR does / why we need it? This pr upgrades torch-npu to 0320, so that #321, https://github.com/vllm-project/vllm-ascend/issues/267#issuecomment-2745045743 could be fixed, and #372 should be reverted after this pr ### Does this PR introduce _any_ user-facing change? upgrade torch-npu to 0320 ### How was this patch tested? tested locally with long seq inferencing. --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-03-26 09:04:12 +08:00
Shanshan Shen	3fb3b5cf75	[Doc] Update model support doc (add QwQ-32B) (#388 ) ### What this PR does / why we need it? Update model support doc (add QwQ-32B) Signed-off-by: Shanshan Shen <87969357+shen-shanshan@users.noreply.github.com>	2025-03-25 11:40:50 +08:00
Shanshan Shen	c06af8b2e0	[V1][Core] Add support for V1 Engine (#295 ) ### What this PR does / why we need it? Add support for V1 Engine. Please note that this is just the initial version, and there may be some places need to be fixed or optimized in the future, feel free to leave some comments to us. ### Does this PR introduce _any_ user-facing change? To use V1 Engine on NPU device, you need to set the env variable shown below: ```bash export VLLM_USE_V1=1 export VLLM_WORKER_MULTIPROC_METHOD=spawn ``` If you are using vllm for offline inferencing, you must add a `__main__` guard like: ```bash if __name__ == '__main__': llm = vllm.LLM(...) ``` Find more details [here](https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#python-multiprocessing). ### How was this patch tested? I have tested the online serving with `Qwen2.5-7B-Instruct` using this command: ```bash vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240 ``` Query the model with input prompts: ```bash curl http://localhost:8000/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2.5-7B-Instruct", "prompt": "The future of AI is", "max_tokens": 7, "temperature": 0 }' ``` --------- Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: didongli182 <didongli@huawei.com>	2025-03-20 19:34:44 +08:00
Shanshan Shen	441a62e937	[Doc] Fix bugs of installation doc and format tool (#330 ) ### What this PR does / why we need it? Fix bugs of installation doc and format tool. ### Does this PR introduce _any_ user-facing change? no. ### How was this patch tested? no. Signed-off-by: shen-shanshan <467638484@qq.com>	2025-03-14 10:21:35 +08:00
wangxiyuan	c25631ec7b	[Doc] Add the release note for 0.7.3rc1 (#285 ) Add the release note for 0.7.3rc1 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-13 17:57:06 +08:00
Li Wang	41aba1cfc1	[Doc]Fix tutorial doc expression (#319 ) Fix tutorial doc expression Signed-off-by: wangli <wangli858794774@gmail.com>	2025-03-13 15:24:05 +08:00
xiemingda	59ea23d0d3	[Doc] Add Single NPU (Qwen2.5-VL-7B) tutorial (#311 ) Run vllm-ascend on Single NPU What this PR does / why we need it? Add vllm-ascend tutorial doc for Qwen/Qwen2.5-VL-7B-Instruct model Inference/Serving doc Does this PR introduce any user-facing change? no How was this patch tested? no Signed-off-by: xiemingda <xiemingda1002@gmail.com>	2025-03-12 20:37:12 +08:00
Yikun Jiang	007aeaa48b	[Doc] Change distributed_executor_backend to mp (#287 ) ### What this PR does / why we need it? Fix `ValueError: Unrecognized distributed executor backend tp. Supported values are 'ray', 'mp' 'uni', 'external_launcher' or custom ExecutorBase subclass.` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test on my local node Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-10 11:27:26 +08:00
Yikun Jiang	38334f5daa	[Docs] Re-arch on doc and make QwQ doc work (#271 ) ### What this PR does / why we need it? Re-arch on tutorials, move singe npu / multi npu / multi node to index. - Unifiy docker run cmd - Use dropdown to hide build from source installation doc - Re-arch tutorials to include Qwen/QwQ/DeepSeek - Make QwQ doc works ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI test Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-10 09:27:48 +08:00

1 2

76 Commits