xc-llm-ascend

Author	SHA1	Message	Date
Yikun Jiang	d7e1110c8e	Re-patch TritonPlaceholder on main to make CI happy (#753 ) ### What this PR does / why we need it? Re-patch TritonPlaceholder on main to make CI happy - Add triton patch back until https://github.com/vllm-project/vllm/pull/17446 resolved - Move patch_main before patch_common to resolve minicpm triton import issue - Add `0.8.5` and `0.8.5.post1` to make patch work on 0.8.5 all versions Related: - https://github.com/vllm-project/vllm-ascend/pull/704 - https://github.com/vllm-project/vllm-ascend/pull/690 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? All CI passed include main Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-05 23:22:24 +08:00
wangxiyuan	f8350569e6	[CI] upgrade vllm to 0.8.5 (#715 ) 1. Upgrade vllm to 0.8.5 2. Drop 0.8.4 support 3. Keep doc to 0.8.4rc2 until we release 0.8.5 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-30 09:15:50 +08:00
wangxiyuan	95e7aa4736	[Platform] format platform to make it more clear (#610 ) Platform should only contain the function that based from vllm. This PR move the unrelated function to the right place to make platform more clear. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-30 09:03:10 +08:00
Pleaplusone	e74331a1ed	Add dp initialize patch with hccl backend (#626 ) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> Add dp stateless process group initialization path with hccl backend as vllm-ascend patch. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>	2025-04-23 15:47:51 +08:00
wangxiyuan	538a69c145	[Patch] format patch module to make it more clear (#601 ) Format patch module to make it more clear. Add the patch doc description, the new patch must follow this guide. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-22 14:13:00 +08:00
Pleaplusone	d12a057df8	Add note for deepseek related docs and remove unnecessary comments (#590 ) ### What this PR does / why we need it? Add notes for deepseek's patch and remove some of the unnecessary comments --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>	2025-04-22 09:59:09 +08:00
Shuqiao Li	5442b463fd	add doc for patch_config (#574 ) ### What this PR does / why we need it? add doc for patch_config ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No code changed. Signed-off-by: Shuqiao Li <celestialli@outlook.com>	2025-04-21 10:33:38 +08:00
Pleaplusone	1a1f9a6d89	port deepseekv2 and mtp to main branch (#429 ) ### What this PR does / why we need it? This PR ports all the deepseek graph mode code and mtp code from v0.7.3 to the main branch --------- Signed-off-by: SidaoY <1024863041@qq.com> Signed-off-by: linfeng-yuan <1102311262@qq.com> Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com> Signed-off-by: mengwei805 <mengwei25@huawei.com> Signed-off-by: libaokui <libaokui@huawei.com> Signed-off-by: q00832892 <qiaoyang19@huawei.com> Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Co-authored-by: SidaoY <1024863041@qq.com> Co-authored-by: linfeng-yuan <1102311262@qq.com> Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com> Co-authored-by: mengwei805 <mengwei25@huawei.com> Co-authored-by: libaokui <libaokui@huawei.com>	2025-04-19 17:38:18 +08:00
Shuqiao Li	84563fc65d	Add sleep mode feature for Ascend NPU (#513 ) ### What this PR does / why we need it? This PR adds sleep mode feature for vllm-ascend, when sleeps, we do mainly two things: - offload model weights - discard kv cache RLHF tools(such as https://github.com/volcengine/verl and https://github.com/OpenRLHF/OpenRLHF) have a strong need of sleep mode to accelerate the training process. This PR may solve #375 and #320 . ### Does this PR introduce _any_ user-facing change? No existing user interfaces changed. Users will have two new methods(`sleep()` and `wake_up()`) to use. ### How was this patch tested? This PR is tested with Qwen/Qwen2.5-0.5B-Instruct. At first, we have free NPU memory M1. After `llm = LLM("Qwen/Qwen2.5-0.5B-Instruct", enable_sleep_mode=True)` executed, we have free NPU memory M2. M2 < M1. Then we call `llm.sleep(level=1)`, we have free NPU memory M3. We have M3 > M2, M3 is very close to M1. Plus, we have the same output tokens before sleep and after wake up, with the config of `SamplingParams(temperature=0, max_tokens=10)` and with the same input tokens of course. This PR is utilizing the CMake procedure of #371 , thanks a lot. Signed-off-by: Shuqiao Li <celestialli@outlook.com>	2025-04-18 13:11:39 +08:00
wangxiyuan	42c7fbb10e	[Misc] Fix import error and address nits to make CI happy (#563 ) 1. Add `vllm_version_is` function to check vllm version. 2. `ensure_kv_transfer_initialized` and `get_kv_transfer_group ` have been moved to other place in vllm main branch via `3408e47159` , this patch fix the import error. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-18 12:23:32 +08:00
wangxiyuan	bbe7ccd366	[MISC] Add patch module (#526 ) This PR added patch module for vllm 1. platform patch: the patch will be registered when load the platform 2. worker patch: the patch will be registered when worker is started. The detail is: 1. patch_common: patch for main and 0.8.4 version 4. patch_main: patch for main verison 5. patch_0_8_4: patch for 0.8.4 version	2025-04-16 09:28:58 +08:00

11 Commits