xc-llm-ascend

Author	SHA1	Message	Date
Mengqing Cao	6eddbd2521	[CI/UT][PD Disaggreate] Initialize PD Disaggreate UT (#889 ) Initialize PD Disaggreate UT --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-05-29 10:17:12 +08:00
wangxiyuan	f6e5decc10	[CI] upgrade to vllm 0.9.0 (#959 ) Upgrade to vllm 0.9.0. 0.8.5 will not be supported any more. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-28 21:18:41 +08:00
wangxiyuan	e2a0c19cea	[CI] Refactor CI (#952 ) 1. remove some useless test func and file 2. fix format.sh problem 3. enable full test for singlecard and multicard 4. move long term test to long_term folder. For this kind of test, it only runs by labeled and daily test. Include: spec decode、accuracy test ## After refactor: There are 4 test modules - `singlecard`: contains the test running on one NPU. It'll be run for each PR and daily test. - `multicard`: contains the test running on multi NPUs. It'll be run for each PR and daily test. - `long_term`: contains the test that cost much time(Now include `spec decode` and `accuracy` test). It'll be run for the PR with `long-term-test` labeled and daily test. - `e2e`: contains the test for doc and pd feature. It'll be run for the PR with `pd-test` labeled and daily test. ## Todo: 1. some test are skipped, they should be fixed and reenabled in the future. 2. pyhccl test for multicard doesn't work at all. It should be enabled as well. 3. ensure long-term-test pass by daily test. ### Know issue Now, `ready` labels is required to start pd test or long term test. And when `long-term-test` or `pd-test` is labeled after another one, the old labeled test will be re-run again. So the labeled test should be ran in the following step: 1. decide which test need run, then label it. `long-term-test` or `pd-test` or both. 2. add `ready-for-test` label, then the test will be ran. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-28 06:31:35 +08:00
Shuqiao Li	01e3d59eae	add workflow to build and release wheel (#775 ) ### What this PR does / why we need it? This is a continuing work of #716. This PR add workflow to build and release wheel, and also release source to PYPI. We have 3 conditions to trigger the workflow: 1. PR to `main` and `-dev` 2. push to `main` and `-dev` 3. push tag with name of `v*` Release to PYPI will only be done under condition 3. Under condition 1 and 2, it will generate .tar.gz and build .whl, upload to github artifacts but will not release. update: Will build .whl and upload to github artifacts with scheduled task. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? All triggered conditions are well tested with my fork repo. --------- Signed-off-by: Shuqiao Li <celestialli@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-26 14:18:26 +08:00
jiangpeng	df58fb80ee	Spec decode support for V1 Engine (#874 ) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> Make spec decode support for V1 Engine - Currently, Ascend does not support the triton kernel. PyTorch is used to rewrite the `rejection_sampler.py` triton kernel. However, PyTorch is not as good as Triton. Therefore, ascend c is used to implement the function in the future. - Currently, spec decode supports only the ngram algorithm. The eagle algorithm needs to be further adapted. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> Not change user facing. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> test by `tests/singlecard/spec_decode/e2e/test_v1_spec_decode.py` and `tests/sample/test_rejection_sampler.py`, test base function of rejection sampler and e2e function of spec decode. Signed-off-by: ponix-j <657511300@qq.com>	2025-05-23 14:25:46 +08:00
yupeng	0f53b138f6	[V1][LoRA][Test] V1 Engine LoRA support & e2e test (#893 ) ### What this PR does / why we need it? Add V1Engine LoRA support. Add LoRA e2e test on single card and multiple cards. ### Does this PR introduce _any_ user-facing change? support lora for V1 ### How was this patch tested? CI passed with new added test --------- Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: paulyu <paulyu0307@gmail.com> Signed-off-by: paulyu12 <507435917@qq.com> Co-authored-by: jesse <szxfml@gmail.com> Co-authored-by: paulyu <paulyu0307@gmail.com>	2025-05-22 19:20:51 +08:00
Wan_Danfeng	5cf9ff18e9	[Performance]: Custom AscendC Kernel of Multi-Step Prepare Input (#814 ) ### What this PR does / why we need it? - According to https://github.com/vllm-project/vllm-ascend/issues/807, we pull request for customer ascendc kernel of multi-step. - also a bug we found in multi_step_runner.py is fixed when we use multi-step on V0 Engine. ### Does this PR introduce _any_ user-facing change? no user-facing change ### How was this patch tested? we add Unit Test file and offline inference file to test the custom ascendc kernel. See test/ops/test_multi_step.py and examples/offline_multi_step.py --------- Signed-off-by: wan_danfeng <wonderful199082@126.com>	2025-05-20 09:31:30 +08:00
Yikun Jiang	508242425c	[CI][1/N] Add basic ci for PD disaggregation (#830 ) ### What this PR does / why we need it? Add basic CI for PD disaggregation, and enable it when schedule and label with `module:pd` - Updated `.github/actionlint.yaml` to add a new self-hosted runner configuration: `linux-arm64-npu-static-8`. - Introduced a new GitHub Actions workflow `.github/workflows/vllm_ascend_test_pd.yaml` for PD disaggregation testing: - Scheduled to run daily at 23:00 UTC and triggered by pull request label `module:pd`. - Added steps for baisci installation and other steps will add in followup PR Related: https://github.com/vllm-project/vllm-ascend/issues/841 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed - No trigger by default <img width="847" alt="image" src="https://github.com/user-attachments/assets/23aa128f-526d-447f-91c8-8ebf6be8400f" /> - Trigger only if we tag with pd <img width="930" alt="image" src="https://github.com/user-attachments/assets/aef1caca-2029-48e8-a6e6-860136adcd37" /> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-14 18:04:16 +08:00
Yikun Jiang	59e02502b1	[CI] Add e2e test frame work and doctest (#730 ) ### What this PR does / why we need it? Add quickstart doctest CI ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - CI passed - Run `/vllm-ascend/tests/e2e/run_doctests.sh` Related: https://github.com/vllm-project/vllm-ascend/issues/725 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-14 09:27:54 +08:00
wangxiyuan	6193ba679b	[CI] add codespell CI and fix format.sh (#827 ) 1. Fix format check error to make format.sh work 2. Add codespell check CI 3. Add the missing required package for vllm-ascend. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-12 22:04:48 +08:00
hfadzxy	4a2505f81f	[accuracy test]Update cann version and huggingface-hub version for Qwen3 (#823 ) ### What this PR does / why we need it? 1. update cann version to 8.1.0 for multimodal 2. fix huggingface-hub version to adapt to qwen3 3. change Qwen3-8B to Qwen-8B-Base, Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-05-12 19:12:48 +08:00
Li Wang	8e4e791fcd	[CI] Add deepseek-v2-lite test (#631 ) ### What this PR does / why we need it? Add deepseek-v2-lite test, part of #499 --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-05-12 14:59:17 +08:00
wemaster	19c8e134e4	[CI/UT] fix spec ut in vllm-ascend main and vllm main (#759 ) ### What this PR does / why we need it? #### 1. fix spec ut in vllm-ascend main and vllm main As https://github.com/vllm-project/vllm-ascend/pull/694 and https://github.com/vllm-project/vllm-ascend/pull/749 verify, Now, vllm-ascend main and vllm 0.8.5, spec UT is happy, but vllm-ascend main and vllm main, CI is fail. I found the reason is a triton bug https://github.com/triton-lang/triton/issues/2266, but i I didn't figure it out that why the bug did not effect vllm-ascend main and vllm 0.8.5, maybe the usage of triton have changed when vllm 0.8.5 to latest main As the bug describe, I changed the minimum block_size in UT from 8 to 16, and the modification is verified locally to be effective. #### 2. modify some case skip form. I modified some commented out cases to skipif form, which is more standardized. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI Signed-off-by: mengwei805 <mengwei25@huawei.com>	2025-05-10 09:45:56 +08:00
Li Wang	58d2f85c4a	[CI] Fix schedule trigger bug (#757 ) ### What this PR does / why we need it? This PR aims to fix nightly ci [broken](https://github.com/vllm-project/vllm-ascend/actions/runs/14848150987) We have a workflow containing multiple triggers: - push events (to the default branch) - pull requests (against the default branch) - scheduled events Our paths-filter action works great for the first two use-cases, detecting the context and base to compare against. However, it fails for scheduled events giving the error `This action requires 'base' input to be configured or 'repository.default_branch' to be set in the event payload.` For the scheduling trigger event, we choose to skip this filter because we don't need its results: ``` - name: Check for changes in Speculative Decode if: github.event_name != 'schedule' ``` Signed-off-by: wangli <wangli858794774@gmail.com>	2025-05-10 09:45:07 +08:00
Yikun Jiang	5897dc5bbe	[Build] Bump vLLM version to v0.8.5.post1 (#755 ) ### What this PR does / why we need it? Bump vllm version to v0.8.5.post1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-06 11:44:12 +08:00
Yikun Jiang	79538b5d73	Upgrade CANN version to 8.1.rc1 (#747 ) ### What this PR does / why we need it? Make CANN version bump separately from https://github.com/vllm-project/vllm-ascend/pull/708 - Upgrade CANN version to 8.1.rc1 - Add prefix to speed up download `m.daocloud.io/quay.io/ascend/cann:8.1.rc1-910b-ubuntu22.04-py3.10` - Address tail sapce for Dockerfile.openEuler - Add note for `/workspace` and `/vllm-workspace` as followup of https://github.com/vllm-project/vllm-ascend/pull/741 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? CI passed Co-authored-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: MengqingCao <cmq0113@163.com>	2025-05-06 05:44:18 +08:00
Yikun Jiang	d2ead057ae	Re-enable Speculative Decode test for vLLM v0.8.5 (#749 ) ### What this PR does / why we need it? Re-enable Speculative Decode test for vLLM v0.8.5 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-05-02 14:44:48 +08:00
whx	8b194ad12e	[Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694 ) ### What this PR does / why we need it? - This PR proposes a P2P version of Disaggregated Prefill based on llm_datadist which manages data transfer. - This solution reconstructs previous offline single-node Disaggregated Prefill solution, and supports multi-node and online serveing now. - Currently this solution supports 1P1D situation of Deepseek hybrid parallelism (P: TP+EP, D: DP+EP). Note that xPyD situation is considered in the solution design, and will be supported soon within v1 engine. --------- Signed-off-by: hw_whx <wanghexiang7@huawei.com> Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Co-authored-by: hw_whx <wanghexiang7@huawei.com> Co-authored-by: ganyi <pleaplusone.gy@gmail.com>	2025-05-01 22:31:36 +08:00
Mengqing Cao	399b03830d	[Build][Bugfix] Fix source code path to avoid reference error (#726 ) ### What this PR does / why we need it? Fix source code path to avoid reference error in docker image fix https://github.com/vllm-project/vllm-ascend/issues/725 Signed-off-by: MengqingCao <cmq0113@163.com>	2025-04-30 17:38:13 +08:00
hfadzxy	affca6f348	[Test] Add accuracy test report workflow (#542 ) ### What this PR does / why we need it? 1. Provide accuracy test report for development branch release. 2. Models and datasets for accuracy test： \| Model \| datasets \| \|---------------------------- \| --------------------------- \| \| Qwen2.5-7B-Instruct \| ceval-val, gsm8k, mmlu \| \| Qwen3-8B \| ceval-val, gsm8k, mmlu \| \| Llama-3.1-8B-Instruct \| ceval-val, gsm8k, mmlu \| \| Qwen2.5-VL-7B-Instruct \| mmmu_val \| ### Does this PR introduce _any_ user-facing change? This PR will display the accuracy test report of the release versionin docs/source/developer_guide/accuracy_report。 Qwen2.5-7B-Instruct.md Qwen3-8B.md Llama-3.1-8B-Instruct.md Qwen2.5-VL-7B-Instruct .md Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-04-30 14:53:58 +08:00
wangxiyuan	f8350569e6	[CI] upgrade vllm to 0.8.5 (#715 ) 1. Upgrade vllm to 0.8.5 2. Drop 0.8.4 support 3. Keep doc to 0.8.4rc2 until we release 0.8.5 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-30 09:15:50 +08:00
Li Wang	7aee9228f0	[CI] Add nightly CI (#668 ) ### What this PR does / why we need it? Add nightly CI for basic function and model usability --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-29 16:35:52 +08:00
wemaster	54c0e63df7	[MTP] follow custom deepseek modeling changes to support graph mode (#636 ) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? As custom deepseek modeling do some changes to support graph mode in https://github.com/vllm-project/vllm-ascend/pull/585, so i follow it to change custom deepseek_mtp modeling. And some modifications for k>1 were not carried over by the https://github.com/vllm-project/vllm-ascend/pull/429, now i add it. In order to better take care of the MTP feature in the vllm-ascend repository, I added cases related to graph mode(torchair), but i skip it since torchair can not correctly clean up memory in vllmrunner. Also i add some case for MTP quantization weights, but test weight is not ready, so i skip it and i will open it when test quant weights is ready. https://github.com/vllm-project/vllm-ascend/pull/648 did not completely fix the sample change(https://github.com/vllm-project/vllm-ascend/issues/660) issue, I added the relevant changes. ### Does this PR introduce _any_ user-facing change? now, u can use following method to use mtp in deepseek v3/r1 float or quant weights with eager mode. ```python llm = LLM( model="wemaster/deepseek_mtp_main_random_bf16", tensor_parallel_size=2, speculative_config={ "num_speculative_tokens": 1, }, enforce_eager=True, trust_remote_code=True, disable_log_stats=False, gpu_memory_utilization=0.8, max_model_len=64, ) ``` or use mtp in deepseek v3/r1 float or quant weights with graph mode（torchair） ```python llm = LLM( model="wemaster/deepseek_mtp_main_random_bf16", tensor_parallel_size=2, speculative_config={ "num_speculative_tokens": 1, }, trust_remote_code=True, additional_config={ 'enable_graph_mode': True, }, disable_log_stats=False, gpu_memory_utilization=0.8, max_model_len=64, ) ``` add notes: 1. now, we support k>1, so u can set num_speculative_tokens > 1 if there is sufficient redundant computing power; 2. MTP is not supported in V1, we will support it when vLLM does it in https://github.com/vllm-project/vllm/issues/13500. 3. if u run MTP failed by `segmentation fault`, u can follow v0.7.3 patch https://github.com/vllm-project/vllm-ascend/pull/236 file `vllm_ascend/patch/patch_metrics.py` method `__npu_async_metrics_collector_init__` ### How was this patch tested? local tested passed and test by CI Signed-off-by: mengwei805 <mengwei25@huawei.com>	2025-04-28 21:18:53 +08:00
dependabot[bot]	8849cf1eda	Bump actions/setup-python from 5.5.0 to 5.6.0 (#697 ) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.5.0 to 5.6.0. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-04-28 14:06:38 +08:00
Icey	ee7a0e2cd4	Update openEuler dockerfile for COMPILE_CUSTOM_KERNELS=1 (#689 ) ### What this PR does / why we need it? Update openEuler dockerfile for COMPILE_CUSTOM_KERNELS=1 ### Does this PR introduce _any_ user-facing change? No Signed-off-by: Icey <1790571317@qq.com>	2025-04-28 11:45:46 +08:00
Yikun Jiang	2e20797934	[BUILD] Upgrade torch-npu to 2.5.1 (#661 ) ### What this PR does / why we need it? The torch-npu 2.5.1 are published: https://pypi.org/project/torch-npu/2.5.1/ It's time to remove all torch-npu dev version from vllm-ascend code base ### Does this PR introduce _any_ user-facing change? Yes, using torch-npu 2.5.1 ### How was this patch tested? - [ ] CI passed - [ ] Manually test - [ ] Grep all `dev2025` --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-27 17:28:29 +08:00
Bug Hunter Yan	05bdcbeae4	support aclgraph (#426 ) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> This PR supports the access of vllm-acend to the piecewise_graph feature provided by the v1 engine. 1. register unifiled_ascend_attention_with_output for piecewise_graph to split graph. 2. support NPUGraph to accelerate kernel launch. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means any user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> support npugraph to default， Users can disenable the npugraph feature by configuring enforce_eager. This has corresponding requirements for the versions of torch_npu and CANN, and they need to support graph capture. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> it turn to default --------- Signed-off-by: Bug Hunter Yan <yanpq@zju.edu.cn> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-04-23 20:56:24 +08:00
wemaster	0ae9ee0f8a	[BUGFIX] main-sd-bugfix && [UT] add mtp UT (#593 ) ### What this PR does / why we need it? The pr will fix some bug about spec decode / MTP The pr add a mtp e2e UT `test_mtp_correctness.py` vllm_ascend/attention/attention.py 1. add support `self.attn_mask_cache` only has 1 element to cover scene in which both spec docode and chunked prefill are enabled. vllm_ascend/distributed/parallel_state.py 1. remove 2 assert because spec decode worker would use init_worker twice vllm_ascend/models/deepseek_mtp.py 1. remove unused params; 2. add support w8a8 in `CustomDeepSeekMTP` vllm_ascend/quantization/quant_config.py 1. use `AscendUnquantizedFusedMoEMethod` instead of `UnquantizedFusedMoEMethod` other 1. replace `from vllm.logger import init_logger` to `from vllm.logger import logger` all of the vllm-ascend project ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Signed-off-by: mengwei805 <mengwei25@huawei.com>	2025-04-21 19:25:51 +08:00
Yikun Jiang	96d6fa7c90	[Docker] Fix openEuler image suffix (#586 ) ### What this PR does / why we need it? There was a bug when we release v0.8.4rc1 (openEuler image tag was wrong set to 0.8.4rc1), according doc of docker-meta-action, it should be append suffix: ``` tags: \| type=pep440,enable=true,priority=900,prefix=,suffix=,pattern=,value= ``` This patch just fix openEuler image suffix to make pep440 tag rule work. This patch also remove the cache step because the cache step bring more than 10mins export, but reduce less time in next trigger. ### Does this PR introduce _any_ user-facing change? Yes, docker image tag set to right ### How was this patch tested? I test with in my fork repo by setting default branch: - release a tag: v0.7.88rc1 (pep440 tag) - The log show `--label org.opencontainers.image.version=v0.7.88rc1-openeuler` is right rule https://github.com/Yikun/vllm-ascend/actions/runs/14560411481/job/40842950165#step:9:205 Related: https://github.com/vllm-project/vllm-ascend/pull/489 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-21 08:55:26 +08:00
wangxiyuan	42c7fbb10e	[Misc] Fix import error and address nits to make CI happy (#563 ) 1. Add `vllm_version_is` function to check vllm version. 2. `ensure_kv_transfer_initialized` and `get_kv_transfer_group ` have been moved to other place in vllm main branch via `3408e47159` , this patch fix the import error. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-18 12:23:32 +08:00
Mengqing Cao	6ee7f5cf71	[SpecDecode] Add spec decode support (#500 ) ### What this PR does / why we need it? Backport: https://github.com/vllm-project/vllm-ascend/pull/252 This support speculative decoding in Ascend, including speculating with a draft model、by matching n-grams in the prompt、using MLP speculators and using EAGLE based draft models. Backport: https://github.com/vllm-project/vllm-ascend/pull/423 spec decode MultiStepWorker support TP1DraftModelRunner fully, support run the draft_model_runner with multi-step prepare on the NPU directly and support draft_model_runner use MLA. 1. before this pr, `MultiStepWorker` would not step into the branch using NPU prepare, but only into the branch using CPU prepare (`line 52` of `vllm_ascend/patch/patch_multi_step_worker.py`). Although this has `no effect` on the `correct operation` of speculative decoding and the performance of the two branches is basically the same as of the current version, I support entering this branch in this PR. In general, there are two main changes in `patch_multi_step_worker.py`: first, the `is_cuda_like()` check is removed and the `TP1DraftModelRunner` rewritten in vllm_ascend is used; second, the `supports_gpu_multi_step()` function is made to return true on NPU devices when outer Multi_step_worker could work correct. 3. before this pr, `TP1DraftModelRunner` only supports Attention on NPU, but not MLA. The relevant adaptation is in `vllm_ascend/worker/draft_model_runner.py`. Although I don’t know why the `input_positions` of `model_input.attn_metadata` in vllm-ascend needs to be added in `execute_model`, it is done in `model_runner.py`, so I also made corresponding changes. Otherwise, when atten_backend is MLA, it will prompt that input_positions cannot be found. 4. I commented out two lines in `draft_model_runner.py` in `line118` to support the scenario of K>1. ``` # lora_mapping=model_input.lora_mapping, # lora_requests=model_input.lora_requests, ``` I added comments. In the future, when vllm-ascend supports lora feature, the changes here can be restored. TODO： - [ ] revert the patch when the related issues are addressed in vllm ### How was this patch tested? CI passed with new added test. - e2e test for medusa proposer: tests/singlecard/spec_decode/e2e/test_medusa_correctness.py - e2e test for mlp proposer: tests/singlecard/spec_decode/e2e/test_mlp_correctness.py - e2e test for n-gram proposer: tests/singlecard/spec_decode/e2e/test_ngram_correctness.py Tests for patched files: - tests/singlecard/spec_decode/test_dynamic_spec_decode.py - tests/singlecard/spec_decode/test_multi_step_worker.py - tests/singlecard/spec_decode/test_ngram_worker.py - tests/singlecard/spec_decode/test_spec_decode_worker.py --------- Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: mengwei805 <mengwei25@huawei.com>	2025-04-17 20:16:32 +08:00
hfadzxy	9935d45728	[CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460 ) ### What this PR does / why we need it? Add model basic accuracy test(Qwen2.5-0.5B-Instruct) Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-04-17 14:59:56 +08:00
Li Wang	9859e7313f	[CI]Add global env to runner (#537 ) ### What this PR does / why we need it? - add `HF_TOKEN` as global var to the runner - add `HF_ENDPOINT` as global var to the runner - change concurrency group, rely on current pr num --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-17 10:08:00 +08:00
wangxiyuan	434749d299	[CI] update 0.8.3 to 0.8.4 (#528 ) Update 0.8.3 CI to 0.8.4 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-16 09:26:30 +08:00
Li Wang	13480d1238	[CI]Fix workflow (#532 ) ### What this PR does / why we need it? make linux-npu-4 runner run parallel for now Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-15 19:55:41 +08:00
wangxiyuan	9c7428b3d5	[CI] enable custom ops build (#466 ) ### What this PR does / why we need it? This PR enable custom ops build by default. ### Does this PR introduce _any_ user-facing change? Yes, users now install vllm-ascend from source will trigger custom ops build step. ### How was this patch tested? By image build and e2e CI --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-04-12 10:24:53 +08:00
Icey	d05ea17427	Add openEuler based container image for vLLM Ascend (#489 ) ### What this PR does / why we need it? Provide users with openEuler-based vllm images, so modify the quick start readme ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? There is no need for performing any test. --------- Signed-off-by: Icey <1790571317@qq.com>	2025-04-10 14:30:49 +08:00
Li Wang	afdbf77483	[CI] Add new runner and enable QwQ multinpu test (#417 ) ### What this PR does / why we need it? - Add a new runner to the continuous integration system and keep the original CI runner until the new runner runs stably - Add distributed test cases ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: wangli <wangli858794774@gmail.com>	2025-04-08 16:52:45 +08:00
Yikun Jiang	adabdeea7f	Set numpy < 2.0.0 to resolve numpy VersionConflict (#476 ) ### What this PR does / why we need it? vLLM bumps numpy version to 2.x: `8427f70493` , this will cause a `pip._vendor.pkg_resources.ContextualVersionConflict: (numpy 2.2.4 (/usr/local/python3.10/lib/python3.10/site-packages), Requirement.parse('numpy==1.26.4'), {'vllm-ascend'})` failure when vllm ascend install. This PR resolved the issue by: - Set numpy < 2.0.0 to resolve numpy VersionConflict - Sync requirements and toml - Reorder ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Closes: https://github.com/vllm-project/vllm-ascend/issues/473 Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-04-07 16:07:21 +08:00
Pleaplusone	ce8259975e	[core] Support custom ascendc kernels in vllm-ascend (#233 ) This PR add custom ascendc kernel rotary_embedding support in vllm-ascend, related CMakeLists and setuptools is also added in this PR. Related: https://github.com/vllm-project/vllm-ascend/issues/156 --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>	2025-04-03 14:52:34 +08:00
dependabot[bot]	78083d405e	Bump actions/setup-python from 5.4.0 to 5.5.0 (#440 ) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.4.0 to 5.5.0. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-04-01 14:34:33 +08:00
Mengqing Cao	2dbd763584	[CI] Fix mypy CI (#443 ) ### What this PR does / why we need it? Fix CI by updating mypy and pining numpy version _the modification of model_runner_v1 is just to make CI happy_ ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed Signed-off-by: MengqingCao <cmq0113@163.com>	2025-04-01 09:25:33 +08:00
wangxiyuan	b6499ed97d	[CI] Use CI pool (#428 ) Use CI pool instead of self-host for e2e test to speed up CI. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-29 12:42:59 +08:00
wangxiyuan	31f29b9f30	[Core] Make V1 work and enable V1 engine test (#389 ) 1. Make sure the version is string before parse in collect_env 2. Add basic V1 engine test Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-28 19:34:23 +08:00
Zhenyu Zheng	4804b74e95	Update 110-user-story.yml (#402 ) Fix a few typos in issue template Signed-off-by: Zhenyu Zheng <zheng.zhenyu@outlook.com>	2025-03-27 08:58:57 +08:00
Zhenyu Zheng	0b5a9643fd	Add an example for user stories (#399 ) Add an example for user stories and fix some typo Add a new section, user story in the docs, to collect user stories of llvm-ascend, also add an example and the issue template to collect user story Signed-off-by: Zhenyu Zheng <zheng.zhenyu@outlook.com>	2025-03-26 16:25:57 +08:00
Mengqing Cao	6295d2e9bc	[CI/Build][Doc] upgrade torch-npu to 0320 (#392 ) ### What this PR does / why we need it? This pr upgrades torch-npu to 0320, so that #321, https://github.com/vllm-project/vllm-ascend/issues/267#issuecomment-2745045743 could be fixed, and #372 should be reverted after this pr ### Does this PR introduce _any_ user-facing change? upgrade torch-npu to 0320 ### How was this patch tested? tested locally with long seq inferencing. --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-03-26 09:04:12 +08:00
Mengqing Cao	8996733307	[CI] fix vllm test (#365 ) fix vllm test Signed-off-by: MengqingCao <cmq0113@163.com>	2025-03-24 16:09:06 +08:00
wangxiyuan	663dca7578	[CI] fix race condition problem (#353 ) fix race condition problem Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-03-19 17:04:36 +08:00
Yikun Jiang	18bb8d1f52	Adapt vLLM requirements changes to fix main CI (#279 ) ### What this PR does / why we need it? Adapt vLLM requirements changes: `206e2577fa (diff-01ec17406c969585ed075609a2bbf2f2f4fe3e3def36946694abe6d4eb60a6f2)` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>	2025-03-09 16:07:45 +08:00

1 2

75 Commits