xc-llm-ascend

Author	SHA1	Message	Date
Yizhou	ec1d2b5c04	[Test] Temporarily skip flaky ACL graph test (#3577 ) ### What this PR does / why we need it? Disables `FULL_DECODE_ONLY` end-to-end test that fails intermittently. This prevents CI blockages while the root cause of the flakiness is investigated. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? None needed. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-10-21 17:16:15 +08:00
lilinsiman	70bef33f13	add new accuracy test case for aclgraph (#3390 ) ### What this PR does / why we need it? Add new accuracy test case Deepseek-V2-Lite-W8A8 for aclgraph ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-10-20 20:04:04 +08:00
panchao-hub	1756efa5fd	[Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125 ) ### What this PR does / why we need it? Adds support for capturing the Multi-Layer Attention (MLA) decode operation into an ACL graph. This improves performance by compiling the attention kernel for single-token decoding. Key changes include: - Implementing the graph capture logic for the MLA kernel, including workspace management and parameter updates. - Modifying the rotary embedding (RoPE) handling to use pre-allocated tensors, which is a requirement for graph capture. - Adding a `build_for_graph_capture` method to the MLA metadata builder to create dummy metadata during the graph compilation phase. Known issues: - Currently, MTP is not supported in FULL_DECEDE_ONLY mode -- we're working on a fix - We are preparing to remove update_mla_attn_params with auto_dispatch_capture ### Does this PR introduce _any_ user-facing change? compilation_config={ "cudagraph_mode": "FULL_DECODE_ONLY", }, ### How was this patch tested? - vLLM version: v0.11.0 --------- Signed-off-by: panchao-hub <315134829@qq.com> Signed-off-by: p00465316 <panchao13@huawei.com> Co-authored-by: p00465316 <panchao13@huawei.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-10-10 16:31:20 +08:00
wangxiyuan	fef18b60bc	Refactor e2e CI (#2276 ) Refactor E2E CI to make it clear and faster 1. remove some uesless e2e test 2. remove some uesless function 3. Make sure all test runs with VLLMRunner to avoid oom error 4. Make sure all ops test end with torch.empty_cache to avoid oom error 5. run the test one by one to avoid resource limit error - vLLM version: v0.10.1.1 - vLLM main: `a344a5aa0a` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-02 09:02:22 +08:00
22dimensions	9e65da990e	[Misc] Add warning for incompatible Ray backend with ACL Graph mode (#2132 ) ### What this PR does / why we need it? cherry-pick #1501 from 0.9.1-dev to main Currently, Ray is not compatible with ACL Graph, so we need to fall back to eager mode when using the Ray backend. co-authored: Yizhou Liu <liu_yizhou@outlook.com> - vLLM version: v0.10.0 - vLLM main: `2836dd73f1` Signed-off-by: 22dimensions <waitingwind@foxmail.com>	2025-08-01 09:06:09 +08:00
Mengqing Cao	574fe407eb	[1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841 ) ### What this PR does / why we need it? We'll refator `CustomOp` in vllm-ascend from this pr on. Use function `CustomOp.register_oot` to achieve the customop registery, taking `AscendQuickGELU` as an example: ```python from vllm_ascend.ops.activation import AscendQuickGELU CustomOp.register_oot(_decorated_op_cls=AscendQuickGELU, name="QuickGELU") ``` This is a quick adapt for `CustomOp.register_oot` mechanism from vllm 0.9.2. For further step, we can remove inherit from `QuickGELU` can write our own `QuickGELU` at all. Part of https://github.com/vllm-project/vllm-ascend/pull/1647 - vLLM version: v0.9.2 - vLLM main: `8dfb45ca33` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-07-18 23:07:14 +08:00
wangxiyuan	787010a637	[Test] Remove VLLM_USE_V1 in example and tests (#1733 ) V1 is enabled by default, no need to set it by hand now. This PR remove the useless setting in example and tests - vLLM version: v0.9.2 - vLLM main: `9ad0a4588b` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-15 12:49:57 +08:00
Mengqing Cao	cc210f46e6	[AscendScheduler][Bugfix] Remove num_draft_tokens while allocating slots (#1718 ) ### What this PR does / why we need it? Now there is no need to calculate `num_draft_tokens` when allocating slots. This PR follows the changes in vllm: https://github.com/vllm-project/vllm/pull/20701 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test - vLLM version: v0.9.2 - vLLM main: `cc876d0f29` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-07-10 18:47:45 +08:00
ApsarasX	c58accc15e	[Bugfix] Support Qwen3-MOE on aclgraph mode (#1381 ) ### What this PR does / why we need it? Fix the shape of the `npu_moe_init_routing` input parameters to support aclgraph mode on qwen3-moe In addition to this PR, resolving the `gatherv3` error might be necessary. See related PR https://github.com/vllm-project/vllm-ascend/pull/1297 https://github.com/vllm-project/vllm-ascend/pull/1446 Thanks to @yiz-liu for providing the idea ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested on Qwen3-30B-A3B Closes: https://github.com/vllm-project/vllm-ascend/issues/1368 --------- Signed-off-by: ApsarasX <apsarax@outlook.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-07-06 15:29:36 +08:00
wangxiyuan	69b817ed65	[CI] Add unit test framework (#1201 ) This PR added the unit test framework to enable ut for vLLM Ascend. Unit test runs on CPU machines. It'll be ran once lint check is passed the same as e2e test. For unit test, this PR created a new folder called `ut` under `tests` module. All the test file in `ut` should keep the same with the code in `vllm-ascend`. The file name should be start with `test_` prefix. For example, in this PR. the `test_ascend_config.py` is added for `ascend_config.py` test. A new fille `worker/test_worker_v1.py` is also added as the placeholder. This file should be the unit test for `vllm-ascend/worker/worker_v1.py`. Additional, a new `fake_weight` folder is added, it contains the config.json from `facebook/opt-125m`, so that the test will not always visit huggingface. TODO: We should add all the unit test file one by one in the future. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-06-16 18:32:28 +08:00

10 Commits