xc-llm-ascend

Author	SHA1	Message	Date
rjg-lyh	ebd45b6596	[V0.11.0][Core] Restore scheduling logic under default configuration (#4094 ) ### What this PR does / why we need it? Cherry-pick #3967 from main branch. This PR reverts the changes introduced in PR #2894 Initially, due to performance issues with the older version of the chunked prefill ops, the default behavior was to use the Ascend scheduler to disable the chunked prefill feature. However, with the improvements in the performance of the new chunked prefill ops, this interception strategy has been removed. This change also aligns with the community's default configuration behavior. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Signed-off-by: rjg-lyh <1318825571@qq.com>	2025-11-10 20:02:23 +08:00
lilinsiman	016337eaec	[v0.11.0][UT] Add new ut case for aclgraph enable (#4038 ) ### What this PR does / why we need it? add new ut case for aclgraph enable ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-11-07 11:35:24 +08:00
zhangxinyuehfad	fdac146f71	[UT] fix skip ut test and enable ut test run normally (#3410 ) ### What this PR does / why we need it? fix skip ut test and enable ut test run normally ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: hfadzxy <starmoon_zhang@163.com>	2025-10-20 16:30:57 +08:00
linfeng-yuan	1c5900327b	[refactor] refactor deepseek-related files (#2849 ) ### What this PR does / why we need it? This PR deletes ~2K lines of code about deepseek modeling. It falls back CustomDeepseekV2 modules to original vllm implementations and adapts some modifications in vllm about deepseek and moe. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? E2E vllm serving with torchair graph mode and eager mode. - vLLM version: v0.10.2 - vLLM main: `759ef49b15` --------- Signed-off-by: linfeng-yuan <1102311262@qq.com> Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> Co-authored-by: yiz-liu <136800916+yiz-liu@users.noreply.github.com> Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com>	2025-09-16 14:13:07 +08:00
wangxiyuan	7d6d9449a8	[Misc] Move lora patch file into lora module (#2797 ) Cleanup useless file in patch module. Update the lora support list is OK in vLLM Ascend, no need to patch vLLM - vLLM version: v0.10.1.1 - vLLM main: `f4962a6d55` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-08 21:42:12 +08:00
无脸男	0c0789be74	[Feat] allow using aclgraph in ray backend (#2589 ) ### What this PR does / why we need it? Allow using aclgraph in ray backend, for tp + pp + aclgraph in multi machine ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `4ba0c587ba` Signed-off-by: withHades <244036962@qq.com>	2025-09-04 11:45:56 +08:00
wangxiyuan	f023bd52bf	[CI] Make test_platform UT stable (#2696 ) Make test_platform stable - vLLM version: v0.10.1.1 - vLLM main: `56d04089ef` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-02 18:34:04 +08:00
weijinqian0	6f1047d5fd	[CI] fix UT error. (#2644 ) `69f46359dd` changed the vl input usage, this PR fix the related UT failure. - vLLM version: v0.10.1.1 - vLLM main: `d660c98c1b` --------- Signed-off-by: weijinqian_v1 <weijinqian@huawei.com> Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>	2025-08-30 12:04:01 +08:00
Mengqing Cao	6c973361fc	[Bugfix] Fix aclgraph not enabled by default (#2590 ) ### What this PR does / why we need it? As vllm will set `cudagraph_mode` to `NONE` before `check_and_update_config` in post init of `VllmConfig` (`5da4f5d857/vllm/config/__init__.py (L3630)`), we always have `cudagraph_mode` isn't `None`, thus we must remove this check and add it when the related adaption in vllm is done. part of https://github.com/vllm-project/vllm-ascend/pull/2577, will add the e2e test on applying reply after the CI refactor is done ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.1.1 - vLLM main: `f48a9af892` Signed-off-by: MengqingCao <cmq0113@163.com>	2025-08-28 14:08:31 +08:00
Mengqing Cao	a9e78a3299	[Aclgraph] Update compilation config in `check_and_update_config` (#2540 ) ### What this PR does / why we need it? This pr updates compilation config in `check_and_update_config`, we use `compilation_config.level` to update `compilation_config.cudagraph_mode` to ensure the config is correct. Add `compilation_config.cudagraph_num_of_warmups = 1` when V1 is enabled, cause this is also used in torchair graph mode. and this fixes https://github.com/vllm-project/vllm-ascend/issues/2523 fix the bug that the `aclgraphmode` always be `NONE` while running forward in aclgraph mode ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.10.1.1 - vLLM main: `f58675bfb3` --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-08-27 09:30:25 +08:00
zhanghw0354	b3fdd78a6b	[Main][Refactor]Change ASCEND_QUATIZATION_METHOD to ASCEND_QUANTIZATION_METHOD (#2517 ) ### What this PR does / why we need it? The constant ASCEND_QUATIZATION_METHOD in vllm_ascend/utils.py is misspelled and should be corrected to ASCEND_QUANTIZATION_METHOD. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.10.1.1 - vLLM main: `c9abb10489` Signed-off-by: zhanghaiwen <zhanghaiwen@cmss.chinamobile.com> Co-authored-by: zhanghaiwen <zhanghaiwen@cmss.chinamobile.com>	2025-08-26 09:06:16 +08:00
Mengqing Cao	60ac4fb576	[QuickFix] Skip failed ut to recover CI quickly (#2484 ) ### What this PR does / why we need it? Skip failed ut to recover CI quickly related ut: - `test_embed_models_correctness`: revert me when pooler is adapted with the latest vllm main - `test_check_and_update_config_enforce_eager_mode`: revert me when the occasional failed is fixed - vLLM version: v0.10.0 - vLLM main: `8896eb72eb` Signed-off-by: MengqingCao <cmq0113@163.com>	2025-08-22 14:14:51 +08:00
linfeng-yuan	0ca3f48c90	[2/N][refactor] torchair deepseek mla backend refactor (#2459 ) ### What this PR does / why we need it? This PR move current unified mla backend to torchair folder and remove torchair-related code in attention/mla_v1.py (1.3k -> 0.9k). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Running eager mode with mla backend, and torchair mode with code before [2445](https://github.com/vllm-project/vllm-ascend/pull/2445) - vLLM version: v0.10.0 - vLLM main: `f571ff8eb6` Signed-off-by: linfeng-yuan <1102311262@qq.com>	2025-08-21 14:02:30 +08:00
Mengqing Cao	1327f9be1c	Fix some ci issue and refactor modelrunner (#2445 ) ### What this PR does / why we need it? Fix some ci issue and refactor modelrunner ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: `4d9c61993a` --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: weiguihua2 <weiguihua2@huawei.com>	2025-08-20 09:01:04 +08:00
Shanshan Shen	83e0f41408	[3/N][Refactor] Move `torchair_attention` to `torchair` dir (#2017 ) ### What this PR does / why we need it? 1. Move `torchair_attention` to `torchair` dir. 2. Make `AscendAttentionTorchairBackend` extend `AscendAttentionBackend` to reduce duplicate methods. 3. Make `AscendTorchairMetadata` extend `AscendMetadata` to reduce duplicate properties. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: `0933f9d518` --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2025-08-19 10:25:22 +08:00
lbk-sys	c611291661	【main】SP For Qwen3 MoE (#2209 ) ### What this PR does / why we need it? Qwen3 MoE supports SP. In scenarios like AlltoAll, AlltoAllv, and MC2, replacing AllReduce with Reduce-Scatter and AllGather achieves computational benefits in norm operations while saving one AllGather communication. This feature is enabled during the P-phase and delivers notable gains in long-sequence scenarios (e.g., 16k–25k), with performance improvements reaching 5%–10%. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ``` compilation_config={ "pass_config":{ "enable_sequence_parallelism": True } }, enable_expert_parallel=True, ``` - vLLM version: v0.10.0 - vLLM main: `9edd1db02b` --------- Signed-off-by: libaokui <libaokui@huawei.com> Co-authored-by: libaokui <libaokui@huawei.com>	2025-08-07 09:15:49 +08:00
leo-pony	807f0895b2	Bump torch version to 2.7.1 (#1562 ) ### What this PR does / why we need it? Bump torch version to 2.7.1, and cleanup infer schema patch https://github.com/vllm-project/vllm-ascend/commit/857f489 (https://github.com/vllm-project/vllm-ascend/pull/837), this patch depends on also: https://github.com/vllm-project/vllm-ascend/pull/1974 ### Does this PR introduce any user-facing change? No #### How was this patch tested? CI passed torch-npu 2.7.1rc1 install guide: https://gitee.com/ascend/pytorch/tree/v2.7.1/ install depending: ``` pip3 install pyyaml pip3 install setuptools ``` install torch-npu: Closes: https://github.com/vllm-project/vllm-ascend/issues/1866 Closes: https://github.com/vllm-project/vllm-ascend/issues/1390 - vLLM version: v0.10.0 - vLLM main: `9af654cc38` --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>	2025-08-05 08:43:24 +08:00
wangxiyuan	af56ae3ed1	[1/4][Refactor] Refactor torchair worker (#1885 ) There is a lot torchair specified logic in common code. It results hard code maintenance. We will create a new torchair module to launch torchair related logic there. I plan to add 4 PR. 1. Refactor worker (this PR) - create torchair module and move torchair related code in worker to the new module 3. Refactor utils 4. Refactor model_runner 5. Refactor attention - vLLM version: v0.9.2 - vLLM main: `8188196a1c` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-21 11:50:46 +08:00
Mengqing Cao	8cfd257992	[Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681 ) ### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of #1422 backport. Fixes https://github.com/vllm-project/vllm-ascend/issues/1396 https://github.com/vllm-project/vllm-ascend/issues/1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: `fe8a2c544a` Signed-off-by: MengqingCao <cmq0113@163.com>	2025-07-21 09:08:04 +08:00
Shanshan Shen	f96100fad5	[Misc][V0 Deprecation] Remove V0 related codes of test, example, platform (#1805 ) ### What this PR does / why we need it? Remove V0 related codes of test, example, platform. This PR is a part of https://github.com/vllm-project/vllm-ascend/issues/1620. - vLLM version: v0.9.2 - vLLM main: `235bfd5dfe` --------- Signed-off-by: shen-shanshan <467638484@qq.com>	2025-07-15 19:58:55 +08:00
wangxiyuan	787010a637	[Test] Remove VLLM_USE_V1 in example and tests (#1733 ) V1 is enabled by default, no need to set it by hand now. This PR remove the useless setting in example and tests - vLLM version: v0.9.2 - vLLM main: `9ad0a4588b` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-15 12:49:57 +08:00
wangxiyuan	7bdada58eb	[Misc] Remove VLLM_USE_V1 usage in code (#1764 ) We plan to remove V0 code from this version. The first step is to delete v0 usage. Related: https://github.com/vllm-project/vllm-ascend/issues/1620 - vLLM version: v0.9.2 - vLLM main: `61e20828da` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-15 11:52:16 +08:00
Angazenn	a5f33590d3	[CORE]initial support for torchair with non-mla backend (#1506 ) ### What this PR does / why we need it? This PR supports torchair graph mode with non-mla backend on both 800IA2 and 300I Duo platforms. The main change is to add `attention_v1_torchair.py` to support specific attention related operations that are required by torchair. ### Does this PR introduce _any_ user-facing change? Before this PR, vLLM-Ascend only allows deepseek to use torchair. Now we can also use it with pangu. Besides, we add a support model list to control which type of models that can use torchair. ### How was this patch tested? We have test it with PanguProMoE on both 800IA2 and 300I Duo platforms, and model generates answer normally. --------- Signed-off-by: angazenn <zengyanjia@huawei.com> Signed-off-by: tianyitang <tangtianyi4@huawei.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Co-authored-by: tianyitang <tangtianyi4@huawei.com>	2025-07-03 22:21:42 +08:00
zhanghw0354	9fb3d558e5	[Test]Add unit test for platform.py (#1476 ) ### What this PR does / why we need it? According to issue #1298 , this pull request adds unit test code for platform.py. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed with new added/existing test. --------- Signed-off-by: zhanghw0354 <zhanghaiwen_yewu@cmss.chinamobile.com> Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: angazenn <zengyanjia@huawei.com> Signed-off-by: zhuyilin <809721801@qq.com> Co-authored-by: Shanshan Shen <467638484@qq.com> Co-authored-by: Mengqing Cao <cmq0113@163.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> Co-authored-by: Angazenn <92204292+Angazenn@users.noreply.github.com> Co-authored-by: angazenn <zengyanjia@huawei.com> Co-authored-by: Zhu Yi Lin <116337067+GDzhu01@users.noreply.github.com>	2025-07-02 17:46:06 +08:00

24 Commits