xc-llm-ascend

Author	SHA1	Message	Date
Chen Chen	6b290acfe1	remove redundant params in mla_preprocess kernel (#3530 ) ### What this PR does / why we need it? This pull request removes the redundant parameters `gamma1` and `beta1` (also named `gamma0`/`beta0` in some places) from the `mla_preprocess` kernel and its calling hierarchy. The changes are consistent across C++ kernel code, bindings, and Python call sites. The parameters were unused in the lower-level functions, so their removal is a good cleanup. ### Does this PR introduce _any_ user-facing change? The python interface of the kernel is affected, and the params of `gamma0` and `beta0` are not needed. ### How was this patch tested? The unit-test of the kernel is adapted accordingly. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: mojave2 <chenchen145@huawei.com>	2025-10-21 19:20:13 +08:00
anon189Ty	248ee7fa11	[Feat]Make full graph mode compalible with MTP (#3276 ) ### What this PR does / why we need it? Make the Full Graph mode can run with MTP. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>	2025-10-17 20:19:56 +08:00
lilinsiman	1b424fb7f1	ACLgraph enable: Test cases revisions for all features (#3388 ) ### What this PR does / why we need it? This PR revise the test cases of various features on the warehouse which add the enablement of aclgraph to the test cases. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ut - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: lilinsiman <lilinsiman@gmail.com>	2025-10-17 17:15:19 +08:00
zouyida2052	3642b64afc	bugfix for mtp with multistream_moe (#3419 ) ### What this PR does / why we need it? when infer deepseek mtp layer with multistream_moe, we should pass a boolean to evaluate this feature and fix bugs when we are in mtp layer - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: zouyida2052 <zouyida2002@gmail.com>	2025-10-15 08:59:58 +08:00
wangxiyuan	c73dd8fecb	[CI] Fix CI by addressing max_split_size_mb config (#3258 ) ### What this PR does / why we need it? Fix CI by addressing max_split_size_mb config ### Does this PR introduce _any_ user-facing change? No, test onyl ### How was this patch tested? Full CI passed, espcially eagle one - vLLM version: v0.10.2 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-29 14:05:12 +08:00
wangxiyuan	15b8aff582	[CI] Add max_split_size_mb for e2e test to avoid oom (#3252 ) ### What this PR does / why we need it? we add a patch for model weight loader to avoid using vLLM weight loader v2, since v2 will lead unknown issue for torchair. While this patch make some unknown memory usage problem. To quick fix the problem, let's expend the `max_split_size_mb` to a larger value to avoid weight load oom issue. Further solution is to remove the patch and address weight loader v2 from vLLM. Closes: https://github.com/vllm-project/vllm-ascend/issues/3251 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: https://github.com/vllm-project/vllm/commit/releases/v0.11.0 Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-29 09:13:08 +08:00
Icey	2a9d02e080	[Bugfix] eagle and eagle3 spec decode failures and enable e2e test (#2979 ) ### What this PR does / why we need it? - Fix the bug https://github.com/vllm-project/vllm-ascend/issues/2978 - Enable e2e test, - Adapt to scenarios where Speculative tokens are greater than 2, - Fix the bug that causes Eagle3 inference failures under high concurrency and improve the acceptance rate of draft models, by https://github.com/vllm-project/vllm-ascend/pull/2794 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? CI passed with new added/existing test. Co-authored-by: hukongyi [hukongyi@cmbchina.com](mailto:hukongyi@cmbchina.com) Co-authored-by: guanyuzhu [zhuguanyu@huawei.com](mailto:zhuguanyu@huawei.com) Co-authored-by: liumail680 [liumail680@163.com](mailto:liumail680@163.com) - vLLM version: v0.10.2 - vLLM main: `f225ea7dd9` --------- Signed-off-by: Icey <1790571317@qq.com>	2025-09-25 14:39:12 +08:00
xuyexiong	6681dde902	[Feat][Graph] Support MTP for ACL Graph (#2932 ) ### What this PR does / why we need it? This PR depends on the merge of #2707 and has adapted the aclgraph functionality to support MTP. ### How was this patch tested? - vLLM version: v0.10.2 - vLLM main: `2b85697031` --------- Signed-off-by: xuyexiong <xuyexiong@huawei.com>	2025-09-18 14:05:33 +08:00
1092626063	5b3646ab21	[FEATURE][MTP] Support MTP > 1 (#2708 ) ### What this PR does / why we need it? [RFC：Support MTP > 1 for DeepSeek](https://github.com/vllm-project/vllm-ascend/issues/2745) - [x] dp1 tp16 - [x] dp4 tp4 - [x] dp2 tp 8 - [x] torchair graph - vLLM version: v0.10.1.1 - vLLM main: `c9f7081f9c` Signed-off-by: 1092626063 <1092626063@qq.com>	2025-09-05 09:11:22 +08:00
wangxiyuan	24d4dad7b2	[CI] Enable MTP torchair e2e test (#2705 ) enable MTP torchair e2e test - vLLM version: v0.10.1.1 - vLLM main: `ce30dca5c4` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-03 08:57:43 +08:00
xuyexiong	214b32a346	[V1][BUGFIX][0.10.1] FIX mtp on main branch (#2632 ) ### What this PR does / why we need it? Fix MTP torchair bug caused by torchair refactor and moe refactor Depends on PRs: fused moe fix: https://github.com/vllm-project/vllm-ascend/pull/2627 torchair multi DP fix: https://github.com/vllm-project/vllm-ascend/pull/2626 ### Does this PR introduce _any_ user-facing change? when dp is enabled, to run mtp online server, need to disable server log due to the current metrics does not support multi dp `--disable-log-stats` ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: `7c8271cd1e` Signed-off-by: xuyexiong <xuyexiong@huawei.com>	2025-09-02 11:12:41 +08:00
wangxiyuan	fef18b60bc	Refactor e2e CI (#2276 ) Refactor E2E CI to make it clear and faster 1. remove some uesless e2e test 2. remove some uesless function 3. Make sure all test runs with VLLMRunner to avoid oom error 4. Make sure all ops test end with torch.empty_cache to avoid oom error 5. run the test one by one to avoid resource limit error - vLLM version: v0.10.1.1 - vLLM main: `a344a5aa0a` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-09-02 09:02:22 +08:00
ZhaoJiangJiang	3629bc4431	feat: add mtp ut and fix some bugs (#2453 ) ### What this PR does / why we need it? Fix mtp mode ut ### Does this PR introduce _any_ user-facing change? Nothing ### How was this patch tested? This can be tested in the same way as a unit test. - vLLM version: v0.10.0 - vLLM main: `53415653ff` Signed-off-by: 赵江江 <zhaojiangjiang1@h-partners.com> Co-authored-by: 赵江江 <zhaojiangjiang1@h-partners.com>	2025-08-22 17:09:08 +08:00
wangxiyuan	9260910c8d	[CI] Fix broken CI (#2302 ) 1. disable test_eagle_ccorrectness test, we'll reopen it once oom error fixed. 2. drop transformers version limit for main, since vLLM rely on >=4.55.0, see: `65552b476b` 3. fix kv_connector_output bug, see: `796bae07c5` - vLLM version: v0.10.0 - vLLM main: `d1af8b7be9` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-11 11:22:32 +08:00
wangxiyuan	787010a637	[Test] Remove VLLM_USE_V1 in example and tests (#1733 ) V1 is enabled by default, no need to set it by hand now. This PR remove the useless setting in example and tests - vLLM version: v0.9.2 - vLLM main: `9ad0a4588b` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-07-15 12:49:57 +08:00
Mengqing Cao	dd22ac38b2	[CI/UT][Refactor] move e2e spec decode and deepseek acc test to per pr (#1136 ) ### What this PR does / why we need it? 1. run deepseek acc ut per pr --- multicard CI time increased by 9 min 2. run spec decode e2e test on v1 per pr --- singlecard CI time increased by 3 min (partly is disabled due to not work now) ~~3. align the output of whether dbo is enabled or not~~ The generated results with and without dbo cannot be aligned. https://github.com/vllm-project/vllm-ascend/actions/runs/15822900528/job/44600029405?pr=1136 4. skip V0 mtp test due to failure in https://github.com/vllm-project/vllm-ascend/actions/runs/16012172833/job/45171988816 5. fix some version conflicts ### How was this patch tested? CI passed with new added test. --------- Signed-off-by: MengqingCao <cmq0113@163.com>	2025-07-04 18:05:45 +08:00

16 Commits