xc-llm-ascend

Author	SHA1	Message	Date
Levi	ecd4232698	[Feat] flashcomm2+oshard Generalized (#4723 ) ### What this PR does / why we need it? [FlashComm2](https://gitcode.com/ascend-tribe/ascend-inference-cluster/blob/main/FlashComm/FlashComm2%E5%A4%A7%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86%E4%B8%AD%E4%BB%A5%E5%AD%98%E6%8D%A2%E4%BC%A0%E7%9A%84%E9%80%9A%E4%BF%A1%E4%BC%98%E5%8C%96%E6%8A%80%E6%9C%AF.pdf) introduces redundant storage of the o_proj matrix, which imposes pressure on GPU memory. We propose the FlashComm2+Oshard approach by integrating the shared linear layer feature (#2931). This approach distributes weights layer-by-layer to each GPU and accesses the o_proj of each layer via asynchronous broadcast operations, thereby alleviating memory pressure while achieving nearly lossless performance compared to the original FlashComm2. This PR implements a generalized FlashComm2+Oshard solution. Using following env to support flashcomm2 with oshard ```shell export VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 --additional-config '{ "layer_sharding": ["o_proj"] }' ``` ### How was this patch tested? - vLLM version: v0.12.0 - vLLM main: `ad32e3e19c` --------- Signed-off-by: Levi-JQ <yujinqi2@huawei.com> Co-authored-by: Levi-JQ <yujinqi2@huawei.com>	2026-01-10 22:57:57 +08:00
ZT-AIA	e11ff8e535	[BufFix]Fix the error when using Ascend custom operators with rank=128 (#5394 ) ### What this PR does / why we need it? The customized ascend operator sgmv_expand and sgmv_shrink applies only to the scenario where rank is 8,16,32,64. When rank >= 128, the operator is out of range, causing the model to report an error. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Depends on this commit https://github.com/vllm-project/vllm/pull/31408 - vLLM version: release/v0.13.0 - vLLM main: `254f6b9867` --------- Signed-off-by: ZT-AIA <1028681969@qq.com> Signed-off-by: ZT-AIA <63220130+ZT-AIA@users.noreply.github.com>	2026-01-09 15:57:43 +08:00
LI SHENGYONG	b69db4ce55	[EPLB][CI] EPLB add aclgraph and redundant expert ci (#5625 ) ### What this PR does / why we need it? EPLB currently does not have CI related to aclgraph and redundancy experts; this PR adds them. release on #5529 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Tested the use cases to be added in this PR. PASSED ====================================================== warnings summary ========================================================== <frozen importlib._bootstrap>:241 <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute <frozen importlib._bootstrap>:241 <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ====================================================== 1 passed, 2 warnings in 272.24s (0:04:32) ===================================================== - vLLM version: v0.13.0 - vLLM main: `8be6432bda` Signed-off-by: shenchuxiaofugui <1311027364@qq.com>	2026-01-08 09:51:48 +08:00
Li Wang	1165b2c863	[1/N][CI] Refactor accuracy test (#5400 ) ### What this PR does / why we need it? 1. Accuracy testing no longer compares eager and graph modes; instead, it directly extracts the golden result under the graph mode configuration (the implicit purpose of this case is to verify whether modifications affect existing results) 2. Next step: finer-grained supervision of logits/sampler results ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: `254f6b9867` Signed-off-by: wangli <wangli858794774@gmail.com>	2026-01-07 20:58:15 +08:00
wangxiyuan	6f7a81cd9f	[CI] cleanup single/multi-card test (#5623 ) 1. speed up e2e light test. 2. create `2-cards` and `4-cards` folder in multicard 3. move ops to nightly 4. run test in Alphabetical Order - vLLM version: v0.13.0 - vLLM main: `8be6432bda` Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-01-07 14:13:34 +08:00

5 Commits