xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

752e272a55 Add note for Ascend HDK version (#2765) Yikun Jiang 2025-09-07 10:33:41 +08:00
5a7181569c [feat]: oproj tensor parallelism in pure DP and graph-mode scenarios. (#2167) lidenghui1110 2025-09-07 10:31:32 +08:00
a58b43b72c Remove git .extraheader and fecth all commtis in /vllm-workspace/vllm-ascend (#2746) Yikun Jiang 2025-09-05 09:45:11 +08:00
51a2aec115 Delete redundant codes related to communication (#2717) henryxuxu0716 2025-09-05 09:39:39 +08:00
5b3646ab21 [FEATURE][MTP] Support MTP > 1 (#2708) 1092626063 2025-09-05 09:11:22 +08:00
83eb40a51c [Fix][MoE] Refine MoE communication strategy (#2734) yiz-liu 2025-09-05 09:04:04 +08:00
4c90fa79ca [Misc] Remove useless PD check in deepseek (#2739) liziyu 2025-09-04 22:22:19 +08:00
3a2a7d88db [Doc] Update accuracy reports for v0.10.1rc1 (#2755) vllm-ascend-ci 2025-09-04 22:17:17 +08:00
f86596a66c allgather use fusedop. (#2689) sherie 2025-09-04 11:56:29 +08:00
7d47d8f4f6 [Fix] fix resources limit error when apply speculative decoding and aclgraph (#2472) 无脸男 2025-09-04 11:50:43 +08:00
0c0789be74 [Feat] allow using aclgraph in ray backend (#2589) 无脸男 2025-09-04 11:45:56 +08:00
aff5189c87 [main] Fuse GroupedMatmul, Swiglu and DynamicQuant in W8A8_DYNAMIC quantized MoE layers (#2275) Ruri 2025-09-04 11:37:32 +08:00
37f5a29cd4 [1/N][Refactor][Quantization] remove redundant quantizer class (#2680) 22dimensions 2025-09-04 11:35:14 +08:00
d4370ebc42 [Refactor] Refactor Spec Decode (#2668) Icey 2025-09-04 11:34:47 +08:00
7e16b4a7cd [ReleaseNote] Add Release Note for v0.10.1rc1 (#2635) Mengqing Cao 2025-09-04 11:26:47 +08:00
e7409e95ee [1/N][Draft][Refactor]torchair pangu_moe modeling refactor (#2437) Angazenn 2025-09-04 10:39:21 +08:00
a58013440a [BugFix][MLA] Fix attn_mask bug for ring mla (#2704) whx 2025-09-04 10:22:46 +08:00
e11a1bbfc1 [Doc] Update news (#2736) wangxiyuan 2025-09-04 10:10:24 +08:00
984bd7c13a [Bugfix][APC] Fix accuracy issue on prefix caching with AscendScheduler (#2714) Mengqing Cao 2025-09-04 08:22:46 +08:00
df88a2ecc8 [P/D]mooncake_connector adapted to 0.10.1 (#2664) baxingpiaochong 2025-09-04 08:22:10 +08:00
07d44ade19 bugfix: fix initialization error for mooncake in k8s (#2541) zhiyuanzhang 2025-09-03 22:25:08 +08:00
41b028aa5f [Doc] add v0.9.1 release note (#2646) wangxiyuan 2025-09-03 18:04:27 +08:00
90a75a90a9 [bugfix] fix torchair runtime error caused by configuration mismtaches and file missing (#2532) linfeng-yuan 2025-09-03 17:56:12 +08:00
5889fa1b1c [bugfix] ascend schedule encountered an incorrect req block length in the check_watermark_for_prefill function (#2508) liziyu 2025-09-03 16:54:39 +08:00
59d23c39eb [DP] External dp server starter (#2685) whx 2025-09-03 16:30:26 +08:00
c03321781a [CI] skip unstable UT (#2716) wangxiyuan 2025-09-03 15:53:50 +08:00
3584306387 [Bugfix] Fix qwen2.5-vl-without-padding (#2623) Li Wang 2025-09-03 14:38:55 +08:00
bece793be6 [CI] Disable per-PR triggering for A3 (#2710) Li Wang 2025-09-03 11:52:34 +08:00
eaeb2efb20 [Main][Feat]Set the Profiler parameters through environment variables consistent with vLLM (#2608) zhanghw0354 2025-09-03 10:58:08 +08:00
93754d8061 [Bugfix] Fix long context seq accuracy problem for GLM4.5 (#2601) Shanshan Shen 2025-09-03 09:18:44 +08:00
b84465c525 [Perf]Enable npu_moe_gating_top_k_softmax on quantized scenarios (#2633) Angazenn 2025-09-03 09:14:17 +08:00
24d4dad7b2 [CI] Enable MTP torchair e2e test (#2705) wangxiyuan 2025-09-03 08:57:43 +08:00
af62af3cc5 [Image] Upgrade openEuler to 24.03 (#2631) Icey 2025-09-02 20:09:09 +08:00
0829b4873f [CI] recover e2e test (#2688) wangxiyuan 2025-09-02 18:49:17 +08:00
f023bd52bf [CI] Make test_platform UT stable (#2696) wangxiyuan 2025-09-02 18:34:04 +08:00
c1e607b7b7 [Misc] Clean up uesless code in rotary_embedding (#2663) wangxiyuan 2025-09-02 17:25:33 +08:00
253b01b9a5 [7/N][refactor]fix torchair rope ops (#2683) Wang Yixuan 2025-09-02 17:21:56 +08:00
9f1e054fe3 [Bugfix][LoRA][Operator] Fix LoRA custom operators accuracy issue (#2672) yupeng 2025-09-02 11:46:59 +08:00
214b32a346 [V1][BUGFIX][0.10.1] FIX mtp on main branch (#2632) xuyexiong 2025-09-02 11:12:41 +08:00
fef18b60bc Refactor e2e CI (#2276) wangxiyuan 2025-09-02 09:02:22 +08:00
0df059f41a [CI] Fix CI Break: upstream adds routed_scaling_factor in forward_oot interface (#2675) leo-pony 2025-09-01 19:02:50 +08:00
ea53f9076e support torchair mode (#2641) panchao-hub 2025-09-01 15:49:07 +08:00
b72e34013f Add ut for mla (#2637) LeeWenquan 2025-09-01 14:07:57 +08:00
ad13964c71 [6/N][refactor]delete torchair in rotary ops (#2581) Wang Yixuan 2025-09-01 09:10:15 +08:00
c2c97f3079 [5/N][refactor]add torchair rotary ops (#2559) Wang Yixuan 2025-09-01 09:09:21 +08:00
3a5fc5ee01 [Refactor][MoE] remove redundant code after refactoring fused_moe (#2612) weichen 2025-08-30 22:28:50 +08:00
20ae71291d [torchair]remove aicpu op (#2640) panchao-hub 2025-08-30 15:51:12 +08:00
7215454de6 bugfix for torchair graph (#2639) panchao-hub 2025-08-30 15:49:48 +08:00
6f1047d5fd [CI] fix UT error. (#2644) weijinqian0 2025-08-30 12:04:01 +08:00
d3c93fba5c [3/N][Feat][Graph] Support all-to-all and quantized models with ACL Graph (#2614) yiz-liu 2025-08-30 11:00:35 +08:00
91c35d765a [Bugfix] Fix mc2 operator error in aclgraph + ep<16 scenario (#2609) Mengqing Cao 2025-08-29 21:59:16 +08:00
ee6d141dd4 [MAIN][BUGFIX] BugFix: Resolve the issue of waiting queue accumulation when requests are canceled. (#2426) wangxiaoteng666 2025-08-29 17:19:23 +08:00
52aff9e229 [main] [bugfix] Fix misjudging quantized/unquantized scenarios (#2627) weichen 2025-08-29 16:20:22 +08:00
aadc75c247 [Fix] Resolve data-parallel (DP) assertion errors in TorchAir (#2626) yiz-liu 2025-08-29 16:06:49 +08:00
600b08f754 [Feat]: Add custom lmhead tensor model parallel (#2309) lidenghui1110 2025-08-29 11:41:21 +08:00
e7ad4a64f4 [CI] Add e2e ci test for A3 (#2573) zhangxinyuehfad 2025-08-29 09:33:42 +08:00
dfc7eb39ad [Fix] Fix DP-related padding logic (#2582) yiz-liu 2025-08-28 19:39:58 +08:00
175f6bc445 Support v0.10.1 (#2584) Yikun Jiang 2025-08-28 18:47:53 +08:00
6c973361fc [Bugfix] Fix aclgraph not enabled by default (#2590) Mengqing Cao 2025-08-28 14:08:31 +08:00
cf96366a39 [Bugfix][LoRA][Patch] Fix the LoRA inference bug after upstream vLLM codebase changed (#2560) yupeng 2025-08-28 10:40:51 +08:00
1191a64ae5 [Feat]attention add sliding windows size (#2528) yeyifan 2025-08-28 10:37:19 +08:00
c8d1df3a3f [Refactor][WIP] Refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl (#2465) LeeWenquan 2025-08-28 10:35:57 +08:00
320edde2df [main] [refactor] refactor fused_moe.py to enable token_dispatchers (#2570) weichen 2025-08-28 10:13:35 +08:00
936c102105 [bugfix][refactor]fix torchair_w8a8 (#2569) Wang Yixuan 2025-08-28 09:10:31 +08:00
a955e5d404 [4/N][refactor]delete torchair from quantization (#2535) Wang Yixuan 2025-08-28 09:10:03 +08:00
c578f817ca [CustomOp] Register VocabParallelEmbedding instead of overwrite forward (#2515) Icey 2025-08-28 08:57:34 +08:00
516e14ae6a [Doc] Upgrade to multi-node tutorial model to deepseek-v3.1-w8a8 (#2553) Li Wang 2025-08-27 14:16:44 +08:00
2bfbf9b9b3 [main][bugfix] Fix bugs and refactor cached mask generation logic (#2442) rjg-lyh 2025-08-27 12:07:29 +08:00
6881c19458 [main] convert the format of gmm to nz (#2474) huangxialu 2025-08-27 11:25:02 +08:00
c0e12143a3 [CI] Fix UT failure (#2563) wangxiyuan 2025-08-27 11:24:35 +08:00
20a7bc4b71 [3/N][refactor] refactoer quantization (#2504) Wang Yixuan 2025-08-27 10:45:50 +08:00
acdc53c2f6 [Bugfix] Fix the bug of cos invalid shape when dp (#2558) weiguihua2 2025-08-27 10:36:23 +08:00
a9e78a3299 [Aclgraph] Update compilation config in check_and_update_config (#2540) Mengqing Cao 2025-08-27 09:30:25 +08:00
f22077daa6 [Embedding] Recover embedding function (#2483) wangxiyuan 2025-08-27 09:22:01 +08:00
6a4ec186e7 [Qwen-moe] Remove the minor operation arange (#2373) s30076806 2025-08-27 09:13:31 +08:00
358ba68994 [main][bugfix] Fix MatmulNZ format bug on some machines (#2549) rjg-lyh 2025-08-27 09:08:17 +08:00
042605f4b2 [Doc] Add stable modelslim branch (#2545) Li Wang 2025-08-27 09:05:46 +08:00
8151a9d5a4 [Test]Add unit test for worker_v1.py (#2547) zhanghw0354 2025-08-26 22:00:49 +08:00
a6bb502e70 [2/N][Feat] Add MC2 communication method for MoE layers (#2469) yiz-liu 2025-08-26 19:05:23 +08:00
5d8ec28009 [2/N][refactor] split torchair from fused_moe (#2503) Wang Yixuan 2025-08-26 14:12:43 +08:00
cfe77e83ae [Bugfix]Support Qwen3-MOE on aclgraph mode in sizes capture and add new ut (#2511) lilinsiman 2025-08-26 12:39:21 +08:00
b3fdd78a6b [Main][Refactor]Change ASCEND_QUATIZATION_METHOD to ASCEND_QUANTIZATION_METHOD (#2517) zhanghw0354 2025-08-26 09:06:16 +08:00
21b5727f9a [CI] Upgrade vllm in accuracy and performance CI (#2527) Mengqing Cao 2025-08-26 08:49:49 +08:00
7e494e94a9 [CI] Fix broken ci (#2530) wangxiyuan 2025-08-26 07:42:24 +08:00
99bf25af76 [Fix] Add operations in _dummy_run to maintain synchronization with _process_reqs, resolving a service hang (#2454) yiz-liu 2025-08-25 19:56:02 +08:00
de7649492d [Refactor] cleanup converting_weight_acl_format_format (#2482) wangxiyuan 2025-08-25 19:48:55 +08:00
0f81e032f0 [1/N][refactor] torchair fused_moe refactor (#2438) Wang Yixuan 2025-08-25 15:46:10 +08:00
334c44613a [Doc] Update release version info (#2518) Shanshan Shen 2025-08-25 15:39:10 +08:00
98c68220c1 [Doc] Update v0.9.1rc3 doc (#2512) Shanshan Shen 2025-08-25 11:39:29 +08:00
4c4ffeebe5 [Doc] update vllm version in ci (#2513) Mengqing Cao 2025-08-25 11:35:37 +08:00
0767d51dd5 [Structured Output][CI] Add test for outlines backend for structured output in CI (#2283) Shanshan Shen 2025-08-25 09:59:13 +08:00
891b2bfe71 Accuracy report formatting (#2279) Icey 2025-08-25 09:39:30 +08:00
f796e6280b [CustomOp] Register RotaryEmbedding instead of overwrite forward (#2385) Icey 2025-08-25 09:32:35 +08:00
950c4b219a [main] refactor alltoallv in fused_moe (#2487) weichen 2025-08-23 20:38:17 +08:00
4af5b80606 [Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig (#2434) linfeng-yuan 2025-08-23 19:39:44 +08:00
3629bc4431 feat: add mtp ut and fix some bugs (#2453) ZhaoJiangJiang 2025-08-22 17:09:08 +08:00
dd04a96ee3 [Bugfix] Fix the bug of incorrect precision (#2479) weiguihua2 2025-08-22 17:08:56 +08:00
f0be3eed84 [Doc] Add release note for v0.9.1rc3 (#2488) Shanshan Shen 2025-08-22 16:06:29 +08:00
60ac4fb576 [QuickFix] Skip failed ut to recover CI quickly (#2484) Mengqing Cao 2025-08-22 14:14:51 +08:00
e9fb895b10 [Doc] Add feature branch long_seq_optimization (#2477) LookAround0301 2025-08-22 08:53:12 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0