xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

46a1614387 [P/D] Improve the performance of Layerwise Connector (#5303) zxr2333 2025-12-31 15:09:01 +08:00
7d5242faca [Refactor] Formatting output types related to FuseMoE (#5481) Jade Zheng 2025-12-31 14:24:37 +08:00
38570cfeb6 [Feature] Support kv nz feature for DeepSeek decode node in disagg-prefill scenario (#3072) Jade Zheng 2025-12-31 14:24:04 +08:00
a539ae753a [feature] mooncake support pcp/dcp in common conditions (#5224) wangxiaochao6 2025-12-31 09:53:03 +08:00
a5ae07a5d2 [Bugfix] Fix mm_merge (#5249) Li Wang 2025-12-31 09:49:55 +08:00
3c2d3e52e5 [Main2Main] Upgrade vllm commit to 1230 (#5495) wjunLu 2025-12-31 09:44:35 +08:00
5d9fde9819 [Feature] Refactor PCP &DCP related code (#5214) zhenwenqi2024 2025-12-31 09:29:57 +08:00
46862ce1af [main][test] Refactor the mtp and eagle test case (#5326) lilinsiman 2025-12-31 09:22:58 +08:00
bdc721d35a [smoke][bugfix] moe_init_routing_v2 active_expert_range use int type (#5521) LI SHENGYONG 2025-12-31 09:19:04 +08:00
2ee17e50a1 [2/N] Upgrade nightly doc (#5534) Li Wang 2025-12-31 09:11:42 +08:00
98798d80a0 [Doc] Add new contributors. (#5537) zhangyiming 2025-12-31 07:39:42 +08:00
073097a9a1 [3/N][Nightly] Move ops tests to nightly (#5538) Li Wang 2025-12-30 20:50:44 +08:00
e760aae1df [1/N] Refactor nightly test structure (#5479) Li Wang 2025-12-30 19:03:02 +08:00
c85cc045f8 Docs: Remove deprecated --task parameter for embedding models (#5257) huqi 2025-12-30 16:09:07 +08:00
71f729a661 Revert "moe_gating_top_k" (#5512) zzzzwwjj 2025-12-30 15:05:47 +08:00
4ff4d1cef9 [Doc] Fix issue link for 0.12.0 (#5500) wangxiyuan 2025-12-30 10:34:20 +08:00
8c4e9bb76b [CI]update triton ascend version (#5392) meihanc 2025-12-30 09:51:45 +08:00
45c3c279e2 moe_gating_top_k (#5271) ZCG12345 2025-12-30 09:28:01 +08:00
15d73f248e [refactor] refactor model runner capture model (#5230) weiguihua2 2025-12-30 08:32:14 +08:00
5e96f94d2a Update corresponding vllm commit ID to 12 29 (#5475) Nengjun Ma 2025-12-29 22:48:05 +08:00
51da5ea543 [Kernel]update csrc cmakelist for open-source cann (#5458) Fager10086 2025-12-29 20:34:53 +08:00
d5f72835e6 [OP] add custom op aclnnMoeInitRoutingCustom (#5251) jiazhengyi 2025-12-29 19:29:40 +08:00
92353c0643 [Refactor][EAGLE] 1/N delete __init__ in mtp_proposer (#5176) Zetong Li 2025-12-29 16:25:52 +08:00
28b7614322 [Refactor][Triton] Move reject sample triton kernels into ops/triton (#5324) whx 2025-12-29 16:15:41 +08:00
e7e1a7dc05 [Feature] support eager mode in model runner v2 (#5210) Ronald 2025-12-29 15:28:34 +08:00
4da46da9bf [feature] fia support sliding windows (#5239) yeyifan 2025-12-29 14:56:25 +08:00
d8e15dae6c Optimize some rejectsampler functions to make npu op launch non-blocking (#4587) ZongYuan Zhan 2025-12-29 14:10:39 +08:00
3e67e8276c [Feature] Support to use fullgraph with eagle (#5118) anon189Ty 2025-12-29 09:54:51 +08:00
f81cf694b2 [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (#5311) LI SHENGYONG 2025-12-29 09:26:14 +08:00
23169021d9 [Refactor]6/N Extract common code of class AscendMLAImpl (#5314) wujinyuan1 2025-12-28 10:40:45 +08:00
dbe4c338f2 [Refactor] cache cos/sin in mla & remove parameter model in builder. (#5277) weijinqian0 2025-12-28 10:35:07 +08:00
24328aaf00 update vllm pin to 12.27 (#5412) ZT-AIA 2025-12-28 00:19:36 +08:00
1b5d5abf86 [ReleaseNote] Add release note for v0.13.0rc1 (#5334) Mengqing Cao 2025-12-27 18:46:57 +08:00
58adf7c8ac [Bugfix] Correctly handle the output shape in multimodal attention (#5443) Li Wang 2025-12-27 18:42:46 +08:00
1d81bfaed1 Fix nightly (#5413) Li Wang 2025-12-27 18:16:46 +08:00
e91e11d3b0 [bugfix] fix typo of _skip_all_reduce_across_dp_group (#5435) jiangkuaixue123 2025-12-27 17:50:04 +08:00
c30c3dc831 [Doc]modify pcp tutorial doc (#5440) weiguihua2 2025-12-27 17:47:09 +08:00
77cd960524 [Misc] fast fail for exiting if tools/install_flash_infer_attention_score_ops_a2.sh (#5422) Mengqing Cao 2025-12-27 17:30:34 +08:00
b8b5521f5b [Doc] Update DeepSeek V3.1/R1 2P1D doc (#5387) MengLong Chen 2025-12-27 17:28:43 +08:00
843751768e [DOC]Fix model weight download links (#5436) cookieyyds 2025-12-27 17:14:31 +08:00
04104031d0 [Doc] Modify DeepSeek-R1/V3.1 documentation (#5426) Zhu Yi Lin 2025-12-27 17:13:58 +08:00
09f71c14a6 Revert "[feat] enable hierarchical mc2 ops on A2 by default (#5300)" (#5434) realliujiaxu 2025-12-27 17:06:58 +08:00
2add3dc3e0 [Bugfix] fix greedy temperature detection (#5417) realliujiaxu 2025-12-27 17:04:10 +08:00
eab306b09c [doc] Update Qwen3-235B doc for reproducing latest performance (#5323) Angazenn 2025-12-27 15:55:58 +08:00
12da9f9460 [feat] enable hierarchical mc2 ops on A2 by default (#5300) hwhaokun 2025-12-27 15:45:25 +08:00
be2a947521 [Doc] delete environment variable HCCL_OP_EXPANSION_MODE in DeepSeekV3.1/R1 (#5419) Zhu Yi Lin 2025-12-27 12:44:50 +08:00
ca31d6823e [Doc] add long_sequence feature user guide (#5343) LookAround0301 2025-12-27 10:44:43 +08:00
cb2fbf7df2 [bugfix] solve dp scenario Host-Device sync (#5298) hwhaokun 2025-12-27 10:36:59 +08:00
69f96950e1 [Doc] modify pcp tutorials (#5411) weiguihua2 2025-12-27 10:36:10 +08:00
3f33ad23fe [BugFix] Fix npu-cpu offloading interface change bug. (#5290) whx 2025-12-27 10:21:20 +08:00
2ef4d1979e [bugfix][main]KV Pool for KV Transfer in PD Disaggregation Scenarios (#5398) fems14 2025-12-27 09:53:57 +08:00
ce52e17bf3 [Doc]add long sequence tutorials (#5364) weiguihua2 2025-12-27 09:52:11 +08:00
d1f0df7b4b Revert "MLA prefill preformance optimization (#5275)" (#5410) wangxiyuan 2025-12-27 09:48:56 +08:00
711f1861e4 MLA prefill preformance optimization (#5275) pichangping 2025-12-27 09:19:45 +08:00
1486e0d06c [TEST]Add vllm bench (#5306) jiangyunfan1 2025-12-27 09:16:08 +08:00
16ef2474bf [Test] Add acceptance test for eagle/eagle3 (#5366) Zetong Li 2025-12-27 08:50:01 +08:00
8ed6f98a5a [Build] Add installation script of fused_infer_attention_score kernel with flash decoding (#5402) Mengqing Cao 2025-12-27 02:01:06 +08:00
f5af6bbd1e [CI] Add qwen-235b-a22b a2 multi-node test (#5393) Nengjun Ma 2025-12-26 23:46:09 +08:00
1d8aa892bf Update vllm pin to 12.26 (#5378) ZT-AIA 2025-12-26 23:44:48 +08:00
8b9ca86827 [Feature] Remove the transpose step after attention and switch to transpose_batchmatmul (#5390) Jade Zheng 2025-12-26 22:03:46 +08:00
bc5b7a5fb5 [bugfix] Fix MHA model runtime error in aclgraph mode (#5397) Wang Kunpeng 2025-12-26 21:37:28 +08:00
7685d0c239 rollback causal_conv1d_fn to torch ops & update qwen3Next doc (#5391) LeeWenquan 2025-12-26 19:57:38 +08:00
48854aef5c [TEST]Add sending request with and without chat (#5286) jiangyunfan1 2025-12-26 18:04:17 +08:00
0dfdfa9526 [Feature] Enhance all-reduce skipping logic for MoE models in NPUModelRunner (#5329) Jade Zheng 2025-12-26 17:39:44 +08:00
06732dbf5b [Doc] update R1/V3.1 doc (#5383) Zhu Yi Lin 2025-12-26 17:09:22 +08:00
8ed87dfa84 [doc] Add context parallel user guide (#5358) zhangsicheng5 2025-12-26 17:03:47 +08:00
09390eaf32 [Bugfix] Fix unsuitable moe_comm_type under ep=1 scenario (#5388) Zetong Li 2025-12-26 16:45:45 +08:00
da0b113cf5 [doc]<PCP&DCP> add developer guide for PCP&DCP (#5372) Qiu 2025-12-26 05:17:38 -03:00
135cc0a505 vllm-ascend vnpu v1 starkwj 2025-12-26 07:37:35 +00:00
18302c8467 Revert "Add MagicMTP(block verify) and Triton optimization (#4443)" (#5380) Zhu Yi Lin 2025-12-26 15:06:13 +08:00
45c5bcd962 [E2E] Optimize the E2E test time. (#5294) zhangyiming 2025-12-26 14:17:50 +08:00
29d2fe653d cleanup ascend config (#5296) wangxiyuan 2025-12-26 14:07:37 +08:00
adaa89a7a5 Update vllm pin to 12.25 (#5342) ZT-AIA 2025-12-26 14:05:40 +08:00
c2f776b846 [Nightly] Initial logging for nightly multi-node testing (#5362) Li Wang 2025-12-26 11:39:07 +08:00
320877d488 move contiguous in fused_sigmoid_gating_delta_rule_update to model_runner_v1 (#5274) XiaoxinWang 2025-12-26 09:19:47 +08:00
9b2a7d8866 [BugFix][Fusion] Patch compile backend to make fusion available (#5308) Icey 2025-12-26 09:18:16 +08:00
7372225bcb [FIX] Update _causal_conv1d_update_kernel for Efficient Conv State Handling on NPU (#5322) Qi Mao 2025-12-26 09:12:30 +08:00
4ce32c1a8d [CI] Skip failed test cases to recover CI (#5368) Mengqing Cao 2025-12-26 08:18:23 +08:00
1858f3d36e [Bugfix] Fix Qwen P/D Disaggregation accuracy issue (#5340) Feng Liu 2025-12-25 22:46:08 +08:00
2da8038dd2 [doc] update using command (#5373) cookieyyds 2025-12-25 22:28:35 +08:00
59f11dd1cb [Bugfix] fix xlite decode-only e2e test (#5354) Magnus 2025-12-25 16:30:17 +08:00
d752c030e9 [Bugfix] fix pcp 128K break (#5266) weiguihua2 2025-12-25 11:58:52 +08:00
8caad0510d fix e2e rejection-sampler error (#5341) Aoxuan Chen 2025-12-25 11:39:38 +08:00
2ae0bad96d Remove VLLM_ASCEND_ENABLE_DENSE_OPTIMIZE (#5272) wangxiyuan 2025-12-25 11:09:56 +08:00
13cd6362c6 [bugfix] fix Error 'ValueError: Duplicate layer name' (#5280) Wang Kunpeng 2025-12-25 10:43:24 +08:00
30778f371b [BugFix] Fix num_pcp_pads Assignment Issues (#5273) dsxsteven 2025-12-25 10:38:09 +08:00
fca2f948c1 [E2E Refactor] Enable skipped e2e case (#5287) wjunLu 2025-12-25 09:18:05 +08:00
a9fccbeb30 [CI] add xlite e2e test (#5305) Magnus 2025-12-25 09:17:06 +08:00
6d25372baa Add MagicMTP(block verify) and Triton optimization (#4443) Aoxuan Chen 2025-12-25 09:00:25 +08:00
a90482803d [Kernel] add l2norm triton kernel (#4595) Ascendyh 2025-12-25 06:06:18 +08:00
e54630e01c Revert [KV-Sharing] Support KV-Sharing feature in CLA models (#4138) (#5317) Mengqing Cao 2025-12-24 22:24:17 +08:00
fb3d6ca08c Cleanup uesless env (#5270) wangxiyuan 2025-12-24 22:07:59 +08:00
5018f2d8fd [quantization] Add w8a16 quantization support (#4541) TmacAaron 2025-12-24 19:49:32 +08:00
515267de22 [perf][bugfix] improve performance of rejection sampler and eliminate HD synchronize in TopKTopPSampler (#4154) linfeng-yuan 2025-12-24 19:10:33 +08:00
2f03a2f4a4 [CI] Skip some failed ops tests (#5309) Li Wang 2025-12-24 18:29:34 +08:00
42c989a437 Update vllm pin to 12.24 (#5307) Nengjun Ma 2025-12-24 17:24:31 +08:00
a3f65b938f [Doc] Add pa_shape_list description to qwen dense tutorial (#5225) ZYang6263 2025-12-24 14:40:20 +08:00
9227e6af73 [bugfix] remove the EP buffer allocation introduced by fused-op dispatch_ffn_c… (#5284) Chen Chen 2025-12-24 11:26:19 +08:00
74a1de50a9 [E2E] Optimize e2e test. (#5091) zhangyiming 2025-12-24 10:41:55 +08:00
bd4fb871c6 [CI] Add skipped testcases. (#5254) zhangyiming 2025-12-24 10:41:32 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0