xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

8b3a7a9e87 [bugfix] Support dsv3.2 enable both mtp and full_decode_only (#5679) cookieyyds 2026-01-08 15:47:31 +08:00
ccbc5e2ba1 [Feat][Bugfix][main] Adapted SP to eagle3 (#5562) drslark 2026-01-08 15:33:52 +08:00
d03cc9c456 [CI] Fix image build workflow_dispatch error (#5717) wangxiyuan 2026-01-08 15:07:33 +08:00
920bbe932f [CI] Drop outdated cases (#5709) Li Wang 2026-01-08 11:23:44 +08:00
b69db4ce55 [EPLB][CI] EPLB add aclgraph and redundant expert ci (#5625) LI SHENGYONG 2026-01-08 09:51:48 +08:00
264cc254cc [CI] fix image build tag (#5703) wangxiyuan 2026-01-08 09:27:45 +08:00
48811bc0b8 Optimize the print info format when deprecated code is used in vllm-ascend (#5696) Nengjun Ma 2026-01-08 09:26:49 +08:00
8763953f56 [Feature] add the magicmtp speculative decoding acceleration algorithm (#5542) Aoxuan Chen 2026-01-08 09:15:55 +08:00
481138e1d2 [bugfix] adapt to new implemented get_kv_cache_spec in cpuoffload connector (#4311) lidenghui1110 2026-01-08 09:15:09 +08:00
f7db812ed7 [refactor] Refactor the interface for shard weight and remove the flashcomm2 o_shared interface. (#5181) zzhxxx 2026-01-08 09:05:02 +08:00
20a8cf061b [BugFix][P/D] Fix pre-create link parameter error (#5694) zxr2333 2026-01-08 08:41:10 +08:00
3be8e33fe9 [Kernel] Add moe_gating_top_k operator support for Ascend NPU (#5579) ZCG12345 2026-01-07 21:42:31 +08:00
1165b2c863 [1/N][CI] Refactor accuracy test (#5400) Li Wang 2026-01-07 20:58:15 +08:00
b94fc13d3f [BugFix][Fusion] Fix graph fusion failure problem (#5676) Icey 2026-01-07 18:42:55 +08:00
137f28341d [Tests] Add qwen3-8b nightly test (#5597) Icey 2026-01-07 18:42:05 +08:00
3f4f2b4ae6 [Refactor] Import global var form vllm instead of overwirte it (#5469) Mengqing Cao 2026-01-07 18:41:45 +08:00
380f089fbf [Refactor] Fix AttentionMaskBuilder singleton and remove redundant pcp_prefill_mask (#4870) LICO67373 2026-01-07 17:09:52 +08:00
91790fd85a [CI] move image and wheel job to schedule way (#5685) wangxiyuan 2026-01-07 16:40:19 +08:00
1140789e83 [Bugfix] Fix the graph capture failure issue in the eagle3+full scenario. (#5553) 无脸男 2026-01-07 15:57:16 +08:00
fa0fb46853 fix reload return value starkwj 2026-01-07 07:42:30 +00:00
2b8a9ce8bd [Bugfix] fix resource are insufficient when pcp and piecewise (#5377) weiguihua2 2026-01-07 15:39:52 +08:00
4f9808002b [CI] Add workflow to cancel running workflows on PR close (#5646) Paco Xu 2026-01-07 15:38:10 +08:00
d314ea8d3d [CI] Bump lm-eval version to v0.4.9.2 (#5655) Li Wang 2026-01-07 14:15:53 +08:00
6f7a81cd9f [CI] cleanup single/multi-card test (#5623) wangxiyuan 2026-01-07 14:13:34 +08:00
1afbc01ed4 [misc]Add Kimi-K2 series to CI model list (#5656) SILONG ZENG 2026-01-07 11:32:48 +08:00
d6bb17f10e [Bugfix]Add register_kv_cache in ucm_connector (#5657) UnifiedCacheManager 2026-01-07 11:30:33 +08:00
cd59323e40 [Bugfix] Revert pr4214 multi-stream collect expert hotpot (#5529) LI SHENGYONG 2026-01-07 11:26:47 +08:00
25baf6df09 [Feature]EPLB:Adapt DispatchGmmCombineDecode operator to eplb tensor list and expert token numbers (#5552) wangyibo1005 2026-01-07 11:23:42 +08:00
086c093347 [CI] Add DeepSeek-V3.2-W8A8 nightly ci test (#5371) starmountain1997 2026-01-07 10:02:02 +08:00
cbc987db0b [bugfix (pcp)] fix chunked prefill accurancy issue (#5647) Feng Liu 2026-01-07 10:01:27 +08:00
1112208052 [Refactor] Cleanup platform (#5566) wangxiyuan 2026-01-07 09:25:55 +08:00
6ea2afe5fa [Feature] implement basic framework for batch invariant (#5517) Ronald 2026-01-07 09:11:26 +08:00
bdedf3c9f8 [Graph][Fusion] Add AddRMSNormSPPattern and AddRMSNormSPPatternWithBias (#5569) CodeCat 2026-01-07 09:03:45 +08:00
ad9b711f89 [Bugfix] fix dcp_only bug and add e2e accuracy test for dcp only and pcp only (#5565) zhenwenqi2024 2026-01-06 22:48:21 +08:00
77a029979e Revert "[BugFix][Fusion] Fix graph fusion failure problem (#5253)" (#5667) Fager10086 2026-01-06 21:55:47 +08:00
330e25ab1d [P/D] Performance enhancement of Layerwise connector in TP asymmetric scenarios (#5540) liziyu 2026-01-06 20:25:36 +08:00
cd1162e25a [Misc] Remove useless weight loader patch (#5619) wangxiyuan 2026-01-06 20:17:32 +08:00
089ca2ddcc [Nightly][Test] Add Qwen3-Next-80B-A3B-Instruct-W8A8 nightly test (#5616) InSec 2026-01-06 17:36:00 +08:00
cc0110abb4 [Bugfix] Remove swa parameter of fia (#5602) yeyifan 2026-01-06 17:24:43 +08:00
29e2f9a43e Bugfix: Align expert map shapes with redundant experts in EPLB adjustment (#5285) Mercykid-bash 2026-01-06 17:22:36 +08:00
fe3f2c7702 [Refactor][EAGLE] 3/N delete redundant methods in mtp_proposer (#5420) Zetong Li 2026-01-06 16:47:39 +08:00
b94d589769 [MM][Bugfix] Update hf_config to hf_text_config (#5319) Shanshan Shen 2026-01-06 16:41:39 +08:00
293b2275df [CI] Specify the version of xlite (#5612) Magnus 2026-01-06 16:02:16 +08:00
b8f245792e [Main2Main] Upgrade vllm commit to 0106 (#5617) wjunLu 2026-01-06 15:50:40 +08:00
c1dcddce3f [CI]update bisheng version (#5621) meihanc 2026-01-06 15:22:22 +08:00
e07938047e [UT][PCP&DCP] UT for block_table.py (#5032) Qiu 2026-01-06 11:19:25 +08:00
3cf059a72b [Main2Main] Upgrade vllm commit to 0105 (#5595) wjunLu 2026-01-06 08:44:29 +08:00
c5e2f48510 [CI] mv ops to correct path (#5615) Li Wang 2026-01-05 23:17:07 +08:00
129ba9fe1b [BugFix] Fix Smoke Testing Bug for DSR1 longseq (#5613) dsxsteven 2026-01-05 22:40:28 +08:00
8eae949d11 Revert "[Feat] enable hierarchical mc2 ops on A2 by default (#5545)" (#5611) ZixuanWang 2026-01-05 22:39:05 +08:00
11e75494b1 [TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (#5267) Angazenn 2026-01-05 21:35:37 +08:00
a2daacbd71 [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (#5192) Chen Chen 2026-01-05 21:29:45 +08:00
074ae28d6e 更新 README.md lumian 2026-01-05 20:33:31 +08:00
b10ef9b9f3 [docs] Correct image about prefill phase of PCP (#5598) Qiu 2026-01-05 20:21:59 +08:00
a034941d06 [CI] update triton-ascend version (#5584) meihanc 2026-01-05 20:20:11 +08:00
473431e7e2 [P/D]Remove mooncake kvpool unused parameter local_hostname (#5574) Chao Lei 2026-01-05 20:18:59 +08:00
d86021f7b4 [Bugfix] record cos and sin cache in AscendRotaryEmbedding (#5516) Debonet 2026-01-05 20:12:41 +08:00
16b1bee804 [bugfix] fix test_camem failed with triton-ascend (#5492) meihanc 2026-01-05 20:10:54 +08:00
58e8d19c35 [UT]add triton ops ut : test_fused_qkvzba_split_reshape_cat (#5474) ZT-AIA 2026-01-05 20:05:07 +08:00
1e6228d8cd [CI] Download models from ms (#5405) Li Wang 2026-01-05 19:59:13 +08:00
2d22700d69 Docs: Add A3 Docker image guidance for Atlas A3 machines (#5256) huqi 2026-01-05 19:42:42 +08:00
9d8b4c8d9d [Doc] Add NNAL installation guide and requirements (#5235) huqi 2026-01-05 19:40:26 +08:00
caf0289e1a add Dockerfile and readme starkwj 2026-01-05 09:10:56 +00:00
ec3563334b Add the requirement of arctic-inference which speculative decoding with suffix_decode (#5045) frankie 2026-01-05 19:15:49 +08:00
e7b623b363 [BugFix][Fusion] Fix graph fusion failure problem (#5253) Icey 2026-01-05 17:49:09 +08:00
4a3663327b [Refactor]7/N Extract common code to common_cp (#5490) wujinyuan1 2026-01-05 17:41:12 +08:00
755caeb06e [Feat][Spec] Optimize token index calculation in spec decode with Triton kernel (#5356) Yizhou 2026-01-05 16:51:29 +08:00
8ffe3f5d78 feat: implement high-performance Triton kernels for rejection sampling: optimization for rejection_random_sample_kernel (#5259) daniel 2026-01-05 16:03:02 +08:00
91bf524364 [BugFix][kernel] fix matmul_allreduce_add_rmsnorm_kernel (#5335) Trunrain 2026-01-05 15:19:54 +08:00
6c1a685b30 [Doc] add new doc for mooncake: PD-Colocated cross-node multi-instance validation of Mooncake's KV Cache reuse and performance. (#5415) zhangmuzhi_yuwan 2026-01-05 14:19:57 +08:00
549be94397 [Bugfix] fix pcp + eplb error (#5561) weiguihua2 2026-01-05 14:08:11 +08:00
52863c4165 [Refactor][EAGLE] 2/N: load model and generate token (#5437) lilinsiman 2026-01-05 14:07:54 +08:00
50e7934415 MLA prefill preformance optimization (#5456) pichangping 2026-01-05 11:41:59 +08:00
c23cf30709 [Doc] eval-type not support service but server (#2920) L4 2026-01-05 11:17:39 +08:00
2b5536362a [CI] skip xlite-decode-only e2e test (#5407) Magnus 2026-01-05 11:05:26 +08:00
a099b994b3 [Doc] update supported models (#5379) zhangxinyuehfad 2026-01-05 09:21:52 +08:00
42774df744 [Bugfix] Fix weight transpose in RL scenarios (#5567) panchao-hub 2026-01-05 09:17:26 +08:00
d25a2c20c5 [Bugfix] Fix chunk prefill bug for long_sequence feature (#5444) LookAround0301 2026-01-05 09:16:36 +08:00
fbb93ad8f2 [bugfix]update bishengir source envs (#5582) meihanc 2026-01-05 09:13:40 +08:00
7cf65d0581 [Doc]modify the quantization user guide and add a quantization adaptation developer guide (#5554) InSec 2026-01-05 09:12:11 +08:00
96775a27a8 [refactor](UT,PCP,DCP) refactor pcp&dcp patches in UTs (#5505) Qiu 2026-01-05 09:05:45 +08:00
46c2fc6a3c [KVPOOL]decode save kvcache (#5168) baxingpiaochong 2026-01-04 22:22:01 +08:00
350b95efcf [BugFix]Disable dispatch_gmm_combine_decode operator when mtp drafter model uses non-w8a8 while main model uses w8a8, or drafter model is eagle series (#5293) wangqiankun13 2026-01-04 17:51:28 +08:00
f15dc3fa02 [bugfix](pcp) expand max_num_tokens for pcp pad (#5478) Qiu 2026-01-04 17:25:40 +08:00
749c4a3deb [Doc] Fix typo in ASCEND_RT_VISIBLE_DEVICES (#5581) Cao Yi 2026-01-04 17:01:02 +08:00
d462577504 [Recover] [Bugfix] support mtp kv transfer and pp partition by hand in kv transfer (#4892) (revert in #4981) (#5511) lidenghui1110 2026-01-04 16:49:33 +08:00
7c210225a2 [Perf][PCP][DCP] add multi-stream for GQA to enable computation-communication overlap (#5382) Qiu 2026-01-04 16:33:18 +08:00
37fd48bee5 [CI] Move longseq Nightly CI (#5577) dsxsteven 2026-01-04 15:42:43 +08:00
fb9fdcdbe4 [Feat] enable hierarchical mc2 ops on A2 by default (#5545) hwhaokun 2026-01-04 14:44:20 +08:00
363ac1b80f [Feat][main] Supported to use full-graph with Qwen3-Next-MTP (#5477) drslark 2026-01-04 12:03:21 +08:00
fd4b4fd06f [Doc] Fix spelling mistake of environment variable name ASCEND_RT_VISIBLE_DEVICES in Doc (#5570) TmacAaron 2026-01-04 11:52:58 +08:00
1d7539ab3f Cleanup pass config override (#5283) wangxiyuan 2026-01-04 11:52:12 +08:00
3c7e6c6817 [CI] Add multi-nodes longseq configs of DeepSeek-R1-W8A8 & Qwen3-235B-W8A8 (#5381) dsxsteven 2026-01-04 10:38:40 +08:00
799b41a9f4 Bump actions/download-artifact from 4 to 7 (#5465) dependabot[bot] 2026-01-04 08:54:06 +08:00
ad40494b84 Bump actions/upload-artifact from 4 to 6 (#5466) dependabot[bot] 2026-01-04 08:53:52 +08:00
32a56496cc [Nightly] Trigger image build for nightly (#5547) Li Wang 2026-01-04 08:50:57 +08:00
d193316ded [P/D] Bugfix zmq send/receive failed (#5503) Chao Lei 2025-12-31 19:17:08 +08:00
80fc0f5b9e [Graph][Fusion] Add AddRMSNorm(with bias) (#5491) CodeCat 2025-12-31 17:10:26 +08:00
d07d8a4535 [Model] Add LongCat-Flash (#3833) Chu Yuelin 2025-12-31 17:06:55 +08:00
03679cf1d3 [Bugfix] fix the precision issues that may raise from the inter-layer reuse of the workspace in certain scenarios (#5522) 无脸男 2025-12-31 16:54:04 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0