Commit Graph

  • 3be8e33fe9 [Kernel] Add moe_gating_top_k operator support for Ascend NPU (#5579) ZCG12345 2026-01-07 21:42:31 +08:00
  • 1165b2c863 [1/N][CI] Refactor accuracy test (#5400) Li Wang 2026-01-07 20:58:15 +08:00
  • b94fc13d3f [BugFix][Fusion] Fix graph fusion failure problem (#5676) Icey 2026-01-07 18:42:55 +08:00
  • 137f28341d [Tests] Add qwen3-8b nightly test (#5597) Icey 2026-01-07 18:42:05 +08:00
  • 3f4f2b4ae6 [Refactor] Import global var form vllm instead of overwirte it (#5469) Mengqing Cao 2026-01-07 18:41:45 +08:00
  • 380f089fbf [Refactor] Fix AttentionMaskBuilder singleton and remove redundant pcp_prefill_mask (#4870) LICO67373 2026-01-07 17:09:52 +08:00
  • 91790fd85a [CI] move image and wheel job to schedule way (#5685) wangxiyuan 2026-01-07 16:40:19 +08:00
  • 1140789e83 [Bugfix] Fix the graph capture failure issue in the eagle3+full scenario. (#5553) 无脸男 2026-01-07 15:57:16 +08:00
  • fa0fb46853 fix reload return value starkwj 2026-01-07 07:42:30 +00:00
  • 2b8a9ce8bd [Bugfix] fix resource are insufficient when pcp and piecewise (#5377) weiguihua2 2026-01-07 15:39:52 +08:00
  • 4f9808002b [CI] Add workflow to cancel running workflows on PR close (#5646) Paco Xu 2026-01-07 15:38:10 +08:00
  • d314ea8d3d [CI] Bump lm-eval version to v0.4.9.2 (#5655) Li Wang 2026-01-07 14:15:53 +08:00
  • 6f7a81cd9f [CI] cleanup single/multi-card test (#5623) wangxiyuan 2026-01-07 14:13:34 +08:00
  • 1afbc01ed4 [misc]Add Kimi-K2 series to CI model list (#5656) SILONG ZENG 2026-01-07 11:32:48 +08:00
  • d6bb17f10e [Bugfix]Add register_kv_cache in ucm_connector (#5657) UnifiedCacheManager 2026-01-07 11:30:33 +08:00
  • cd59323e40 [Bugfix] Revert pr4214 multi-stream collect expert hotpot (#5529) LI SHENGYONG 2026-01-07 11:26:47 +08:00
  • 25baf6df09 [Feature]EPLB:Adapt DispatchGmmCombineDecode operator to eplb tensor list and expert token numbers (#5552) wangyibo1005 2026-01-07 11:23:42 +08:00
  • 086c093347 [CI] Add DeepSeek-V3.2-W8A8 nightly ci test (#5371) starmountain1997 2026-01-07 10:02:02 +08:00
  • cbc987db0b [bugfix (pcp)] fix chunked prefill accurancy issue (#5647) Feng Liu 2026-01-07 10:01:27 +08:00
  • 1112208052 [Refactor] Cleanup platform (#5566) wangxiyuan 2026-01-07 09:25:55 +08:00
  • 6ea2afe5fa [Feature] implement basic framework for batch invariant (#5517) Ronald 2026-01-07 09:11:26 +08:00
  • bdedf3c9f8 [Graph][Fusion] Add AddRMSNormSPPattern and AddRMSNormSPPatternWithBias (#5569) CodeCat 2026-01-07 09:03:45 +08:00
  • ad9b711f89 [Bugfix] fix dcp_only bug and add e2e accuracy test for dcp only and pcp only (#5565) zhenwenqi2024 2026-01-06 22:48:21 +08:00
  • 77a029979e Revert "[BugFix][Fusion] Fix graph fusion failure problem (#5253)" (#5667) Fager10086 2026-01-06 21:55:47 +08:00
  • 330e25ab1d [P/D] Performance enhancement of Layerwise connector in TP asymmetric scenarios (#5540) liziyu 2026-01-06 20:25:36 +08:00
  • cd1162e25a [Misc] Remove useless weight loader patch (#5619) wangxiyuan 2026-01-06 20:17:32 +08:00
  • 089ca2ddcc [Nightly][Test] Add Qwen3-Next-80B-A3B-Instruct-W8A8 nightly test (#5616) InSec 2026-01-06 17:36:00 +08:00
  • cc0110abb4 [Bugfix] Remove swa parameter of fia (#5602) yeyifan 2026-01-06 17:24:43 +08:00
  • 29e2f9a43e Bugfix: Align expert map shapes with redundant experts in EPLB adjustment (#5285) Mercykid-bash 2026-01-06 17:22:36 +08:00
  • fe3f2c7702 [Refactor][EAGLE] 3/N delete redundant methods in mtp_proposer (#5420) Zetong Li 2026-01-06 16:47:39 +08:00
  • b94d589769 [MM][Bugfix] Update hf_config to hf_text_config (#5319) Shanshan Shen 2026-01-06 16:41:39 +08:00
  • 293b2275df [CI] Specify the version of xlite (#5612) Magnus 2026-01-06 16:02:16 +08:00
  • b8f245792e [Main2Main] Upgrade vllm commit to 0106 (#5617) wjunLu 2026-01-06 15:50:40 +08:00
  • c1dcddce3f [CI]update bisheng version (#5621) meihanc 2026-01-06 15:22:22 +08:00
  • e07938047e [UT][PCP&DCP] UT for block_table.py (#5032) Qiu 2026-01-06 11:19:25 +08:00
  • 3cf059a72b [Main2Main] Upgrade vllm commit to 0105 (#5595) wjunLu 2026-01-06 08:44:29 +08:00
  • c5e2f48510 [CI] mv ops to correct path (#5615) Li Wang 2026-01-05 23:17:07 +08:00
  • 129ba9fe1b [BugFix] Fix Smoke Testing Bug for DSR1 longseq (#5613) dsxsteven 2026-01-05 22:40:28 +08:00
  • 8eae949d11 Revert "[Feat] enable hierarchical mc2 ops on A2 by default (#5545)" (#5611) ZixuanWang 2026-01-05 22:39:05 +08:00
  • 11e75494b1 [TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (#5267) Angazenn 2026-01-05 21:35:37 +08:00
  • a2daacbd71 [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (#5192) Chen Chen 2026-01-05 21:29:45 +08:00
  • 074ae28d6e 更新 README.md lumian 2026-01-05 20:33:31 +08:00
  • b10ef9b9f3 [docs] Correct image about prefill phase of PCP (#5598) Qiu 2026-01-05 20:21:59 +08:00
  • a034941d06 [CI] update triton-ascend version (#5584) meihanc 2026-01-05 20:20:11 +08:00
  • 473431e7e2 [P/D]Remove mooncake kvpool unused parameter local_hostname (#5574) Chao Lei 2026-01-05 20:18:59 +08:00
  • d86021f7b4 [Bugfix] record cos and sin cache in AscendRotaryEmbedding (#5516) Debonet 2026-01-05 20:12:41 +08:00
  • 16b1bee804 [bugfix] fix test_camem failed with triton-ascend (#5492) meihanc 2026-01-05 20:10:54 +08:00
  • 58e8d19c35 [UT]add triton ops ut : test_fused_qkvzba_split_reshape_cat (#5474) ZT-AIA 2026-01-05 20:05:07 +08:00
  • 1e6228d8cd [CI] Download models from ms (#5405) Li Wang 2026-01-05 19:59:13 +08:00
  • 2d22700d69 Docs: Add A3 Docker image guidance for Atlas A3 machines (#5256) huqi 2026-01-05 19:42:42 +08:00
  • 9d8b4c8d9d [Doc] Add NNAL installation guide and requirements (#5235) huqi 2026-01-05 19:40:26 +08:00
  • caf0289e1a add Dockerfile and readme starkwj 2026-01-05 09:10:56 +00:00
  • ec3563334b Add the requirement of arctic-inference which speculative decoding with suffix_decode (#5045) frankie 2026-01-05 19:15:49 +08:00
  • e7b623b363 [BugFix][Fusion] Fix graph fusion failure problem (#5253) Icey 2026-01-05 17:49:09 +08:00
  • 4a3663327b [Refactor]7/N Extract common code to common_cp (#5490) wujinyuan1 2026-01-05 17:41:12 +08:00
  • 755caeb06e [Feat][Spec] Optimize token index calculation in spec decode with Triton kernel (#5356) Yizhou 2026-01-05 16:51:29 +08:00
  • 8ffe3f5d78 feat: implement high-performance Triton kernels for rejection sampling: optimization for rejection_random_sample_kernel (#5259) daniel 2026-01-05 16:03:02 +08:00
  • 91bf524364 [BugFix][kernel] fix matmul_allreduce_add_rmsnorm_kernel (#5335) Trunrain 2026-01-05 15:19:54 +08:00
  • 6c1a685b30 [Doc] add new doc for mooncake: PD-Colocated cross-node multi-instance validation of Mooncake's KV Cache reuse and performance. (#5415) zhangmuzhi_yuwan 2026-01-05 14:19:57 +08:00
  • 549be94397 [Bugfix] fix pcp + eplb error (#5561) weiguihua2 2026-01-05 14:08:11 +08:00
  • 52863c4165 [Refactor][EAGLE] 2/N: load model and generate token (#5437) lilinsiman 2026-01-05 14:07:54 +08:00
  • 50e7934415 MLA prefill preformance optimization (#5456) pichangping 2026-01-05 11:41:59 +08:00
  • c23cf30709 [Doc] eval-type not support service but server (#2920) L4 2026-01-05 11:17:39 +08:00
  • 2b5536362a [CI] skip xlite-decode-only e2e test (#5407) Magnus 2026-01-05 11:05:26 +08:00
  • a099b994b3 [Doc] update supported models (#5379) zhangxinyuehfad 2026-01-05 09:21:52 +08:00
  • 42774df744 [Bugfix] Fix weight transpose in RL scenarios (#5567) panchao-hub 2026-01-05 09:17:26 +08:00
  • d25a2c20c5 [Bugfix] Fix chunk prefill bug for long_sequence feature (#5444) LookAround0301 2026-01-05 09:16:36 +08:00
  • fbb93ad8f2 [bugfix]update bishengir source envs (#5582) meihanc 2026-01-05 09:13:40 +08:00
  • 7cf65d0581 [Doc]modify the quantization user guide and add a quantization adaptation developer guide (#5554) InSec 2026-01-05 09:12:11 +08:00
  • 96775a27a8 [refactor](UT,PCP,DCP) refactor pcp&dcp patches in UTs (#5505) Qiu 2026-01-05 09:05:45 +08:00
  • 46c2fc6a3c [KVPOOL]decode save kvcache (#5168) baxingpiaochong 2026-01-04 22:22:01 +08:00
  • 350b95efcf [BugFix]Disable dispatch_gmm_combine_decode operator when mtp drafter model uses non-w8a8 while main model uses w8a8, or drafter model is eagle series (#5293) wangqiankun13 2026-01-04 17:51:28 +08:00
  • f15dc3fa02 [bugfix](pcp) expand max_num_tokens for pcp pad (#5478) Qiu 2026-01-04 17:25:40 +08:00
  • 749c4a3deb [Doc] Fix typo in ASCEND_RT_VISIBLE_DEVICES (#5581) Cao Yi 2026-01-04 17:01:02 +08:00
  • d462577504 [Recover] [Bugfix] support mtp kv transfer and pp partition by hand in kv transfer (#4892) (revert in #4981) (#5511) lidenghui1110 2026-01-04 16:49:33 +08:00
  • 7c210225a2 [Perf][PCP][DCP] add multi-stream for GQA to enable computation-communication overlap (#5382) Qiu 2026-01-04 16:33:18 +08:00
  • 37fd48bee5 [CI] Move longseq Nightly CI (#5577) dsxsteven 2026-01-04 15:42:43 +08:00
  • fb9fdcdbe4 [Feat] enable hierarchical mc2 ops on A2 by default (#5545) hwhaokun 2026-01-04 14:44:20 +08:00
  • 363ac1b80f [Feat][main] Supported to use full-graph with Qwen3-Next-MTP (#5477) drslark 2026-01-04 12:03:21 +08:00
  • fd4b4fd06f [Doc] Fix spelling mistake of environment variable name ASCEND_RT_VISIBLE_DEVICES in Doc (#5570) TmacAaron 2026-01-04 11:52:58 +08:00
  • 1d7539ab3f Cleanup pass config override (#5283) wangxiyuan 2026-01-04 11:52:12 +08:00
  • 3c7e6c6817 [CI] Add multi-nodes longseq configs of DeepSeek-R1-W8A8 & Qwen3-235B-W8A8 (#5381) dsxsteven 2026-01-04 10:38:40 +08:00
  • 799b41a9f4 Bump actions/download-artifact from 4 to 7 (#5465) dependabot[bot] 2026-01-04 08:54:06 +08:00
  • ad40494b84 Bump actions/upload-artifact from 4 to 6 (#5466) dependabot[bot] 2026-01-04 08:53:52 +08:00
  • 32a56496cc [Nightly] Trigger image build for nightly (#5547) Li Wang 2026-01-04 08:50:57 +08:00
  • d193316ded [P/D] Bugfix zmq send/receive failed (#5503) Chao Lei 2025-12-31 19:17:08 +08:00
  • 80fc0f5b9e [Graph][Fusion] Add AddRMSNorm(with bias) (#5491) CodeCat 2025-12-31 17:10:26 +08:00
  • d07d8a4535 [Model] Add LongCat-Flash (#3833) Chu Yuelin 2025-12-31 17:06:55 +08:00
  • 03679cf1d3 [Bugfix] fix the precision issues that may raise from the inter-layer reuse of the workspace in certain scenarios (#5522) 无脸男 2025-12-31 16:54:04 +08:00
  • 46a1614387 [P/D] Improve the performance of Layerwise Connector (#5303) zxr2333 2025-12-31 15:09:01 +08:00
  • 7d5242faca [Refactor] Formatting output types related to FuseMoE (#5481) Jade Zheng 2025-12-31 14:24:37 +08:00
  • 38570cfeb6 [Feature] Support kv nz feature for DeepSeek decode node in disagg-prefill scenario (#3072) Jade Zheng 2025-12-31 14:24:04 +08:00
  • a539ae753a [feature] mooncake support pcp/dcp in common conditions (#5224) wangxiaochao6 2025-12-31 09:53:03 +08:00
  • a5ae07a5d2 [Bugfix] Fix mm_merge (#5249) Li Wang 2025-12-31 09:49:55 +08:00
  • 3c2d3e52e5 [Main2Main] Upgrade vllm commit to 1230 (#5495) wjunLu 2025-12-31 09:44:35 +08:00
  • 5d9fde9819 [Feature] Refactor PCP &DCP related code (#5214) zhenwenqi2024 2025-12-31 09:29:57 +08:00
  • 46862ce1af [main][test] Refactor the mtp and eagle test case (#5326) lilinsiman 2025-12-31 09:22:58 +08:00
  • bdc721d35a [smoke][bugfix] moe_init_routing_v2 active_expert_range use int type (#5521) LI SHENGYONG 2025-12-31 09:19:04 +08:00
  • 2ee17e50a1 [2/N] Upgrade nightly doc (#5534) Li Wang 2025-12-31 09:11:42 +08:00
  • 98798d80a0 [Doc] Add new contributors. (#5537) zhangyiming 2025-12-31 07:39:42 +08:00