Commit Graph

  • 592cfb6a6f [CI] Add Triton Ascend in CI (#4921) meihanc 2025-12-23 12:47:35 +08:00
  • 2e010e12dd [EPLB][CI] Add dynamic EPLB CI for qwen3-moe (#5179) LI SHENGYONG 2025-12-23 11:31:00 +08:00
  • 449f8f65a7 [KV-Sharing] Support KV-Sharing feature in CLA models (#4138) Mengqing Cao 2025-12-23 10:48:31 +08:00
  • 9a79cbaecb [ModelRunner] Add hunyuan-vl basic support (#5151) Li Wang 2025-12-23 10:46:54 +08:00
  • c9b5881bcd [Doc] fix docs set rope_theta value is 10e6 in qwen3-235b model (#5258) rongfu.leng 2025-12-23 10:21:46 +08:00
  • 6c478531f8 [CustomOp] Register AscendApplyRotaryEmb CustomOp and remove related patch (#4667) Shanshan Shen 2025-12-23 10:04:37 +08:00
  • 35dbdbb398 [Doc] Add new contributors and relative scripts. (#5070) zhangyiming 2025-12-23 10:01:45 +08:00
  • 3d04ae8e7d [Main] [Patch] support balance scheduling patch (#5212) Zhu Yi Lin 2025-12-23 09:04:38 +08:00
  • f883a2edb9 [Doc] Update the weight download URL. (#5238) zhangyiming 2025-12-23 08:53:30 +08:00
  • c3a8d13ca7 [refactor] Remove unnecessary attributes from set_ascend_forward_context (#5204) Wang Kunpeng 2025-12-23 08:49:52 +08:00
  • 95e8a52156 [Refactor] move the metadata from attention_v1 to util(ready for extract common_cp) & realize Ascendmetadata inherit from the parent class. (#5203) weijinqian0 2025-12-23 00:10:52 +08:00
  • 3d9954eff0 [Bugfix] Use hf_text_config instead of hf_config to support multimodal PD-Disaggregated (#5205) ApsarasX 2025-12-22 20:21:45 +08:00
  • 3ba920a65b [TEST]Update mm param --mm-processor-cache-gb (#5242) jiangyunfan1 2025-12-22 18:54:03 +08:00
  • 61efaffcaf [Bugfix] Implement multimodal_cpu_fields in model runner (#5196) zhangxinyuehfad 2025-12-22 18:39:45 +08:00
  • 052e472453 [bugfix] fix w8a8dynamic fused_moe trans nz (#5199) zzzzwwjj 2025-12-22 17:45:34 +08:00
  • 55beac9c91 [Feat]Xlite Qwen3-vl Support (#5228) lvjunqi 2025-12-22 16:30:52 +08:00
  • 78aa7f2693 [feature] support pcp + mtp in full graph (#4572) zhangsicheng5 2025-12-22 16:13:39 +08:00
  • 12d581605b [Triton]support swiglu_quant triton in w4a8 (#5161) Zhu Yi Lin 2025-12-22 16:01:58 +08:00
  • 60d9398f6d [1/N][Eagle3] Aligns auxiliary hidden state usage for eagle3 models (#5162) Yizhou 2025-12-22 15:24:54 +08:00
  • b62b2ebd9b [Doc] Update readme (#5226) wangxiyuan 2025-12-22 15:21:16 +08:00
  • 4861484b68 Bump actions/checkout from 4 to 6 (#5234) dependabot[bot] 2025-12-22 15:16:43 +08:00
  • 11a25497ce Bump actions/upload-artifact from 4 to 6 (#5233) dependabot[bot] 2025-12-22 15:15:45 +08:00
  • 64669c4243 [misc][FlashComm1][ACLGraph] Incompatibility between Flashcomm1 and FULL_DECODE_ONLY. (#5200) Qiu 2025-12-22 03:33:32 -03:00
  • b84ad8c5d8 [CustomOp] Register AscendMMEncoderAttention CustomOp and remove related patch (#4750) Shanshan Shen 2025-12-22 14:32:53 +08:00
  • b2c121637f [task] Add fused gdn gating triton kernel (#4304) Ascendyh 2025-12-22 14:09:19 +08:00
  • ea6206bb18 [bugfix][ACLGraph][MTP] deletes cudagraph_batch_sizes in MtpProposer (#5183) Qiu 2025-12-22 03:08:27 -03:00
  • dc047489c7 [Doc] Fix DeepSeek-V3.2 tutorial. (#5190) zhangyiming 2025-12-22 11:30:17 +08:00
  • 492173cf89 [Misc] Cleanup useless print and logger (#5220) wangxiyuan 2025-12-22 11:28:26 +08:00
  • e117b3d693 [Perf] vectorize PCP/DCP loops in mla_v1.py (#5003) Feng Liu 2025-12-22 11:06:30 +08:00
  • 49838d4bec [Perf] vectorize PCP/DCP loops in attention_cp.py (#4944) Feng Liu 2025-12-22 11:06:19 +08:00
  • 904c18f929 [Feature]Use DispatchGmmCombineDecode operator to replace MC2(Optional) (#5040) wangqiankun13 2025-12-21 15:23:59 +08:00
  • 67a0325cf2 [BugFix]Fix wrong _cos, _sin instantiation (#5154) Angazenn 2025-12-20 22:52:50 +08:00
  • 5d02eed16f [Performance] Add async exponential while model executing (#4501) YuhanBai 2025-12-20 21:23:21 +08:00
  • 58773af708 [Fix] Delete pooling redundant code (#4940) lianyibo 2025-12-20 20:47:30 +08:00
  • 21745221a3 [lint]clean code (#5218) weiguihua2 2025-12-20 18:24:04 +08:00
  • bbde0f9743 [CI] fix lint (#5216) wangxiyuan 2025-12-20 17:03:25 +08:00
  • 74aa968a9f [e2e] add pcp e2e (#5141) weiguihua2 2025-12-20 16:56:46 +08:00
  • 5d59bf8ca0 [CI] unblock CI on suffix spec decoding (#4813) Mengqing Cao 2025-12-20 14:54:49 +08:00
  • 758d81dcb1 Drop 0.12.0 support (#5146) wangxiyuan 2025-12-20 09:38:53 +08:00
  • 243ab7d720 [CI] Use offline mode for nightly test (#5187) Li Wang 2025-12-19 21:21:42 +08:00
  • 14931d2a86 [CI] Fix image merge bug (#5197) Li Wang 2025-12-19 17:30:48 +08:00
  • 141bd913e1 restore matmul_allreduce_add_rmsnrom aclnn interface (#5119) Trunrain 2025-12-19 17:06:59 +08:00
  • 17f2eead99 [Doc]Add the user_guide doc file regarding fine-grained TP. (#5084) zzhxxx 2025-12-19 16:37:25 +08:00
  • 0cc3fc357f [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (#4818) XiaoxinWang 2025-12-19 16:34:11 +08:00
  • 118b0ed346 [Feature] Add token mask for DispatchGmmCombineDecode operator (#5171) wangqiankun13 2025-12-19 16:31:48 +08:00
  • 636265be6d [CI] Improve CI (#5078) wangxiyuan 2025-12-19 15:34:35 +08:00
  • 35ad11b637 [Refactor] remove some metadata variables in attention_v1. (#5160) weijinqian0 2025-12-19 14:57:09 +08:00
  • bc05a81bf2 Add Qwen3-VL-235B-A22B-Instruct tutorials (#5167) luluxiu520 2025-12-19 14:56:17 +08:00
  • 5ab6d124e5 [Doc] Add a perf tune section (#5127) Li Wang 2025-12-19 14:52:52 +08:00
  • a6eaf816f1 [Image] Refactor image build (#5175) Li Wang 2025-12-19 14:35:51 +08:00
  • cc23067f1e [refactor] refactor weight trans nz and transpose (#4878) zzzzwwjj 2025-12-19 14:27:24 +08:00
  • ea8f544ce7 [BugFix]Fix precision issue for LoRA feature (#4141) hukongyi 2025-12-19 14:22:06 +08:00
  • f952de93df 【Doc】Deepseekv3.1/R1 doc enhancement (#4827) 1092626063 2025-12-19 10:52:33 +08:00
  • 76e58d66be support basic long_seq feature st (#5140) LookAround0301 2025-12-19 10:50:01 +08:00
  • cee9b715b5 [Bugfix] install trition for test_custom_op (#5112) zhangxinyuehfad 2025-12-19 10:40:46 +08:00
  • ca6f631cba [2/N][Pangu][MoE] Remove Pangu Related Code (#5130) weichen 2025-12-19 09:00:07 +08:00
  • 1b47fca0e8 [bugfix] Use FUSED_MC2 MoE comm path for the op dispatch_ffn_combine (#5156) Chen Chen 2025-12-18 23:34:31 +08:00
  • 73e4b4f496 [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (#5131) zhaomingyu13 2025-12-18 23:07:14 +08:00
  • 073a3a6e6c [Doc][P/D] Fix MooncakeConnector's name (#5172) zxr2333 2025-12-18 22:29:19 +08:00
  • 2304218f90 [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (#5165) Zetong Li 2025-12-18 22:27:47 +08:00
  • 7d32371b7e [Doc] Refact benchmark doc (#5173) Li Wang 2025-12-18 22:26:13 +08:00
  • 6cb76ecd02 [Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (#5174) ZT-AIA 2025-12-18 22:25:45 +08:00
  • 632eab28b7 [BugFix]Fix incorrect get_current_vllm_config (#5121) Angazenn 2025-12-18 22:21:36 +08:00
  • fd9a47c04d fix vl pd smoke error (#5103) shaopeng-666 2025-12-18 22:20:45 +08:00
  • ff3914e31a [Fix] Refines decode mode padding condition for uniform queries (#5164) Yizhou 2025-12-18 21:09:23 +08:00
  • acc3578f58 [Graph][Fusion]Add new pattern for AddRmsnormQuant with SP. (#5077) Angazenn 2025-12-18 20:25:44 +08:00
  • a74a1196c5 [Feat] Support MLP_TP feature, exclude MOE layer (#4999) zzhxxx 2025-12-18 20:06:53 +08:00
  • 5a88e3333b feat: implement high-performance Triton kernels for rejection sampling (#4830) yuxingcyx 2025-12-18 19:42:10 +08:00
  • 0f571c347b Nominate new maintainers @zzzzwwjj @realliujiaxu @LCAIZJ (#5152) wangxiyuan 2025-12-18 18:49:07 +08:00
  • 9fcaf66646 fix: use batch_matmul_transpose operator in MLA _v_up_proj for better performance (#5142) LICO67373 2025-12-18 16:48:55 +08:00
  • b69b04d3a9 implement model runner v2 basic framework (#5051) Ronald 2025-12-18 15:51:54 +08:00
  • 1c8c23de58 [Bugfix] fix pipeline parallelism bug introduced by async-scheduling refactor work (#4973) lidenghui1110 2025-12-18 15:27:55 +08:00
  • 9268ad11e3 Qwen3-Next:Update the gpu-memory-utilization parameter to 0.7 (#5129) ming1212 2025-12-18 15:16:33 +08:00
  • ef8157a5f2 fixed fused alltoall execute all reduce (#5109) AlvisGong 2025-12-18 15:07:40 +08:00
  • 78602eab4f [UT] Add mooncake ut test (#5080) Yuzhou Tong 2025-12-18 15:07:14 +08:00
  • 9045843c90 [UT]Ut for function cumsum_group_list in moe_mlp (ref #5025) (#5036) Clorist33 2025-12-18 15:00:16 +08:00
  • 543f122101 [Fix] Fix DeepSeek V3.2 "no attr" error (#5147) Yizhou 2025-12-18 14:46:41 +08:00
  • b0376abd4c [feat] proxy support elastic scaling (#5063) yuxinshan 2025-12-18 14:29:53 +08:00
  • 71e544e259 [test] add w4a8 accuracy case (#5110) ck-hw-1018 2025-12-18 14:10:14 +08:00
  • 39fb9e7c83 qwen3_next add triton ops : fused_qkvzba_split_reshape (#4788) ZT-AIA 2025-12-18 11:31:04 +08:00
  • 07014e2101 [UT] Add model_runner pcp related UTs (#4951) zhangsicheng5 2025-12-18 10:54:57 +08:00
  • 879ec2d1c4 [Doc] add qwen3 reranker (#5086) TingW09 2025-12-18 10:54:07 +08:00
  • 8069442b41 enable npugraph_ex (#5120) panchao-hub 2025-12-18 09:08:40 +08:00
  • 39bdd4cfaa fix profile run for vl model (#5136) shaopeng-666 2025-12-17 23:51:31 +08:00
  • 43d974c6f7 [Fix] Synchronize the host query_start_loc with device values to prevent shape mismatches (#5134) Yizhou 2025-12-17 23:50:12 +08:00
  • 950570f8d1 [Bugfix]delele profile_run in model_runner (#5122) zhenwenqi2024 2025-12-17 23:48:34 +08:00
  • 98e6e57622 [Refactor] 4/N Distinguish the branches based on the applicable scenarios of PA and FIA Ops. (#5081) weijinqian0 2025-12-17 23:14:02 +08:00
  • 7671ce1bf1 Fix a data conversion bug introduced by commit 3b7eb51 in main#4655 (#5115) Yuzhou Tong 2025-12-17 20:19:02 +08:00
  • 7f1e93f185 [Bugfix][MoE] Remove All2All in w4a8_dynamic (#4977) weichen 2025-12-17 17:39:57 +08:00
  • 97537709ae [BugFix] Fix mooncake bug in PCP scenario (#5055) dsxsteven 2025-12-17 16:32:16 +08:00
  • eda3cabf5b [UT] add pcp&dcp UT for mla_cp (#4953) Feng Liu 2025-12-17 16:19:27 +08:00
  • 724d04391e [model] Support PanguUltraMoE (#4615) JeffLee1874 2025-12-17 16:15:29 +08:00
  • f0060fc822 [Pangu][MoE] Remove PanguProMoEV1 related code (#5088) weichen 2025-12-17 16:14:42 +08:00
  • 3f7a2fba70 [main][doc] Instructions for using permissions added to docker (#5092) lilinsiman 2025-12-17 15:26:09 +08:00
  • 06b82e7503 [main] rename device type (#5099) zzzzwwjj 2025-12-17 14:08:19 +08:00
  • 4144376e88 [CI] Fix UT (#5106) wangxiyuan 2025-12-17 09:52:20 +08:00
  • bf97048bce [feat]pd disaggregated support cross-machine (#5008) weiguihua2 2025-12-17 09:28:03 +08:00
  • 153eeaa621 [Bugfix] Fix DeepSeek FIA error in async_scheduling with mtp (#5046) Wang Yixuan 2025-12-17 09:20:44 +08:00
  • 06f33540c4 [UT]add the UT of pcp and dcp in the attention_cp file (#5054) pichangping 2025-12-17 09:11:33 +08:00
  • cadfa5ddc1 [Fusion] [Graph] Add qknorm rope fusion operator (#4711) Icey 2025-12-17 08:53:44 +08:00