Commit Graph

  • 073097a9a1 [3/N][Nightly] Move ops tests to nightly (#5538) Li Wang 2025-12-30 20:50:44 +08:00
  • e760aae1df [1/N] Refactor nightly test structure (#5479) Li Wang 2025-12-30 19:03:02 +08:00
  • c85cc045f8 Docs: Remove deprecated --task parameter for embedding models (#5257) huqi 2025-12-30 16:09:07 +08:00
  • 71f729a661 Revert "moe_gating_top_k" (#5512) zzzzwwjj 2025-12-30 15:05:47 +08:00
  • 4ff4d1cef9 [Doc] Fix issue link for 0.12.0 (#5500) wangxiyuan 2025-12-30 10:34:20 +08:00
  • 8c4e9bb76b [CI]update triton ascend version (#5392) meihanc 2025-12-30 09:51:45 +08:00
  • 45c3c279e2 moe_gating_top_k (#5271) ZCG12345 2025-12-30 09:28:01 +08:00
  • 15d73f248e [refactor] refactor model runner capture model (#5230) weiguihua2 2025-12-30 08:32:14 +08:00
  • 5e96f94d2a Update corresponding vllm commit ID to 12 29 (#5475) Nengjun Ma 2025-12-29 22:48:05 +08:00
  • 51da5ea543 [Kernel]update csrc cmakelist for open-source cann (#5458) Fager10086 2025-12-29 20:34:53 +08:00
  • d5f72835e6 [OP] add custom op aclnnMoeInitRoutingCustom (#5251) jiazhengyi 2025-12-29 19:29:40 +08:00
  • 92353c0643 [Refactor][EAGLE] 1/N delete __init__ in mtp_proposer (#5176) Zetong Li 2025-12-29 16:25:52 +08:00
  • 28b7614322 [Refactor][Triton] Move reject sample triton kernels into ops/triton (#5324) whx 2025-12-29 16:15:41 +08:00
  • e7e1a7dc05 [Feature] support eager mode in model runner v2 (#5210) Ronald 2025-12-29 15:28:34 +08:00
  • 4da46da9bf [feature] fia support sliding windows (#5239) yeyifan 2025-12-29 14:56:25 +08:00
  • d8e15dae6c Optimize some rejectsampler functions to make npu op launch non-blocking (#4587) ZongYuan Zhan 2025-12-29 14:10:39 +08:00
  • 3e67e8276c [Feature] Support to use fullgraph with eagle (#5118) anon189Ty 2025-12-29 09:54:51 +08:00
  • f81cf694b2 [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy(depend on pr5285) (#5311) LI SHENGYONG 2025-12-29 09:26:14 +08:00
  • 23169021d9 [Refactor]6/N Extract common code of class AscendMLAImpl (#5314) wujinyuan1 2025-12-28 10:40:45 +08:00
  • dbe4c338f2 [Refactor] cache cos/sin in mla & remove parameter model in builder. (#5277) weijinqian0 2025-12-28 10:35:07 +08:00
  • 24328aaf00 update vllm pin to 12.27 (#5412) ZT-AIA 2025-12-28 00:19:36 +08:00
  • 1b5d5abf86 [ReleaseNote] Add release note for v0.13.0rc1 (#5334) Mengqing Cao 2025-12-27 18:46:57 +08:00
  • 58adf7c8ac [Bugfix] Correctly handle the output shape in multimodal attention (#5443) Li Wang 2025-12-27 18:42:46 +08:00
  • 1d81bfaed1 Fix nightly (#5413) Li Wang 2025-12-27 18:16:46 +08:00
  • e91e11d3b0 [bugfix] fix typo of _skip_all_reduce_across_dp_group (#5435) jiangkuaixue123 2025-12-27 17:50:04 +08:00
  • c30c3dc831 [Doc]modify pcp tutorial doc (#5440) weiguihua2 2025-12-27 17:47:09 +08:00
  • 77cd960524 [Misc] fast fail for exiting if tools/install_flash_infer_attention_score_ops_a2.sh (#5422) Mengqing Cao 2025-12-27 17:30:34 +08:00
  • b8b5521f5b [Doc] Update DeepSeek V3.1/R1 2P1D doc (#5387) MengLong Chen 2025-12-27 17:28:43 +08:00
  • 843751768e [DOC]Fix model weight download links (#5436) cookieyyds 2025-12-27 17:14:31 +08:00
  • 04104031d0 [Doc] Modify DeepSeek-R1/V3.1 documentation (#5426) Zhu Yi Lin 2025-12-27 17:13:58 +08:00
  • 09f71c14a6 Revert "[feat] enable hierarchical mc2 ops on A2 by default (#5300)" (#5434) realliujiaxu 2025-12-27 17:06:58 +08:00
  • 2add3dc3e0 [Bugfix] fix greedy temperature detection (#5417) realliujiaxu 2025-12-27 17:04:10 +08:00
  • eab306b09c [doc] Update Qwen3-235B doc for reproducing latest performance (#5323) Angazenn 2025-12-27 15:55:58 +08:00
  • 12da9f9460 [feat] enable hierarchical mc2 ops on A2 by default (#5300) hwhaokun 2025-12-27 15:45:25 +08:00
  • be2a947521 [Doc] delete environment variable HCCL_OP_EXPANSION_MODE in DeepSeekV3.1/R1 (#5419) Zhu Yi Lin 2025-12-27 12:44:50 +08:00
  • ca31d6823e [Doc] add long_sequence feature user guide (#5343) LookAround0301 2025-12-27 10:44:43 +08:00
  • cb2fbf7df2 [bugfix] solve dp scenario Host-Device sync (#5298) hwhaokun 2025-12-27 10:36:59 +08:00
  • 69f96950e1 [Doc] modify pcp tutorials (#5411) weiguihua2 2025-12-27 10:36:10 +08:00
  • 3f33ad23fe [BugFix] Fix npu-cpu offloading interface change bug. (#5290) whx 2025-12-27 10:21:20 +08:00
  • 2ef4d1979e [bugfix][main]KV Pool for KV Transfer in PD Disaggregation Scenarios (#5398) fems14 2025-12-27 09:53:57 +08:00
  • ce52e17bf3 [Doc]add long sequence tutorials (#5364) weiguihua2 2025-12-27 09:52:11 +08:00
  • d1f0df7b4b Revert "MLA prefill preformance optimization (#5275)" (#5410) wangxiyuan 2025-12-27 09:48:56 +08:00
  • 711f1861e4 MLA prefill preformance optimization (#5275) pichangping 2025-12-27 09:19:45 +08:00
  • 1486e0d06c [TEST]Add vllm bench (#5306) jiangyunfan1 2025-12-27 09:16:08 +08:00
  • 16ef2474bf [Test] Add acceptance test for eagle/eagle3 (#5366) Zetong Li 2025-12-27 08:50:01 +08:00
  • 8ed6f98a5a [Build] Add installation script of fused_infer_attention_score kernel with flash decoding (#5402) Mengqing Cao 2025-12-27 02:01:06 +08:00
  • f5af6bbd1e [CI] Add qwen-235b-a22b a2 multi-node test (#5393) Nengjun Ma 2025-12-26 23:46:09 +08:00
  • 1d8aa892bf Update vllm pin to 12.26 (#5378) ZT-AIA 2025-12-26 23:44:48 +08:00
  • 8b9ca86827 [Feature] Remove the transpose step after attention and switch to transpose_batchmatmul (#5390) Jade Zheng 2025-12-26 22:03:46 +08:00
  • bc5b7a5fb5 [bugfix] Fix MHA model runtime error in aclgraph mode (#5397) Wang Kunpeng 2025-12-26 21:37:28 +08:00
  • 7685d0c239 rollback causal_conv1d_fn to torch ops & update qwen3Next doc (#5391) LeeWenquan 2025-12-26 19:57:38 +08:00
  • 48854aef5c [TEST]Add sending request with and without chat (#5286) jiangyunfan1 2025-12-26 18:04:17 +08:00
  • 0dfdfa9526 [Feature] Enhance all-reduce skipping logic for MoE models in NPUModelRunner (#5329) Jade Zheng 2025-12-26 17:39:44 +08:00
  • 06732dbf5b [Doc] update R1/V3.1 doc (#5383) Zhu Yi Lin 2025-12-26 17:09:22 +08:00
  • 8ed87dfa84 [doc] Add context parallel user guide (#5358) zhangsicheng5 2025-12-26 17:03:47 +08:00
  • 09390eaf32 [Bugfix] Fix unsuitable moe_comm_type under ep=1 scenario (#5388) Zetong Li 2025-12-26 16:45:45 +08:00
  • da0b113cf5 [doc]<PCP&DCP> add developer guide for PCP&DCP (#5372) Qiu 2025-12-26 05:17:38 -03:00
  • 135cc0a505 vllm-ascend vnpu v1 starkwj 2025-12-26 07:37:35 +00:00
  • 18302c8467 Revert "Add MagicMTP(block verify) and Triton optimization (#4443)" (#5380) Zhu Yi Lin 2025-12-26 15:06:13 +08:00
  • 45c5bcd962 [E2E] Optimize the E2E test time. (#5294) zhangyiming 2025-12-26 14:17:50 +08:00
  • 29d2fe653d cleanup ascend config (#5296) wangxiyuan 2025-12-26 14:07:37 +08:00
  • adaa89a7a5 Update vllm pin to 12.25 (#5342) ZT-AIA 2025-12-26 14:05:40 +08:00
  • c2f776b846 [Nightly] Initial logging for nightly multi-node testing (#5362) Li Wang 2025-12-26 11:39:07 +08:00
  • 320877d488 move contiguous in fused_sigmoid_gating_delta_rule_update to model_runner_v1 (#5274) XiaoxinWang 2025-12-26 09:19:47 +08:00
  • 9b2a7d8866 [BugFix][Fusion] Patch compile backend to make fusion available (#5308) Icey 2025-12-26 09:18:16 +08:00
  • 7372225bcb [FIX] Update _causal_conv1d_update_kernel for Efficient Conv State Handling on NPU (#5322) Qi Mao 2025-12-26 09:12:30 +08:00
  • 4ce32c1a8d [CI] Skip failed test cases to recover CI (#5368) Mengqing Cao 2025-12-26 08:18:23 +08:00
  • 1858f3d36e [Bugfix] Fix Qwen P/D Disaggregation accuracy issue (#5340) Feng Liu 2025-12-25 22:46:08 +08:00
  • 2da8038dd2 [doc] update using command (#5373) cookieyyds 2025-12-25 22:28:35 +08:00
  • 59f11dd1cb [Bugfix] fix xlite decode-only e2e test (#5354) Magnus 2025-12-25 16:30:17 +08:00
  • d752c030e9 [Bugfix] fix pcp 128K break (#5266) weiguihua2 2025-12-25 11:58:52 +08:00
  • 8caad0510d fix e2e rejection-sampler error (#5341) Aoxuan Chen 2025-12-25 11:39:38 +08:00
  • 2ae0bad96d Remove VLLM_ASCEND_ENABLE_DENSE_OPTIMIZE (#5272) wangxiyuan 2025-12-25 11:09:56 +08:00
  • 13cd6362c6 [bugfix] fix Error 'ValueError: Duplicate layer name' (#5280) Wang Kunpeng 2025-12-25 10:43:24 +08:00
  • 30778f371b [BugFix] Fix num_pcp_pads Assignment Issues (#5273) dsxsteven 2025-12-25 10:38:09 +08:00
  • fca2f948c1 [E2E Refactor] Enable skipped e2e case (#5287) wjunLu 2025-12-25 09:18:05 +08:00
  • a9fccbeb30 [CI] add xlite e2e test (#5305) Magnus 2025-12-25 09:17:06 +08:00
  • 6d25372baa Add MagicMTP(block verify) and Triton optimization (#4443) Aoxuan Chen 2025-12-25 09:00:25 +08:00
  • a90482803d [Kernel] add l2norm triton kernel (#4595) Ascendyh 2025-12-25 06:06:18 +08:00
  • e54630e01c Revert [KV-Sharing] Support KV-Sharing feature in CLA models (#4138) (#5317) Mengqing Cao 2025-12-24 22:24:17 +08:00
  • fb3d6ca08c Cleanup uesless env (#5270) wangxiyuan 2025-12-24 22:07:59 +08:00
  • 5018f2d8fd [quantization] Add w8a16 quantization support (#4541) TmacAaron 2025-12-24 19:49:32 +08:00
  • 515267de22 [perf][bugfix] improve performance of rejection sampler and eliminate HD synchronize in TopKTopPSampler (#4154) linfeng-yuan 2025-12-24 19:10:33 +08:00
  • 2f03a2f4a4 [CI] Skip some failed ops tests (#5309) Li Wang 2025-12-24 18:29:34 +08:00
  • 42c989a437 Update vllm pin to 12.24 (#5307) Nengjun Ma 2025-12-24 17:24:31 +08:00
  • a3f65b938f [Doc] Add pa_shape_list description to qwen dense tutorial (#5225) ZYang6263 2025-12-24 14:40:20 +08:00
  • 9227e6af73 [bugfix] remove the EP buffer allocation introduced by fused-op dispatch_ffn_c… (#5284) Chen Chen 2025-12-24 11:26:19 +08:00
  • 74a1de50a9 [E2E] Optimize e2e test. (#5091) zhangyiming 2025-12-24 10:41:55 +08:00
  • bd4fb871c6 [CI] Add skipped testcases. (#5254) zhangyiming 2025-12-24 10:41:32 +08:00
  • 7ff1db4b84 [Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097) wujinyuan1 2025-12-24 10:25:19 +08:00
  • 2a2d527e96 fix transformer version to 4.57.3 (#5250) shaopeng-666 2025-12-23 23:55:40 +08:00
  • 3b59f20a28 update to vllm 12-19 (#5223) Nengjun Ma 2025-12-23 23:52:11 +08:00
  • e14514e2fd [Bugfix] quick fix balance scheduling patch (#5281) Zhu Yi Lin 2025-12-23 21:23:05 +08:00
  • ffe51eedd6 [Refactor][MoE] Reuse vLLM's all_reduce logic (#5189) weichen 2025-12-23 18:53:48 +08:00
  • 8ae7fca947 [CI] refect e2e ci test (#5246) zhangxinyuehfad 2025-12-23 18:42:35 +08:00
  • 5d1f6daef6 [CI] Mock spawn for vlm tests (#5279) Li Wang 2025-12-23 18:35:06 +08:00
  • cb963c53a5 [Doc] Added deploying on k8s with kthena (#4674) Tiger Xu / Zhonghu Xu 2025-12-23 17:46:04 +08:00
  • 22138e2727 [main][Refactor] Remove with_prefill parameter from set_ascend_forward_context (#5094) Slightwind 2025-12-23 14:30:50 +08:00
  • fa0c212bfa [test]Corrected the Qwen3-Omni-30B-A3B-Instruct accuracy test configuration in nightly tests. (#5195) SILONG ZENG 2025-12-23 14:17:27 +08:00
  • 29a93daa82 [CI]refactor: standardize test case naming convention (#5243) SILONG ZENG 2025-12-23 14:13:42 +08:00