Commit Graph

  • fc2bcbe21c [Ops] Fix bug in register_custom_ops without forward_context (#2883) rjg-lyh 2025-09-12 16:58:08 +08:00
  • 6d8bc38c7b Enable label-based image test and use free runner to run lint (#2864) Yikun Jiang 2025-09-12 10:49:42 +08:00
  • 778cb72556 fix bug when rotary_dim is not 128 (#2847) realliujiaxu 2025-09-12 09:49:36 +08:00
  • f5a97e8fa5 [Quantization] register AscendQuantRMSNorm for quantization (#2856) 22dimensions 2025-09-11 23:14:02 +08:00
  • eab3635850 [Bugfix] Retrieve num_redundant_experts from eplb_config in torchair qwen3_moe.py (#2857) wyu0-0 2025-09-11 22:15:19 +08:00
  • aeffe27b30 [Perf]set moe w2_weight default to be nz (#2842) Angazenn 2025-09-11 21:40:54 +08:00
  • 9615dea3a7 Refactor tensor_parallel and comm_utils (#2814) wuweiqiang24 2025-09-11 21:26:36 +08:00
  • 0005479b9c [main] mlp weight prefetch in Qwen Dense Models (#2816) rjg-lyh 2025-09-11 21:20:09 +08:00
  • c3c2221503 [Feat]support dynamic quantization in allgather (#2841) 无脸男 2025-09-11 18:47:20 +08:00
  • 07c58669fd [Bugfix] Update lm_eval version to remove deprecated param (#2871) Li Wang 2025-09-11 18:39:03 +08:00
  • bd3dedea61 support qwen25 vl w8a8 quantization (#2778) 6lazijiamo 2025-09-11 16:40:51 +08:00
  • 2b9269b581 [Perf][V1] Fully overlap model execution (#2783) jiangpeng 2025-09-11 16:35:36 +08:00
  • 923cdaeba3 fix ascend fused moe spelling error (#2863) zhaozx-cn 2025-09-11 14:35:46 +08:00
  • b9a0a75c78 fix qwen torchair attention PrefillCacheHit (#2787) zhaozx-cn 2025-09-11 14:26:59 +08:00
  • 7b2ecc1e9a [Feat] Unquantized linear nz support (#2619) anon189Ty 2025-09-11 11:40:00 +08:00
  • 5691104249 LLMdatadist connector adapt the distributed KV aggregation (#2718) liziyu 2025-09-11 11:37:41 +08:00
  • c2fdd4b8bc [CI/UT] Fix UTs on register customop and warm up model (#2862) Mengqing Cao 2025-09-11 11:30:16 +08:00
  • b7df04de9b debug_aclgraph_sizes_capture (#2827) lilinsiman 2025-09-10 22:50:48 +08:00
  • e75b568011 [CI] Update pre_commit runner (#2850) zhangxinyuehfad 2025-09-10 20:23:25 +08:00
  • b7ee3fdad3 [Code clean] Remove the unnecessary code (#2815) Jiawei Li 2025-09-10 17:19:39 +08:00
  • 88d7af62be [main] adjust the position of warm_up_atb (#2823) huangxialu 2025-09-10 14:06:38 +08:00
  • 22b425765a [Bugfix] Fix broken CI (#2825) Li Wang 2025-09-10 13:29:29 +08:00
  • aa4d2a91ed Refactor AscendMultiHeadLatentAttention (#2826) Icey 2025-09-10 11:26:11 +08:00
  • 168ad600b5 [main] add pd transfer for ascend scheduler (#2753) CaranLic 2025-09-10 08:46:39 +08:00
  • edf1f600ad [CI] Remove compatibility maintenance for vllm v0.10.1 and v0.10.1.1 (#2840) Mengqing Cao 2025-09-10 08:43:10 +08:00
  • 93e28e6862 add weight transpose check. (#2756) sherie 2025-09-09 20:33:43 +08:00
  • e13c4ddb42 [Fix] Fix SharedFusedMoE (#2817) yiz-liu 2025-09-09 18:19:56 +08:00
  • 7a205dbaa8 [main] Optimize rope in Qwen Models (#2571) rjg-lyh 2025-09-09 14:28:14 +08:00
  • 5bcb4c1528 [CI] Reduce CI time (#2801) wangxiyuan 2025-09-09 10:52:14 +08:00
  • 1bbb20ea13 [main] flashcomm_v1 optim in Qwen Dense Models (#2802) rjg-lyh 2025-09-08 22:52:24 +08:00
  • 4df8df5b94 [bugfix] fix deepseek rope sincoscache re-generation (#2744) zzzzwwjj 2025-09-08 22:03:34 +08:00
  • 7d6d9449a8 [Misc] Move lora patch file into lora module (#2797) wangxiyuan 2025-09-08 21:42:12 +08:00
  • 85d989a3b9 [Misc] Remove pangu model file (#2798) wangxiyuan 2025-09-08 21:30:37 +08:00
  • a041d4f328 [main] [refactor] refactor common_fused_moe.py (#2706) weichen 2025-09-08 20:09:50 +08:00
  • 1a82b16355 Remove unused code in fused_moe.py (#2805) machenglong2025 2025-09-08 20:05:19 +08:00
  • d51694a77b [2/N][Refactor][Quantization] clean quantization patch (#2785) 22dimensions 2025-09-08 17:31:53 +08:00
  • cd88f89267 Bump actions/github-script from 7 to 8 (#2803) dependabot[bot] 2025-09-08 14:53:26 +08:00
  • d3c3538ddc [Bugfix]fix bug when graph_size is not divisible by tp_size (#2719) realliujiaxu 2025-09-08 14:52:33 +08:00
  • dd087effcc Refector prepare_inputs in model_runner_v1.py (#2750) TaoYu Chen 2025-09-08 10:45:23 +08:00
  • c735bb0941 [Fix] Ensure metadata sync across DP ranks in eager mode (#2766) yiz-liu 2025-09-08 09:55:16 +08:00
  • 2693196ef8 add gatherep select. (#2740) sherie 2025-09-08 09:15:50 +08:00
  • 6666e5265d Added support for KV connector v1 (#2039) Marco Barletta 2025-09-08 03:04:22 +02:00
  • 2967e5e22a [Benchmark] Correctly kill vllm process in performance benchamrk (#2782) Li Wang 2025-09-07 10:36:34 +08:00
  • a746f8274f [DOC] Qwen3 PD disaggregation user guide (#2751) yupeng 2025-09-07 10:35:37 +08:00
  • b2f77d3aa8 [fix] prefill unsupport sliding window attention (#2758) yeyifan 2025-09-07 10:34:38 +08:00
  • 752e272a55 Add note for Ascend HDK version (#2765) Yikun Jiang 2025-09-07 10:33:41 +08:00
  • 5a7181569c [feat]: oproj tensor parallelism in pure DP and graph-mode scenarios. (#2167) lidenghui1110 2025-09-07 10:31:32 +08:00
  • a58b43b72c Remove git .extraheader and fecth all commtis in /vllm-workspace/vllm-ascend (#2746) Yikun Jiang 2025-09-05 09:45:11 +08:00
  • 51a2aec115 Delete redundant codes related to communication (#2717) henryxuxu0716 2025-09-05 09:39:39 +08:00
  • 5b3646ab21 [FEATURE][MTP] Support MTP > 1 (#2708) 1092626063 2025-09-05 09:11:22 +08:00
  • 83eb40a51c [Fix][MoE] Refine MoE communication strategy (#2734) yiz-liu 2025-09-05 09:04:04 +08:00
  • 4c90fa79ca [Misc] Remove useless PD check in deepseek (#2739) liziyu 2025-09-04 22:22:19 +08:00
  • 3a2a7d88db [Doc] Update accuracy reports for v0.10.1rc1 (#2755) vllm-ascend-ci 2025-09-04 22:17:17 +08:00
  • f86596a66c allgather use fusedop. (#2689) sherie 2025-09-04 11:56:29 +08:00
  • 7d47d8f4f6 [Fix] fix resources limit error when apply speculative decoding and aclgraph (#2472) 无脸男 2025-09-04 11:50:43 +08:00
  • 0c0789be74 [Feat] allow using aclgraph in ray backend (#2589) 无脸男 2025-09-04 11:45:56 +08:00
  • aff5189c87 [main] Fuse GroupedMatmul, Swiglu and DynamicQuant in W8A8_DYNAMIC quantized MoE layers (#2275) Ruri 2025-09-04 11:37:32 +08:00
  • 37f5a29cd4 [1/N][Refactor][Quantization] remove redundant quantizer class (#2680) 22dimensions 2025-09-04 11:35:14 +08:00
  • d4370ebc42 [Refactor] Refactor Spec Decode (#2668) Icey 2025-09-04 11:34:47 +08:00
  • 7e16b4a7cd [ReleaseNote] Add Release Note for v0.10.1rc1 (#2635) Mengqing Cao 2025-09-04 11:26:47 +08:00
  • e7409e95ee [1/N][Draft][Refactor]torchair pangu_moe modeling refactor (#2437) Angazenn 2025-09-04 10:39:21 +08:00
  • a58013440a [BugFix][MLA] Fix attn_mask bug for ring mla (#2704) whx 2025-09-04 10:22:46 +08:00
  • e11a1bbfc1 [Doc] Update news (#2736) wangxiyuan 2025-09-04 10:10:24 +08:00
  • 984bd7c13a [Bugfix][APC] Fix accuracy issue on prefix caching with AscendScheduler (#2714) Mengqing Cao 2025-09-04 08:22:46 +08:00
  • df88a2ecc8 [P/D]mooncake_connector adapted to 0.10.1 (#2664) baxingpiaochong 2025-09-04 08:22:10 +08:00
  • 07d44ade19 bugfix: fix initialization error for mooncake in k8s (#2541) zhiyuanzhang 2025-09-03 22:25:08 +08:00
  • 41b028aa5f [Doc] add v0.9.1 release note (#2646) wangxiyuan 2025-09-03 18:04:27 +08:00
  • 90a75a90a9 [bugfix] fix torchair runtime error caused by configuration mismtaches and file missing (#2532) linfeng-yuan 2025-09-03 17:56:12 +08:00
  • 5889fa1b1c [bugfix] ascend schedule encountered an incorrect req block length in the check_watermark_for_prefill function (#2508) liziyu 2025-09-03 16:54:39 +08:00
  • 59d23c39eb [DP] External dp server starter (#2685) whx 2025-09-03 16:30:26 +08:00
  • c03321781a [CI] skip unstable UT (#2716) wangxiyuan 2025-09-03 15:53:50 +08:00
  • 3584306387 [Bugfix] Fix qwen2.5-vl-without-padding (#2623) Li Wang 2025-09-03 14:38:55 +08:00
  • bece793be6 [CI] Disable per-PR triggering for A3 (#2710) Li Wang 2025-09-03 11:52:34 +08:00
  • eaeb2efb20 [Main][Feat]Set the Profiler parameters through environment variables consistent with vLLM (#2608) zhanghw0354 2025-09-03 10:58:08 +08:00
  • 93754d8061 [Bugfix] Fix long context seq accuracy problem for GLM4.5 (#2601) Shanshan Shen 2025-09-03 09:18:44 +08:00
  • b84465c525 [Perf]Enable npu_moe_gating_top_k_softmax on quantized scenarios (#2633) Angazenn 2025-09-03 09:14:17 +08:00
  • 24d4dad7b2 [CI] Enable MTP torchair e2e test (#2705) wangxiyuan 2025-09-03 08:57:43 +08:00
  • af62af3cc5 [Image] Upgrade openEuler to 24.03 (#2631) Icey 2025-09-02 20:09:09 +08:00
  • 0829b4873f [CI] recover e2e test (#2688) wangxiyuan 2025-09-02 18:49:17 +08:00
  • f023bd52bf [CI] Make test_platform UT stable (#2696) wangxiyuan 2025-09-02 18:34:04 +08:00
  • c1e607b7b7 [Misc] Clean up uesless code in rotary_embedding (#2663) wangxiyuan 2025-09-02 17:25:33 +08:00
  • 253b01b9a5 [7/N][refactor]fix torchair rope ops (#2683) Wang Yixuan 2025-09-02 17:21:56 +08:00
  • 9f1e054fe3 [Bugfix][LoRA][Operator] Fix LoRA custom operators accuracy issue (#2672) yupeng 2025-09-02 11:46:59 +08:00
  • 214b32a346 [V1][BUGFIX][0.10.1] FIX mtp on main branch (#2632) xuyexiong 2025-09-02 11:12:41 +08:00
  • fef18b60bc Refactor e2e CI (#2276) wangxiyuan 2025-09-02 09:02:22 +08:00
  • 0df059f41a [CI] Fix CI Break: upstream adds routed_scaling_factor in forward_oot interface (#2675) leo-pony 2025-09-01 19:02:50 +08:00
  • ea53f9076e support torchair mode (#2641) panchao-hub 2025-09-01 15:49:07 +08:00
  • b72e34013f Add ut for mla (#2637) LeeWenquan 2025-09-01 14:07:57 +08:00
  • ad13964c71 [6/N][refactor]delete torchair in rotary ops (#2581) Wang Yixuan 2025-09-01 09:10:15 +08:00
  • c2c97f3079 [5/N][refactor]add torchair rotary ops (#2559) Wang Yixuan 2025-09-01 09:09:21 +08:00
  • 3a5fc5ee01 [Refactor][MoE] remove redundant code after refactoring fused_moe (#2612) weichen 2025-08-30 22:28:50 +08:00
  • 20ae71291d [torchair]remove aicpu op (#2640) panchao-hub 2025-08-30 15:51:12 +08:00
  • 7215454de6 bugfix for torchair graph (#2639) panchao-hub 2025-08-30 15:49:48 +08:00
  • 6f1047d5fd [CI] fix UT error. (#2644) weijinqian0 2025-08-30 12:04:01 +08:00
  • d3c93fba5c [3/N][Feat][Graph] Support all-to-all and quantized models with ACL Graph (#2614) yiz-liu 2025-08-30 11:00:35 +08:00
  • 91c35d765a [Bugfix] Fix mc2 operator error in aclgraph + ep<16 scenario (#2609) Mengqing Cao 2025-08-29 21:59:16 +08:00
  • ee6d141dd4 [MAIN][BUGFIX] BugFix: Resolve the issue of waiting queue accumulation when requests are canceled. (#2426) wangxiaoteng666 2025-08-29 17:19:23 +08:00
  • 52aff9e229 [main] [bugfix] Fix misjudging quantized/unquantized scenarios (#2627) weichen 2025-08-29 16:20:22 +08:00
  • aadc75c247 [Fix] Resolve data-parallel (DP) assertion errors in TorchAir (#2626) yiz-liu 2025-08-29 16:06:49 +08:00
  • 600b08f754 [Feat]: Add custom lmhead tensor model parallel (#2309) lidenghui1110 2025-08-29 11:41:21 +08:00