Commit Graph

  • aff5189c87 [main] Fuse GroupedMatmul, Swiglu and DynamicQuant in W8A8_DYNAMIC quantized MoE layers (#2275) Ruri 2025-09-04 11:37:32 +08:00
  • 37f5a29cd4 [1/N][Refactor][Quantization] remove redundant quantizer class (#2680) 22dimensions 2025-09-04 11:35:14 +08:00
  • d4370ebc42 [Refactor] Refactor Spec Decode (#2668) Icey 2025-09-04 11:34:47 +08:00
  • 7e16b4a7cd [ReleaseNote] Add Release Note for v0.10.1rc1 (#2635) Mengqing Cao 2025-09-04 11:26:47 +08:00
  • e7409e95ee [1/N][Draft][Refactor]torchair pangu_moe modeling refactor (#2437) Angazenn 2025-09-04 10:39:21 +08:00
  • a58013440a [BugFix][MLA] Fix attn_mask bug for ring mla (#2704) whx 2025-09-04 10:22:46 +08:00
  • e11a1bbfc1 [Doc] Update news (#2736) wangxiyuan 2025-09-04 10:10:24 +08:00
  • 984bd7c13a [Bugfix][APC] Fix accuracy issue on prefix caching with AscendScheduler (#2714) Mengqing Cao 2025-09-04 08:22:46 +08:00
  • df88a2ecc8 [P/D]mooncake_connector adapted to 0.10.1 (#2664) baxingpiaochong 2025-09-04 08:22:10 +08:00
  • 07d44ade19 bugfix: fix initialization error for mooncake in k8s (#2541) zhiyuanzhang 2025-09-03 22:25:08 +08:00
  • 41b028aa5f [Doc] add v0.9.1 release note (#2646) wangxiyuan 2025-09-03 18:04:27 +08:00
  • 90a75a90a9 [bugfix] fix torchair runtime error caused by configuration mismtaches and file missing (#2532) linfeng-yuan 2025-09-03 17:56:12 +08:00
  • 5889fa1b1c [bugfix] ascend schedule encountered an incorrect req block length in the check_watermark_for_prefill function (#2508) liziyu 2025-09-03 16:54:39 +08:00
  • 59d23c39eb [DP] External dp server starter (#2685) whx 2025-09-03 16:30:26 +08:00
  • c03321781a [CI] skip unstable UT (#2716) wangxiyuan 2025-09-03 15:53:50 +08:00
  • 3584306387 [Bugfix] Fix qwen2.5-vl-without-padding (#2623) Li Wang 2025-09-03 14:38:55 +08:00
  • bece793be6 [CI] Disable per-PR triggering for A3 (#2710) Li Wang 2025-09-03 11:52:34 +08:00
  • eaeb2efb20 [Main][Feat]Set the Profiler parameters through environment variables consistent with vLLM (#2608) zhanghw0354 2025-09-03 10:58:08 +08:00
  • 93754d8061 [Bugfix] Fix long context seq accuracy problem for GLM4.5 (#2601) Shanshan Shen 2025-09-03 09:18:44 +08:00
  • b84465c525 [Perf]Enable npu_moe_gating_top_k_softmax on quantized scenarios (#2633) Angazenn 2025-09-03 09:14:17 +08:00
  • 24d4dad7b2 [CI] Enable MTP torchair e2e test (#2705) wangxiyuan 2025-09-03 08:57:43 +08:00
  • af62af3cc5 [Image] Upgrade openEuler to 24.03 (#2631) Icey 2025-09-02 20:09:09 +08:00
  • 0829b4873f [CI] recover e2e test (#2688) wangxiyuan 2025-09-02 18:49:17 +08:00
  • f023bd52bf [CI] Make test_platform UT stable (#2696) wangxiyuan 2025-09-02 18:34:04 +08:00
  • c1e607b7b7 [Misc] Clean up uesless code in rotary_embedding (#2663) wangxiyuan 2025-09-02 17:25:33 +08:00
  • 253b01b9a5 [7/N][refactor]fix torchair rope ops (#2683) Wang Yixuan 2025-09-02 17:21:56 +08:00
  • 9f1e054fe3 [Bugfix][LoRA][Operator] Fix LoRA custom operators accuracy issue (#2672) yupeng 2025-09-02 11:46:59 +08:00
  • 214b32a346 [V1][BUGFIX][0.10.1] FIX mtp on main branch (#2632) xuyexiong 2025-09-02 11:12:41 +08:00
  • fef18b60bc Refactor e2e CI (#2276) wangxiyuan 2025-09-02 09:02:22 +08:00
  • 0df059f41a [CI] Fix CI Break: upstream adds routed_scaling_factor in forward_oot interface (#2675) leo-pony 2025-09-01 19:02:50 +08:00
  • ea53f9076e support torchair mode (#2641) panchao-hub 2025-09-01 15:49:07 +08:00
  • b72e34013f Add ut for mla (#2637) LeeWenquan 2025-09-01 14:07:57 +08:00
  • ad13964c71 [6/N][refactor]delete torchair in rotary ops (#2581) Wang Yixuan 2025-09-01 09:10:15 +08:00
  • c2c97f3079 [5/N][refactor]add torchair rotary ops (#2559) Wang Yixuan 2025-09-01 09:09:21 +08:00
  • 3a5fc5ee01 [Refactor][MoE] remove redundant code after refactoring fused_moe (#2612) weichen 2025-08-30 22:28:50 +08:00
  • 20ae71291d [torchair]remove aicpu op (#2640) panchao-hub 2025-08-30 15:51:12 +08:00
  • 7215454de6 bugfix for torchair graph (#2639) panchao-hub 2025-08-30 15:49:48 +08:00
  • 6f1047d5fd [CI] fix UT error. (#2644) weijinqian0 2025-08-30 12:04:01 +08:00
  • d3c93fba5c [3/N][Feat][Graph] Support all-to-all and quantized models with ACL Graph (#2614) yiz-liu 2025-08-30 11:00:35 +08:00
  • 91c35d765a [Bugfix] Fix mc2 operator error in aclgraph + ep<16 scenario (#2609) Mengqing Cao 2025-08-29 21:59:16 +08:00
  • ee6d141dd4 [MAIN][BUGFIX] BugFix: Resolve the issue of waiting queue accumulation when requests are canceled. (#2426) wangxiaoteng666 2025-08-29 17:19:23 +08:00
  • 52aff9e229 [main] [bugfix] Fix misjudging quantized/unquantized scenarios (#2627) weichen 2025-08-29 16:20:22 +08:00
  • aadc75c247 [Fix] Resolve data-parallel (DP) assertion errors in TorchAir (#2626) yiz-liu 2025-08-29 16:06:49 +08:00
  • 600b08f754 [Feat]: Add custom lmhead tensor model parallel (#2309) lidenghui1110 2025-08-29 11:41:21 +08:00
  • e7ad4a64f4 [CI] Add e2e ci test for A3 (#2573) zhangxinyuehfad 2025-08-29 09:33:42 +08:00
  • dfc7eb39ad [Fix] Fix DP-related padding logic (#2582) yiz-liu 2025-08-28 19:39:58 +08:00
  • 175f6bc445 Support v0.10.1 (#2584) Yikun Jiang 2025-08-28 18:47:53 +08:00
  • 6c973361fc [Bugfix] Fix aclgraph not enabled by default (#2590) Mengqing Cao 2025-08-28 14:08:31 +08:00
  • cf96366a39 [Bugfix][LoRA][Patch] Fix the LoRA inference bug after upstream vLLM codebase changed (#2560) yupeng 2025-08-28 10:40:51 +08:00
  • 1191a64ae5 [Feat]attention add sliding windows size (#2528) yeyifan 2025-08-28 10:37:19 +08:00
  • c8d1df3a3f [Refactor][WIP] Refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl (#2465) LeeWenquan 2025-08-28 10:35:57 +08:00
  • 320edde2df [main] [refactor] refactor fused_moe.py to enable token_dispatchers (#2570) weichen 2025-08-28 10:13:35 +08:00
  • 936c102105 [bugfix][refactor]fix torchair_w8a8 (#2569) Wang Yixuan 2025-08-28 09:10:31 +08:00
  • a955e5d404 [4/N][refactor]delete torchair from quantization (#2535) Wang Yixuan 2025-08-28 09:10:03 +08:00
  • c578f817ca [CustomOp] Register VocabParallelEmbedding instead of overwrite forward (#2515) Icey 2025-08-28 08:57:34 +08:00
  • 516e14ae6a [Doc] Upgrade to multi-node tutorial model to deepseek-v3.1-w8a8 (#2553) Li Wang 2025-08-27 14:16:44 +08:00
  • 2bfbf9b9b3 [main][bugfix] Fix bugs and refactor cached mask generation logic (#2442) rjg-lyh 2025-08-27 12:07:29 +08:00
  • 6881c19458 [main] convert the format of gmm to nz (#2474) huangxialu 2025-08-27 11:25:02 +08:00
  • c0e12143a3 [CI] Fix UT failure (#2563) wangxiyuan 2025-08-27 11:24:35 +08:00
  • 20a7bc4b71 [3/N][refactor] refactoer quantization (#2504) Wang Yixuan 2025-08-27 10:45:50 +08:00
  • acdc53c2f6 [Bugfix] Fix the bug of cos invalid shape when dp (#2558) weiguihua2 2025-08-27 10:36:23 +08:00
  • a9e78a3299 [Aclgraph] Update compilation config in check_and_update_config (#2540) Mengqing Cao 2025-08-27 09:30:25 +08:00
  • f22077daa6 [Embedding] Recover embedding function (#2483) wangxiyuan 2025-08-27 09:22:01 +08:00
  • 6a4ec186e7 [Qwen-moe] Remove the minor operation arange (#2373) s30076806 2025-08-27 09:13:31 +08:00
  • 358ba68994 [main][bugfix] Fix MatmulNZ format bug on some machines (#2549) rjg-lyh 2025-08-27 09:08:17 +08:00
  • 042605f4b2 [Doc] Add stable modelslim branch (#2545) Li Wang 2025-08-27 09:05:46 +08:00
  • 8151a9d5a4 [Test]Add unit test for worker_v1.py (#2547) zhanghw0354 2025-08-26 22:00:49 +08:00
  • a6bb502e70 [2/N][Feat] Add MC2 communication method for MoE layers (#2469) yiz-liu 2025-08-26 19:05:23 +08:00
  • 5d8ec28009 [2/N][refactor] split torchair from fused_moe (#2503) Wang Yixuan 2025-08-26 14:12:43 +08:00
  • cfe77e83ae [Bugfix]Support Qwen3-MOE on aclgraph mode in sizes capture and add new ut (#2511) lilinsiman 2025-08-26 12:39:21 +08:00
  • b3fdd78a6b [Main][Refactor]Change ASCEND_QUATIZATION_METHOD to ASCEND_QUANTIZATION_METHOD (#2517) zhanghw0354 2025-08-26 09:06:16 +08:00
  • 21b5727f9a [CI] Upgrade vllm in accuracy and performance CI (#2527) Mengqing Cao 2025-08-26 08:49:49 +08:00
  • 7e494e94a9 [CI] Fix broken ci (#2530) wangxiyuan 2025-08-26 07:42:24 +08:00
  • 99bf25af76 [Fix] Add operations in _dummy_run to maintain synchronization with _process_reqs, resolving a service hang (#2454) yiz-liu 2025-08-25 19:56:02 +08:00
  • de7649492d [Refactor] cleanup converting_weight_acl_format_format (#2482) wangxiyuan 2025-08-25 19:48:55 +08:00
  • 0f81e032f0 [1/N][refactor] torchair fused_moe refactor (#2438) Wang Yixuan 2025-08-25 15:46:10 +08:00
  • 334c44613a [Doc] Update release version info (#2518) Shanshan Shen 2025-08-25 15:39:10 +08:00
  • 98c68220c1 [Doc] Update v0.9.1rc3 doc (#2512) Shanshan Shen 2025-08-25 11:39:29 +08:00
  • 4c4ffeebe5 [Doc] update vllm version in ci (#2513) Mengqing Cao 2025-08-25 11:35:37 +08:00
  • 0767d51dd5 [Structured Output][CI] Add test for outlines backend for structured output in CI (#2283) Shanshan Shen 2025-08-25 09:59:13 +08:00
  • 891b2bfe71 Accuracy report formatting (#2279) Icey 2025-08-25 09:39:30 +08:00
  • f796e6280b [CustomOp] Register RotaryEmbedding instead of overwrite forward (#2385) Icey 2025-08-25 09:32:35 +08:00
  • 950c4b219a [main] refactor alltoallv in fused_moe (#2487) weichen 2025-08-23 20:38:17 +08:00
  • 4af5b80606 [Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig (#2434) linfeng-yuan 2025-08-23 19:39:44 +08:00
  • 3629bc4431 feat: add mtp ut and fix some bugs (#2453) ZhaoJiangJiang 2025-08-22 17:09:08 +08:00
  • dd04a96ee3 [Bugfix] Fix the bug of incorrect precision (#2479) weiguihua2 2025-08-22 17:08:56 +08:00
  • f0be3eed84 [Doc] Add release note for v0.9.1rc3 (#2488) Shanshan Shen 2025-08-22 16:06:29 +08:00
  • 60ac4fb576 [QuickFix] Skip failed ut to recover CI quickly (#2484) Mengqing Cao 2025-08-22 14:14:51 +08:00
  • e9fb895b10 [Doc] Add feature branch long_seq_optimization (#2477) LookAround0301 2025-08-22 08:53:12 +08:00
  • b0403f8d8a [CI] fix ci (#2464) Mengqing Cao 2025-08-22 07:30:48 +08:00
  • 0ca3f48c90 [2/N][refactor] torchair deepseek mla backend refactor (#2459) linfeng-yuan 2025-08-21 14:02:30 +08:00
  • 67a222c383 [Doc] Add feature branch policy (#2432) Yikun Jiang 2025-08-21 10:37:21 +08:00
  • 3fb80ee356 add mlp tp optimze (#2120) sherie 2025-08-21 09:22:07 +08:00
  • 973a7cfdf0 [DOC] update doc: LoRA with ACLGraph (#2430) yupeng 2025-08-21 08:55:55 +08:00
  • 0dca4c6dbd refact runner model v1 (#2461) weiguihua2 2025-08-21 08:54:57 +08:00
  • 1de16ead8e [main][bugfix] Modify the default value of the enable_shared_pert_dp to false (#2457) Wang Kunpeng 2025-08-20 20:25:53 +08:00
  • c40d4171bc [main][quantization] Adapt to the new format of ds w4a8 weight (#2392) Wang Kunpeng 2025-08-20 20:25:18 +08:00
  • eccfb715f6 [CI] Fix UT (#2452) wangxiyuan 2025-08-20 16:26:07 +08:00
  • 3f867ee708 refactor allgather/mc2-related fused_experts (#2369) sherie 2025-08-20 14:20:46 +08:00
  • 73acdcfc3b [PD] Correct the ip and port env (#2450) wangxiyuan 2025-08-20 11:39:05 +08:00