Commit Graph

  • e7ad4a64f4 [CI] Add e2e ci test for A3 (#2573) zhangxinyuehfad 2025-08-29 09:33:42 +08:00
  • dfc7eb39ad [Fix] Fix DP-related padding logic (#2582) yiz-liu 2025-08-28 19:39:58 +08:00
  • 175f6bc445 Support v0.10.1 (#2584) Yikun Jiang 2025-08-28 18:47:53 +08:00
  • 6c973361fc [Bugfix] Fix aclgraph not enabled by default (#2590) Mengqing Cao 2025-08-28 14:08:31 +08:00
  • cf96366a39 [Bugfix][LoRA][Patch] Fix the LoRA inference bug after upstream vLLM codebase changed (#2560) yupeng 2025-08-28 10:40:51 +08:00
  • 1191a64ae5 [Feat]attention add sliding windows size (#2528) yeyifan 2025-08-28 10:37:19 +08:00
  • c8d1df3a3f [Refactor][WIP] Refactor mla_v1 by moving all MLA preprocessing ops into mla_v1 attention impl (#2465) LeeWenquan 2025-08-28 10:35:57 +08:00
  • 320edde2df [main] [refactor] refactor fused_moe.py to enable token_dispatchers (#2570) weichen 2025-08-28 10:13:35 +08:00
  • 936c102105 [bugfix][refactor]fix torchair_w8a8 (#2569) Wang Yixuan 2025-08-28 09:10:31 +08:00
  • a955e5d404 [4/N][refactor]delete torchair from quantization (#2535) Wang Yixuan 2025-08-28 09:10:03 +08:00
  • c578f817ca [CustomOp] Register VocabParallelEmbedding instead of overwrite forward (#2515) Icey 2025-08-28 08:57:34 +08:00
  • 516e14ae6a [Doc] Upgrade to multi-node tutorial model to deepseek-v3.1-w8a8 (#2553) Li Wang 2025-08-27 14:16:44 +08:00
  • 2bfbf9b9b3 [main][bugfix] Fix bugs and refactor cached mask generation logic (#2442) rjg-lyh 2025-08-27 12:07:29 +08:00
  • 6881c19458 [main] convert the format of gmm to nz (#2474) huangxialu 2025-08-27 11:25:02 +08:00
  • c0e12143a3 [CI] Fix UT failure (#2563) wangxiyuan 2025-08-27 11:24:35 +08:00
  • 20a7bc4b71 [3/N][refactor] refactoer quantization (#2504) Wang Yixuan 2025-08-27 10:45:50 +08:00
  • acdc53c2f6 [Bugfix] Fix the bug of cos invalid shape when dp (#2558) weiguihua2 2025-08-27 10:36:23 +08:00
  • a9e78a3299 [Aclgraph] Update compilation config in check_and_update_config (#2540) Mengqing Cao 2025-08-27 09:30:25 +08:00
  • f22077daa6 [Embedding] Recover embedding function (#2483) wangxiyuan 2025-08-27 09:22:01 +08:00
  • 6a4ec186e7 [Qwen-moe] Remove the minor operation arange (#2373) s30076806 2025-08-27 09:13:31 +08:00
  • 358ba68994 [main][bugfix] Fix MatmulNZ format bug on some machines (#2549) rjg-lyh 2025-08-27 09:08:17 +08:00
  • 042605f4b2 [Doc] Add stable modelslim branch (#2545) Li Wang 2025-08-27 09:05:46 +08:00
  • 8151a9d5a4 [Test]Add unit test for worker_v1.py (#2547) zhanghw0354 2025-08-26 22:00:49 +08:00
  • a6bb502e70 [2/N][Feat] Add MC2 communication method for MoE layers (#2469) yiz-liu 2025-08-26 19:05:23 +08:00
  • 5d8ec28009 [2/N][refactor] split torchair from fused_moe (#2503) Wang Yixuan 2025-08-26 14:12:43 +08:00
  • cfe77e83ae [Bugfix]Support Qwen3-MOE on aclgraph mode in sizes capture and add new ut (#2511) lilinsiman 2025-08-26 12:39:21 +08:00
  • b3fdd78a6b [Main][Refactor]Change ASCEND_QUATIZATION_METHOD to ASCEND_QUANTIZATION_METHOD (#2517) zhanghw0354 2025-08-26 09:06:16 +08:00
  • 21b5727f9a [CI] Upgrade vllm in accuracy and performance CI (#2527) Mengqing Cao 2025-08-26 08:49:49 +08:00
  • 7e494e94a9 [CI] Fix broken ci (#2530) wangxiyuan 2025-08-26 07:42:24 +08:00
  • 99bf25af76 [Fix] Add operations in _dummy_run to maintain synchronization with _process_reqs, resolving a service hang (#2454) yiz-liu 2025-08-25 19:56:02 +08:00
  • de7649492d [Refactor] cleanup converting_weight_acl_format_format (#2482) wangxiyuan 2025-08-25 19:48:55 +08:00
  • 0f81e032f0 [1/N][refactor] torchair fused_moe refactor (#2438) Wang Yixuan 2025-08-25 15:46:10 +08:00
  • 334c44613a [Doc] Update release version info (#2518) Shanshan Shen 2025-08-25 15:39:10 +08:00
  • 98c68220c1 [Doc] Update v0.9.1rc3 doc (#2512) Shanshan Shen 2025-08-25 11:39:29 +08:00
  • 4c4ffeebe5 [Doc] update vllm version in ci (#2513) Mengqing Cao 2025-08-25 11:35:37 +08:00
  • 0767d51dd5 [Structured Output][CI] Add test for outlines backend for structured output in CI (#2283) Shanshan Shen 2025-08-25 09:59:13 +08:00
  • 891b2bfe71 Accuracy report formatting (#2279) Icey 2025-08-25 09:39:30 +08:00
  • f796e6280b [CustomOp] Register RotaryEmbedding instead of overwrite forward (#2385) Icey 2025-08-25 09:32:35 +08:00
  • 950c4b219a [main] refactor alltoallv in fused_moe (#2487) weichen 2025-08-23 20:38:17 +08:00
  • 4af5b80606 [Scheduler] validate max_num_batched_tokens and max_model_len in AscendSchedulerConfig (#2434) linfeng-yuan 2025-08-23 19:39:44 +08:00
  • 3629bc4431 feat: add mtp ut and fix some bugs (#2453) ZhaoJiangJiang 2025-08-22 17:09:08 +08:00
  • dd04a96ee3 [Bugfix] Fix the bug of incorrect precision (#2479) weiguihua2 2025-08-22 17:08:56 +08:00
  • f0be3eed84 [Doc] Add release note for v0.9.1rc3 (#2488) Shanshan Shen 2025-08-22 16:06:29 +08:00
  • 60ac4fb576 [QuickFix] Skip failed ut to recover CI quickly (#2484) Mengqing Cao 2025-08-22 14:14:51 +08:00
  • e9fb895b10 [Doc] Add feature branch long_seq_optimization (#2477) LookAround0301 2025-08-22 08:53:12 +08:00
  • b0403f8d8a [CI] fix ci (#2464) Mengqing Cao 2025-08-22 07:30:48 +08:00
  • 0ca3f48c90 [2/N][refactor] torchair deepseek mla backend refactor (#2459) linfeng-yuan 2025-08-21 14:02:30 +08:00
  • 67a222c383 [Doc] Add feature branch policy (#2432) Yikun Jiang 2025-08-21 10:37:21 +08:00
  • 3fb80ee356 add mlp tp optimze (#2120) sherie 2025-08-21 09:22:07 +08:00
  • 973a7cfdf0 [DOC] update doc: LoRA with ACLGraph (#2430) yupeng 2025-08-21 08:55:55 +08:00
  • 0dca4c6dbd refact runner model v1 (#2461) weiguihua2 2025-08-21 08:54:57 +08:00
  • 1de16ead8e [main][bugfix] Modify the default value of the enable_shared_pert_dp to false (#2457) Wang Kunpeng 2025-08-20 20:25:53 +08:00
  • c40d4171bc [main][quantization] Adapt to the new format of ds w4a8 weight (#2392) Wang Kunpeng 2025-08-20 20:25:18 +08:00
  • eccfb715f6 [CI] Fix UT (#2452) wangxiyuan 2025-08-20 16:26:07 +08:00
  • 3f867ee708 refactor allgather/mc2-related fused_experts (#2369) sherie 2025-08-20 14:20:46 +08:00
  • 73acdcfc3b [PD] Correct the ip and port env (#2450) wangxiyuan 2025-08-20 11:39:05 +08:00
  • 7bec1a9b9c qwen3_moe/qwen25 support torchair graph (#2403) Nicholas Tao 2025-08-20 11:23:50 +08:00
  • 31ae249742 [misc] remove uesless envs (#2448) wangxiyuan 2025-08-20 10:50:21 +08:00
  • 3a384492e1 [CI] add lint block before running e2e (#2447) Mengqing Cao 2025-08-20 09:53:23 +08:00
  • 1327f9be1c Fix some ci issue and refactor modelrunner (#2445) Mengqing Cao 2025-08-20 09:01:04 +08:00
  • 955411611c Nominate Mengqing Cao as vllm-ascend maintainer (#2433) Jade Zheng 2025-08-19 14:13:54 +08:00
  • d91c6daf89 [improve] Remove redundant parentheses in pangu_moe.py (#2081) xleoken 2025-08-19 11:00:18 +08:00
  • 6335fe39ea Nominate ApsarasX as vllm-ascend maintainer (#2419) wangxiyuan 2025-08-19 10:44:35 +08:00
  • 83e0f41408 [3/N][Refactor] Move torchair_attention to torchair dir (#2017) Shanshan Shen 2025-08-19 10:25:22 +08:00
  • 2a763b8326 [Bug] Fix bug in test_chunked.py (#1992) xleoken 2025-08-19 10:23:47 +08:00
  • 27d038dc66 fix doc typo (#2407) G.O.D 2025-08-19 09:10:01 +08:00
  • 3f4a358b14 [Bugfix] Fix custom op register issue (#2409) Pleaplusone 2025-08-19 09:09:43 +08:00
  • 3648d18e67 Add Custom Kernels For LoRA Performance (#2325) liuchenbing 2025-08-19 09:09:11 +08:00
  • 8fb50a4248 Bump actions/checkout from 4 to 5 (#2420) dependabot[bot] 2025-08-19 08:54:56 +08:00
  • 9e7c168d99 Add ModelRunner_prepare_inputs doc (#1493) TaoYu Chen 2025-08-18 15:41:24 +08:00
  • 3fc31ee1cb [1/N][refactor] torchair deepseek modeling refactor (#2384) linfeng-yuan 2025-08-18 15:00:37 +08:00
  • 19fdc9a3f0 [Bugfix] Fix header include issue in rope (#2397) Pleaplusone 2025-08-18 14:33:38 +08:00
  • 03ca2b26ca [P/D] Mooncake Connector for v1 distributed (#1568) Chao Lei 2025-08-18 14:30:07 +08:00
  • 2bb7e55022 [Bugfix][PD]fix non-working disaggregated prefill (#2374) CaveNightingale 2025-08-15 16:59:52 +08:00
  • 1b40665548 [Misc] remove unused file (cache.py) (#2377) 22dimensions 2025-08-15 10:27:43 +08:00
  • 61866b8ac6 [Quickfix] update CachedRequestState as NewRequestData changed (#2367) Mengqing Cao 2025-08-15 07:35:27 +08:00
  • 2ad7e1251e [Doc] Fix quant documentation to make it reproducible (#2277) Li Wang 2025-08-14 17:19:47 +08:00
  • c721ae6042 [CustomOp] Register RMSNorm instead of overwrite forward_oot (#2284) Icey 2025-08-14 17:18:30 +08:00
  • e14f2ef669 refactor select_experts of moe module (#2150) shiyuan680 2025-08-14 11:50:53 +08:00
  • 103654ccd6 [Misc] Remove redundant imported envs, using envs_ascend instead (#2193) Shanshan Shen 2025-08-14 09:33:39 +08:00
  • 55d0790597 [2/N][Refactor] Refactor V1 attention for better extensibility (#1995) Shanshan Shen 2025-08-14 09:32:41 +08:00
  • 8914d5a4b2 [Quickfix] Add the missing apply_router_weight_on_input in FusedMoE init (#2348) Mengqing Cao 2025-08-14 09:17:50 +08:00
  • 0f7492d18e [Bugfix] fix the oom when chunkprefill with long context like 64k (#2319) zhenghaojiang 2025-08-13 17:15:59 +08:00
  • 8bfd16a145 [Doc] Add container image save/load FAQ for offline environments (#2347) jack 2025-08-13 16:00:43 +08:00
  • 992271b027 [1/N][Feat] Support MoE models with ACL Graph and refactor MoE communication logic (#2125) yiz-liu 2025-08-12 21:10:20 +08:00
  • 1a70564e7c [5/N][Refactor] torchair model runner refactor (#2216) wangxiyuan 2025-08-12 14:24:50 +08:00
  • 49ec6c98b7 [Doc] Update faq (#2334) Mengqing Cao 2025-08-12 14:12:53 +08:00
  • dc585f148a [main][prefill optimization] Optimize parallel strategies to reduce communication overhead (#2198) Wang Kunpeng 2025-08-12 14:12:12 +08:00
  • 81817908ca ut: add ci guard for ut coverage (#2317) Ronald1995 2025-08-12 08:05:01 +08:00
  • 9c6d108330 Configure Gemini (#2298) jack 2025-08-11 22:21:29 +08:00
  • c8b0f5f799 [4/N][Refactor] torchair model runner refactor (#2208) wangxiyuan 2025-08-11 21:39:24 +08:00
  • eb43a475f4 [Feat] chunkprefill mla support torchair graph (#1772) zhenghaojiang 2025-08-11 19:58:59 +08:00
  • 881e36d6a9 [3/N][Refactor] torchair model runner refactor (#2207) wangxiyuan 2025-08-11 18:03:19 +08:00
  • 29aaba5f84 [Perf][MTP] Optimize reject sampler in greedy situation. (#2137) whx 2025-08-11 17:37:49 +08:00
  • ca274001b0 Bump actions/download-artifact from 4 to 5 (#2311) dependabot[bot] 2025-08-11 16:02:12 +08:00
  • c0f0b70813 [core] Support capture custom ops into aclgraph (#2113) Pleaplusone 2025-08-11 15:59:42 +08:00
  • 1ab15414bb [2/N][Refactor] torchair model runner refactor (#2204) wangxiyuan 2025-08-11 14:06:49 +08:00
  • 9260910c8d [CI] Fix broken CI (#2302) wangxiyuan 2025-08-11 11:22:32 +08:00
  • ee6f79c44a Add ut for test_communicator.py (#2293) yangqinghao-cmss 2025-08-09 08:26:04 +08:00
  • 3e65c406b8 Fix accuracy test create PR (#2274) Icey 2025-08-08 14:12:11 +08:00