Commit Graph

  • 7bec1a9b9c qwen3_moe/qwen25 support torchair graph (#2403) Nicholas Tao 2025-08-20 11:23:50 +08:00
  • 31ae249742 [misc] remove uesless envs (#2448) wangxiyuan 2025-08-20 10:50:21 +08:00
  • 3a384492e1 [CI] add lint block before running e2e (#2447) Mengqing Cao 2025-08-20 09:53:23 +08:00
  • 1327f9be1c Fix some ci issue and refactor modelrunner (#2445) Mengqing Cao 2025-08-20 09:01:04 +08:00
  • 955411611c Nominate Mengqing Cao as vllm-ascend maintainer (#2433) Jade Zheng 2025-08-19 14:13:54 +08:00
  • d91c6daf89 [improve] Remove redundant parentheses in pangu_moe.py (#2081) xleoken 2025-08-19 11:00:18 +08:00
  • 6335fe39ea Nominate ApsarasX as vllm-ascend maintainer (#2419) wangxiyuan 2025-08-19 10:44:35 +08:00
  • 83e0f41408 [3/N][Refactor] Move torchair_attention to torchair dir (#2017) Shanshan Shen 2025-08-19 10:25:22 +08:00
  • 2a763b8326 [Bug] Fix bug in test_chunked.py (#1992) xleoken 2025-08-19 10:23:47 +08:00
  • 27d038dc66 fix doc typo (#2407) G.O.D 2025-08-19 09:10:01 +08:00
  • 3f4a358b14 [Bugfix] Fix custom op register issue (#2409) Pleaplusone 2025-08-19 09:09:43 +08:00
  • 3648d18e67 Add Custom Kernels For LoRA Performance (#2325) liuchenbing 2025-08-19 09:09:11 +08:00
  • 8fb50a4248 Bump actions/checkout from 4 to 5 (#2420) dependabot[bot] 2025-08-19 08:54:56 +08:00
  • 9e7c168d99 Add ModelRunner_prepare_inputs doc (#1493) TaoYu Chen 2025-08-18 15:41:24 +08:00
  • 3fc31ee1cb [1/N][refactor] torchair deepseek modeling refactor (#2384) linfeng-yuan 2025-08-18 15:00:37 +08:00
  • 19fdc9a3f0 [Bugfix] Fix header include issue in rope (#2397) Pleaplusone 2025-08-18 14:33:38 +08:00
  • 03ca2b26ca [P/D] Mooncake Connector for v1 distributed (#1568) Chao Lei 2025-08-18 14:30:07 +08:00
  • 2bb7e55022 [Bugfix][PD]fix non-working disaggregated prefill (#2374) CaveNightingale 2025-08-15 16:59:52 +08:00
  • 1b40665548 [Misc] remove unused file (cache.py) (#2377) 22dimensions 2025-08-15 10:27:43 +08:00
  • 61866b8ac6 [Quickfix] update CachedRequestState as NewRequestData changed (#2367) Mengqing Cao 2025-08-15 07:35:27 +08:00
  • 2ad7e1251e [Doc] Fix quant documentation to make it reproducible (#2277) Li Wang 2025-08-14 17:19:47 +08:00
  • c721ae6042 [CustomOp] Register RMSNorm instead of overwrite forward_oot (#2284) Icey 2025-08-14 17:18:30 +08:00
  • e14f2ef669 refactor select_experts of moe module (#2150) shiyuan680 2025-08-14 11:50:53 +08:00
  • 103654ccd6 [Misc] Remove redundant imported envs, using envs_ascend instead (#2193) Shanshan Shen 2025-08-14 09:33:39 +08:00
  • 55d0790597 [2/N][Refactor] Refactor V1 attention for better extensibility (#1995) Shanshan Shen 2025-08-14 09:32:41 +08:00
  • 8914d5a4b2 [Quickfix] Add the missing apply_router_weight_on_input in FusedMoE init (#2348) Mengqing Cao 2025-08-14 09:17:50 +08:00
  • 0f7492d18e [Bugfix] fix the oom when chunkprefill with long context like 64k (#2319) zhenghaojiang 2025-08-13 17:15:59 +08:00
  • 8bfd16a145 [Doc] Add container image save/load FAQ for offline environments (#2347) jack 2025-08-13 16:00:43 +08:00
  • 992271b027 [1/N][Feat] Support MoE models with ACL Graph and refactor MoE communication logic (#2125) yiz-liu 2025-08-12 21:10:20 +08:00
  • 1a70564e7c [5/N][Refactor] torchair model runner refactor (#2216) wangxiyuan 2025-08-12 14:24:50 +08:00
  • 49ec6c98b7 [Doc] Update faq (#2334) Mengqing Cao 2025-08-12 14:12:53 +08:00
  • dc585f148a [main][prefill optimization] Optimize parallel strategies to reduce communication overhead (#2198) Wang Kunpeng 2025-08-12 14:12:12 +08:00
  • 81817908ca ut: add ci guard for ut coverage (#2317) Ronald1995 2025-08-12 08:05:01 +08:00
  • 9c6d108330 Configure Gemini (#2298) jack 2025-08-11 22:21:29 +08:00
  • c8b0f5f799 [4/N][Refactor] torchair model runner refactor (#2208) wangxiyuan 2025-08-11 21:39:24 +08:00
  • eb43a475f4 [Feat] chunkprefill mla support torchair graph (#1772) zhenghaojiang 2025-08-11 19:58:59 +08:00
  • 881e36d6a9 [3/N][Refactor] torchair model runner refactor (#2207) wangxiyuan 2025-08-11 18:03:19 +08:00
  • 29aaba5f84 [Perf][MTP] Optimize reject sampler in greedy situation. (#2137) whx 2025-08-11 17:37:49 +08:00
  • ca274001b0 Bump actions/download-artifact from 4 to 5 (#2311) dependabot[bot] 2025-08-11 16:02:12 +08:00
  • c0f0b70813 [core] Support capture custom ops into aclgraph (#2113) Pleaplusone 2025-08-11 15:59:42 +08:00
  • 1ab15414bb [2/N][Refactor] torchair model runner refactor (#2204) wangxiyuan 2025-08-11 14:06:49 +08:00
  • 9260910c8d [CI] Fix broken CI (#2302) wangxiyuan 2025-08-11 11:22:32 +08:00
  • ee6f79c44a Add ut for test_communicator.py (#2293) yangqinghao-cmss 2025-08-09 08:26:04 +08:00
  • 3e65c406b8 Fix accuracy test create PR (#2274) Icey 2025-08-08 14:12:11 +08:00
  • 0bd5ff5299 Fix accuracy test config and add DeepSeek-V2-Lite test (#2261) Icey 2025-08-08 11:09:16 +08:00
  • ad1083761f [CI][Quickfix] Fix AscendFusedMoE init error (#2268) Mengqing Cao 2025-08-08 10:20:23 +08:00
  • dceef080b1 [main] remove torch.cat and replace it by List[0] (#2153) huangxialu 2025-08-07 17:20:19 +08:00
  • b2598c3271 enable mm allreduce test (#2192) Ronald1995 2025-08-07 17:19:23 +08:00
  • 4604882a3e [ReleaseNote] Release note of v0.10.0rc1 (#2225) Mengqing Cao 2025-08-07 14:46:49 +08:00
  • 58c8d4fdcd Remove transformer pins for v0.9.1-dev (#2234) Yikun Jiang 2025-08-07 14:41:10 +08:00
  • 92eebc0c9b [Doc] Update user guide for suported models (#2263) zhangxinyuehfad 2025-08-07 14:39:51 +08:00
  • 440d28a138 [Tutorial] Add qwen3 8b w4a8 tutorial (#2249) 22dimensions 2025-08-07 14:39:38 +08:00
  • bcd0b532f5 [Doc] Update user guide for using lm-eval (#1325) zhangxinyuehfad 2025-08-07 14:15:49 +08:00
  • dbba3cabb0 [Doc] Update tutorials for single_npu_audio and single_npu_multimodal (#2252) zhangxinyuehfad 2025-08-07 14:08:14 +08:00
  • 205eff2b12 [Bugfix] Disable check vllm init temporary (#2250) Li Wang 2025-08-07 10:37:22 +08:00
  • c611291661 【main】SP For Qwen3 MoE (#2209) lbk-sys 2025-08-07 09:15:49 +08:00
  • 57b9f02185 [Bugfix] Fix disaggregated pd error (#2242) Li Wang 2025-08-06 19:48:10 +08:00
  • 26fc36b0e0 [V1] MTP supports torchair (#2145) xuyexiong 2025-08-06 19:37:43 +08:00
  • bf84f2dbfa [Doc] Support kimi-k2-w8a8 (#2162) Li Wang 2025-08-06 19:28:47 +08:00
  • 875a86cbe9 ut: add example and e2e test for sleepmode in external_launcher (#2152) huangxialu 2025-08-06 11:11:53 +08:00
  • 8a59367d0c [main][Feature] Support deepseek w4a8 quantization (#2172) Wang Kunpeng 2025-08-06 10:17:44 +08:00
  • e31b31f9c3 [main][Bugfix] Fix unable to load qwen3_moe quantized weights (#2219) Ruri 2025-08-06 09:08:36 +08:00
  • 54ace9e12b Add release note for v0.9.1rc2 (#2188) Yikun Jiang 2025-08-06 09:04:46 +08:00
  • 126cdfc92b [Test] add rejection sampler ut (#2084) sherie 2025-08-05 19:03:36 +08:00
  • f3b50c54e8 [main][Prefill Perf] Optimize Quantized MoE Performance by Reducing All2All Communication (#2195) Slightwind 2025-08-05 18:47:13 +08:00
  • 292fb8f696 [1/N][Refactor] torchair model runner refactor (#2205) wangxiyuan 2025-08-05 18:43:04 +08:00
  • 458ab2db12 [BugFix] Fix the bug that qwen3 moe doesn't work with aclgraph (#2183) wangxiyuan 2025-08-05 17:42:52 +08:00
  • 583ad8f347 [main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. (#2062) jinyuxin 2025-08-05 17:03:36 +08:00
  • 27c2b5c145 [Doc] Update pytorch version in README_zh doc (#2202) xleoken 2025-08-05 11:13:49 +08:00
  • 807f0895b2 Bump torch version to 2.7.1 (#1562) leo-pony 2025-08-05 08:43:24 +08:00
  • 36e450eb0f [Misc] Nit fix for disaggregated_prefill and ascend_forward_context (#2097) wangxiyuan 2025-08-05 08:39:02 +08:00
  • ad366bf908 [Bugfix] Follow vLLM Qwen-Moe/VL and KV Connector change to fix broken CI (#2181) Li Wang 2025-08-04 21:37:50 +08:00
  • e38fab011d [Doc][PD] Restore the default configuration items in examples/disaggregate_prefill_v1/README.md (#2165) hucong 2025-08-04 20:30:53 +08:00
  • 957c7f108d [Bugfix][PD] Make multiple Ps and Ds work on a single machine (#2080) CaveNightingale 2025-08-04 17:22:18 +08:00
  • a9480d5f0a [Fix] Adjust use_aclgraph logic (#2156) yiz-liu 2025-08-04 15:23:20 +08:00
  • 688350a3bb [bugfixed] fix the bug when run the inference of quantized ds-w8a8-mtp (#2134) liu 2025-08-04 15:16:42 +08:00
  • 4b3a210c33 Implementation of simple load balance routing proxy server (#1953) (#2124) Pleaplusone 2025-08-04 10:35:53 +08:00
  • af04ee9e7a [MoE][Dist] Fix Qwen MoE accuracy bug in DP scenario (#1856) Mengqing Cao 2025-08-04 10:24:18 +08:00
  • f939381c6f [Bugfix] Adopt the new changes on disaggregated pd from vllm main branch (#2122) Pleaplusone 2025-08-04 10:08:58 +08:00
  • ddaded1537 Add ut for envs.py (#2131) YuanCheng-coder 2025-08-02 16:53:44 +08:00
  • bea3d5bbb4 [Bug] Fix run bug in run_dp_server.sh (#2139) xleoken 2025-08-02 16:52:12 +08:00
  • 47f688a2f0 Change retrieving remote files to local retrieval. (#2141) yangqinghao-cmss 2025-08-02 16:51:22 +08:00
  • e48f32ec59 [CI] Update image for 310p ci (#2155) zhangxinyuehfad 2025-08-02 16:46:02 +08:00
  • e467fe1b77 Add qwen-vl model and sampling feature UT for 310I series (#2168) leo-pony 2025-08-02 11:26:12 +08:00
  • 6e00aed4d5 [main][Feature]Moe alltoallv communication optimization for unquantized RL training sence (#2088) weijinqian0 2025-08-02 09:49:10 +08:00
  • f0c1f0c828 [Doc] Add qwen vl example in tutorials for 310I series (#2160) leo-pony 2025-08-02 08:58:56 +08:00
  • 8cf97d8310 [Misc] Add extra checking to torchair_graph_config. (#1939) 22dimensions 2025-08-01 09:24:11 +08:00
  • 2284289880 [MISC] Cherry pick #1291 from v0.9.1-dev (#1825) Li Wang 2025-08-01 09:08:45 +08:00
  • 9e65da990e [Misc] Add warning for incompatible Ray backend with ACL Graph mode (#2132) 22dimensions 2025-08-01 09:06:09 +08:00
  • 99fa0ac882 [BugFix] update the kv transfer config (#2121) yangqinghao-cmss 2025-08-01 08:56:55 +08:00
  • 968e6791d3 [Misc] Add data preprocess functions to qwen2.5_vl_without_padding (#2148) Li Wang 2025-08-01 08:54:02 +08:00
  • e3b3ffb875 [Misc] Disable quantization in mindie_turbo (#2147) Li Wang 2025-08-01 08:53:00 +08:00
  • c62f346f5d Fixed 310p failure when using the sampler feature (#2151) leo-pony 2025-08-01 08:43:08 +08:00
  • 86bdde1ca8 Enable pytest and yaml style accuracy test (#2073) Icey 2025-07-31 21:39:13 +08:00
  • 9c9a7cd90b [main] adapt usage of npu_moe_gating_top_k_softmax and remove envs.SELECT_GATING_TOPK_SOTFMAX_EXPERTS (#2112) huangxialu 2025-07-31 21:05:56 +08:00
  • e8660d7978 ut:add ut for qwen2_5_vl (#2143) Ronald1995 2025-07-31 20:46:17 +08:00
  • cb0a303080 ut:add e2e test for external launcher (#2091) Ronald1995 2025-07-31 20:37:42 +08:00
  • 4c8842da65 [BugFix] Fix a bug of running chunked-prefill with torchair. (#1378) (#1844) Mengqing Cao 2025-07-31 20:08:45 +08:00
  • db310c6ec9 add ut for device allocator/camem and mutistream/layers (#2037) daniel 2025-07-31 19:17:27 +08:00
  • 2008152c48 [main][bugfix]Fix vLLM startup failure when inferring DeepSeek R1 model in DP scenario (#2020) zhanghw0354 2025-07-31 15:30:28 +08:00