Commit Graph

  • 366d2d95e8 [P/D] Add readme for PD separation (#4182) wangxiaoteng888 2025-11-28 15:17:59 +08:00
  • e52ebf8674 [MM][Model][Perf] Remove Qwen2.5-VL modeling files and add patch for VisionAttention (#4349) Shanshan Shen 2025-11-28 14:23:00 +08:00
  • bdc66972db [Quantization] Support compressed tensors w8a8 static and w8a8 dynamic weight (#4036) LHXuuu 2025-11-28 14:09:39 +08:00
  • ab37a7d5ae [main]Upgrade cann to 8.3rc2 (#4350) SILONG ZENG 2025-11-28 14:06:01 +08:00
  • 755b635844 [TEST] Add eagle proposer ut (#4447) Zhu Yi Lin 2025-11-27 21:59:31 +08:00
  • 9fdabb7b60 [feature] Add Custom Op grouped_matmul_swiglu_quant (#4431) Slightwind 2025-11-27 21:56:18 +08:00
  • 89a1a65300 [bugfix] fix ray start failed: local_world_size cannot little than visible device count error (#4457) Nengjun Ma 2025-11-27 21:18:32 +08:00
  • 1cae3e4a49 [BugFix] Adapted Qwen3-Next eager mode to v0.11.2 (#4477) drslark 2025-11-27 17:44:59 +08:00
  • b220de33e8 [CI][Nightly] Support local debugging for multi-node CI test cases (#4489) Li Wang 2025-11-27 17:20:29 +08:00
  • 1fd56b1106 chip type judgement code optimization (#4485) zzzzwwjj 2025-11-27 17:18:49 +08:00
  • 84d7f5a10d [UT] Fix ut test (#4472) zhangxinyuehfad 2025-11-26 21:37:47 +08:00
  • d252e36ae8 Change comment location (#4432) herizhen 2025-11-26 16:13:31 +08:00
  • 136ea9ff56 [refact] unified soc_version code (#4359) zzzzwwjj 2025-11-26 14:28:55 +08:00
  • a91e76cd84 [CI] clean up ci (#4452) wangxiyuan 2025-11-26 14:07:56 +08:00
  • bc69d7cfe1 upgrade to vllm 0.11.2 (#4400) wangxiyuan 2025-11-26 11:48:58 +08:00
  • d5f77f14d0 mkdir triton package and move triton files (#4420) shiyuan680 2025-11-26 11:06:12 +08:00
  • 96c362361e [0.11.0][TEST] Delete Comment (#4428) Zhu Yi Lin 2025-11-25 21:39:36 +08:00
  • 1b137d6b1b [TEST] Delete Comment (#4427) Zhu Yi Lin 2025-11-25 21:39:04 +08:00
  • 98031653df [misc] Remove useless patch_logits (#4252) wangxiyuan 2025-11-25 21:25:54 +08:00
  • a686f2962a [0.11.0][Bugfix] fix e2e full test (#4424) zhangxinyuehfad 2025-11-25 21:21:42 +08:00
  • cdaf7f4a51 [MM][Bugfix] Minor fix for VL model verification (#4385) Shanshan Shen 2025-11-25 20:36:32 +08:00
  • 4864909648 [MM][Bugfix] Minor fix for VL model verification (#4384) Shanshan Shen 2025-11-25 20:36:16 +08:00
  • 463910e686 [Bugfix] use module-level import for patched function in Qwen3Next (#4354) Zhijun Chen 2025-11-25 20:15:43 +08:00
  • 941d54a2ce [bugfix]Return the Transformer version from 4.57.2 to 4.57.1 (#4423) SILONG ZENG 2025-11-25 15:32:24 +08:00
  • 31a2c09e79 [Bugfix] fix patch typo (#4351) 欧派果奶我还要 2025-11-25 15:13:06 +08:00
  • e945e91933 Document error correction (#4422) herizhen 2025-11-25 14:21:13 +08:00
  • 06f6cc1c81 [Bugfix]Fix the hang issue of multimodal model when running with DP>1 (#4392) wujinyuan1 2025-11-25 09:33:49 +08:00
  • 386a85eccc [Bugfix]Fix the hang issue of multimodal model when running with DP>1 (#4393) wujinyuan1 2025-11-25 09:32:22 +08:00
  • a3164ac372 [v0.11.0][Bugfix][MoE] enable force_load_balance in aclgraph (#4367) weichen 2025-11-25 09:16:57 +08:00
  • 84eae97f27 Bump actions/checkout from 4 to 6 (#4380) dependabot[bot] 2025-11-25 09:05:11 +08:00
  • 00ea61ec88 [feature] vllm-ascend support msprobe (eager mode dump) (#4241) Tjh-UKN 2025-11-24 21:58:31 +08:00
  • 5b1a7514eb [Bugfix][MoE] enable force_load_balance in aclgraph (#4366) weichen 2025-11-24 20:33:56 +08:00
  • ae068a3342 [Refactor] remove moe type of multicast. (#4224) weijinqian0 2025-11-24 17:32:37 +08:00
  • 75452abe1e [Doc][v11.0-dev][cherry-pick]Add single node PD disaggregation instructions (#4370) mazhixin000 2025-11-24 17:23:11 +08:00
  • 5508a602ed [Fix] fix aclgraph e2e test. (#4131) XiaoxinWang 2025-11-24 17:22:03 +08:00
  • a1f142b7ad Drop 0.11.0 support (#4377) wangxiyuan 2025-11-24 17:08:20 +08:00
  • 41ddb06554 [TEST]Update deepseek mtpx acc cases standard (#4321) jiangyunfan1 2025-11-24 16:43:29 +08:00
  • 97999347c8 [Fix] Remove unnecessary NPU synchronization in MTP proposer (#4325) Yizhou 2025-11-24 14:07:10 +08:00
  • 8c87a3b053 Change the first letter to uppercase (#4375) herizhen 2025-11-24 12:18:24 +08:00
  • b5f7a83927 [Doc] Upgrade multi-node doc (#4365) Li Wang 2025-11-24 10:57:50 +08:00
  • b34f195cc8 [CI] Fix nightly CI for A2 series (#3825) Li Wang 2025-11-23 23:05:33 +08:00
  • ab51fcea4c [Doc]Add single node PD disaggregation instructions (#4337) mazhixin000 2025-11-22 23:33:07 +08:00
  • ea3372fb0c [Bugfix][KV Pool]fix get_ip import in mooncake_store (#4355) pz1116 2025-11-22 18:52:48 +08:00
  • 9b3a484b46 [BugFix] Fix some issues caused by the ascending order of cudagraph_capture_sizes (#4338) Angazenn 2025-11-22 17:33:12 +08:00
  • fff258bce1 [Doc] add release note for v0.11.0rc2 (#4348) wangxiyuan 2025-11-21 23:03:32 +08:00
  • a2e4c3fe78 Revert "[cherry-pick][refactor]support gatingtopk operator generalization (#4050)" (#4352) wangxiyuan 2025-11-21 23:03:20 +08:00
  • 5ad0ccdc31 [v0.11.0]Upgrade cann to 8.3.rc2 (#4332) SILONG ZENG 2025-11-21 22:48:57 +08:00
  • 0f9025cceb [EPLB] Eplb Verify Fix (#4334) LI SHENGYONG 2025-11-21 18:18:15 +08:00
  • 3955bf2908 [EPLB] Eplb Verify Fix (#4333) LI SHENGYONG 2025-11-21 18:17:46 +08:00
  • 3deeea14a0 [bugfix] bugfix for PD disaggregate (#4319) wangxiaochao 2025-11-21 18:08:56 +08:00
  • 97ffb9120f [CI] Defaultly compile vllm with multimodal audio feature in dockerfile (#4324) (#4341) Ting FU 2025-11-21 17:53:00 +08:00
  • 218bc70f6f [CI] Remove redundant workflows (#4335) Li Wang 2025-11-21 16:48:35 +08:00
  • e332e27ec3 [Test] Add ut test for torchair (#4287) CodeCat 2025-11-21 16:33:34 +08:00
  • a5554b6661 [Feat][Doc] Add a load_balance_dp_proxy in examples and external dp doc. (#4265) whx 2025-11-21 16:33:23 +08:00
  • 6c157cb75a [CI] Defaultly compile vllm with multimodal audio feature in dockerfile (#4324) Ting FU 2025-11-21 16:15:31 +08:00
  • 70f076331f [MM][Bugfix] Add error log for VL models when enabling FLASHCOMM (#4222) Shanshan Shen 2025-11-21 15:04:35 +08:00
  • 8e3b834bf7 [MM][Bugfix] Add error log for VL models when enabling FLASHCOMM (#4272) Shanshan Shen 2025-11-21 15:04:18 +08:00
  • c94b38c82e [Readme] EPLB Support Scenarios (#4315) LI SHENGYONG 2025-11-21 14:25:39 +08:00
  • 4573c855b7 [Readme] EPLB Support Scenarios (#4314) LI SHENGYONG 2025-11-21 14:24:54 +08:00
  • 019c7ded91 eplb redundant expert bugfix (#4291) LI SHENGYONG 2025-11-21 14:24:35 +08:00
  • 9c6d0b422c [v0.11.0-dev][misc]change default capture size for Qwen3-MoE when using full dp (#4205) Angazenn 2025-11-21 11:19:11 +08:00
  • 5a4e8cdeba [Feat][BugFix]Support the Qwen3-Next-80B-A3B-Instruct quantization model&Fix the NZ issue (#4245) InSec 2025-11-21 10:42:56 +08:00
  • cbb27feaf2 [Test] Add ACL graph capture/replay DP test (#4259) Yizhou 2025-11-21 08:50:46 +08:00
  • d96d5fa971 [Test] quick fix mla ut (#4318) Zhu Yi Lin 2025-11-20 23:06:12 +08:00
  • 5c9f4a40c6 [Feat] Support MTP to running in full graph mode (#3892) anon189Ty 2025-11-20 20:34:54 +08:00
  • 15c1eb025c [CI] Add mla ut (#4280) Zhu Yi Lin 2025-11-20 20:29:09 +08:00
  • 470fe05df6 [Test] Add tests for the multi-node DeepSeek-V2-Lite network in GE Graph (#4039) CodeCat 2025-11-20 17:28:32 +08:00
  • b6d59bdea2 cherry pick from pr 4270 (#4285) shaopeng-666 2025-11-19 22:32:02 +08:00
  • 3653f33878 avoid mrope fusion op when running qwen2.5-vl on a+x machine (#4270) shaopeng-666 2025-11-19 22:31:14 +08:00
  • c848da0687 [Bugfix] fix nightly multi-node EPLB tests' "DYNAMIC_EPLB=true" environment not working (#4223) 欧派果奶我还要 2025-11-19 21:31:58 +08:00
  • 277670730c [Bugfix][Aclgraph] failed to update graph task (#4282) MengLong Chen 2025-11-19 21:30:48 +08:00
  • a3e9673137 [long seq feat]GQA support long-prefill-token-threshold and fixbug (#4209) Delphine-Nic 2025-11-19 18:10:27 +08:00
  • 97daf7f78c [misc] clean up get_metadata_cls (#4276) wangxiyuan 2025-11-19 17:18:19 +08:00
  • d5fef22149 [Docs] Improve the AISBench multi-modal testing docs (#4255) Canlin Guo 2025-11-19 16:00:39 +08:00
  • d43022f3ed [doc]fix readme for kv pool user guide (#4271) pz1116 2025-11-19 15:57:50 +08:00
  • 2938bd5ad2 remove get_metadata_cls (#4087) wangxiyuan 2025-11-19 14:58:17 +08:00
  • 1cdf9ffa73 [Bugfix] fix hang in async scheduling (#4233) realliujiaxu 2025-11-19 14:47:19 +08:00
  • 91b6ba8ffe [CI] Fix kubernetes failed to resolve ip by dns name (#4240) Li Wang 2025-11-19 14:38:13 +08:00
  • df777e9faa [bugfix] pcp + mtp acl graph bugfix (#4221) zhangsicheng5 2025-11-19 11:21:46 +08:00
  • c87a77e8b4 [cherry-pick][refactor]support gatingtopk operator generalization (#4050) 1092626063 2025-11-19 10:39:28 +08:00
  • 9328f377b4 [refactor]support gatingtopk operator generalization (#2958) 1092626063 2025-11-19 10:38:56 +08:00
  • 63561d6763 [Fix] Sorts aclgraph batch sizes in ascending order (#4230) Yizhou 2025-11-19 09:36:37 +08:00
  • ddf3e75800 [Cherry-pick] [0.11.0] pd proxy support ipv6 and fix proxy (#4242) liziyu 2025-11-18 16:33:00 +08:00
  • e98543267a [bugfix] fix proxy hen host ip using domain name (#4243) liziyu 2025-11-18 16:30:51 +08:00
  • a30261f779 [P/D] pd proxy support ipv6 (#4161) liziyu 2025-11-18 11:01:13 +08:00
  • 0d04ad8c8f [feature] Mooncake_connector support pcp/dcp (#4183) wangxiaochao 2025-11-18 10:17:48 +08:00
  • 10a046ddce [main][misc]change default capture size for Qwen3-MoE when using full dp (#4199) Angazenn 2025-11-18 08:41:45 +08:00
  • da1cd9c7ca [Bugfix]Fix moe error when sp chunked the hidden_states (#4212) weiguihua2 2025-11-17 22:55:17 +08:00
  • 3677202594 make vllm-ascend work well in developer mode (#4179) Ronald 2025-11-17 19:13:04 +08:00
  • 9a1cfb48d4 [TEST]Update prefixcache perf threshold for qwen3-32b-int8 (#4220) jiangyunfan1 2025-11-17 19:06:54 +08:00
  • 378e92a2a2 [Cherry-pick][0.11.0] Adapted to torch_npu.npu_fused_infer_attention_score (#4202) Icey 2025-11-17 10:56:23 +08:00
  • e38ef2c434 support FULL graph mode for GQA (#3970) XiaoxinWang 2025-11-17 10:50:35 +08:00
  • c334114f69 [CI] Fix no space left in build wheel CI. (#4215) zhangyiming 2025-11-17 10:45:58 +08:00
  • 67f2b3a031 [Test] Add deepseek v3.2 exp nightly test (#4191) zhangxinyuehfad 2025-11-14 15:46:10 +08:00
  • 1d0f13c1a3 [Misc] Add benchmark results into .gitignore (#4200) Shanshan Shen 2025-11-14 15:44:28 +08:00
  • a7eb42cf0a [v0.11.0-dev][Bugfix][cherry-pick]bugfix for weight load of kimi-k2 (#4190) zhangyiming 2025-11-14 15:43:22 +08:00
  • f10251ede0 [Platform] Add import_kernels interface (#3694) Canlin Guo 2025-11-14 11:32:51 +08:00
  • 094f32c8c9 [Feat] Adds a utility for printing from within ACL graphs (#4162) Yizhou 2025-11-14 09:41:14 +08:00
  • 01195e860c [Bugfix] fix cannot import name get_mp_context (#4174) weiguihua2 2025-11-14 09:09:14 +08:00
  • f90ed95578 [CI] Add multi-nodes EPLB configs of DeepSeek-R1-W8A8 & Qwen3-235B-W8A8 (#4144) 欧派果奶我还要 2025-11-14 08:50:29 +08:00