Commit Graph

  • 07c7131104 [Fix] Delete redundant variable (#4903) Wang Yixuan 2025-12-11 17:50:25 +08:00
  • e1bb6f47ec [doc] Add Qwen2.5 tutorials (#4636) yangxiaoman8 2025-12-11 17:30:05 +08:00
  • 332b547728 [Bugfix] support mtp kv transfer and pp partition by hand in kv transfer (#4892) lidenghui1110 2025-12-11 17:23:21 +08:00
  • a47aa4da2f [feat] apply flashcomm1 on bailing (#4868) hwhaokun 2025-12-11 17:02:21 +08:00
  • 2f965d8339 [Bugfix] Fix the bug in sfa-cp under multi-DP scenarios. (#4850) zzhxxx 2025-12-11 16:44:14 +08:00
  • 5ebb9bd8d2 【Bugfix】bugfix_for_bmm_transpose (#4899) ChrisGelhLan 2025-12-11 16:32:28 +08:00
  • 78bf211539 [OPS] support triton causal_conv1d_fn ops (#4119) QilaiZhang 2025-12-11 15:52:39 +08:00
  • eac72f5f23 [Feat] Flashcomm2 use o_shared linear (#4188) zzhxxx 2025-12-11 12:43:04 +08:00
  • bb76f7962c cleanup useless torchair logic (#4856) wangxiyuan 2025-12-11 11:21:13 +08:00
  • c12eb22cbe [feat] mlapo add bf16 no_quant support (#4852) chenjunyi 2025-12-11 11:06:56 +08:00
  • c95c271538 [E2E] Optimize nightly testcase. (#4886) zhangyiming 2025-12-11 10:15:39 +08:00
  • 66b0781840 [E2E] Refactor the e2e testcases. (#4789) zhangyiming 2025-12-11 10:15:00 +08:00
  • 11bebb518c [E2E] Remove unused PD-disaggreate scripts in E2E test. (#4837) zhangyiming 2025-12-11 09:23:38 +08:00
  • 0eefbe75b6 [Doc] Add local running multi-node nightly test case guide (#4884) Nengjun Ma 2025-12-11 08:56:27 +08:00
  • ff7d703192 [Doc]Add tutorial document for qwen-VL-Dense (#3516) SILONG ZENG 2025-12-11 08:55:23 +08:00
  • 89a8607b30 add DeepSeek-R1 tutorial. (#4666) Leaf 2025-12-11 08:52:27 +08:00
  • f917d5edcf Remove useless env (#4858) wangxiyuan 2025-12-11 06:51:07 +08:00
  • 08441baedd Remove VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION (#4860) wangxiyuan 2025-12-10 23:50:18 +08:00
  • 37db0844f5 Remove COMPILE_CUSTOM_KERNELS env (#4864) wangxiyuan 2025-12-10 23:48:03 +08:00
  • 3362be7f86 Update patch doc (#4869) wangxiyuan 2025-12-10 23:27:45 +08:00
  • 0fb1dc43a1 [BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill (#4770) drslark 2025-12-10 22:54:24 +08:00
  • 490ddf536f [perf][dsv3.2][async_scheduling] improve dsv3.2 performance by eliminating HD synchronization (#4805) linfeng-yuan 2025-12-10 22:31:47 +08:00
  • dd622aa6a6 [Feature] Support npuhraph_ex backend (#4700) ChenCangtao 2025-12-10 20:48:05 +08:00
  • d7db6791e7 [Bugfix] Support for mlapo in deepseekv3.1 w4a8 (#4828) Zhu Yi Lin 2025-12-10 20:45:07 +08:00
  • 8bb028424b Fixed the performance degradation issue in post-processing in speculative decoding scenarios. (#4849) FuNanyang 2025-12-10 20:32:44 +08:00
  • 5b179c53f1 [FEAT] Support DeepSeek-V3.2 with FULL_DECODE_ONLY mode (#4706) Yizhou 2025-12-10 20:11:09 +08:00
  • 0d8c0f1a24 [Bugfix] Fix out-of-bounds access to token_id due to uninitialized logprobs (#4248) JiangWeixiang 2025-12-10 17:45:58 +08:00
  • bd8be2e759 [Kernel] Add moe normal ops (#4810) shiro-zzzz 2025-12-10 17:15:28 +08:00
  • c77dca54b2 [CI] fix lint (#4888) wangxiyuan 2025-12-10 16:57:24 +08:00
  • 1a443f2772 add multi_npu_qwen3_dense tutorials (#4543) wind-all 2025-12-10 16:09:56 +08:00
  • a82b0fa70e mooncake connector support pipeline parallel & fix pp with flashcomm1 (#4054) lidenghui1110 2025-12-10 16:01:43 +08:00
  • ce5872705e [Feat] Support native Kimi-K2-Thinking native W4A16 quantized experts weights (#4516) Ruri 2025-12-10 15:58:52 +08:00
  • c1db298f43 [CI] Use offline mode for modelscope (#4875) Li Wang 2025-12-10 15:49:34 +08:00
  • ceadc2788d Revert "[refactor]support gatingtopk operator generalization (#4356)" (#4873) 1092626063 2025-12-10 15:45:20 +08:00
  • 7132ae8532 [CI]Cleanup accurary test (#4861) SILONG ZENG 2025-12-10 14:13:56 +08:00
  • e32014ac1d [Model] Support pooling models (#3122) lianyibo 2025-12-10 11:37:57 +08:00
  • 1a7a34c5ec add e2e test for mtp async_scheduling (#4826) Ronald 2025-12-10 11:30:22 +08:00
  • 134e011896 [Test] Temporarily skips Qwen3-30B-A3B-W8A8 data parallel test case (#4857) Yizhou 2025-12-10 11:05:32 +08:00
  • 89733111fa [Nightly] Optimize nightly online test logger info (#4798) Li Wang 2025-12-10 09:24:19 +08:00
  • 835b4c8f1d Drop torchair (#4814) wangxiyuan 2025-12-10 09:20:40 +08:00
  • ba9cda9dfd [Kernel] add custom op MatmulAllreduceAddRmsnorm (#4606) Trunrain 2025-12-10 09:05:33 +08:00
  • f404c9af7f [bugfix] fix quant method validation bug (#4831) zzzzwwjj 2025-12-09 23:42:01 +08:00
  • 863a5a5a17 Add gsm8k accuracy test for multi-note Qwen3-235B-A22B (#4802) Nengjun Ma 2025-12-09 23:05:41 +08:00
  • a77045f355 [P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780) wangxiaoteng888 2025-12-09 22:36:43 +08:00
  • 848419d1ba [Bugfix] Disable the dispatch_ffn_combine kernel in MTP path (#4751) Chen Chen 2025-12-09 22:14:05 +08:00
  • cd1c69ee0b [Fix] Add extra warmup run count for MC2 on specific SoC version (#4843) Yizhou 2025-12-09 21:37:38 +08:00
  • 4813cefc58 [CI] Setup github proxy for self_hosted runners (#4841) Li Wang 2025-12-09 20:35:43 +08:00
  • c331503677 [Refactor] 2/N Unify all mask generation methods and cache mask (#4779) weijinqian0 2025-12-09 18:51:00 +08:00
  • dee00d0de3 [Usability]local_buffer_size support for units: GB, MB, KB, B (#4829) lty 2025-12-09 17:52:24 +08:00
  • 49e346c6a6 [UT]add pcp aclgraph ut (#4804) weiguihua2 2025-12-09 17:27:40 +08:00
  • c68dfa70ac [Bugfix]fix bmm_transpose ops in dsv32 (#4791) Wang Yixuan 2025-12-09 16:55:09 +08:00
  • c8b671c498 [CI] Increase HCCL_BUFFSIZE for A3 (#4838) Li Wang 2025-12-09 16:39:50 +08:00
  • 9567e5dd8c [kernel] Adapt DispatchGmmCombineDecode operator to parameters of small operators (#4790) wangqiankun13 2025-12-09 16:17:06 +08:00
  • 9a885d08d0 [Feat] Multi-stream for eplb heat collection and aggregation (#4214) dsxsteven 2025-12-09 16:16:55 +08:00
  • dda027e680 [KVPOOl]Support pp (#4761) baxingpiaochong 2025-12-09 16:15:26 +08:00
  • 9038865261 [CI] Optimize CI time (#4821) Li Wang 2025-12-09 16:09:37 +08:00
  • 9a144bc7be [Docs][0.11.0] delete AIV env variables in DSV32 documentation (#4833) linfeng-yuan 2025-12-09 15:53:53 +08:00
  • 8f45f9ce29 BugFix: Resolve shape mismatch in eplb update and calculation issues in quant_apply_mlp (#4777) Mercykid-bash 2025-12-09 15:46:58 +08:00
  • 695e5c9ebc [0.11.0][ops] npu_top_k_top_p supports k and p only (#4153) linfeng-yuan 2025-12-09 15:45:40 +08:00
  • 4588d1f215 [CI] Use arm node for unit tests (#4819) Li Wang 2025-12-09 15:45:14 +08:00
  • 56f01820e8 [Docs]fix the configuration conflicts in documentation (#4823) linfeng-yuan 2025-12-09 15:37:38 +08:00
  • e0757dc376 [0.11.0]fix the configuration conflicts in documentation (#4824) linfeng-yuan 2025-12-09 15:37:06 +08:00
  • 1c70f5c922 [CI] Skip test_suffix_correctness (#4820) Li Wang 2025-12-09 11:48:13 +08:00
  • 033e3557cc [cherry-pick]fix qwen3vl mrope op (#4484) (#4811) zhangxinyuehfad 2025-12-09 11:07:32 +08:00
  • 2b819bb35b [Bugfix] Add the check for a null VllmConfig (#4749) Canlin Guo 2025-12-09 09:21:17 +08:00
  • 9862a23985 【0.11.0-dev】optimization of kimi-k2 in cann8.3 (#4555) Levi 2025-12-09 08:49:15 +08:00
  • 0d094531b4 [bugfix] Fixed the bug in retrieving the quantization method for mlp.… (#4797) zhangxinyuehfad 2025-12-09 08:47:19 +08:00
  • 7e70da9fb7 Revert "[Kernel] add custom moe ops for prefill" (#4806) Mengqing Cao 2025-12-08 23:20:32 +08:00
  • 432b861cae Fix incorrect MLAPO weight release in PD mixex scenarios. (#4774) ZYang6263 2025-12-08 23:17:45 +08:00
  • b230e7e987 [MOE]move weight transpose to wakeup for RL secnarios (#4626) lhp-deep 2025-12-08 20:34:52 +08:00
  • 58db21f56a [DP] Fix dp padding logic in dummyrun (#4705) Mengqing Cao 2025-12-08 20:32:35 +08:00
  • 193dc1703f [Doc] Add Qwen3-235B tutorial (#4358) xuyexiong 2025-12-08 20:06:46 +08:00
  • 4e728f1f40 [Bugfix] fix qwen3-vl-moe shape ERROR during the _prepare_inputs phase under high concurrency. (#4658) Levi 2025-12-08 19:30:16 +08:00
  • d412565ec9 [Cherry-pick]bmm_transpose to v011dev (#3995) Wang Yixuan 2025-12-08 19:22:14 +08:00
  • 9766cf9128 fix qwen3vl mrope op (#4484) shaopeng-666 2025-12-08 19:19:17 +08:00
  • 3c3c9a5386 Bump actions/checkout from 6.0.0 to 6.0.1 (#4772) dependabot[bot] 2025-12-08 19:15:40 +08:00
  • 0617d7d394 [Kernel] add custom moe ops for prefill (#4194) shiro-zzzz 2025-12-08 19:11:58 +08:00
  • f0876b5d88 [Bugfix] Fix Dcp dimension mismatch when enable Mlapo (#4687) zengzengran 2025-12-08 17:19:58 +08:00
  • afe00505de [Fix] skip xlite e2e test (#4786) LuLina 2025-12-08 16:48:15 +08:00
  • 96ea0e078f [EPLB] Add log Info for moe_load Imbalance Ratio (#4482) dsxsteven 2025-12-08 14:28:13 +08:00
  • a433f3280a [Op] DeepSeekV3.2 support bmm_transpose operator (#4631) ZYang6263 2025-12-08 14:03:38 +08:00
  • 0b65ac6c4b remove useless patch (#4699) wangxiyuan 2025-12-08 11:02:42 +08:00
  • 866347a621 Deepseek Mtp model uses the lm_head and embedding from the main model (#2790) zzhxxx 2025-12-08 10:33:29 +08:00
  • 9fbcfa36af [CI] Fix ngram & suffix test oom (#4755) fluctlux 2025-12-08 09:26:29 +08:00
  • 916a9a1913 fix synchronize error of exceeds_max_model_len d2h copy (#4708) Ronald 2025-12-08 09:07:59 +08:00
  • 6391f0625f [v0.11.0-dev][bugfix] Add branch for stream up-lifting in update_attn_params (#4437) Angazenn 2025-12-08 08:54:46 +08:00
  • 2be0fe2691 [Feat] Add Euler xlite graph wrapper support (#4526) LuLina 2025-12-08 08:27:46 +08:00
  • 8fdb689a32 [BugFix] Refactor ACL graph size adjustment for speculative decoding (#4640) Yizhou 2025-12-07 17:32:45 +08:00
  • 688b1332da [P/D] check kv extra config and del hccl backend (#4547) liziyu 2025-12-07 15:19:42 +08:00
  • b91a5f0968 Support DeepSeekV3.2 with MLAPO operator (#4753) ZYang6263 2025-12-07 12:40:24 +08:00
  • a5163c8c36 [Feat]enable sfa cp for dsv3.2 (#4702) AlvisGong 2025-12-06 19:46:41 +08:00
  • 4bd1030842 [Kernel] add custom op DispatchGmmCombineDecode (#4139) GuoRen868 2025-12-06 17:33:14 +08:00
  • cb42564942 [BugFix] Fix eagle3 accuracy problem when enforce_eager=True (#4521) zhaomingyu13 2025-12-06 17:31:26 +08:00
  • 3480094d7c support async mtp (#4511) Ronald 2025-12-06 17:15:57 +08:00
  • f067623afd [Bugfix] fix mtp and eagle aclgraph bug (#4710) Zhu Yi Lin 2025-12-06 11:22:57 +08:00
  • 74033999ed mlapo add qdown output (#4707) h1074112368 2025-12-06 11:18:53 +08:00
  • 2598124e67 [Image] Correcting the vllm tag of the openeuler image on the A2 device. (#4745) Li Wang 2025-12-06 10:55:22 +08:00
  • 8378f56f53 rm vanilla attn (#4558) zzzzwwjj 2025-12-06 10:53:55 +08:00
  • e0c5073956 [Bugfix]fix bmm_transpose ops for cann version (#4653) Wang Yixuan 2025-12-06 10:52:46 +08:00
  • a78f49ea57 [Refactor] 1/N Refactor attention_v1 & extract attention_cp (#4628) weijinqian0 2025-12-06 09:33:28 +08:00