Commit Graph

  • 40c7db6559 [MM][Bugfix] Add MoE verification for multi-modal models (#3897) Shanshan Shen 2025-11-04 09:16:19 +08:00
  • 0cead5c1ee Quality enhancement: Immediately interrupt execution when allocate NPU memory OOM (#3944) leo-pony 2025-11-04 08:55:22 +08:00
  • 892f1ee30f Quality enhancement: Immediately interrupt execution when memory OOM (#3932) leo-pony 2025-11-04 08:55:09 +08:00
  • 5453033a41 revert TND modify when dcp pcp (#3948) weiguihua2 2025-11-03 22:22:17 +08:00
  • cc2cd42ad3 Upgrade CANN to 8.3.rc1 (#3945) wangxiyuan 2025-11-03 20:21:07 +08:00
  • 49d74785c4 [Test] Add new e2e test use deepseek-v2-lite in ge graph mode (#3937) CodeCat 2025-11-03 20:10:01 +08:00
  • 8f222f21f1 [CI][Nightly] Fix mooncake build (#3958) Li Wang 2025-11-03 20:07:47 +08:00
  • 7cc6208029 [0.11.0][MTP][Aclgraph] Fix the support aclgraph with MTP (#3912) Mengqing Cao 2025-11-03 14:25:37 +08:00
  • ec98320285 correct bug to fix the value of max_num_tokens (#3933) zouyida2052 2025-11-03 14:17:51 +08:00
  • 0b9b6d79fe [Feat][UT] Support Deepseekv32 FULL_DECODE_ONLY mode and add unit test of sfa_v1 (#3763) 1Fire4 2025-11-03 10:02:47 +08:00
  • d4c75088a0 [Perf] Move attention update stream out of loop to optimize performance (#3848) XiaoxinWang 2025-11-03 09:19:57 +08:00
  • d0cc9c1203 [CI][Nightly] Correct the commit hash available for mooncake (#3943) Li Wang 2025-11-01 21:52:16 +08:00
  • 8a7154001e [0.11.0]Chery pick pta upgrade change (#3940) wangxiyuan 2025-10-31 22:14:26 +08:00
  • fcc9a0eaeb Update torch-npu version to 2.7.1 (#3896) wangxiyuan 2025-10-31 17:16:31 +08:00
  • 5f6d1b3323 [Doc] Update doc for release notese (#3853) zhangxinyuehfad 2025-10-31 16:46:17 +08:00
  • 3d81ea03ed [v0.11.0-dev][bugfix] fix valueError in static_forward_context when prefix is empty (#3929) rjg-lyh 2025-10-31 15:45:06 +08:00
  • 0f70698d6d [feature] support pcp + mtp (with pd disaggregate) (#3822) zhangsicheng5 2025-10-31 15:43:22 +08:00
  • f99762eb25 [E2E][MM] Add e2e tests for InternVL model (#3796) Canlin Guo 2025-10-31 15:42:47 +08:00
  • c1a6aeab46 [main][bugfix] fix valueError in static_forward_context when prefix is empty (#3924) rjg-lyh 2025-10-31 14:55:58 +08:00
  • 9f7de45b75 [Bugfix] fix MTP support for lmhead_tensor_parallel_size (#3921) Nagisa125 2025-10-31 14:34:28 +08:00
  • ee2e55e602 [v0.11.0][Test] Add new test model for aclgraph single_request v0.11.0 (#3889) lilinsiman 2025-10-31 11:23:55 +08:00
  • 1f486b2dd1 [Test] Add new test model for aclgraph single_request (#3888) lilinsiman 2025-10-31 11:23:13 +08:00
  • 6764777f00 [Bugfix] Fix MTP support for lmhead_tensor_parallel_size (#3915) Nagisa125 2025-10-31 10:30:28 +08:00
  • 90aca84e60 fix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_len (#3909) zouyida2052 2025-10-31 09:25:06 +08:00
  • 1966885be2 mfix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_lentp (#3910) zouyida2052 2025-10-31 09:24:50 +08:00
  • 387ce1cc5b add new e2e tests case for aclgraph memory to v0.11.0 (#3880) lilinsiman 2025-10-31 09:17:09 +08:00
  • 35a913cf1e add new e2e tests case for aclgraph memory (#3879) lilinsiman 2025-10-31 09:16:52 +08:00
  • 38afd2c9cb [bugfix_v0.11.0]cancel tokenize for layerwise_proxy (#3913) wangxiaoteng888 2025-10-30 23:55:04 +08:00
  • a2b325ee00 [bugfix]cancel tokenize for layerwise_proxy (#3914) wangxiaoteng888 2025-10-30 23:54:46 +08:00
  • eb0a2ee2d0 [CI] Optimize nightly CI (#3898) Li Wang 2025-10-30 23:42:20 +08:00
  • af7a56550b [bugfix_v0.11.0-dev] layerwise D first plan (#3907) wangxiaoteng888 2025-10-30 22:21:11 +08:00
  • 2c291bc63f [bugfix] layerwise D first plan (#3866) wangxiaoteng888 2025-10-30 22:20:34 +08:00
  • d5a9aba03f [BugFix]Fix group list type of mc2. (#3890) offline893 2025-10-30 21:44:14 +08:00
  • 627f20ce26 [BugFix]Fix group list type of mc2. (#3864) offline893 2025-10-30 21:39:01 +08:00
  • 655a229455 [TEST]Add MALPO for aclgraph in nightly test (#3894) jiangyunfan1 2025-10-30 18:25:54 +08:00
  • 216fc0e8e4 [feature] Prompt Embeddings Support for v1 Engine (#3026) Song Zhixin 2025-10-30 17:15:57 +08:00
  • f6149f3894 [Model][3/N] Refactor sfa into mla and remove deepseek_v3_2.py (#3769) whx 2025-10-30 17:06:38 +08:00
  • eff3e5fc6f [FEAT] Refactor spec decode to support efficient padded speculation (#3528) xuyexiong 2025-10-30 16:53:05 +08:00
  • 10772d94e3 [Build] Force torch version (#3791) wangxiyuan 2025-10-30 15:53:15 +08:00
  • ff47524b88 [Doc] Remove modeling doc (#3789) wangxiyuan 2025-10-30 15:53:02 +08:00
  • 67dd3a4581 [UT] fix skip ut test for test_utils (#3803) Meihan-chen 2025-10-30 15:52:53 +08:00
  • c506ba60fb [v0.11.0] [Bugfix] [MoE]fix error in deepseek when using allgather (#3827) weichen 2025-10-30 14:59:46 +08:00
  • eed1957f03 Add FAQ for docker pull error on Kylin OS (#3870) Liwx 2025-10-30 14:10:52 +08:00
  • 14ca1e5cb2 [CI]Fix oom of deepseek-eplb nigtly test. (#3884) offline893 2025-10-30 10:18:07 +08:00
  • 211d4b9da4 [BugFix] Fix mlapo accuracy problem related with weight processing. (#3857) whx 2025-10-30 00:35:50 +08:00
  • dc960e798e [BugFix] Fix mlapo accuracy problem related with weight processing. (#3850) whx 2025-10-30 00:34:55 +08:00
  • d9249c968e bugfix for mtp in fullgraph (#3878) zouyida2052 2025-10-29 23:52:20 +08:00
  • adadd50613 bugfix for mtp fullgraph (#3845) zouyida2052 2025-10-29 23:50:13 +08:00
  • 19f49ecb5f [0.11.0][Bugfix]fix_mulit_connector_bug (#3332) (#3882) fems14 2025-10-29 23:44:52 +08:00
  • d6ef3df3b3 [Bugfix]fix_mulit_connector_bug (#3332) baxingpiaochong 2025-10-29 23:23:06 +08:00
  • e5b938c5fe [v0.11.0] [P/D] force with_prefill true after allreduce in kv producer (#3835) liziyu 2025-10-29 23:14:00 +08:00
  • 07873d9396 fix mooncake layerwise connector (#3849) liziyu 2025-10-29 23:10:51 +08:00
  • 5f176ca992 [CI]Fix eplb nightly tests. (#3863) offline893 2025-10-29 23:06:05 +08:00
  • b323be9fe4 deepseek torchair adapt for torch_npu version (#3876) Wang Yixuan 2025-10-29 22:44:44 +08:00
  • 870a3f21cb [BugFix] deepseek torchair adapt for torch_npu version (#3862) Wang Yixuan 2025-10-29 22:39:34 +08:00
  • 4a2ab13743 [CI] Optimize nightly CI (#3858) Li Wang 2025-10-29 22:30:19 +08:00
  • cba69e117e [CI]pin vllm commit id (#3861) Meihan-chen 2025-10-29 17:43:58 +08:00
  • 74191864b7 [Perf] Delete redundant operations in model_runner and forward_context (#3677) realliujiaxu 2025-10-29 15:59:55 +08:00
  • 29bd9235ed [v0.11.0][Perf] Delete redundant operations in model_runner and forward_context (#3775) realliujiaxu 2025-10-29 15:58:53 +08:00
  • 0d1859af08 [Bugfix] [MoE] fix error in deepseek when using allgather (#3824) weichen 2025-10-29 14:51:39 +08:00
  • 900086fdc6 [HybridKV][Bugfix] Fix Hybrid kvcache sharing bug in same attention type (#3760) Mengqing Cao 2025-10-29 14:18:52 +08:00
  • 75de3fa172 [v0.11.0][Doc] Update doc (#3852) zhangxinyuehfad 2025-10-29 11:32:12 +08:00
  • 789ba4c5c2 [Doc] Update doc (#3836) zhangxinyuehfad 2025-10-29 11:03:39 +08:00
  • 1e31b07fa7 fix qwen3next full graph break. (#3812) XiaoxinWang 2025-10-29 10:30:23 +08:00
  • c76db627ab [P/D] force with_prefill true after allreduce in kv producer (#3768) liziyu 2025-10-29 10:15:38 +08:00
  • f57bdb09fc [long_seq_optim] BSND to TND and FA_UPDATE replacement (#3778) pichangping 2025-10-29 09:33:35 +08:00
  • e56b0017a3 [TEST]Add aisbench log and A2 cases (#3841) jiangyunfan1 2025-10-28 23:33:15 +08:00
  • 6188450269 [v0.11.0][Bugfix]Avoid using the fusion operator in the MOE model (#3837) ZYang6263 2025-10-28 23:31:19 +08:00
  • d08401d1e7 [Main][Bugfix]Avoid using the fusion operator in the MOE model (#3834) ZYang6263 2025-10-28 23:30:27 +08:00
  • 90ae114569 [CI] Fix nightly CI (#3821) Li Wang 2025-10-28 20:40:03 +08:00
  • a7450db1bd Upgrade to 0.11.1 newest vllm commit (#3762) Icey 2025-10-28 14:55:03 +08:00
  • f846bd20e4 [CI] Add multi-node test case for a2 (#3805) Li Wang 2025-10-27 23:10:17 +08:00
  • 9030106a14 [TEST]Add 2P1D multi node cases for nightly test (#3764) jiangyunfan1 2025-10-27 23:09:15 +08:00
  • d64bdd06ae 【Bugfix】bugfix for weight load of kimi-k2 (#3798) Levi 2025-10-27 21:18:35 +08:00
  • da5f2cc1e3 [Doc] Update FAQ (#3792) wangxiyuan 2025-10-27 20:32:17 +08:00
  • 00aa0bf33e support prefill cache mode use fia op (#3696) shiyuan680 2025-10-27 19:41:07 +08:00
  • 3e5ae49160 [MM][Doc] Update online serving tutorials for Qwen2-Audio (#3606) Shanshan Shen 2025-10-27 16:58:03 +08:00
  • e48ca0b6ec [bugfix][0.11]fix proxy decode bug (#3751) Shirley125 2025-10-27 16:56:50 +08:00
  • d8ca7fee75 [bugfix][main]fix proxy decode bug (#3750) Shirley125 2025-10-27 16:56:09 +08:00
  • 43276fd822 [v0.11.0][Fix] Prevent memory leak in MLA decode graph (#3743) (#3774) Yizhou 2025-10-27 16:00:20 +08:00
  • b8796b06c8 [Doc][Example][Bugfix] Elements in local_device_ids should be casted … (#3782) yupeng 2025-10-27 14:52:47 +08:00
  • 638d8d1a47 Bump actions/upload-artifact from 4 to 5 (#3786) dependabot[bot] 2025-10-27 14:11:53 +08:00
  • 79623e0bab Bump actions/download-artifact from 5 to 6 (#3787) dependabot[bot] 2025-10-27 14:10:56 +08:00
  • e9072429fb [CI] Enable 2 jobs for nightly test (#3781) jiangyunfan1 2025-10-27 14:08:29 +08:00
  • 60ee4af6d0 [CI] Add custom op to nightly (#3765) Li Wang 2025-10-27 14:07:03 +08:00
  • 4312a92a4f [feat]dcp pcp support aclgraph (#3731) weiguihua2 2025-10-27 09:58:23 +08:00
  • 825fdfb197 [v0.11.0][Feat] Prefetching Attention QKV Linear Weight With AddRmsNormQuant Custom Op (#3649) Ruri 2025-10-27 09:42:09 +08:00
  • 8ab8111fde [Fix] Prevent memory leak in MLA decode graph (#3743) Yizhou 2025-10-25 20:37:33 +08:00
  • 1b16c01afd [v0.11.0-dev][Installation] limit opencv-python-headless version to resolve numpy version conflict (#3767) Mengqing Cao 2025-10-25 18:18:28 +08:00
  • afc58184ec [Installation] limit opencv-python-headless version to resolve numpy version conflict (#3713) 22dimensions 2025-10-25 18:07:54 +08:00
  • bb5f16d926 [BugFix] Fix Qwen3-next break (#3428) Icey 2025-10-25 18:03:36 +08:00
  • 7572939b94 add qwq testcase (#3757) ck-hw-1018 2025-10-25 17:11:35 +08:00
  • e5676fc36e [main] remove dbo code (#3712) zzzzwwjj 2025-10-25 15:53:01 +08:00
  • a58ff9e92f [Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev (#3753) whx 2025-10-25 15:51:43 +08:00
  • 1bc61031e5 [v0.11.0][Fix] Cap max tokens to prevent potential OOM (#3720) (#3744) Yizhou 2025-10-25 15:46:56 +08:00
  • d9cdc65854 Upgrade to new vllm commit (#3719) Icey 2025-10-25 15:36:32 +08:00
  • 99e154dc84 [0.11.0] cherry-pick from #3747 (#3746) fems14 2025-10-25 14:21:30 +08:00
  • 226f832c0b [bugfixfix] correct _register function place for mooncacke (#3747) fems14 2025-10-25 14:20:09 +08:00
  • 11f75883be [Test] add test for prefix cache feature of deepseek (#3733) HuaJiaHeng 2025-10-25 14:08:15 +08:00
  • fed8145aea [cherry-pick][Feat] Add mrope fusion op#3708 (#3735) shaopeng-666 2025-10-25 11:41:23 +08:00