Commit Graph

  • 19f49ecb5f [0.11.0][Bugfix]fix_mulit_connector_bug (#3332) (#3882) fems14 2025-10-29 23:44:52 +08:00
  • e5b938c5fe [v0.11.0] [P/D] force with_prefill true after allreduce in kv producer (#3835) liziyu 2025-10-29 23:14:00 +08:00
  • b323be9fe4 deepseek torchair adapt for torch_npu version (#3876) Wang Yixuan 2025-10-29 22:44:44 +08:00
  • 29bd9235ed [v0.11.0][Perf] Delete redundant operations in model_runner and forward_context (#3775) realliujiaxu 2025-10-29 15:58:53 +08:00
  • 75de3fa172 [v0.11.0][Doc] Update doc (#3852) zhangxinyuehfad 2025-10-29 11:32:12 +08:00
  • 6188450269 [v0.11.0][Bugfix]Avoid using the fusion operator in the MOE model (#3837) ZYang6263 2025-10-28 23:31:19 +08:00
  • e48ca0b6ec [bugfix][0.11]fix proxy decode bug (#3751) Shirley125 2025-10-27 16:56:50 +08:00
  • 43276fd822 [v0.11.0][Fix] Prevent memory leak in MLA decode graph (#3743) (#3774) Yizhou 2025-10-27 16:00:20 +08:00
  • 825fdfb197 [v0.11.0][Feat] Prefetching Attention QKV Linear Weight With AddRmsNormQuant Custom Op (#3649) Ruri 2025-10-27 09:42:09 +08:00
  • 1b16c01afd [v0.11.0-dev][Installation] limit opencv-python-headless version to resolve numpy version conflict (#3767) Mengqing Cao 2025-10-25 18:18:28 +08:00
  • a58ff9e92f [Cherry-pick] Port MoE multi-stream fix to v0.11.0-dev (#3753) whx 2025-10-25 15:51:43 +08:00
  • 1bc61031e5 [v0.11.0][Fix] Cap max tokens to prevent potential OOM (#3720) (#3744) Yizhou 2025-10-25 15:46:56 +08:00
  • 99e154dc84 [0.11.0] cherry-pick from #3747 (#3746) fems14 2025-10-25 14:21:30 +08:00
  • fed8145aea [cherry-pick][Feat] Add mrope fusion op#3708 (#3735) shaopeng-666 2025-10-25 11:41:23 +08:00
  • 0644113c35 [BugFix] cherry-pick PR 3736 to v0.11.0-dev (#3737) whx 2025-10-25 10:35:14 +08:00
  • 5a2c5be229 [BugFix][Cherry-pick] Cherry-pick PR 3675 to v0.11.0-dev (#3732) whx 2025-10-25 09:41:51 +08:00
  • 12bc78d252 [v0.11.0][BugFix][P/D] Modify the recalculation logic to prevent waiting requests from filling up the D node KVCache (#3686) hucong 2025-10-25 09:15:42 +08:00
  • 5c0a23f98b [0.11.0][Perf] Add fused matmul/reduce-scatter kernel for performance optimization. (#3725) ZYang6263 2025-10-25 08:20:43 +08:00
  • 17dd9ae42c [0.11.0][bugfix]look up multi_tp key (#3699) (#3723) fems14 2025-10-24 18:22:45 +08:00
  • f0eb3e1d97 [v0.11.0][bugfix]kvpool sync load (#3698) (#3722) fems14 2025-10-24 18:21:46 +08:00
  • 33514a4cc2 [Bugfix] The server fails to locate the request, leading to the server hanging. (#3721) 何必问 2025-10-24 17:41:29 +08:00
  • 4e21b1537e [BugFix] Check all expert maps when using muilty instance. (#3662) offline893 2025-10-24 17:10:31 +08:00
  • b321e3846a [cherry-pick]【main】patch sched_yield (#3648) (#3687) wangxiyuan 2025-10-24 00:24:58 +08:00
  • d0086d432a fix deepseek torchair recompile (#3679) Wang Yixuan 2025-10-23 22:53:13 +08:00
  • d2d19a4c3c [v0.11.0][bugfix] Add 'layer_type' param to get_pergroup_param() for compatibility (#3684) Slightwind 2025-10-23 21:26:50 +08:00
  • f3ea657e93 [0.11.0][Bugfix] fix delay free prefill req & D node support prefix cache (#3609) liziyu 2025-10-23 20:39:35 +08:00
  • 6975d46627 [v0.11.0][Perf] Eliminating the zerolike operator through patch (#3632) ZYang6263 2025-10-23 14:49:28 +08:00
  • 74903af460 [v0.11.0][refactor] refactor SequenceRowParallelOp forward (#3654) rjg-lyh 2025-10-23 14:45:49 +08:00
  • 54bd531db8 [v0.11.0][Fix] Fix attention metadata handling for profiling and MLA (#3636) (#3643) Yizhou 2025-10-23 10:29:30 +08:00
  • 6464c97ff9 [BugFix][v0.11.0] Fix quantization related mtp bug with patch (#3619) whx 2025-10-22 23:06:09 +08:00
  • 6e72bfdc50 [v0.11.0] cherry-pick Fix performance degradation when mtp>1 (#3597) (#3630) Zetong Li 2025-10-22 22:07:39 +08:00
  • a989fef5de unify logic between aclgraph and torchair (#3602) zouyida2052 2025-10-22 21:55:06 +08:00
  • edccd46d74 fix deepseek torchair precision (#3635) Wang Yixuan 2025-10-22 20:20:32 +08:00
  • 984efdc0d0 [v0.11.0][Fix] Fixes attribute error in MLA implementation (#3617) Yizhou 2025-10-22 15:49:18 +08:00
  • a0c3b8dd2d [v0.11.0]cherry-pick fix ut (#3608) (#3614) wangxiyuan 2025-10-22 14:14:15 +08:00
  • 726bc8aa2a [CI]fix test nightly workflow. (#3604) offline893 2025-10-22 10:34:03 +08:00
  • e916265b2b [CI]Add EPLB CI. (#3568) offline893 2025-10-21 22:58:02 +08:00
  • 4c9af353ee Revert "[Feat] Shared expert dp for deepseek and deepseek_mtp (#3495)" (#3586) linfeng-yuan 2025-10-21 22:24:30 +08:00
  • bd11c0054f [BugFix] Fix torchair+mtp bug after deleting deepseek_mtp. (#3590) whx 2025-10-21 22:23:52 +08:00
  • 0c83eee9b1 fix vl float model not support NZ format weight error (#3533) shaopeng-666 2025-10-21 22:23:17 +08:00
  • 6f04b467de [CI] Upgrade manylinux image (#3587) Icey 2025-10-21 22:22:45 +08:00
  • 79821106e6 [BugFix]Fix mtp torchair bug caused by #2719 (#3566) xuyexiong 2025-10-21 22:21:44 +08:00
  • 534f32d27c [BugFix][mian] Fixed a triton kernel bug of layer_norm_fwd_kernel for Qwen3-next (#3549) drslark 2025-10-21 20:20:57 +08:00
  • 13e8e75143 [Refactor] refactor patch module (#3555) wangxiyuan 2025-10-21 20:19:46 +08:00
  • 0c6349610e [Feature] Reduce host memory usage for attention mask generation (#3048) Jade Zheng 2025-10-21 20:19:04 +08:00
  • 5f8b1699ae [Feat][quantization] Support new version w4a8 dynamic quantization for Linear layers (#3311) Anion 2025-10-21 20:18:39 +08:00
  • 11f9bccf6b Mooncake store use adxl inferface (#3350) Chao Lei 2025-10-21 20:18:17 +08:00
  • ef3fabf399 [Chore] Prevents use of ASCEND_LAUNCH_BLOCKING with ACL Graph (#3574) Yizhou 2025-10-21 20:17:33 +08:00
  • 220df60c61 [Model][2/N] Remove deepseek_mtp modeling. (#3561) whx 2025-10-21 20:17:09 +08:00
  • ffb42a8daa [BugFix] Fixed the bug that caused the transposematmul operator to report an error due to the shape being too large (#3578) Zhu Yi Lin 2025-10-21 20:16:54 +08:00
  • 3164cb663c [Bugfix] mooncake connector support external dp & update readme (#3579) liziyu 2025-10-21 20:15:24 +08:00
  • 6b290acfe1 remove redundant params in mla_preprocess kernel (#3530) Chen Chen 2025-10-21 19:20:13 +08:00
  • 80b8df881f [TEST] Add Qwen3-32b-w8a8 acc/perf A2/A3 test (#3541) jiangyunfan1 2025-10-21 17:34:48 +08:00
  • ec1d2b5c04 [Test] Temporarily skip flaky ACL graph test (#3577) Yizhou 2025-10-21 17:16:15 +08:00
  • 9830f85c42 [CI] Fix test_mla_v1 (#3570) Li Wang 2025-10-21 10:31:55 +08:00
  • 4a849df6fa [main] support cpu binding (#3546) Zhu Yi Lin 2025-10-21 09:17:03 +08:00
  • 274b708e0c [Fix] Refactor dummy attention metadata creation (#3497) Yizhou 2025-10-21 00:00:42 +08:00
  • 6b6857929d [Doc] Add --shm-size option to Docker command for qwen3 vl 235B (#3519) likeful 2025-10-20 23:37:35 +08:00
  • 0bf3f21a98 Revert "Add mrope op fusion (#3509)" (#3562) wangxiyuan 2025-10-20 20:19:24 +08:00
  • 068ed706c8 [feat][torchair] support super kernel feat for quantized dsr1 (#3485) linfeng-yuan 2025-10-20 20:04:37 +08:00
  • 70bef33f13 add new accuracy test case for aclgraph (#3390) lilinsiman 2025-10-20 20:04:04 +08:00
  • b9e2896eb1 Revert "[Perf] Add FIA interface in FA case" (#3553) ZYang6263 2025-10-20 19:56:10 +08:00
  • 34c2996ab8 [main] v_proj combining transpose and matmul (#3545) Zhu Yi Lin 2025-10-20 19:53:32 +08:00
  • e04a5e3dd3 [Bugfix] Fix race condition in d2h transfer (#3372) Jade Zheng 2025-10-20 18:24:21 +08:00
  • fdac146f71 [UT] fix skip ut test and enable ut test run normally (#3410) zhangxinyuehfad 2025-10-20 16:30:57 +08:00
  • f8b52fe950 [Model][1/N] Delete deepseek v2/v3 modeling codes. (#3189) whx 2025-10-20 15:31:34 +08:00
  • 918ded9155 [BugFix][HybridKV] Update the check logic of reinitializing inputbatch (#3540) Mengqing Cao 2025-10-20 15:29:48 +08:00
  • daa4dd0a57 [DeepSeek] Seperate deepseek v3.2 modeling form deepseek v2 (#3531) Mengqing Cao 2025-10-20 09:50:44 +08:00
  • 6c65dd891f [ModelRunner][Qwen3-Next] Fix attn_group initialization timing (#3477) Mengqing Cao 2025-10-20 09:39:40 +08:00
  • 9e59fc1510 [TEST] Add initial aisbench support and Qwen3 32B acc/perf test (#3474) jiangyunfan1 2025-10-20 09:33:17 +08:00
  • 58a37ce189 bugfix for mooncake (#3535) zouyida2052 2025-10-19 17:06:05 +08:00
  • 1e78ecbad6 [Perf] Add FIA interface in FA case (#3321) ZYang6263 2025-10-19 12:45:33 +08:00
  • 4b3bd4f397 [main][bugfix] bugfix for minicpm models (#3527) Wang Kunpeng 2025-10-19 11:00:55 +08:00
  • 6c9909c861 [Patch]patch of v1 executor when enable eplb. (#3511) offline893 2025-10-19 10:54:26 +08:00
  • 646c1db5d7 Add mrope op fusion (#3509) shaopeng-666 2025-10-18 18:08:24 +08:00
  • 0777e2f899 Optimize torchair kv_consumer padding logic (#3526) xuyexiong 2025-10-18 16:42:17 +08:00
  • b4233a2ec3 [Bugfix] Route requests requiring KVC recomputation from the decode instance to the P instance (#3448) Shirley125 2025-10-18 15:56:44 +08:00
  • 4750d45d86 [BugFix]Support redundant experts in EPLB (#3473) yechao237 2025-10-18 00:09:16 +08:00
  • 07ca1b9b78 [Refactor] Clean up w4a4_flatquant_dynamic implementation (#3440) Slightwind 2025-10-17 23:53:19 +08:00
  • 21769e8f44 [BUGFIX] Mtp torchair pd fix (#3506) xuyexiong 2025-10-17 21:57:05 +08:00
  • 9547d6f0d9 [Core]Append padding logic for Attention (#3256) Angazenn 2025-10-17 21:56:01 +08:00
  • b154a8e22c [Bugfix] fix logging and d2h bug for flash comm1 (#3505) realliujiaxu 2025-10-17 21:13:41 +08:00
  • 248ee7fa11 [Feat]Make full graph mode compalible with MTP (#3276) anon189Ty 2025-10-17 20:19:56 +08:00
  • 46e62efd44 [Feat]mtp aclgraph support (#3244) anon189Ty 2025-10-17 18:14:49 +08:00
  • 1b424fb7f1 ACLgraph enable: Test cases revisions for all features (#3388) lilinsiman 2025-10-17 17:15:19 +08:00
  • bf87606932 [Feat] Shared expert dp for deepseek and deepseek_mtp (#3495) zhaozx-cn 2025-10-17 15:06:37 +08:00
  • d9ee491f70 [BugFix]Move to_list in foward_v1 with FIA earlier to build (#3185) Angazenn 2025-10-17 11:19:41 +08:00
  • 30e3d86b0f Revert "[BUGFIX] Mtp torchair pd fix (#3449)" (#3500) xuyexiong 2025-10-17 09:42:48 +08:00
  • 3a53bbc508 [Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with bias, resolve conflict with weight prefetch (#3465) huangdong2022 2025-10-17 09:30:51 +08:00
  • 4c4a8458a5 [CI] Refator multi-node CI (#3487) Li Wang 2025-10-17 09:04:31 +08:00
  • ccb6fb9ec1 [Fix] Clears unused slot mappings and fix accuracy issue with MLA models when enabling FULL_DECODE_ONLY (#3482) Yizhou 2025-10-16 19:43:09 +08:00
  • f9535cc9e2 [BugFix] fix qwenVL quant assertion error (#3466) elilzhu 2025-10-16 17:08:00 +08:00
  • 9ff6b0b862 [CI]: Fix doctest ci for main release (#3451) menogrey 2025-10-16 14:38:11 +08:00
  • b0ae203e72 [BUGFIX] Mtp torchair pd fix (#3449) xuyexiong 2025-10-16 09:03:49 +08:00
  • 291c00a224 [Doc] pin version that can stable running 310I Duo to vllm-ascend v0.10.0rc1 (#3455) leo-pony 2025-10-16 08:54:09 +08:00
  • ff91904ee2 [Doc] Clearer corresponding relationship between configurations for multi-node guides (#3441) leo-pony 2025-10-16 08:54:03 +08:00
  • aa6154703a [BugFix]GPQA Accuracy Issue Bugfix (#3476) DreamerLeader 2025-10-15 23:28:17 +08:00
  • cec1fab509 Revert "[MoE] [Refactor] Remove manual memory cleanup (#3365)" (#3483) weichen 2025-10-15 22:25:46 +08:00
  • f69a83b7ba [Feat] Flash comm allgher ep (#3334) realliujiaxu 2025-10-15 19:36:32 +08:00
  • 8abe517870 [Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 (#3432) Mengqing Cao 2025-10-15 17:48:58 +08:00