Commit Graph

  • 1f25d60870 [Fix] Cap max tokens to prevent potential OOM (#3720) Yizhou 2025-10-25 11:23:21 +08:00
  • 63c363d3de [Refactor] [MoE] Rename moe-related classes & files (#3646) weichen 2025-10-25 11:22:03 +08:00
  • 0637e8f021 [Doc] Update supported models (#3481) zhangxinyuehfad 2025-10-25 11:13:46 +08:00
  • 8f6f967028 [Test] Add e2e test and accuracy test for Qwen3-Next-80B-A3B-Instruct (#3450) zhangxinyuehfad 2025-10-25 10:57:56 +08:00
  • 0644113c35 [BugFix] cherry-pick PR 3736 to v0.11.0-dev (#3737) whx 2025-10-25 10:35:14 +08:00
  • d5609e2c48 [BugFix] Comment out newly added vlm e2e. (#3736) whx 2025-10-25 10:34:59 +08:00
  • 9e150e5009 [Refactor] optimize _prepare_inputs method in eagle_proposer (#3296) lio 2025-10-25 09:49:42 +08:00
  • d30bb95b90 [Bugfix] Fix zero attention output in qwen3-next (#3572) QilaiZhang 2025-10-25 09:47:03 +08:00
  • 5a2c5be229 [BugFix][Cherry-pick] Cherry-pick PR 3675 to v0.11.0-dev (#3732) whx 2025-10-25 09:41:51 +08:00
  • e33751ef8b [BugFix][Core] Fix a bug running multi-modal with ascend_scheduler (#3675) whx 2025-10-25 09:41:33 +08:00
  • 1a9feb3ba5 Update version doc (#3599) wangxiyuan 2025-10-25 09:37:56 +08:00
  • 07c8d4547c [CI] Skip ops test for e2e (#3665) wangxiyuan 2025-10-25 09:37:30 +08:00
  • 6922947033 [Misc] Limit ray version (#3660) wangxiyuan 2025-10-25 09:36:44 +08:00
  • 8295136575 [UT][fix] Add missing get_ascend_config mock to NPUWorker initialization tests (#3729) Canlin Guo 2025-10-25 09:33:16 +08:00
  • 7f73c28a24 [CI][Doc] Optimize multi-node CI (#3565) Li Wang 2025-10-25 09:23:47 +08:00
  • 12bc78d252 [v0.11.0][BugFix][P/D] Modify the recalculation logic to prevent waiting requests from filling up the D node KVCache (#3686) hucong 2025-10-25 09:15:42 +08:00
  • 292cf339c3 [BugFix][P/D] Modify the recalculation logic to prevent waiting requests from filling up the D node KVCache (#3641) hucong 2025-10-25 09:14:20 +08:00
  • 39b994a987 [Feat] Add mrope fusion op (#3708) shaopeng-666 2025-10-25 09:12:18 +08:00
  • 3158742a97 [Refactor] Refactor Ascend attention implementation forward (#3714) Yizhou 2025-10-25 08:58:35 +08:00
  • 5c0a23f98b [0.11.0][Perf] Add fused matmul/reduce-scatter kernel for performance optimization. (#3725) ZYang6263 2025-10-25 08:20:43 +08:00
  • 17dd9ae42c [0.11.0][bugfix]look up multi_tp key (#3699) (#3723) fems14 2025-10-24 18:22:45 +08:00
  • f0eb3e1d97 [v0.11.0][bugfix]kvpool sync load (#3698) (#3722) fems14 2025-10-24 18:21:46 +08:00
  • 0b1da24742 [Main][Perf] Add fused matmul/reduce-scatter kernel for performance optimization. (#3693) ZYang6263 2025-10-24 18:19:58 +08:00
  • 33514a4cc2 [Bugfix] The server fails to locate the request, leading to the server hanging. (#3721) 何必问 2025-10-24 17:41:29 +08:00
  • 82a4970fe9 look up multi_tp key (#3699) fems14 2025-10-24 17:23:36 +08:00
  • c83efcb9e4 kvpool sync load (#3698) fems14 2025-10-24 17:22:53 +08:00
  • 59bb16b75c [Bugfix] The server fails to locate the request, leading to the server hanging. (#3703) 何必问 2025-10-24 17:18:03 +08:00
  • d301c56d1a [TEST]Add initial multi modal cases of Qwen2.5-VL-32B-Instruct for nightly test (#3707) wangyu 2025-10-24 17:12:06 +08:00
  • 4e21b1537e [BugFix] Check all expert maps when using muilty instance. (#3662) offline893 2025-10-24 17:10:31 +08:00
  • 9b0baa1182 [BugFix] Check all expert maps when using muilty instance. (#3576) offline893 2025-10-24 17:10:14 +08:00
  • cea0755b07 [1/N][Refactor] Refactor code to adapt with vllm main (#3612) Mengqing Cao 2025-10-24 16:55:08 +08:00
  • ec9ec78b53 [TEST]Add initial prefix cache case for nightly test (#3709) jiangyunfan1 2025-10-24 16:33:18 +08:00
  • 6be321b95e remove useless code (#3685) zzzzwwjj 2025-10-24 16:29:08 +08:00
  • cd58a643c5 [UT] Fix test_sample_recovered_tokens_pytorch_autoregressive (#3434) lio 2025-10-24 11:20:57 +08:00
  • 802c574532 [Benchmark] Upgrade benchmark args for new vllm version (#3218) Li Wang 2025-10-24 11:18:19 +08:00
  • 1b270a64bd [MoE][Multistream] Avoid performing communication in extra stream. (#3582) whx 2025-10-24 10:44:38 +08:00
  • b54d44e664 support cp&dcp (#3260) LookAround0301 2025-10-24 10:32:01 +08:00
  • b321e3846a [cherry-pick]【main】patch sched_yield (#3648) (#3687) wangxiyuan 2025-10-24 00:24:58 +08:00
  • 2bcadcb9d5 【main】patch sched_yield (#3648) fems14 2025-10-24 00:06:45 +08:00
  • d0086d432a fix deepseek torchair recompile (#3679) Wang Yixuan 2025-10-23 22:53:13 +08:00
  • a7b40b09eb [BugFix]fix deepseek torchair recompile (#3678) Wang Yixuan 2025-10-23 22:53:01 +08:00
  • d2d19a4c3c [v0.11.0][bugfix] Add 'layer_type' param to get_pergroup_param() for compatibility (#3684) Slightwind 2025-10-23 21:26:50 +08:00
  • 3366d47694 [main][bugfix] Add 'layer_type' param to get_pergroup_param() for compatibility (#3682) Slightwind 2025-10-23 21:26:33 +08:00
  • 062257f624 [Test] add a new Qwen3-32b-int8 test case with feature_stack3 (#3676) HuaJiaHeng 2025-10-23 20:43:14 +08:00
  • ebfd09a075 [Doc] Update the Pangu Pro MoE tutorials. (#3651) zhangyiming 2025-10-23 20:41:47 +08:00
  • f3ea657e93 [0.11.0][Bugfix] fix delay free prefill req & D node support prefix cache (#3609) liziyu 2025-10-23 20:39:35 +08:00
  • aeddf4261a [Bugfix] fix delay free prefill req & D node support prefix cache (#3607) liziyu 2025-10-23 20:39:14 +08:00
  • e3c1ac89e5 [Structured Output] Replace apply_grammar_bitmask() method with that in vllm to avoid maintenance (#2524) Shanshan Shen 2025-10-23 17:26:27 +08:00
  • 9434f24ded [TEST]Add initial multi modal cases for nightly test and deepseek-r1 tests (#3631) jiangyunfan1 2025-10-23 17:18:49 +08:00
  • 427b17e2da [Misc] Add a model loader that utilizes HCCL for weight loading (#2888) Rui Kang 2025-10-23 15:56:07 +08:00
  • 807686dec9 perf : optimize memory for deepseek mtp (#2713) NeverRaR 2025-10-23 15:52:17 +08:00
  • 2584f97217 [BugFix] fix deepseek torchair precision (#3624) Wang Yixuan 2025-10-23 15:41:50 +08:00
  • f06a6cad1b [Doc] Update the modelslim website from gitee to gitcode. (#3615) Crazyang 2025-10-23 15:38:16 +08:00
  • 6975d46627 [v0.11.0][Perf] Eliminating the zerolike operator through patch (#3632) ZYang6263 2025-10-23 14:49:28 +08:00
  • 74903af460 [v0.11.0][refactor] refactor SequenceRowParallelOp forward (#3654) rjg-lyh 2025-10-23 14:45:49 +08:00
  • 292e213dd2 [main][refactor] refactor SequenceRowParallelOp forward (#3616) rjg-lyh 2025-10-23 14:41:15 +08:00
  • ca104ce6f0 [Doc] Upgrade docker run command (#3645) Li Wang 2025-10-23 11:17:26 +08:00
  • 54bd531db8 [v0.11.0][Fix] Fix attention metadata handling for profiling and MLA (#3636) (#3643) Yizhou 2025-10-23 10:29:30 +08:00
  • dd7a25063c [Feat] Prefetching Attention QKV Linear Weight With AddRmsNormQuant Custom Op (#3517) Ruri 2025-10-23 10:07:37 +08:00
  • 72695c97d0 [BugFix][main] Fix quantization related mtp bug with patch (#3620) whx 2025-10-23 09:54:31 +08:00
  • 4381d296e5 [Fix] Fix attention metadata handling for profiling and MLA (#3636) Yizhou 2025-10-23 09:35:18 +08:00
  • b13d22bf5a [Fix] Fixes attribute error in MLA implementation (#3618) Yizhou 2025-10-23 09:12:50 +08:00
  • 6464c97ff9 [BugFix][v0.11.0] Fix quantization related mtp bug with patch (#3619) whx 2025-10-22 23:06:09 +08:00
  • 6e72bfdc50 [v0.11.0] cherry-pick Fix performance degradation when mtp>1 (#3597) (#3630) Zetong Li 2025-10-22 22:07:39 +08:00
  • 179b897b52 [Bugfix][MTP] Fix performance degradation when mtp>1 (#3597) Zetong Li 2025-10-22 22:04:43 +08:00
  • a989fef5de unify logic between aclgraph and torchair (#3602) zouyida2052 2025-10-22 21:55:06 +08:00
  • 55a4b5ac40 unify logic between aclgraph and torchair (#3560) zouyida2052 2025-10-22 21:52:57 +08:00
  • edccd46d74 fix deepseek torchair precision (#3635) Wang Yixuan 2025-10-22 20:20:32 +08:00
  • 984efdc0d0 [v0.11.0][Fix] Fixes attribute error in MLA implementation (#3617) Yizhou 2025-10-22 15:49:18 +08:00
  • 1ad7ffd647 clean up uesless ut test (#3622) wangxiyuan 2025-10-22 15:00:08 +08:00
  • 286ae9003d [CI] Multi-Node CI scalable (#3611) Li Wang 2025-10-22 14:18:43 +08:00
  • bc30874f8b [Feat] add native kvcache offload (#3433) kx 2025-10-22 14:15:49 +08:00
  • a0c3b8dd2d [v0.11.0]cherry-pick fix ut (#3608) (#3614) wangxiyuan 2025-10-22 14:14:15 +08:00
  • 60e2be1b36 [Feat] Dynamic Batch Feature (#3490) KyrieWang 2025-10-22 14:13:32 +08:00
  • c18ca62a17 [Misc] clean up useless function (#3348) wangxiyuan 2025-10-22 11:53:40 +08:00
  • f2dd5f8d08 fix : support chunked_prefill with deepseek_mtp (#2711) NeverRaR 2025-10-22 11:52:27 +08:00
  • 2f1b9a7a64 Reapply "[MoE] [Refactor] Remove manual memory cleanup (#3365)" (#3483) (#3512) weichen 2025-10-22 11:41:30 +08:00
  • 6ef62cb427 fix ut (#3608) wangxiyuan 2025-10-22 11:30:12 +08:00
  • 5f3b798e56 [CI]Fix test nightly workflow. (#3603) offline893 2025-10-22 10:45:42 +08:00
  • 726bc8aa2a [CI]fix test nightly workflow. (#3604) offline893 2025-10-22 10:34:03 +08:00
  • e916265b2b [CI]Add EPLB CI. (#3568) offline893 2025-10-21 22:58:02 +08:00
  • 4c9af353ee Revert "[Feat] Shared expert dp for deepseek and deepseek_mtp (#3495)" (#3586) linfeng-yuan 2025-10-21 22:24:30 +08:00
  • bd11c0054f [BugFix] Fix torchair+mtp bug after deleting deepseek_mtp. (#3590) whx 2025-10-21 22:23:52 +08:00
  • 0c83eee9b1 fix vl float model not support NZ format weight error (#3533) shaopeng-666 2025-10-21 22:23:17 +08:00
  • 6f04b467de [CI] Upgrade manylinux image (#3587) Icey 2025-10-21 22:22:45 +08:00
  • 79821106e6 [BugFix]Fix mtp torchair bug caused by #2719 (#3566) xuyexiong 2025-10-21 22:21:44 +08:00
  • 534f32d27c [BugFix][mian] Fixed a triton kernel bug of layer_norm_fwd_kernel for Qwen3-next (#3549) drslark 2025-10-21 20:20:57 +08:00
  • 13e8e75143 [Refactor] refactor patch module (#3555) wangxiyuan 2025-10-21 20:19:46 +08:00
  • 0c6349610e [Feature] Reduce host memory usage for attention mask generation (#3048) Jade Zheng 2025-10-21 20:19:04 +08:00
  • 5f8b1699ae [Feat][quantization] Support new version w4a8 dynamic quantization for Linear layers (#3311) Anion 2025-10-21 20:18:39 +08:00
  • 11f9bccf6b Mooncake store use adxl inferface (#3350) Chao Lei 2025-10-21 20:18:17 +08:00
  • ef3fabf399 [Chore] Prevents use of ASCEND_LAUNCH_BLOCKING with ACL Graph (#3574) Yizhou 2025-10-21 20:17:33 +08:00
  • 220df60c61 [Model][2/N] Remove deepseek_mtp modeling. (#3561) whx 2025-10-21 20:17:09 +08:00
  • ffb42a8daa [BugFix] Fixed the bug that caused the transposematmul operator to report an error due to the shape being too large (#3578) Zhu Yi Lin 2025-10-21 20:16:54 +08:00
  • 3164cb663c [Bugfix] mooncake connector support external dp & update readme (#3579) liziyu 2025-10-21 20:15:24 +08:00
  • 6b290acfe1 remove redundant params in mla_preprocess kernel (#3530) Chen Chen 2025-10-21 19:20:13 +08:00
  • 80b8df881f [TEST] Add Qwen3-32b-w8a8 acc/perf A2/A3 test (#3541) jiangyunfan1 2025-10-21 17:34:48 +08:00
  • ec1d2b5c04 [Test] Temporarily skip flaky ACL graph test (#3577) Yizhou 2025-10-21 17:16:15 +08:00
  • 9830f85c42 [CI] Fix test_mla_v1 (#3570) Li Wang 2025-10-21 10:31:55 +08:00
  • 4a849df6fa [main] support cpu binding (#3546) Zhu Yi Lin 2025-10-21 09:17:03 +08:00