Commit Graph

  • e17006077a fix multiproc executor determine kv cache memory & update Dockerfile v0.18.0 starkwj 2026-04-24 08:31:54 +00:00
  • e4d898b245 adapt to vllm-ascend v0.18.0rc1 starkwj 2026-04-21 03:05:32 +00:00
  • 99e1ea0fe6 [v0.18.0][Misc] Upgrade torch_npu to pre-release built version (#7918) Li Wang 2026-04-01 22:41:09 +08:00
  • d3de7333dc [BugFix][v0.18.0][cherry-pick] Fix embedding prefix caching for APC (#7894) hucong 2026-04-01 16:57:33 +08:00
  • 762850fb4e [v0.18.0][Misc] Install numactl in Docker images (#7898) Frank Chen 2026-04-01 16:22:37 +08:00
  • 2cb9195ff0 [Releases/v0.18.0][CI] Updated the parameters for the single-node test to fix the OOM issue for DeepSeek-V3.2 (#7862) Nagisa125 2026-04-01 10:28:46 +08:00
  • 59a7526339 [CI][Misc] modify ds3.2+dcp ci (#7841) weiguihua2 2026-04-01 08:58:21 +08:00
  • ef9964389f [v0.18.0][BugFix][P/D]Fix layerwise connector out of memory during large buffer transfer (#7752) zxr2333 2026-03-31 22:16:53 +08:00
  • b1cc6ef6ae [v0.18.0][BugFix] Fix bug of precision when DSA-CP is enabled on GLM5 (#7843) yydyzr 2026-03-31 21:51:10 +08:00
  • 0b48ddbc8b [Bugfix][0.18.0][KV Pool]Fix KV transfer put logic (#7718) pz1116 2026-03-31 20:21:23 +08:00
  • 14411e911e [Doc][0.18.0][KV Pool]add mooncake rdma timeout (#7784) pz1116 2026-03-31 20:17:03 +08:00
  • a63dd5868d [0.18.0][cherry-pick][BugFix]Fix compilation errors for operators dispatch_gmm_combine_decode/moe_combine_normal/moe_dispatch_normal (#7844) wangyibo1005 2026-03-31 19:58:46 +08:00
  • ed4ef1f4e7 [releases/v0.18.0][Triton][Sampler] Add penalty-related Triton kernel for better performance of penalties (#7794) linfeng-yuan 2026-03-31 19:01:51 +08:00
  • 82e26b5a6e [BugFix][v0.18.0]Adjust request map pop time (#7857) wangxiaoteng888 2026-03-31 18:55:36 +08:00
  • 66db070423 [cherry-pick][Test]repair for test_compute_slot_mapping (#7836) ZT-AIA 2026-03-31 16:52:58 +08:00
  • af4278be35 [v0.18.0][CI] Close build image by pr (#7776) zhangxinyuehfad 2026-03-31 16:38:43 +08:00
  • 7314bbe2df fix(platform): reimplement MiniMax usage accounting patch (#7835) jack 2026-03-31 16:27:00 +08:00
  • 4f259d4fd8 [Performance]Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder (#7737) Wangbei25 2026-03-31 14:49:29 +08:00
  • 2a0a588311 [0.18.0][BugFix] Disable block verify to avoid incorrect verification on NPU … (#7839) liuchenbing2026 2026-03-31 09:36:48 +08:00
  • ab928ed586 [v0.18.0][P/D][Feature]Layerwise connector supports Mamba prefill prefix caching (#7796) zxr2333 2026-03-31 09:25:22 +08:00
  • cab5d73633 [releases/v0.18.0][BugFix] Fix server init error when set max_num_seqs not a multiple of tp while FLASHCOMM is on (#7832) linfeng-yuan 2026-03-30 20:24:52 +08:00
  • deceefd305 [releases/v0.18.0][bugfix][eplb] remove unnecessary weight_scale wrap behaviour (#7732) linfeng-yuan 2026-03-30 16:16:03 +08:00
  • fdd0726ae4 [v0.18.0][Triton] Fix triton-ascend version in Dockerfile (#7766) Mengqing Cao 2026-03-30 14:43:16 +08:00
  • e776d5c0f1 [Bugfix]v0.18.0 support FlashComm1 & DCP for Qwen (#7726) Yang Yuxi 2026-03-29 15:59:19 +08:00
  • 9cc41c9457 [v0.18.0][Bugfix][EAGLE] Fix FIA pad bug under max concurrency (#7754) wangbj127 2026-03-29 12:23:44 +08:00
  • 5df2ddd8db [v0.18.0][Bugfix]Fix Error "AttributeError: 'AscendCompressedTensorsConfig' obiect has no attribute 'enabling_fa_quant'" (#7748) Wang Kunpeng 2026-03-28 17:03:56 +08:00
  • c1cefd26de [v0.18.0][CI] Add nightly- prefix to branch/PR image tags (#7765) zhangxinyuehfad 2026-03-28 11:31:16 +08:00
  • f83cb0e6dc [Bugfix][Platform] Fix GLM47 tool-call finish backfill (#7710) jack 2026-03-28 09:15:04 +08:00
  • 6fbd0049df [v0.18.0] Apply Eagle3 to MiniMax-M2.5 (#7619) (#7714) SparrowMu 2026-03-27 18:33:29 +08:00
  • 60e88d9541 [v0.18.0][Refactor] Use forward mapping instead of reverse mapping in AscendMo… (#7716) Feng-xiaosuo 2026-03-27 18:25:42 +08:00
  • 7cca7e6990 [v0.18.0][Misc] Recompute scheduler upgrade to vLLM 0.18.0 (#7720) Angazenn 2026-03-27 18:24:53 +08:00
  • ab619e1c53 [0.18.0][profiler] profile AICore and MTE time with torch profiler (#7730) linfeng-yuan 2026-03-27 16:37:54 +08:00
  • 2c175f5ed8 [v0.18.0][Bugfix] Fix pr triggers on branches for nightly test workflows (#7695) zhangxinyuehfad 2026-03-27 15:17:06 +08:00
  • bc8e87f3db [v0.18.0][Bugfix] fix ds3.2 dcp mtp (#7681) weiguihua2 2026-03-27 14:24:53 +08:00
  • 048c8d1afe [v0.18.0][Bugfix] Fix the bug of MTP1 crashing in multiple concurrent scenarios. (#7699) bowenli 2026-03-27 14:13:12 +08:00
  • 6ce1dc162a [v0.18.0] fix(attention): reuse weight address in graph + RL scenario (#7715) Debonet 2026-03-27 14:11:20 +08:00
  • 29308ac3a9 [v0.18.0][Bugfix] Fixed wrong class attribute assignment (#7586) (#7655) Mengqing Cao 2026-03-27 11:20:59 +08:00
  • 2c2d8bb015 [cherry-pick][CI] Enforce torchaudio and torchvison compatible with pta (#7688) Li Wang 2026-03-27 11:06:13 +08:00
  • 53cc225cac [v0.18.0][Bugfix][Platform] Fix MiniMax M2 reasoning token usage accounting (#7700) jack 2026-03-27 10:45:28 +08:00
  • a40eee2ba1 [feat] support dispatch_v2/combine_v2 hierarchy communication (#7698) zzzzwwjj 2026-03-27 09:20:16 +08:00
  • 0bab629f90 [v0.18.0][bugfix]fixed block_size incorrect setting issue in dsv3.2 (#7630) (#7652) Wang Kunpeng 2026-03-26 22:38:28 +08:00
  • d6661c09b6 [v0.18.0][kernel] Recompilation optimization triggered by triton function parameter optimization (#7647) HarpsealCC 2026-03-26 19:10:45 +08:00
  • d781902ce9 [v0.18.0][CI] Fix releases/v0.18.0 ci test only support vllm v0.18.0 (#7686) zhangxinyuehfad 2026-03-26 18:36:04 +08:00
  • 124bb00158 [CI][v0.18.0] Build nightly image for releases/v0.18.0 per pr (#7662) zhangxinyuehfad 2026-03-26 16:48:51 +08:00
  • 2db33868a4 [kernel] Recompilation optimization triggered by triton function parameter optimization (#7645) cvSoldier 2026-03-26 16:31:34 +08:00
  • dba34d4915 [v0.18.0][Triton][Qwen3.5] delete expr for kernels args (#7646) Mr.WXS 2026-03-25 23:31:27 +08:00
  • dd55736ee4 fix uncompatible between fc1 and non-sp-padding (#7643) Wangbei25 2026-03-25 23:23:37 +08:00
  • 2ad0ca52a6 Qwen3.5 MoE supports flashcomm v1 (#7644) wangbj127 2026-03-25 23:09:33 +08:00
  • ff1860bd81 [CI]fix lint (#7641) Wang Kunpeng 2026-03-25 18:48:10 +08:00
  • 05a561129e [Graph][Bugfix] Set default cudagraph max capture size via platform defaults (#7572) linfeng-yuan 2026-03-25 17:57:19 +08:00
  • d452d04656 [A5][bugfix] Fix fused MoE A5 MXFP8 scale normalization, load-balance routing and gating_topk ops (#7573) linfeng-yuan 2026-03-25 17:20:28 +08:00
  • e0e585a109 [310P]: add torch chunk gated delta rule and 910b parity ut (#7594) Shaoxu Cheng 2026-03-25 16:46:43 +08:00
  • 17da96658f [ModelLoader][Feature] Add rfork support for fast model loading (#7392) Marck 2026-03-25 16:40:30 +08:00
  • 6ddfc41312 [bugfix] Fixed the error issue when overlaying MTP and full decode on DSV3.1 C8. (#7571) pichangping 2026-03-25 14:36:26 +08:00
  • 95d33f05c2 [eagle3][pcp] fix acceptance rate for eagle3 and pcp enabled (#7549) lilinsiman 2026-03-25 11:52:04 +08:00
  • 114ec75a06 [bugfix][CI] fix '_OpNamespace' 'vllm' object has no attribute 'qkv_rmsnorm_rope' (#7620) meihanc 2026-03-25 11:05:34 +08:00
  • 8e3f8bab57 [Nightly] Nightly pre-build image (#7388) Li Wang 2026-03-25 09:24:01 +08:00
  • 8977be1df3 [Bugfix]Fix deepseek 3.2 C8 precision by rotary tensor (#7537) Yaphets24 2026-03-25 09:18:00 +08:00
  • d96440924a adapt to main2main for model runner v2 (#7578) Ronald 2026-03-25 09:08:44 +08:00
  • fc3ec100bc [Patch] Fix balance scheduling (#7611) Zhu Yi Lin 2026-03-25 08:57:06 +08:00
  • 3f4087a8f0 [310P]fused recurrent gated delta rule pytorch core and ut (#7398) Shaoxu Cheng 2026-03-25 08:53:14 +08:00
  • 54879467c4 [CI] refine issue triage rules, wan regex and update stale setting (#7531) drizzlezyk 2026-03-24 20:11:31 +08:00
  • 1e3c1e76bf [Lint]Add lint hooks for clang-format, shellcheck, forbidden imports, and boolean context manager checks (#7511) SILONG ZENG 2026-03-24 20:03:01 +08:00
  • d1a83a72f7 [doc] add enable_sparse_c8 option in configuration options (#7600) rjg-lyh 2026-03-24 19:36:34 +08:00
  • 0210cc0b07 lower log level in PD Disaggregation (#7589) zouyida2052 2026-03-24 18:03:17 +08:00
  • 0e3186f07c [model_runner_v2]:optimize the performance of the _compute_slot_mappings_kernel (#7575) lhp-deep 2026-03-24 17:29:14 +08:00
  • 5d12446573 [Feat][SP] Suport SP for VL MoE models (#7044) realliujiaxu 2026-03-24 17:16:00 +08:00
  • 9615bc33fd Fix Qwen3Next CI Config (#7561) LeeWenquan 2026-03-24 17:08:17 +08:00
  • d98a0727c8 [Feat] Add npugraph_ex enablement logging (#7574) panchao-hub 2026-03-24 17:04:48 +08:00
  • bdb65319a9 [UT] Align input arguments with Ascend(Yarn)RotaryEmbedding with vLLM and add ut (#7358) Angazenn 2026-03-24 16:02:56 +08:00
  • 568b6d0601 [P/D] Check wildcard address for layerwise connector (#7389) liziyu 2026-03-24 15:50:06 +08:00
  • 73cadecfb4 [P/D] [Bugfix] fix mooncake layerconnector dead when update_decoder_info fail (#7514) liziyu 2026-03-24 15:49:46 +08:00
  • 67aad1fce8 [BugFix][P/D] fix padding error on FullGraph mode && fix layerwise connector mamba accuracy (#7506) zxr2333 2026-03-24 15:15:55 +08:00
  • 475b4b0cea Revert "GMM custom operator optimization in small batch scenarios (vllm-project#7100)" (#7557) LeeWenquan 2026-03-24 14:24:44 +08:00
  • 83bd77c983 [310p]: add rmsnorm gated fallback and unit test (#7424) Shaoxu Cheng 2026-03-24 09:00:11 +08:00
  • 1de805ce0a [Ops][Misc] Refactor and optimize CausalConv1d for Ascend (#7495) jiaojiao 2026-03-24 00:07:12 +08:00
  • e942b62d74 [features]support split qkv rmsnorm rmope for qwen3.5 (#7368) ZhuQi-seu 2026-03-23 23:58:12 +08:00
  • 8e0789bb36 [CI] Recover pd disaggregated encoder test case that been incorrectly skipped (#7505) Nengjun Ma 2026-03-23 21:41:28 +08:00
  • fcba91a392 Main2main Upgrade vllm commit to 0320 17:00 (#7510) Nengjun Ma 2026-03-23 21:37:41 +08:00
  • bdd90c0088 [model_runner_v2]optimize the performance of the post_update. (#7496) weijinqian0 2026-03-23 20:29:55 +08:00
  • 170dcbda62 [Feature] Support DeepSeek for A5 (#7232) lijiahang226 2026-03-23 20:28:26 +08:00
  • 13397e9cb7 [310p] Add a PyTorch implementation of the GDN gating operator on 310P (#7430) Shaoxu Cheng 2026-03-23 20:26:39 +08:00
  • e344a53127 [bugfix][CI]Skip e2e log summary when the log file is missing or empty (#7552) meihanc 2026-03-23 20:25:59 +08:00
  • 886756aea0 [Bugfix][CI] Fix aisbench installation to avoid Gitee authentication (#7536) zhangxinyuehfad 2026-03-23 20:16:51 +08:00
  • ffd195b0fe [Bugfix]Remove conflicting triton after vllm-ascend install on x86 (#7497) SILONG ZENG 2026-03-23 20:14:42 +08:00
  • fb283b5820 [CI] Add nightly CI test cases for the GLM-5 (#7429) liuhy1213-cell 2026-03-23 19:14:19 +08:00
  • 41dadd4312 [main][bugfix] Solved the problem of the d node getting stuck in the pd-separation scenario (#7534) drslark 2026-03-23 18:53:07 +08:00
  • a253235a59 [Doc] Add note for unsupported PCP + FULL (#7559) Zetong Li 2026-03-23 17:34:51 +08:00
  • 9976e685b7 [Bugfix][eager][oom] fix rank0 load imbalance by no padding when multi dp (#7297) Levi 2026-03-23 17:05:02 +08:00
  • 8e2c59e1ee Main2main upgrade vllm commit to 03 19 17:00 (#7478) Nengjun Ma 2026-03-23 16:25:57 +08:00
  • caa71e50ca [Perf] Simplify FIA prefill context merge path (#7293) LICO67373 2026-03-23 15:47:42 +08:00
  • da866cc168 [CI] Bump docker/build-push-action from 6 to 7 (#7541) dependabot[bot] 2026-03-23 15:46:12 +08:00
  • 71df17f4e6 bugfix(MC2): refactor the comm group of MC2 to be compatible with PP (#7291) Qiu 2026-03-23 15:44:21 +08:00
  • 8527b49764 [CI] Bump docker/setup-buildx-action from 3 to 4 (#7542) dependabot[bot] 2026-03-23 15:44:14 +08:00
  • 5b60b530d6 [Bugfix][310p] the new A5 mmencoder op donot support 310p (#7518) Shaoxu Cheng 2026-03-23 15:40:34 +08:00
  • 9e2878065a [Spec-Decode] Fix spec decode proposer in 0.18.0 (#7544) Mengqing Cao 2026-03-23 15:39:24 +08:00
  • 6b7d9b76f1 [MM][Perf] Pre-compute seq_lens and put it on CPU before ViT vision blocks for better performance (#7104) Shanshan Shen 2026-03-23 15:24:26 +08:00
  • 5c0d02f689 [Bugfix] Fix multi-instance serving OOM on single card (#7427) Shanshan Shen 2026-03-23 14:22:59 +08:00
  • 44ef9a36ac [fix]: fix precision issue in dispatch_ffn_combine_bf16 and remove redundant sync (#7198) guanguan0308 2026-03-23 10:14:03 +08:00
  • e68464a1d6 [Bugfix] Fix slow hasattr in ACLGraphWrapper.__getattr__ (#7442) Canlin Guo 2026-03-23 09:26:24 +08:00