Commit Graph

  • 75fae619d5 [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (#7455) Li Wang 2026-03-23 09:08:21 +08:00
  • 9bf9b4b267 [Feature] Optimize Qwen3.5/Qwen3Next GDN prefill by prebuilding chunk metadata (#7487) Qi Mao 2026-03-22 23:09:23 +08:00
  • b2e71b7930 [Bugfix] Fix get_rope_shape for Kimi-K2.5 (#7521) LoganJane 2026-03-22 21:06:31 +08:00
  • 9e2965bae2 [Feature] Support Flash Comm V1 for VL models (with MLA) (#7390) Cao Yi 2026-03-22 21:05:28 +08:00
  • 9d0b7c8e98 [Platform][BugFix] Preserve hybrid block size on Ascend (#7528) Qi Mao 2026-03-22 11:21:49 +08:00
  • cbf46fad3c fixed graph mode bug. (#7460) XiaoxinWang 2026-03-22 10:09:37 +08:00
  • 84a74f0cb1 [Bugfix] Fix padding logic in eagle proposer for kimi25 (#7348) Zetong Li 2026-03-21 16:57:22 +08:00
  • f482c314cf Upgrade vllm v0.18.0 in dockerfile (#7523) zhangxinyuehfad 2026-03-21 16:19:41 +08:00
  • bff4fbfca5 upgrade to 0.18.0 (#7502) meihanc 2026-03-21 16:05:38 +08:00
  • 80a4265717 [Feat] Support separate attention backend for target and draft model. (#7342) HongtaoYang 2026-03-21 10:48:01 +08:00
  • 88d03a783f [refactor] replace scattered business kwargs with typed request objects and explicit stage boundaries (#7024) linfeng-yuan 2026-03-20 23:23:57 +08:00
  • c860535246 【A5】【Qwen VL】Qwen VL adapt for A5 (#7046) yesyue-w 2026-03-20 16:56:12 +08:00
  • f39f566e22 Refactor duplicated code into a common method to reduce redundancy (#7210) idouba 2026-03-20 16:49:02 +08:00
  • 6ad74e8c80 [CI] Add git safe repo (#7501) Li Wang 2026-03-20 16:40:24 +08:00
  • a16c99141b Adapt w8a8mxfp8 quantization for Qwen VL models (#7417) Siyuan Kong 2026-03-20 16:18:58 +08:00
  • 4e6dbe0956 [EPLB][Bugfix] Set parallel_config.enable_eplb to true to load redundant experts (#7470) LI SHENGYONG 2026-03-20 15:22:55 +08:00
  • 1e05c4908f [EPLB] Reduce the memory used for batch_isend_irecv (#7344) LI SHENGYONG 2026-03-20 12:25:58 +08:00
  • a1f321a556 [Doc]Refresh model tutorial examples and serving commands (#7426) SILONG ZENG 2026-03-20 11:34:18 +08:00
  • 7be66cec75 [Test] Add the always_check_nodes parameter to the _wait_for_multiple_servers function in conftest.py for the EPD test case. (#7410) wangyu 2026-03-20 11:33:48 +08:00
  • eb92e7d50e [Bugfix] Restore balance scheduling patch for v0.17.0 (#7479) SILONG ZENG 2026-03-19 20:12:57 +08:00
  • 95e1dc11d8 [CI]: Auto-update estimated test times in config.yaml (#7413) vllm-ascend-ci 2026-03-19 19:01:16 +08:00
  • 9d1452c74d [OPS]add split_qkv_tp_rmsnorm_rope ops (#7376) ichaoren 2026-03-19 17:19:18 +08:00
  • ee804ce23e Main2main upgrade vllm to 0318 commit (#7412) Nengjun Ma 2026-03-19 17:17:36 +08:00
  • 05afc7f8c3 [CI]repair for ci custom ops (#7461) ZT-AIA 2026-03-19 17:13:12 +08:00
  • 83a4065b4b [CI] Add pre-commit check for patch logger (#7446) Li Wang 2026-03-19 16:53:20 +08:00
  • 38e637eef5 Fix manual mapping registration and kimi_k2 layer name mapping (#7347) Feng-xiaosuo 2026-03-19 16:46:41 +08:00
  • 87d6424b2e [CI] Add nightly CI test cases for the GLM-4.7 model. (#7391) aipaes 2026-03-19 16:43:29 +08:00
  • 0261d1b1c6 [CI] add glm4.7 weights download (#7395) aipaes 2026-03-19 16:43:15 +08:00
  • 5e65062973 [doc] Fix issues in the GLM4.7 documentation (#7457) aipaes 2026-03-19 16:42:59 +08:00
  • 6fc190b44a [Doc][KV Pool]Revision KV Pool User Guide [2/2] (#7456) pz1116 2026-03-19 16:17:34 +08:00
  • 42bcad7e9b GMM custom operator optimization in small batch scenarios (#7100) chenxi-hh 2026-03-19 16:10:30 +08:00
  • 8e0ebb470a [Misc] Drop Prefetch MLP Env (#7357) wangxiyuan 2026-03-19 14:27:27 +08:00
  • ce239db4fb [CI] Add multi-hardware wheel build and release workflow (#7312) zhangxinyuehfad 2026-03-19 11:06:17 +08:00
  • 270c5cb8cd [CI] Add nightly CI test cases for the Kimi-K2.5 (#7416) LoganJane 2026-03-19 11:02:29 +08:00
  • 3effc4bc70 [Doc][KV Pool]Revision KV Pool User Guide (#7434) pz1116 2026-03-19 10:13:13 +08:00
  • ab9cd2e305 [CI]Add CI summary log (#7202) meihanc 2026-03-19 09:32:06 +08:00
  • e8f7b2e3f1 [Refactor] [310p] Support Mamba Cache and support attn_head_size larger than 128 (#7372) pu-zhe 2026-03-19 09:16:22 +08:00
  • 8b79d4de52 Main2main upgrade to vllm 0317 afternoon (#7409) Nengjun Ma 2026-03-18 23:24:27 +08:00
  • 305820f1a9 [Bugfix] fix bug about model type of qwen3_vl_8b_instruct_w8a8 (#7383) jiangmengyu18 2026-03-18 20:30:03 +08:00
  • fb8e22ec00 [DOC] MiniMax-M2.5 model intro (#7296) SparrowMu 2026-03-18 20:14:36 +08:00
  • 2916601e6c [CI] add Kimi-K2.5 weights download (#7406) LoganJane 2026-03-18 18:29:37 +08:00
  • adc57c5951 [release] Add GLM5 known issue for 2-node PD mixed deployment (#7436) SILONG ZENG 2026-03-18 18:03:18 +08:00
  • 565868a2a6 [doc] add doc for Kimi-K2.5.md (#7371) LoganJane 2026-03-18 17:16:35 +08:00
  • ec34bf0062 [Misc]fix logger which does not take effects in patches (#7402) Angazenn 2026-03-18 17:13:12 +08:00
  • 1ff9e3f25f [CI] Bump docker/login-action from 3 to 4 (#7299) dependabot[bot] 2026-03-18 17:06:48 +08:00
  • b3206cd6f6 [CI] Bump actions/setup-python from 5 to 6 (#7298) dependabot[bot] 2026-03-18 17:06:28 +08:00
  • 58725b8b24 [doc] add Prefill-Decode Disaggregation doc for GLM5.md (#7300) liuhy1213-cell 2026-03-18 17:00:31 +08:00
  • 6bc68c55d0 [doc] Refresh the documentation for DeepSeek-V3.2 (#7403) Nagisa125 2026-03-18 14:59:48 +08:00
  • c1392a6ce6 [bugfix][accuracy] Fix ds indexer accuracy problem caused by k rope (#7341) rjg-lyh 2026-03-18 14:20:21 +08:00
  • c7157af8f7 [P/D] LayerwiseConnector supports the virtual push functionality on node D. (#7361) wangxiaoteng888 2026-03-18 10:50:02 +08:00
  • 5894a27bfd [CI] Add PAT_TOKEN when checkout (#7400) Li Wang 2026-03-18 10:31:32 +08:00
  • 1c954ff264 [main2main] upgrade vllm to 0308 (#7213) zhangyiming 2026-03-18 09:24:43 +08:00
  • 79ef41a53d [CI] add scheduled stale issue management (#7354) drizzlezyk 2026-03-17 23:28:29 +08:00
  • 467c815db6 [CI] expand issue labeler rules for feature/model triage (#7356) drizzlezyk 2026-03-17 23:28:04 +08:00
  • d9ac7e8539 [Bugfix] Assertion error when decode prefix cache fully hits (#7236) Chao Lei 2026-03-17 23:17:45 +08:00
  • 3b3dd2a889 [doc] Refresh the documentation for GLM-4.7 (#7292) aipaes 2026-03-17 23:09:12 +08:00
  • 5645ca8392 [BugFix]A2 MOE method&& layerwise MTP bugfix && Mamba gdn_metadata bugfix (#7364) zxr2333 2026-03-17 23:03:45 +08:00
  • a457d0f0e8 [doc] Upload doc for qwen3.5-27B and qwen3.5-397B-A17B on Ascend (#7313) pppeng 2026-03-17 22:54:57 +08:00
  • a370dfa962 [bugfix]Enable dispatch_ffn_combine feature for qwen3.5 (#7066) asunxiao 2026-03-17 19:53:02 +08:00
  • 83ad14c74c [bugfix] fix unzip file path for fia operator (#7367) aipaes 2026-03-17 17:21:27 +08:00
  • 7669963c27 [Perf] Optimize bias handling in AscendRMSNorm (#7226) rjg-lyh 2026-03-17 16:53:28 +08:00
  • 8f278fc101 [eagle3][pcp] fix bug for eagle3 and cp enable (#7309) lilinsiman 2026-03-17 16:14:45 +08:00
  • 4e62a2ae15 [Bugfix] fix TransposeKvCacheByBlock op error report in plog (#7235) lidenghui1110 2026-03-17 10:08:32 +08:00
  • 3f39ac9c8d [Feature]Supports DSv3.1 PD separation and C8 quantization (#7222) pichangping 2026-03-16 22:49:05 +08:00
  • a6f6e919e6 [main][bugfix] Fixed the problem that eagle3 will crash in FULL_DECODE_ONLY (#7290) drslark 2026-03-16 20:41:36 +08:00
  • b1a78886a9 [xlite][Bugfix] Support mrope and deepstack features in xlite backend (#7295) LVYANGGUO 2026-03-16 17:05:52 +08:00
  • 22d0e1d3d7 [model_runner_v2]optimize the performance of the _topk_log_softmax_kernel (#7221) wangx700 2026-03-16 16:49:10 +08:00
  • 4d443b9228 [bugfix] restore pr-7029 and fix patch error (#7294) rjg-lyh 2026-03-16 15:39:42 +08:00
  • 9320365dab [Test][Feature] Add e2e test for QuaRot model with eagle3 (#7128) zhaomingyu13 2026-03-16 15:35:55 +08:00
  • 71c21f76f5 [Refactor] Replace npu_ring_mla with FIA in MLA prefill (#5704) LICO67373 2026-03-16 10:33:09 +08:00
  • e20f0b1a0d [ReleaseNote] Add release note for v0.17.0rc1 (#7240) Mengqing Cao 2026-03-15 22:47:47 +08:00
  • 7e85f2ff97 [CI] Add test_qwen3_5.py (#7133) pppeng 2026-03-15 22:19:02 +08:00
  • 0c299f79b9 Revert "[Perf][1/N] w8a8c8 support in dsv3.2/glm5 (#7029)" (#7288) Mengqing Cao 2026-03-15 20:19:09 +08:00
  • 29f195a91c [Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156) yupeng 2026-03-15 17:55:42 +08:00
  • 7daccf4b64 Perf(PP): support PP with async send/recv. (#7143) Qiu 2026-03-15 09:45:09 +08:00
  • ce5544bfc1 [Hybrid] support prefix cache for Qwen3.5/Next with --mamba-cache-mode align (#7103) Angazenn 2026-03-15 09:44:09 +08:00
  • c69291eefc [Doc] Add USE_MODELSCOPE_HUB=0 to lm-eval guide (#7279) bazingazhou233-hub 2026-03-14 22:41:02 +08:00
  • 9e6c547d98 [Doc] Replace deprecated full_cuda_graph with cudagraph_mode in Qwen2.5-Omni (#7286) bazingazhou233-hub 2026-03-14 22:38:36 +08:00
  • bb506a1c99 [Doc][Installation] Clarify SOC_VERSION for CPU-only source builds (#7278) NJX 2026-03-14 22:38:25 +08:00
  • 199df03524 [BugFix]Fix CI errors “ascend_transport.so: cannot open shared object file: No such file or directory” (#7242) DreamerLeader 2026-03-14 21:23:05 +08:00
  • e7aa2c285c [SpecDecode] Fix Draft model proposer (#7230) Mengqing Cao 2026-03-14 18:26:37 +08:00
  • 0ad52517a1 Revert "Refactor quantization layer name mapping to leverage vLLM built-in mappers" (#7237) Hexiang Wang 2026-03-14 00:05:54 +08:00
  • 5ec610e832 [Feature][Quant] Reapply auto-detect quantization format and support remote model ID (#7111) Cao Yi 2026-03-13 22:53:25 +08:00
  • 6852a2e267 [feat] add LMCacheAscendConnector (#6882) Junyuan 2026-03-13 17:41:35 +08:00
  • 986cd45397 [Version] Drop 0.16.0 support (#7153) Mengqing Cao 2026-03-13 16:14:15 +08:00
  • 7ed9e9de69 [Perf][1/N] w8a8c8 support in dsv3.2/glm5 (#7029) rjg-lyh 2026-03-13 14:47:42 +08:00
  • df1ee8070d [feat][spec decode]Unified draft parallel (#6766) kx 2026-03-13 14:07:35 +08:00
  • 6ee7ffb98a Add Qwen3_5 to model list (#7130) pppeng 2026-03-13 11:42:28 +08:00
  • c377e73933 Perf(PP): support PP with async scheduling. (#7136) Qiu 2026-03-13 10:27:23 +08:00
  • c980e68d40 [Feature] support aclgraph for model runner v2 (#7110) Ronald 2026-03-13 09:11:46 +08:00
  • 1f71da80eb [CI] Fix server start failure when long weight loading (#7098) Li Wang 2026-03-13 08:52:56 +08:00
  • 7fe0469e27 [CI][Misc] Use offline mode for model downloads (#7179) Li Wang 2026-03-13 08:52:24 +08:00
  • fe4cad24e9 [BugFix]fix qwen3.5 reshape_kvcache bug (#7209) zxr2333 2026-03-12 23:51:40 +08:00
  • 5fe7942bbd [CI] add action for issue labeler on issue open/edit (#7208) drizzlezyk 2026-03-12 20:16:17 +08:00
  • 0c659e91ed [MTP][Bugfix] Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights (#7139) wangbj127 2026-03-12 20:01:24 +08:00
  • de93790d08 [main][bugfix] Fixed the problem of drafter crashed in FULL mode (#7158) drslark 2026-03-12 18:38:50 +08:00
  • 88c56e3bf2 [Misc] Fix main lint to make CI happy (#7204) Li Wang 2026-03-12 18:27:48 +08:00
  • 0a171b5cdd [Test][BugFix] Fix dispatch_gmm_combine_decode test stability (#7097) Li Wang 2026-03-12 17:22:44 +08:00
  • d866e6b238 [Bugfix] Fixed permission issues with the automatic PR submission workflow (#7142) Li Wang 2026-03-12 17:18:59 +08:00
  • e5343d6eb3 [310P][Bugfix]: fix ngram graph replay accuracy error (#7134) Shaoxu Cheng 2026-03-12 17:08:08 +08:00