Commit Graph

  • 72fee47cba [CI](cp) skip bad UT test_models_chunked_prefill_with_empty_kvcache temporarily (#5919) Qiu 2026-01-15 15:50:06 +08:00
  • a25209252f [CI] Add 310p e2e test back (#5797) wangxiyuan 2026-01-15 15:47:13 +08:00
  • e8bbf72867 [Bugfix] Fix XliteModelRunner init failed when aclgraph is enabled (#5899) Magnus 2026-01-15 15:40:28 +08:00
  • 80fbb1b6b1 [CI]Fix nightly clang installation following previous attempt (#5907) meihanc 2026-01-15 14:18:11 +08:00
  • efa0f64f22 [Doc] Add tutorials for Qwen3-VL-30B-A3B-Instruct (#5331) Shanshan Shen 2026-01-15 10:56:19 +08:00
  • da958ee386 [EPLB]Eplb Config Renaming (#5533) LI SHENGYONG 2026-01-15 10:26:44 +08:00
  • ea01aeaab7 [Refactor][EAGLE] 4/N extract common methods from eagle and mtp (#5870) Zetong Li 2026-01-15 10:24:35 +08:00
  • c11a05c4e1 [Main2Main] Upgrade vllm commit to 0113 (#5839) wjunLu 2026-01-15 09:48:53 +08:00
  • e67608041d [main][BugFix]Fix DispatchGmmCombineDecode acc bug when big batch (#5808) wangqiankun13 2026-01-15 09:29:34 +08:00
  • d840f153f4 [Bugfix] Fix acc bug when enbale dispatch_gmm_combine_decode and eplb (#5806) wangqiankun13 2026-01-15 09:21:18 +08:00
  • 7078dff691 [Feature] implenment set_additional_forward_context for model runner v2 (#5720) Ronald 2026-01-15 09:18:28 +08:00
  • 4811ba62e0 [Lint]Style: reformat markdown files via markdownlint (#5884) SILONG ZENG 2026-01-15 09:06:01 +08:00
  • 96edd4673f [Community] Add code owner (#5882) wangxiyuan 2026-01-15 09:01:12 +08:00
  • 295018ec0f [Refactor]Refactor of vllm_ascend/distributed module (#5719) lty 2026-01-15 08:57:40 +08:00
  • f34b3b8ee9 [nightly] Remove node tolerations for hk cluster (#5896) Li Wang 2026-01-15 08:55:06 +08:00
  • a9f730b853 [bugfix]Intermittent CI failure in the triton runtime jit (#5733) meihanc 2026-01-14 22:58:08 +08:00
  • 51415aaa2f [bugfix]support dsv3.2 enable both mtp and full_decode_only (#5849) cookieyyds 2026-01-14 22:57:38 +08:00
  • a88937f5cb [bugfix](cp) replace None with zeros/inf tensor to avoid TypeError (#5837) Qiu 2026-01-14 20:57:48 +08:00
  • d450ba24c7 Revert "[BugFix] Support setting tp=1 for the Eagle draft model to take effect (#5903) zhaomingyu13 2026-01-14 20:56:20 +08:00
  • 01805fbd7d Revert "[BugFix] Support setting tp=1 for the Eagle draft model to take effect (#5519)"(#5902) zhaomingyu13 2026-01-14 20:55:10 +08:00
  • 2a6d95c389 [Cleanup] Remove dead code make_attention_mask function (#5818) LICO67373 2026-01-14 16:52:51 +08:00
  • d31170496b [doc]index display by category (#5852) herizhen 2026-01-14 16:50:49 +08:00
  • f6a37fc549 [CI] Reduce the resource consumption of unit tests (#5891) Li Wang 2026-01-14 16:33:19 +08:00
  • e5c46bf169 [CI] Fix lint CI (#5880) wangxiyuan 2026-01-14 11:23:38 +08:00
  • e20813f441 [Feature] implement eagle spec decoding for model runner v2 (#5840) Ronald 2026-01-14 09:18:05 +08:00
  • 0415e694cd [Quantization] Support compressed tensors moe w8a8 int8 dynamic weight (#5718) LHXuuu 2026-01-14 09:17:26 +08:00
  • ecf2fa482e [EPLB][Bugfix] Get expert map from layers (#5817) LI SHENGYONG 2026-01-14 09:16:51 +08:00
  • 48ec97821a [Bugfix] Fixed an accuracy problem of sp with eagle3 (#5816) drslark 2026-01-14 09:00:37 +08:00
  • e1bed43cff [P/D] bugfix for p node force free requset (#5431) liziyu 2026-01-14 08:51:31 +08:00
  • 78d5ce3e01 [Lint]Style: Convert example to ruff format (#5863) SILONG ZENG 2026-01-13 20:46:50 +08:00
  • f7b904641e [Main2Main] Upgrade vllm commit to 0109 (#5752) zhangxinyuehfad 2026-01-13 19:14:43 +08:00
  • eed9e366a7 [Bugfix][P/D] fix layerwise connector for decoder tp size > num kv heads (#5846) liziyu 2026-01-13 17:30:33 +08:00
  • 5b95c6b03a [Test][e2e][LoRA] Add more e2e tests to cover scenarios of LoRA (#4075) yupeng 2026-01-13 16:32:28 +08:00
  • d350c2ada6 [CustomOp][Perf] Merge Q/K split to simplify AscendApplyRotaryEmb for better performance (#5799) Shanshan Shen 2026-01-13 15:47:23 +08:00
  • 523e83016b [Lint]Style: Convert root, benchmarks, tools and docs to ruff format (#5843) SILONG ZENG 2026-01-13 15:29:34 +08:00
  • 4b679984de enable ep32 for dispatch_ffn_combine (#5787) lhchg 2026-01-13 14:35:52 +08:00
  • 84d4f474c0 [CI] Unblock 4-cards test (#5831) wangxiyuan 2026-01-13 11:15:29 +08:00
  • 1ccb9acd9a [Refactor] Provide a framework to accommodate operators for different hardware devices (#5735) weijinqian0 2026-01-13 09:53:26 +08:00
  • 8d571286dd [Refactor] Modify the binding logic to allocate CPU cores for each NPU card (#5555) Rozwel-dx 2026-01-13 09:21:28 +08:00
  • d886b81971 [BugFix] Support setting tp=1 for the Eagle draft model to take effect (#5519) zhaomingyu13 2026-01-13 09:14:30 +08:00
  • 7af3b880c1 support triton of mrope (#5664) shiyuan680 2026-01-13 09:13:51 +08:00
  • db7cf9b0ca [bugfix] A2 Environment Pooling for Memcache Compatibility (#5601) DreamerLeader 2026-01-13 09:07:38 +08:00
  • fe251a2efe [Doc] Update community contributors and versioning naming to follow vLLM (#5820) Yikun Jiang 2026-01-13 08:47:11 +08:00
  • c8a324ab73 [Refactor] Add comments for Metadata classes in attention module (#5789) LICO67373 2026-01-13 08:46:50 +08:00
  • dde547e900 [Bugfix] bugfix for the order of dummy run pad and sync (#5777) LiuYi-Up 2026-01-13 08:44:10 +08:00
  • 75c92a3640 [CI] Move nightly-a2 test to hk (#5807) Li Wang 2026-01-12 22:58:35 +08:00
  • 2a010a1f0e [CI] Show disk usage for CI shared volume (#5821) Li Wang 2026-01-12 22:56:23 +08:00
  • 86c4bea116 Bump actions/checkout from 4 to 6 (#5795) dependabot[bot] 2026-01-12 20:44:23 +08:00
  • 7ab63661f5 Bump actions/github-script from 7 to 8 (#5796) dependabot[bot] 2026-01-12 20:44:02 +08:00
  • 5f4b13ab3d [bugfix](cp) align max_context_chunk to cp_virtual_block_size (#5767) Qiu 2026-01-12 20:11:46 +08:00
  • 4453c60262 [bugfix]limit graph replay sync (#5761) wangyongjun 2026-01-12 16:46:21 +08:00
  • 7a6fde80b1 [CI]Add Kimi k2 nightly test (#5682) SILONG ZENG 2026-01-12 15:56:07 +08:00
  • 451bbdc292 [Doc] add tls check to pd disaggregation readme (#5638) liziyu 2026-01-12 15:49:18 +08:00
  • 5ccd53e28a [CI] adpat v0.13.0 change (#5793) wangxiyuan 2026-01-12 14:06:56 +08:00
  • 354ee3b330 [Doc] Update doc url link (#5781) wangxiyuan 2026-01-12 11:21:31 +08:00
  • 297f6deb09 [CI] Align multi-node nightly test paramter with corresponding tutorials document (#5756) Nengjun Ma 2026-01-12 09:00:31 +08:00
  • 6880c1b383 [Feature] Support for cross-attention and whisper model (#5592) gh924 2026-01-11 11:38:45 +08:00
  • db12c1e2c8 [Perf] Supports compute-communication overlap in the forward of sfa_v1 in the Sharded-CP feature. (#5701) zzhxxx 2026-01-11 09:47:27 +08:00
  • c5744e2350 [main][bugfix] Fix fullgraph padding bug in mtp eagle refactor (#5692) lilinsiman 2026-01-10 23:07:48 +08:00
  • 78b554dda9 [P/D] layerwise connector supports DeepSeek-V3.2 sparse attention && Distribute transfer tasks to redundant kv_head cards (#5722) zxr2333 2026-01-10 23:04:16 +08:00
  • c316679e65 adapt to minimax_m2 (#5624) Feng-xiaosuo 2026-01-10 23:01:35 +08:00
  • ecd4232698 [Feat] flashcomm2+oshard Generalized (#4723) Levi 2026-01-10 22:57:57 +08:00
  • aa987ffe87 [P/D][bugfix]Fix the PCP port mapping error issue (#5706) wangxiaoteng888 2026-01-10 22:43:52 +08:00
  • ff4c1a47b3 [bugfix] Fixing KV Pool Memory Retention and Performance Degradation Issues (#5751) fems14 2026-01-09 17:46:23 +08:00
  • 3ba064f804 [Doc] Add GLM4.5 GLM4.6 doc (#5740) 1092626063 2026-01-09 16:40:49 +08:00
  • 3b997fdd32 support mxfp8 quantization (qwen dense) (#5723) wangyao-i 2026-01-09 16:26:31 +08:00
  • 09b3f9d91b [CI]Add Disaggregated PD Nightly Test for Qwen3-235B and Qwen3-VL-235B (#5502) SILONG ZENG 2026-01-09 16:25:20 +08:00
  • f63c1341d9 [Feature] GLM4.6 support mtp with fullgraph (#5460) 1092626063 2026-01-09 16:07:42 +08:00
  • 09682e0751 [Bugfix] Fix matmul allreduce precision issue by using original weight (#4939) ice_rain 2026-01-09 16:05:32 +08:00
  • 64d29875f9 [Refactor] Replace the implementations of o_proj, q_b_proj, and kv_b_proj with custom_op for sharded CP (#5698) zzhxxx 2026-01-09 15:58:40 +08:00
  • e11ff8e535 [BufFix]Fix the error when using Ascend custom operators with rank=128 (#5394) ZT-AIA 2026-01-09 15:57:43 +08:00
  • d36ca88cf4 [CI] Avoid lint and ut for PR push (#5762) wangxiyuan 2026-01-09 15:57:06 +08:00
  • dc99cfdc15 [CustomOp] support TensorList for dispatchFFNCombine (#5665) lhchg 2026-01-09 15:56:29 +08:00
  • 3ce5a34468 [BugFix] Xlite: Bypass the padding of the graph mode in non-MTP cases to obtain the correct decode num. (#5711) Wang Xiaoran 2026-01-09 15:55:30 +08:00
  • 2d713fee93 [CI] Accuracy issue of qwen3-next-w8a8 nightly test fix. (#5746) InSec 2026-01-09 15:55:13 +08:00
  • be941cab71 [BugFix] NetLoader: No backend type associated with device type npu (#5700) Rui Kang 2026-01-09 15:54:54 +08:00
  • 64904ab5b6 [CI] lint and ut use self_hosted runner (#5652) Li Wang 2026-01-09 14:26:14 +08:00
  • 36d74aba58 [Doc][fix] Fix the title of the document for the layer_sharding feature (#5759) zzhxxx 2026-01-09 14:15:22 +08:00
  • ee2ed573f1 [BugFix][DS 3.2] Fix ds indexer accuracy problem caused by rope. (#4641) whx 2026-01-09 14:11:44 +08:00
  • 2a571d8bc8 support multi npu partially starkwj 2026-01-08 06:54:33 +00:00
  • 98c788a65a [Doc] add PaddleOCR-VL tutorials guide (#5556) zyz111222 2026-01-09 11:01:25 +08:00
  • a3a74d6984 [CI] Add qwen3 next ci (#5395) LeeWenquan 2026-01-09 10:29:09 +08:00
  • 40eb3e1836 [OP] Enable custom op aclnnMoeInitRoutingCustom (#5332) Chenxi Qian 2026-01-09 09:35:18 +08:00
  • 595d3484c4 [Nightly] Move ops to the correct path (#5642) Li Wang 2026-01-09 09:23:36 +08:00
  • 1ff1c96d13 [CI] Remove workflow_dispatch way for image build (#5742) wangxiyuan 2026-01-09 09:20:30 +08:00
  • 97f6be8108 [feature]dcp&pcp support mlapo (#5672) zhenwenqi2024 2026-01-08 23:49:23 +08:00
  • 6315a31399 [CI] Add triton ascend in nightly CI (#5716) meihanc 2026-01-08 21:17:32 +08:00
  • f4605c2b3c [Fix] Fixes speculative decode indexing and unpad condition for attention metadata (#5626) Yizhou 2026-01-08 19:41:08 +08:00
  • 503822c56c [Doc] Add Qwen3-Omni-30B-A3B-Thinking Tutorials (#3991) meihanc 2026-01-08 16:57:20 +08:00
  • 8b3a7a9e87 [bugfix] Support dsv3.2 enable both mtp and full_decode_only (#5679) cookieyyds 2026-01-08 15:47:31 +08:00
  • ccbc5e2ba1 [Feat][Bugfix][main] Adapted SP to eagle3 (#5562) drslark 2026-01-08 15:33:52 +08:00
  • d03cc9c456 [CI] Fix image build workflow_dispatch error (#5717) wangxiyuan 2026-01-08 15:07:33 +08:00
  • 920bbe932f [CI] Drop outdated cases (#5709) Li Wang 2026-01-08 11:23:44 +08:00
  • b69db4ce55 [EPLB][CI] EPLB add aclgraph and redundant expert ci (#5625) LI SHENGYONG 2026-01-08 09:51:48 +08:00
  • 264cc254cc [CI] fix image build tag (#5703) wangxiyuan 2026-01-08 09:27:45 +08:00
  • 48811bc0b8 Optimize the print info format when deprecated code is used in vllm-ascend (#5696) Nengjun Ma 2026-01-08 09:26:49 +08:00
  • 8763953f56 [Feature] add the magicmtp speculative decoding acceleration algorithm (#5542) Aoxuan Chen 2026-01-08 09:15:55 +08:00
  • 481138e1d2 [bugfix] adapt to new implemented get_kv_cache_spec in cpuoffload connector (#4311) lidenghui1110 2026-01-08 09:15:09 +08:00
  • f7db812ed7 [refactor] Refactor the interface for shard weight and remove the flashcomm2 o_shared interface. (#5181) zzhxxx 2026-01-08 09:05:02 +08:00
  • 20a8cf061b [BugFix][P/D] Fix pre-create link parameter error (#5694) zxr2333 2026-01-08 08:41:10 +08:00