Commit Graph

  • bfd049aa2c [Lint] fix typos error in epd_load_balance_proxy_layerwise_server_example.py (#7199) Ronald 2026-03-12 17:04:38 +08:00
  • 21fea86b08 feat: [CI] Introduce uv to accelerate pip install (#7127) tfhddd 2026-03-12 16:47:23 +08:00
  • 592661e787 [Doc] EPD doc and load-balance proxy example (#6221) shaopeng-666 2026-03-12 16:17:17 +08:00
  • 09d26754cd [Bugfix] Fix the issue where no exception is thrown when graph capture fails. (#5644) 无脸男 2026-03-12 16:14:45 +08:00
  • 77b43492ae improve the ttft when use mooncake (#6125) xleoken 2026-03-12 16:13:48 +08:00
  • f244f3c4a9 [BugFix] Fix problem of extra processes on rank0 device (#7107) Hexiang Wang 2026-03-12 15:59:03 +08:00
  • e5024d0264 [doc] Add Ascend PyTorch Profiler section (#7117) herizhen 2026-03-12 15:51:00 +08:00
  • 132f3c5d0a Support per-step heat collection and enhance FlashLB for multi-stage load balancing (#6477) Mercykid-bash 2026-03-12 15:49:09 +08:00
  • abe72d7cb9 Refactor quantization layer name mapping to leverage vLLM built-in mappers (#7050) Feng-xiaosuo 2026-03-12 15:48:14 +08:00
  • fb0d6dd175 [main][bugfix] Fixed the problem of speculative decoding in FULL mode (#7148) drslark 2026-03-12 14:51:12 +08:00
  • 37d1bd8c50 fixed fia pad logic in graph mode. (#7144) XiaoxinWang 2026-03-12 14:50:54 +08:00
  • bbffe58b63 [Doc] fix DSV3.1 PD configs (#7187) MengLong Chen 2026-03-12 14:24:49 +08:00
  • aa0143e55d refactor: add a check before layer_sharding logging (#7186) Qiu 2026-03-12 11:56:04 +08:00
  • 5f3826b093 [Build] Add support for Ascend950 chip (#7151) linfeng-yuan 2026-03-12 10:25:51 +08:00
  • da01a74009 Revert "[CI] fix skiped e2e test when upgrade vllm version (#6654)" (#7166) meihanc 2026-03-11 23:03:15 +08:00
  • 3b6b3c4214 [MODELRUNNERV2]fix penality ops (#7013) shiyuan680 2026-03-11 17:13:34 +08:00
  • 830f39dd70 [Bugfix][LoRA] Fix the issue when enable LoRA + tp + fully_sharded_loras (#6650) yupeng 2026-03-11 15:43:15 +08:00
  • a7f91fce71 [KV Pool]get_num_new_matched_tokens return 0 if token length < block_size (#7146) pz1116 2026-03-11 15:05:34 +08:00
  • 1a83c8e2f5 [CI] Build Image for v0.16.0rc1 (#7155) Mengqing Cao 2026-03-11 14:48:50 +08:00
  • 90aa048e60 [CI] Skip test_mooncake_layerwise_connector.py in ut (#7147) SILONG ZENG 2026-03-11 11:46:29 +08:00
  • e16009b2cc [BugFix]Fix recomputed scheduler bug (#7137) zxr2333 2026-03-11 00:32:19 +08:00
  • 54668e73c5 [Model] Support Minimax-m2.5 on NPU (#7105) SparrowMu 2026-03-11 00:12:02 +08:00
  • 239683c7a6 [P/D]Mooncake Layerwise Connector supports hybrid attention manager with multiple kvcache groups (#7022) zxr2333 2026-03-10 23:59:20 +08:00
  • 0f289fa2a8 Add patch_qwen3_5 for triton ops fused_recurrent_gated_delta_rule (#7109) pppeng 2026-03-10 23:28:58 +08:00
  • a78a00e0b1 [Doc][ReleaseNote] Add release notes for v0.16.0rc1 (#7067) Canlin Guo 2026-03-10 22:45:05 +08:00
  • 881c38d210 [Misc] Download on both hk and guiyang region (#7129) Li Wang 2026-03-10 19:22:32 +08:00
  • 6e8d3681ae [bugdix] The problem that the w4a8 weight fails to be loaded when the EP is not enabled is resolved. (#7090) shaopeng-666 2026-03-10 16:57:05 +08:00
  • a5ea699e29 [eagle][cp] fix eagle_cp enable bug2 (#7079) lilinsiman 2026-03-10 16:32:49 +08:00
  • 67d40f23fd [CI]Upgrade niglty multi-node-tests max-parallel to 2 (#7035) zhangxinyuehfad 2026-03-10 16:25:51 +08:00
  • 5df450bca4 [Feat] [310p] Support w8a8sc quantization method (#7075) pu-zhe 2026-03-10 16:13:20 +08:00
  • 14c71b19e1 [Doc][CPU binding] Add user/developer guide for CPU binding (#7045) Frank Chen 2026-03-10 15:59:31 +08:00
  • 33234aa0c5 Revert "[Feature][Quant] Auto-detect quantization format from model f… (#6873) Li Wang 2026-03-10 11:27:32 +08:00
  • 40f7d93f1a [bugfix][LoRA] Fix the lora accuracy issue introduced by the upstream vLLM changed. (#6958) yupeng 2026-03-10 10:43:18 +08:00
  • a398fa6a0b [Bugfix]: correct streaming content-type in load balance proxy server (#6985) ZRJ026 2026-03-10 10:11:35 +08:00
  • bb7ed759d4 [Doc] Fix broken chunked-prefill URL in supported features (#6963) NJX 2026-03-10 10:10:07 +08:00
  • 9b30d4e774 [Doc][Misc] Add metrics usage documentation and example (#6962) NJX 2026-03-10 10:09:50 +08:00
  • 326fd359aa [Docs] add and publish llms.txt for LLM discovery (#6886) Yikun Jiang 2026-03-10 10:06:27 +08:00
  • bdad11e9a8 [doc] Update GLM4.x.md, add GLM4.x multi-node deploy tutorial (#6872) ZKSU 2026-03-10 10:01:53 +08:00
  • 146b9d2a83 [BugFix] fix metadata execute error: integer modulo by zero (#6521) xleoken 2026-03-10 09:58:06 +08:00
  • f6db47f103 [CI] fix skiped e2e test when upgrade vllm version (#6654) meihanc 2026-03-10 09:55:35 +08:00
  • 43df2cb2fc [Lint]Style: Convert test/ to ruff format(Batch #1) (#6738) SILONG ZENG 2026-03-10 09:52:50 +08:00
  • 9216e1b050 [fix] Add support for Qwen3.5 Dense and MoE on Ascend (#6933) xmpp777 2026-03-10 09:09:31 +08:00
  • 3b25ded8b7 [CI] Bump docker/metadata-action from 5 to 6 (#7069) dependabot[bot] 2026-03-10 09:06:04 +08:00
  • 2325bbe79b [CI] Bump actions/checkout from 4 to 6 (#7070) dependabot[bot] 2026-03-10 09:05:22 +08:00
  • ee5347e824 [qwen3 next ]add ascend c casual_conv1d_fn (#6661) ZT-AIA 2026-03-09 23:29:49 +08:00
  • 48b624e4cc [BugFix] Fix implementation bug of triton rope_siso (#7082) Hexiang Wang 2026-03-09 23:08:43 +08:00
  • 542258ac9d [feat] parameterize hardcoded MLA dimensions to support GLM5-W8A8 (#6902) liuchen2026fly 2026-03-09 20:17:21 +08:00
  • 13adcbe44b feat(attention_cp): support chunked prefill for Qwen3Next with PCP&DCP (#6900) Qiu 2026-03-09 17:55:09 +08:00
  • a76a509fae [MOE][Bugfix] Cancel H2D for expert_map (#7000) LI SHENGYONG 2026-03-09 17:53:54 +08:00
  • 82fdd40d49 [Feat]Xlite Qwen3 MoE Support Data Parallel (#6715) 王远 2026-03-09 17:53:35 +08:00
  • ba1c82e758 [DOC] Add explaination of 310p special param: max-model-len (#7065) Shaoxu Cheng 2026-03-09 16:54:43 +08:00
  • dec04ec8d8 [Bugfix] Fix incorrect layer count for MTP models in update_aclgraph_sizes (#7064) wanghuanjun2113 2026-03-09 16:14:51 +08:00
  • 4b4961ba5f [fix]Resolve compilation errors that occur when building versions subsequent to b020 (#7059) guanguan0308 2026-03-09 16:09:35 +08:00
  • eb648f7398 [Bugfix] Support quant config in glm46v (#7062) LoganJane 2026-03-09 16:07:16 +08:00
  • 57c554a23f [bugfix]Fix parameter ordering bug in _merge_multimodal_embeddings (#7068) tanhaoan333 2026-03-09 16:05:52 +08:00
  • cb4c7de856 [Perf] Optimize MTP execution by reordering state update operation (#6844) Cao Yi 2026-03-09 15:55:27 +08:00
  • d39d80830c [KVCache]Qwen3.5 supports contiguous tensor hybrid-attn kv-cache (#6887) zxr2333 2026-03-09 15:28:40 +08:00
  • 482d39c1b0 [commuinty]update contributor and refresh tool (#7072) wangxiyuan 2026-03-09 15:19:35 +08:00
  • aef9d4249d [Perf] Avoid CPU sync in mrope_positions copy by using full tensor copy (#7014) Cao Yi 2026-03-09 14:46:37 +08:00
  • 65eae6de7b Add Ascend Ops recurrent_gated_delta_rule (#6725) LeeWenquan 2026-03-09 14:14:14 +08:00
  • 23bf5d4d48 [EPLB][bugfix] Bugfix for fused mc2 (#6794) JIACHENG XU 2026-03-09 11:26:57 +08:00
  • 06ec136f08 [Bugfix] Obtain kernel block size for computing slot mapping correctly (#7019) Zetong Li 2026-03-09 11:05:01 +08:00
  • a3f4f6b10b [P/D][Bugfix] Layerwise stacking MTP error. (#7036) wangxiaoteng888 2026-03-09 10:55:43 +08:00
  • 675387f1fd [P/D][KVPool]Mooncake Layerwise Connector supports kv_pool (#7032) zxr2333 2026-03-09 10:49:04 +08:00
  • 6a7115fa0d [main][feature] Support quarot for eagle3 without embedding (#7038) drslark 2026-03-09 10:43:06 +08:00
  • 737dfcf638 [MOE] commit GMM custom operator (#7010) chenxi-hh 2026-03-09 09:56:31 +08:00
  • 01d3515dcf [eagle][cp][bugfix] Fix the bug in eagle and cp enabled (#6981) lilinsiman 2026-03-06 20:49:49 +08:00
  • 1c0ecf806a [bugfix] fix pass bug: pass really rope dim for npu_rotary_embedding (#6880) aipaes 2026-03-06 19:35:17 +08:00
  • 094eb0eff9 [bugfix]Qwen-Omni quantization bugfix (#7042) tanhaoan333 2026-03-06 17:24:22 +08:00
  • a51d6366b9 [Bugfix] Qwen3Next support FlashComm1 (#6830) ZhaoJiangJiang 2026-03-06 17:14:08 +08:00
  • a2696006d1 [Refactor][EAGLE] 8/N delete mtp_proposer (re-pull) (#7033) Zetong Li 2026-03-06 17:11:22 +08:00
  • c5dfa8d645 [OPS]add split_qkv_rmsnorm_mrope ops (#6730) Fager10086 2026-03-06 16:18:37 +08:00
  • bc0fd7ca72 [Feat]Adapt the graph mode (piecewise and full_decode_only) of PCP and DCP for DeepSeek v3.2. (#6940) xiaocongtou6 2026-03-06 16:10:24 +08:00
  • a813eadd2d [MM][Perf] Enable 2.7x faster for convolution computation with aclnn BatchMatMulV2 (#7017) Shanshan Shen 2026-03-06 14:26:37 +08:00
  • c49ce18ea5 [Test] Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p (#6977) wanghengkang 2026-03-06 14:25:10 +08:00
  • 620076b76a [bugs] fix install FIA sh (#6989) aipaes 2026-03-06 11:42:32 +08:00
  • 16c3b0b822 Revert "[Refactor][EAGLE] 8/N delete mtp_proposer" (#7030) wangxiyuan 2026-03-06 11:24:05 +08:00
  • 8c2c82f3e1 [Bugfix] Fix the moe_forward error when setting enable_static_kernel … (#6964) panchao-hub 2026-03-06 10:36:10 +08:00
  • a7820d20f4 [Doc][KV Pool]Update Memcache local service config example: increase default world size to 256 and update description (#7025) pz1116 2026-03-06 10:23:55 +08:00
  • a838a89630 [v0.16.0][P/D][Bugfix] Support ALL D-Nodes in fullgraph when running MTP in PD (#6948) MengLong Chen 2026-03-06 10:01:33 +08:00
  • ccd00798f3 [EPLB] Display the expert hotness comparison before and after eplb. (#6877) LI SHENGYONG 2026-03-06 09:53:29 +08:00
  • 18b52afe2b [Ops][Misc] Optimize split_qkv_rmsnorm_rope op (#6827) frank 2026-03-06 09:30:31 +08:00
  • a60e179c7f [Refactor][EAGLE] 8/N delete mtp_proposer (#7016) Zetong Li 2026-03-06 09:10:57 +08:00
  • bd571cf6d6 [Main2Main] Upgrade vLLM to 0303 (#6944) SILONG ZENG 2026-03-06 09:08:52 +08:00
  • 640ecd1b77 [BugFix] Fix muls_add fusion not working for GLM5 models (#6928) liuchen2026fly 2026-03-05 22:35:54 +08:00
  • ae394767d4 【main】ADXL/HIXL supports FabricMem Mode (#6806) fems14 2026-03-05 21:04:11 +08:00
  • 50441e4650 [BugFix][MTP] Fix prefill misclassified as decode when prompt tokens == num_spec_tokens + 1 (#6835) Cao Yi 2026-03-05 17:33:10 +08:00
  • 91c39ebae6 [BugFix] [dcp] Fix GQA Model Error when Enable both DP and DCP (#7012) dsxsteven 2026-03-05 16:51:08 +08:00
  • 1e4017e3fa [CI] support nightly ci for per pr by labels (#6483) zhangxinyuehfad 2026-03-05 16:46:37 +08:00
  • a6745b8577 [CI] fix test_qwen3_moe_external_launcher_ep_tp2 (#6951) zhangxinyuehfad 2026-03-05 16:43:45 +08:00
  • 1f2a083597 [bugfix]Qwen-Omni quantization model_type bugfix (#7007) tanhaoan333 2026-03-05 16:34:34 +08:00
  • 1a7f845696 [Feat][Worker] NPUWorker Profiler profile_prefix full adaptation (RFC #6954) (#6968) realliujiaxu 2026-03-05 16:18:34 +08:00
  • 3047b724b3 Add GemmaRmsNorm ACLGraph Support (#6473) LeeWenquan 2026-03-05 16:15:07 +08:00
  • 5a3744c542 [EPLB] The profiling can collect the time required for adjusting the eplb. (#7001) LI SHENGYONG 2026-03-05 16:10:57 +08:00
  • 43c8da3574 [Feat]fused_qkvzba_split_reshape supports token number greater than 65536 (#6740) songjianquan 2026-03-05 14:41:38 +08:00
  • 13777bf3f0 [Spec Decode]clean up spec decode interface (#6947) wangxiyuan 2026-03-05 14:30:10 +08:00
  • 2bd9c35788 [perf][refactor] Refactor and optimize sfa_v1.py for dsv3.2/glm5 (#6874) rjg-lyh 2026-03-05 14:27:11 +08:00
  • 77e009d9fc [Feature] Add docs of batch invariance and make some extra operators patch (#6910) Ronald 2026-03-05 09:12:40 +08:00
  • f8315f5717 [bugfix]Qwen2.5VL accurate question (#6975) tanhaoan333 2026-03-04 22:02:29 +08:00
  • 566c367a10 [CI] Add DeepSeek-V3.2 large EP nightly ci (#6378) zhangxinyuehfad 2026-03-04 16:15:56 +08:00