Commit Graph

  • 4173255c0c [main][Bugix] fix kv pcp+pooling+pd separation bug (#6153) weiguihua2 2026-01-23 16:15:04 +08:00
  • ff63626874 [Bugfix] Fix the issue of the acceptance rate decline for Qwen3-30B-A3B-EAGLE3 (#6138) zhaomingyu13 2026-01-23 16:12:56 +08:00
  • a3079cd253 [Tests] Skip unstable eagle cases to keep CI success (#6180) wjunLu 2026-01-23 15:33:53 +08:00
  • 78af0c30a3 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #12) (#6177) SILONG ZENG 2026-01-23 14:59:19 +08:00
  • 193acc2c19 [CI] Add nightly ci test for deepseek v3.1 (#5386) zhangxinyuehfad 2026-01-23 14:36:49 +08:00
  • 8210a62a44 [EPLB][Bugfix]Reduce unnecessary video memory usage (#6020) LI SHENGYONG 2026-01-23 14:21:13 +08:00
  • 749e24f81e [bugfix] align max_num_batched_tokens with tp*pcp when using FLASHCOMM1 (#6000) Qiu 2026-01-23 14:19:49 +08:00
  • f8d03d21f1 Add Medusa speculative decoding support for vllm_ascend (#5668) simplzyu 2026-01-23 14:14:23 +08:00
  • a69ef10c3a [Refactor] Quantization Module Refactor (#5738) Cao Yi 2026-01-23 14:13:47 +08:00
  • 8378bc28b0 [Misc] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 (#6013) dsxsteven 2026-01-23 14:13:12 +08:00
  • 418a43e2a2 [Bugfix] Fix seq_lens reset issue causing performance degradation (#6158) ZYang6263 2026-01-23 11:29:54 +08:00
  • 739d074b0c update other platforms' Dockerfile starkwj 2026-01-22 12:07:03 +00:00
  • 82a2b3bcc7 [P/D]Add ssl cert for metaserver proxy (#5875) wangxiaoteng888 2026-01-23 11:11:44 +08:00
  • f4a361fcc3 [CI] Re-open skipped cases due to PTA upgrading and update the golden results (#6144) wjunLu 2026-01-23 10:46:31 +08:00
  • 4d780a8b01 [Misc] Revert "[Misc] Bump mooncake version to v0.3.8.post1 (#6110)" (#6164) Li Wang 2026-01-23 09:53:32 +08:00
  • 72ffc00b86 [Bugfix] Fix structured outputs errors: TypeError: apply_token_bitmask_inplace_cpu() (#6151) wjunLu 2026-01-23 09:52:55 +08:00
  • 08a45e6053 [Doc] update supported features (#6165) zhangxinyuehfad 2026-01-23 09:50:11 +08:00
  • 819a4459ce Drop vLLM 0.13.0 support (#6069) zhangxinyuehfad 2026-01-23 09:45:08 +08:00
  • 27a513b672 [BugFix]hccl bufferSize check for dispatch_ffn_combine (#6130) lhchg 2026-01-23 08:41:40 +08:00
  • 7725314b26 [Feat] Merge the multi eagle graphs to one graph (#5940) anon189Ty 2026-01-23 08:37:02 +08:00
  • 63d3921208 [Bugfix] Remove use_aclgraph in mtp_proposer and use use_cuda_graph (#6032) Zetong Li 2026-01-22 21:08:07 +08:00
  • 176bfc36bc [BugFix] fix 3vl dense model load quant weight (#6100) shaopeng-666 2026-01-22 20:05:25 +08:00
  • 7f91ac2649 [CP&SP] Integrate FIA operator in mla_cp._forward_decode (#5641) Bai Yongbin 2026-01-22 20:02:30 +08:00
  • 88632cf976 [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (#6145) wjunLu 2026-01-22 19:50:54 +08:00
  • e54d294df3 [CI]Install clang in dokerfile for triton ascend (#4409) meihanc 2026-01-22 19:01:28 +08:00
  • a7d781f135 [Main] Upgrade PTA to 2.9.0 (#6112) wjunLu 2026-01-22 17:59:06 +08:00
  • 1402cf6874 [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (#5721) CodeCat 2026-01-22 17:22:41 +08:00
  • f2c0ced06d [P/D][PCP]bugfix pcp force free twice caused logger error (#6124) wangxiaoteng888 2026-01-22 16:24:33 +08:00
  • 1d3544c887 [BugFix]converting pa get_workspace back to capturing (#5833) Angazenn 2026-01-22 15:49:22 +08:00
  • 484e7c59dc [CI] optimize lint term (#5986) Li Wang 2026-01-22 15:46:59 +08:00
  • 9bba0a2a68 [Bugfix] Fix Triton operator usage for multimodal models based on the mrope_interleaved parameter (#6042) zhangxinyuehfad 2026-01-22 15:46:05 +08:00
  • 38edfd585a [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (#6015) ChenCangtao 2026-01-22 12:35:06 +08:00
  • 34fb628248 [BugFix] Support setting tp=1 for the Eagle draft model to take effect (#6097) zhaomingyu13 2026-01-22 11:36:23 +08:00
  • 37a9cf818a [Misc] Bump mooncake version to v0.3.8.post1 (#6110) Li Wang 2026-01-22 11:03:16 +08:00
  • 08d7014874 [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (#5758) wangqiankun13 2026-01-22 10:51:02 +08:00
  • cef04b3555 [bugfix] adapt_remote_request_id (#6051) JiangWeixiang 2026-01-22 10:48:40 +08:00
  • ef9d8367f5 [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (#5143) maxmgrdv 2026-01-22 05:34:58 +03:00
  • dd8571860d [Feature] Support DSA-CP for Hybrid scenario (#5702) zzhxxx 2026-01-22 10:12:09 +08:00
  • 69740039b7 [CI] Upgrade CANN to 8.5.0 (#6070) wangxiyuan 2026-01-22 09:29:50 +08:00
  • ab676413e6 Default enable MLAPO (#5952) Nengjun Ma 2026-01-22 09:26:39 +08:00
  • a15a5f6aa5 [Doc] Supplement PD separation parameters of DeepSeek V3.1 (#6053) MengLong Chen 2026-01-22 08:53:44 +08:00
  • 8900e3398b [Ascend] perf: optimize rope embedding with triton kernel for huge performance gain (#5918) ZCG12345 2026-01-21 22:01:22 +08:00
  • 2a618d2454 [Ops] update causal_conv1d_update (#5984) LeeWenquan 2026-01-21 16:33:52 +08:00
  • 53bfb38192 [CI]Update triton ascend version in 3.2.0 (#6067) meihanc 2026-01-21 16:02:23 +08:00
  • 58ff465821 [bugfix] fix the complex and potentially problematic generate_kv_idx. (#5957) Qiu 2026-01-21 14:21:02 +08:00
  • 12a668b1d9 [Refactor] AttentionBuilder inherit from base class in vllm (#5916) LICO67373 2026-01-21 10:45:45 +08:00
  • 839e03cbc9 [Nightly] Use Qwen repo for qwen3-next (#6064) Li Wang 2026-01-21 10:39:12 +08:00
  • 1ed9524763 add dispath_ffn_combine_bf16 (#5866) guanguan0308 2026-01-21 09:30:30 +08:00
  • bec8641876 [BugFix] Fix input parameter bug of dispatch_gmm_combine_decode[RFC: issue 5476] (#5932) wangqiankun13 2026-01-21 09:26:40 +08:00
  • 5b129cf0a1 [1/N][Feat] Xlite Qwen3 MoE Support (#5951) Magnus 2026-01-21 09:26:03 +08:00
  • 1ab6cd4935 [Bugfix] Fix setting of speculative_config.enforce_eager for dsv32 (#5945) Zetong Li 2026-01-21 09:24:33 +08:00
  • 936d81a258 [bugfix][mm] change get_num_encoder_tokens to get_num_encoder_embeds in recompute_schedule.py (#5132) kx 2026-01-21 09:13:52 +08:00
  • b399117e89 [Bugfix] fix pcp qwen full graph FIA bug (#6037) weiguihua2 2026-01-21 08:49:05 +08:00
  • b6d55fc48e [Bugfix]Fixed precision issues caused by pooled request pooling (#6049) DreamerLeader 2026-01-20 23:51:31 +08:00
  • 8b98d7a4e8 【main】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (#6045) fems14 2026-01-20 22:56:04 +08:00
  • b2475099a0 [main][Bugfix] Fixed an problem related to embeddings sharing (#5967) drslark 2026-01-20 21:34:28 +08:00
  • 6c30f8bf87 [Feature]refactor the npugraph_ex config, support online-infer with static kernel (#5775) ChenCangtao 2026-01-20 21:31:38 +08:00
  • 0c0514579f [CI][Lint] Show lint diff on failure (#5956) Li Wang 2026-01-20 21:07:01 +08:00
  • 8cf1e8d8a7 [CI] Add wait logic for each individual case (#6036) Li Wang 2026-01-20 21:05:44 +08:00
  • 750c06c78a [CI] Add DeepSeek-V3.2-W8A8 nightly ci test (#4633) zhangxinyuehfad 2026-01-20 21:05:15 +08:00
  • cea48c2a34 model runner v2 support triton of penalty (#5854) shiyuan680 2026-01-20 20:26:05 +08:00
  • afabb49f00 [Docs][Model] Support Qwen3-VL-Embedding & Qwen3-VL-Reranker (#6034) Canlin Guo 2026-01-20 17:36:31 +08:00
  • 402872050a [Tests] move qwen3 performance test from nightly to e2e (#5980) Icey 2026-01-20 17:08:43 +08:00
  • 5892455f43 [Bugfix] fix bug of pcp+mtp+async scheduler (#5994) weiguihua2 2026-01-20 15:24:05 +08:00
  • ea57e3e7a4 [Main2Main] Upgrade vllm commit to releases/v0.14.0 (#5988) meihanc 2026-01-20 15:10:40 +08:00
  • 55b20ac63b [Ops] Add layernorm for qwen3Next (#5765) LeeWenquan 2026-01-20 14:43:14 +08:00
  • 0664c6e67a [Doc] Add layer_sharding additional config for DeepSeek-V3.2-W8A8 (#5921) starmountain1997 2026-01-20 12:40:54 +08:00
  • a5b099c73d [Bugfix] Reset incompatible config (#6005) zhangxinyuehfad 2026-01-20 11:02:38 +08:00
  • a8576ec610 [Refactor][EAGLE] 5/N Update attn_metadata by common_attn_metadata (#5869) lilinsiman 2026-01-20 10:06:00 +08:00
  • f58e110afe 【feat】switch for fusion ops gmmswigluquant (#5992) aipaes 2026-01-19 21:19:25 +08:00
  • 38cfcd572a [doc](cp) correct the prefill of GQA and adjust desc of block table. (#5697) Qiu 2026-01-19 18:53:48 +08:00
  • f0d41199a6 [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (#5936) Levi 2026-01-19 17:12:13 +08:00
  • bc486d9530 [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (#5960) wangxiaochao6 2026-01-19 16:35:13 +08:00
  • ebb940691f [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (#5755) wangqiankun13 2026-01-19 16:10:43 +08:00
  • 687df88151 [Refactor] Move AttentionSpec initialization to Attention module (#5834) LICO67373 2026-01-19 14:22:18 +08:00
  • 83de5385b4 [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (#5897) LI SHENGYONG 2026-01-19 13:47:40 +08:00
  • b27774dbd6 [CI]fix for lint CI (#5982) SILONG ZENG 2026-01-19 09:49:28 +08:00
  • c929bd1e8d [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (#5034) Icey 2026-01-19 09:28:07 +08:00
  • 9cad1a8349 [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (#5928) meihanc 2026-01-19 09:27:55 +08:00
  • bc1f6713e7 [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (#5933) LI SHENGYONG 2026-01-19 09:24:25 +08:00
  • 9fed2636cb [EPLB][Nightly][Bugfix] Get expert from moe layer only (#5908) LI SHENGYONG 2026-01-19 09:23:28 +08:00
  • ad3a1eaf70 [Bugfix][MM] Fix multi-modal inference OOM issues by setting expandable_segments:True (#5855) Shanshan Shen 2026-01-19 09:17:31 +08:00
  • 0eafed9bd6 [doc]Table split (#5929) herizhen 2026-01-19 09:15:04 +08:00
  • c4fde5c064 [Doc] Upgrade outdated ut doc (#5937) Li Wang 2026-01-19 09:12:46 +08:00
  • 329961b375 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2) (#5977) SILONG ZENG 2026-01-19 08:59:46 +08:00
  • 2b6dc100b5 Eagle3 mm support, enablement on qwen3vl (#4848) Song Zhixin 2026-01-19 08:58:07 +08:00
  • 05e69b99e5 [Doc] Remove Chinese characters from the icons in the doc. (#5959) zzhxxx 2026-01-18 07:22:57 +08:00
  • fff5df3efe [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (#5968) wangxiaoteng888 2026-01-17 18:49:27 +08:00
  • 22f253142a [Feature] Support fine-grained shared expert overlap (#5482) Jade Zheng 2026-01-17 11:53:22 +08:00
  • 48e10de8c9 [Bugfix] fix cpu offload hang with tp=1 (#5963) lidenghui1110 2026-01-17 11:50:13 +08:00
  • 1ffca8673f [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (#5776) Shaoxu Cheng 2026-01-17 11:49:18 +08:00
  • 7feb74590b Revert "[bugfix]limit graph replay sync (#5761)" (#5965) Angazenn 2026-01-16 23:29:35 +08:00
  • 52086394ae [Lint]Style: Convert vllm-ascend/compilation to ruff format (#5912) SILONG ZENG 2026-01-16 20:57:46 +08:00
  • 3af91e5ac4 [Bugfix] Fix the input constraints checks for the mlapo and bmm_transpose operators (#5764) rjg-lyh 2026-01-16 17:52:48 +08:00
  • 4f446aec4c [CI] Add DeepSeek-V3.2-W8A8-Pruning e2e test (#5922) zhangxinyuehfad 2026-01-16 15:49:57 +08:00
  • 69b170b8b5 [CI] skip 310 test for full test (#5943) wangxiyuan 2026-01-16 10:36:20 +08:00
  • 73a3f822c7 [Main2Main] Upgrade vllm commit to releases/v0.14.0 (#5911) wjunLu 2026-01-15 23:22:43 +08:00
  • 372f979aa5 [CI] Add DeepSeek R1 W8A8 HMB nightly ci (#5874) zhangxinyuehfad 2026-01-15 20:48:20 +08:00
  • 44d3b4d61a [Misc] Cleanup useless file and code (#5877) wangxiyuan 2026-01-15 20:32:47 +08:00
  • 3cb0af0bcf [Refactor]Refactor of vllm_ascend/distributed module (#5910) lty 2026-01-15 16:26:53 +08:00