Commit Graph

  • 4230bc8646 [Bugfix]Modify NPU rotary encoding parameter fields,fix RopeOperation setup failed in condition of self.rotary_dim < self.head_size (#6310) wubin58 2026-01-30 21:25:04 +08:00
  • 77ea873224 fix: resolve sync bug in DispathFFNCombine when expert num per card is 32 (#6416) xulei 2026-01-30 21:21:20 +08:00
  • 56f5d3bd49 [Fix] Pads query_start_loc to satisfy FIA/TND constraint (#6357) Yizhou 2026-01-30 16:41:44 +08:00
  • f2990f7741 [e2e Test][npugraph_ex]add static kernel e2e test case (#6320) ChenCangtao 2026-01-30 16:24:48 +08:00
  • 8969b94a14 [Nightly] Correct nightly image build ref (#6420) Li Wang 2026-01-30 15:55:58 +08:00
  • d252e4f5ec [P/D] Using the cache load operator to replace the index select operator. (#6295) liziyu 2026-01-30 14:27:53 +08:00
  • 70cc5f7969 [bugfix]fix rope_forward_triton error (#6404) Wang Kunpeng 2026-01-30 14:09:00 +08:00
  • 46cee945b3 [doc][npugraph_ex]add npugraph_ex introduction doc (#6306) ChenCangtao 2026-01-30 11:21:37 +08:00
  • 1d661bb279 [Bugfix] Specify tensorflow version in accuracy test to avoid segmentation fault (#6292) zhangxinyuehfad 2026-01-30 09:28:24 +08:00
  • b2857de43f [ST]Add e2e test for Npugraphex_pass (#6388) CodeCat 2026-01-30 09:14:07 +08:00
  • 4970de4242 [CI] Enable the skipped cases when HDK is upgraded to 25.5.0 (#6195) wjunLu 2026-01-29 22:41:41 +08:00
  • e35f304419 [CI] Auto partition for test cases (#6379) Li Wang 2026-01-29 20:28:10 +08:00
  • 14bd55f30c [P/D][BugFix] Fix layerwise P/D request_id error (#6360) zxr2333 2026-01-29 20:19:05 +08:00
  • feab047084 [bugfix](pcp,gqa) set kv_inverse_idx_for_chunk and cp_kv_recover_idx_for_chunk to None when dcp only (#6317) Qiu 2026-01-29 19:35:52 +08:00
  • 50e0e87646 [bugfix](CP,MLA) fix wrong slot_mapping of decode for mixed p/d batch (#6344) Qiu 2026-01-29 16:48:37 +08:00
  • 6a7b3bc29c Qwen3-VL-MoE EAGLE support for vLLM-Ascend (#6327) Sergey-Zlobin 2026-01-29 11:44:30 +03:00
  • 41a52beb26 [bugfix] resolve kv cache leak on P-side due to incorrect req_id (#6325) JiangWeixiang 2026-01-29 16:05:56 +08:00
  • 597091be9f [Doc] Reranker guide remove deprecated task option (#6385) Nengjun Ma 2026-01-29 16:00:26 +08:00
  • 7a5b345dc4 [Misc] Drop deepseek patch (#6288) wangxiyuan 2026-01-29 14:45:50 +08:00
  • 39f8af9d96 [Main2Main][BugFix] Add shared_experts check for AscendSharedFusedMoE (#6335) whx 2026-01-29 08:47:20 +08:00
  • f0ff2cc22d [CI] hot fix for nightly image build tag (#6367) Li Wang 2026-01-28 23:29:50 +08:00
  • 86b6ecac4c [CI][BugFix] Import error fix. (#6293) InSec 2026-01-28 22:07:47 +08:00
  • df588ed488 [BugFix] Disable enable_shared_expert_dp by default if tensor_parallel_size=1 (#6361) hucong 2026-01-28 22:01:01 +08:00
  • 8b0a7b6d80 [CI] Nightly tests use releases/v0.13.0 (#6355) Li Wang 2026-01-28 21:46:13 +08:00
  • 501bb395b1 [CI] Fix image build (#6333) Li Wang 2026-01-28 21:36:44 +08:00
  • 245c1ca241 [0.14.1][bugfix][sched] fix incompatibility of RecomputeScheduler with vllm v0.14.1 (#6286) linfeng-yuan 2026-01-28 20:16:58 +08:00
  • e25ee65729 [Misc][Test] add e2e test for apply_top_k_top_p_custom kernel (#6348) linfeng-yuan 2026-01-28 17:25:57 +08:00
  • 857c533e27 [CI]: add production safeguards for 300I (#6343) Shaoxu Cheng 2026-01-28 16:43:48 +08:00
  • 9fadc8df4f [Fixbugs]: fix refactor cause to 310p chunkprefill error (#6340) Shaoxu Cheng 2026-01-28 16:41:32 +08:00
  • 325cb16e3f [BugFix][CI]Fix DeepSeek-R1-W8A8-longseq nightly CI (#6297) dsxsteven 2026-01-28 16:36:24 +08:00
  • ac963f1519 [Fix] Adds CUDA graph stats to execution state (#6331) Yizhou 2026-01-28 16:34:20 +08:00
  • 379ce599d0 [Bugfix] Add missing draft_attn_metadatas parameter to fix MTP test (#6232) LICO67373 2026-01-28 14:41:18 +08:00
  • f8e76a49fa [CI] Upgrade trasnformers version (#6307) wangxiyuan 2026-01-28 14:06:39 +08:00
  • c498cea22d [refactor] refactor excute_model and _dymmy_run method (#6043) Wang Kunpeng 2026-01-27 22:27:01 +08:00
  • 41eb71d665 [Refactor] profiler config optimze (#6141) TMC 2026-01-27 22:09:50 +08:00
  • 54e8389f8e [Graph][Fusion] Add MatmulAllReduceAddRMSNorm graph fusion for npugraph_ex. (#6006) CodeCat 2026-01-27 16:41:48 +08:00
  • 21b6779a33 [UT]: refactoring 310p ops ut (#6296) pu-zhe 2026-01-27 16:31:51 +08:00
  • 57fd6e4bd9 [Refact.]: refactoring 310p-kv cache allocator, align with main branch (#6270) pu-zhe 2026-01-27 16:26:48 +08:00
  • 5e34c70ffc [Misc] Removes unnecessary graph size re-initialization (#6280) Angazenn 2026-01-27 14:38:07 +08:00
  • fea197ad50 [Main2Main] Upgrade vllm commit to 0123 (#6169) meihanc 2026-01-27 08:44:36 +08:00
  • 9780a995e1 [BugFix] Fix wheel package build workflow (#6276) Icey 2026-01-26 20:42:17 +08:00
  • 595b57c4d4 [CI][BugFix] Qwen3-Next nightly test fix. (#6247) InSec 2026-01-26 19:53:53 +08:00
  • d9979f4d13 [Doc] quick fix for vllm-ascend version (#6278) wangxiyuan 2026-01-26 19:33:18 +08:00
  • cb553f8eee [Community] Nominate whx-sjtu as maintainer (#6268) wangxiyuan 2026-01-26 19:22:26 +08:00
  • 43be004379 [Lint] Fix mypy issue to make CI happy (#6272) Li Wang 2026-01-26 17:54:00 +08:00
  • 29fb27d3bb BugFix: Fix moe_load accumulation error in ACL graph mode (#6182) Mercykid-bash 2026-01-26 17:18:46 +08:00
  • 2d3b8a51f9 [Patch] Remove the patch of ECExampleConnector (#5976) Canlin Guo 2026-01-26 17:10:03 +08:00
  • b390e0ef78 [Bugfix] Fix PP+PCP and PP+flashcomm1 bugs (#5416) Jingchun Gao 2026-01-26 16:53:07 +08:00
  • 7d119df2a9 [Feat] proxy delay to remove instances (#5934) yuxinshan 2026-01-26 16:29:45 +08:00
  • de095c5fed [CI] Add workfolw_dispatch for nightly image build (#6269) Li Wang 2026-01-26 15:56:38 +08:00
  • 1645546661 [bugfix][npugraph_ex]fix static kernel uninstall issue (#6128) ChenCangtao 2026-01-26 15:03:18 +08:00
  • f910cebe04 [Doc] 310P Documents update (#6246) Nengjun Ma 2026-01-26 14:33:21 +08:00
  • 0bb1f91c2c [Feature] Mooncake connector get remote ptp size (#5822) yuxinshan 2026-01-26 14:28:33 +08:00
  • 611e223b7d [EPLB][Bugfix] EPLB support fp/bf16 (#5531) LI SHENGYONG 2026-01-26 14:28:16 +08:00
  • 52d4acfa51 [Doc] add release note for v0.14.0rc1 (#6225) wangxiyuan 2026-01-26 14:22:40 +08:00
  • 1f26f83e34 [CI] Bump actions/checkout from 4 to 6 (#6255) dependabot[bot] 2026-01-26 14:21:00 +08:00
  • ae71c4237e [CI] Bump actions/setup-python from 6.1.0 to 6.2.0 (#6256) dependabot[bot] 2026-01-26 14:20:14 +08:00
  • c26ad78f86 [CI][lint] Add rule codespell back (#6236) Li Wang 2026-01-26 14:12:33 +08:00
  • f4abd9b7b5 [CI] Fix 310p image build (#6259) wangxiyuan 2026-01-26 14:11:56 +08:00
  • 65289676b4 [Refactor] Separate _prepare_inputs to _prepare_inputs and _preprocess (#6191) Canlin Guo 2026-01-26 14:05:23 +08:00
  • e3eefdecbd [Doc] Update max_tokens to max_completion_tokens in all docs (#6248) Shanshan Shen 2026-01-26 11:57:40 +08:00
  • 418fccf0bc [310P]: fix 310p image cannot build (#6238) Shaoxu Cheng 2026-01-26 11:37:19 +08:00
  • 76ac688388 [MM][Perf] Parallelize Q/K/V padding in AscendMMEncoderAttention for better performance (#6204) Shanshan Shen 2026-01-26 10:20:24 +08:00
  • ce11fd49f3 [Feature] Batch invariant torch.compile (#6107) huangning1995 2026-01-26 09:15:06 +08:00
  • 96309e2b79 [ops] support advanced apply_top_k_top_p without top_k constraint (#6098) linfeng-yuan 2026-01-26 09:08:42 +08:00
  • 4e3919e965 Reapply "[Refactor] Unify full-graph parameter update logic (#6041)" (#6227) (#6231) wangxiyuan 2026-01-26 09:04:54 +08:00
  • c38c838d03 [CI] Decrease Qwen3 dense model output throughput baseline to make ci happy (#6233) Li Wang 2026-01-26 09:04:13 +08:00
  • 63adbedb7a [Worker] Implement update max_model_len interface for NPUWorker (#6193) Li Wang 2026-01-26 09:03:33 +08:00
  • ca297eb57f [CI] Migrate e2e test runner to hk (#5344) Li Wang 2026-01-26 09:00:51 +08:00
  • 99bdd7363c [CI] update vLLM to 0.14.1 (#6222) wangxiyuan 2026-01-25 17:52:16 +08:00
  • 384d84c7ef [Bugfix] Avoided a bug of drafter when dp and sp are enabled (#6226) drslark 2026-01-25 17:45:29 +08:00
  • b45bd92c2b [Bugfix] Add defensive check for multimodal_config (#6230) Canlin Guo 2026-01-25 17:39:19 +08:00
  • 2928ae2af5 [Image] fix 310p image build (#6228) wangxiyuan 2026-01-25 16:07:13 +08:00
  • 95649344aa Revert "[Refactor] Unify full-graph parameter update logic (#6041)" (#6227) wangxiyuan 2026-01-25 15:25:38 +08:00
  • 7799c4ca3b [Fusion] change fusion env variable (#6201) Icey 2026-01-24 22:49:33 +08:00
  • 6ccccad102 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5) (#5996) SILONG ZENG 2026-01-24 22:45:38 +08:00
  • 7faa6878a6 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #3) (#5978) SILONG ZENG 2026-01-24 22:10:18 +08:00
  • 4e53c1d900 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6) (#6001) SILONG ZENG 2026-01-24 22:08:33 +08:00
  • 153da1a669 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #4) (#6200) SILONG ZENG 2026-01-24 20:40:48 +08:00
  • fbae41697e [310P]: refactoring for 310p kvcache and some ops class (#6117) Shaoxu Cheng 2026-01-24 20:34:29 +08:00
  • 5b746f3e83 [Inductor]change pass to adapt to new addrmsnormBias operator (#6094) Angazenn 2026-01-24 20:16:44 +08:00
  • 8966a99710 [Refactor] Unify full-graph parameter update logic (#6041) LICO67373 2026-01-24 20:12:57 +08:00
  • 8129c429ef [Doc] Improved English grammar and integrated the DeepWiki badge for Ask AI (#6216) Zeng haolong 2026-01-24 20:11:18 +08:00
  • 4fcacca8a6 [BugFix] Fix build wheel (#6218) Icey 2026-01-24 20:08:20 +08:00
  • fc26260d84 [BugFix] buildwheel dependency install (#6212) Icey 2026-01-24 17:11:55 +08:00
  • 21833a4321 [Doc] Add release note for 0.13.0rc2 (#6207) wangxiyuan 2026-01-24 12:51:47 +08:00
  • f66bcdfb29 [P/D] Mooncake connector add zmq socket fail log (#6155) liziyu 2026-01-24 12:06:42 +08:00
  • 14bef9af6f [P/D] Remove restrictions on mooncake for IPv6 (#5946) liziyu 2026-01-24 11:30:22 +08:00
  • 019a2fe6e6 [Eagle3]enhance skipping dp allreduce and add it into eagle proposer (#6192) Angazenn 2026-01-24 11:29:42 +08:00
  • 56d8f088dd [Doc] Update DeepSeek-V3.2 tutorail, add single-node and multi-node deployment (#6196) zhangyiming 2026-01-24 11:29:07 +08:00
  • 2dd68652bc [Doc] Add the setting description of cudagraph_capture_sizes in speculative decoding user guide (#5637) zhaomingyu13 2026-01-23 23:22:44 +08:00
  • a2f022f9b6 [UCMConnector]Add has_connector_metadata (#6172) UnifiedCacheManager 2026-01-23 21:16:48 +08:00
  • 717d299ae5 [BugFix]bug fix for dispatch_ffn_combine (#6156) lhchg 2026-01-23 21:14:18 +08:00
  • 44a4ff6960 [main][BugFix] Avoided a bug of torch_npu.npu_mm_reduce_scatter_base when sp size >= 16 (#6168) drslark 2026-01-23 21:12:23 +08:00
  • e90b14140b [feature] add_rms_norm support bias (#5790) yjmyl 2026-01-23 21:09:54 +08:00
  • 6c73b88dd6 [CI] Enable FLASHCOMM1 with layer_sharding and FULL_DECODE_ONLY in ds32 testing (#6115) starmountain1997 2026-01-23 19:48:37 +08:00
  • 8786412f5c [Bugfix]KV pool rank 0 consumes more HBM (#6113) baxingpiaochong 2026-01-23 19:47:33 +08:00
  • bdf65e6bd3 [TEST]Add mooncake common method for tests (#6194) jiangyunfan1 2026-01-23 17:14:15 +08:00
  • 1e116829ac [doc]update --max-num-seqs in Qwen3-235b tutorial (#6197) Angazenn 2026-01-23 17:11:10 +08:00
  • af4dbb6b26 [CI] Use nginx for package cache to speed up CI (#6170) Li Wang 2026-01-23 16:56:16 +08:00