Commit Graph

  • c3c265648f [Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (#6939) Zhujiyang2 2026-03-04 16:02:08 +08:00
  • 95b44d7b73 [bugfix]fix file not found error in nightly of single-node (#6976) SILONG ZENG 2026-03-04 11:47:26 +08:00
  • 52d9086f64 [Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (#6914) zhaomingyu13 2026-03-04 11:29:49 +08:00
  • d431d7d526 [CI] Enable auto upgrade e2e estimated time for auto-partition suites (#6840) Li Wang 2026-03-04 10:38:34 +08:00
  • c7fd7a25f7 [Doc][Misc] Fix msprobe_guide.md documentation issues (#6965) NJX 2026-03-04 10:28:31 +08:00
  • 859f2c25b9 [Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml (#6503) SILONG ZENG 2026-03-03 20:13:43 +08:00
  • a0a904a3d4 [BugFix] Improve GDN layer detection for multimodal models (#6941) Cao Yi 2026-03-03 20:08:39 +08:00
  • 5b05b3a090 [feat]ds3.2 pcp support mtp and chunkprefill (#6917) weiguihua2 2026-03-03 19:03:50 +08:00
  • b771ca9a47 [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (#6945) Frank Chen 2026-03-03 17:20:52 +08:00
  • 700423156f [Triton] Centralize Ascend extension op dispatch in triton_utils (#6937) linfeng-yuan 2026-03-03 17:10:30 +08:00
  • cb893bcdb0 [csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (#6936) linfeng-yuan 2026-03-03 17:08:22 +08:00
  • 2064afe380 [300I][Bugfix] fix unquant model weight nd2nz error (#6851) Shaoxu Cheng 2026-03-03 15:57:26 +08:00
  • f19f7b1fe2 [doc] fix supported_models (#6930) zzzzwwjj 2026-03-03 09:47:50 +08:00
  • 248d07566f [CI] nightly test timeout (#6912) starmountain1997 2026-03-03 09:31:46 +08:00
  • f7a8befc20 [CI] Upgrade CANN to 8.5.1 (#6897) Xiaoshuang Wang 2026-03-03 09:02:42 +08:00
  • 15f6564976 [Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (#6828) tanhaoan333 2026-03-03 00:07:23 +08:00
  • dfa9ff7f2a [P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (#6898) wangxiaoteng888 2026-03-02 23:24:03 +08:00
  • 5899438a86 [Feat][310p] 310P support w8a8s quantization and saving w8a8sc state (#6878) pu-zhe 2026-03-02 20:09:15 +08:00
  • 68d8d20ca2 [misc] move mxfp_compat into device to decouple from quantization init chain (#6918) linfeng-yuan 2026-03-02 18:17:01 +08:00
  • 632801b0ad [CI][310P] Add 310p tracked files in CI light. (#6923) pu-zhe 2026-03-02 18:03:46 +08:00
  • 16c879cdf7 [Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518) whx 2026-03-02 17:54:25 +08:00
  • 8547520726 [Doc][Misc] Update AGENTS.md with sign-off and PR template requirements (#6892) realliujiaxu 2026-03-02 16:44:59 +08:00
  • 9180dd6c51 [BugFix][PCP] Fix presion bugs for pcp/dcp in PD disaggregate (#6876) Yuzhou Tong 2026-03-02 16:11:00 +08:00
  • ddc78dbade [300I] support decode-only aclgraph mode (#6849) Shaoxu Cheng 2026-03-02 14:15:14 +08:00
  • 86c9109d16 Bump actions/upload-artifact from 6 to 7 (#6906) dependabot[bot] 2026-03-02 14:08:28 +08:00
  • 002ec24dd8 Bump actions/download-artifact from 7 to 8 (#6907) dependabot[bot] 2026-03-02 14:07:59 +08:00
  • 3c66a970f2 add mxfp8 moe quantization (#6670) Eric-dot 2026-03-02 11:04:06 +08:00
  • c324053b44 [CI] Revert speedup image building and CI Installation related PRs (#6891) wjunLu 2026-03-02 08:53:10 +08:00
  • a77fe932e4 [Platform] Fix CPU binding logic (#6889) Frank Chen 2026-03-01 20:30:43 +08:00
  • 5e24b26a54 [Bugfix] rename enable_flash_comm_v1 back to enable_sp (#6883) realliujiaxu 2026-03-01 20:22:50 +08:00
  • 8835236181 [Image] Fix docker image merge tag settings (#6884) wjunLu 2026-03-01 12:20:57 +08:00
  • 9d09488b4a [Feat] support basic pcp&dcp for qwen3next (#6091) Bai Yongbin 2026-02-28 21:44:08 +08:00
  • 64fba51275 [Bugfix] Fix openEuler dockerfile error (#6871) wjunLu 2026-02-28 20:55:18 +08:00
  • 5ffae03156 [bugfix] fix capture shape in sp_eagle_fullgraph (#6846) starmountain1997 2026-02-28 17:30:02 +08:00
  • 81fb7d5779 [Doc] add 310P3 guidance of PaddleOCR-VL (#6837) zyz111222 2026-02-28 16:03:07 +08:00
  • 3cc8bf15da Support platform.get_device_uuid function (#6777) luomin2005 2026-02-28 14:17:12 +08:00
  • 263c2f8e8d [CI] Revert auto rebase (#6867) wjunLu 2026-02-28 11:54:31 +08:00
  • 3d563292f3 clean 0.15.0 support (#6852) wangxiyuan 2026-02-28 09:20:57 +08:00
  • 84b00695f8 [CI] Refactor to speedup image building and CI Installation (#6708) wjunLu 2026-02-28 09:06:00 +08:00
  • 5666ce03f5 [bugfix] Fixed an accuracy problem of gdn layer in graph (#6822) drslark 2026-02-28 08:57:53 +08:00
  • 9cd0d6c33d [Doc][Misc] Update release notes for v0.15.0rc1 (#6859) wangxiyuan 2026-02-27 22:35:09 +08:00
  • b60b991005 [CI] Add nightly test for Qwen3-235B-A22B with mooncake layerwise connector (#5441) wjunLu 2026-02-27 16:31:02 +08:00
  • c13d90b766 [Refactor][EAGLE] 7/N Merged PCP and disable_padded interface (#6811) lilinsiman 2026-02-27 16:06:56 +08:00
  • e4458b2d2b [Main2Main] Upgrade vLLM to 0226 (#6813) Canlin Guo 2026-02-27 16:05:21 +08:00
  • 80316c5824 [DOC] enable both flashcomm1 and cudagraph (#6807) starmountain1997 2026-02-27 14:52:55 +08:00
  • 3d43ed997e add release note for 0.15.0rc1 (#6839) wangxiyuan 2026-02-27 11:55:55 +08:00
  • a95c0b8b82 [Doc] fix the nit in docs (#6826) wangxiyuan 2026-02-27 11:50:27 +08:00
  • 981d803cb7 [CI] Fix doc test fail when load model with error information: 'Stale file handle' (#6832) Nengjun Ma 2026-02-27 09:14:42 +08:00
  • 5def28dcd3 [Feat]support sequence parallelism by pass for VL models (#5632) realliujiaxu 2026-02-27 08:27:41 +08:00
  • ed175d6d92 [Doc][Release] Add release note skill (#6824) Yikun Jiang 2026-02-26 21:01:21 +08:00
  • 2d49f9079a [BugFix] Support ALL D-Nodes in fullgraph when running MTP in PD (#5472) MengLong Chen 2026-02-26 19:09:05 +08:00
  • 532f7a82f2 [Patch][Misc] Cleanup and update patches (#6802) wangxiyuan 2026-02-26 14:45:33 +08:00
  • c9d05d10aa [Doc][Misc] Refactor skill documentation and add Claude support instructions (#6817) wangxiyuan 2026-02-26 14:42:59 +08:00
  • e76b69b9ef [BugFix] [310p] Fix attention accuracy issue (#6803) pu-zhe 2026-02-26 14:30:39 +08:00
  • 9f8b84e5fc [Misc] Drop patch_rope.py (#6291) Canlin Guo 2026-02-26 14:04:53 +08:00
  • 3953dcf784 [Feature][Quant] Auto-detect quantization format from model files (#6645) Cao Yi 2026-02-26 10:59:25 +08:00
  • bc1622338c [CI] Add long and short prompt tests for DeepSeek-V3.2 (#6536) starmountain1997 2026-02-26 10:58:50 +08:00
  • 169e434f78 [CI] Fix EAGLE CI problems (#6702) Dijurido 2026-02-26 10:26:01 +08:00
  • 2870f7c8ad [Feat] Support routing replay (#6696) Li-Yongwen 2026-02-26 10:22:47 +08:00
  • a9cca0c5c4 [Refactor] Modify the binding logic, added memory migration and interrupt core binding functions. (#6785) Rozwel-dx 2026-02-26 08:49:50 +08:00
  • 3a4292e5b7 [MM][Perf] Use seq_lens CPU cache to avoid frequent d2h copy for better performance (#6448) Shanshan Shen 2026-02-26 08:49:36 +08:00
  • 29e3cdde20 [Doc][Skill] Introduce AI-assisted model-adaptation workflow for vllm-ascend (#6731) jack 2026-02-26 08:48:15 +08:00
  • 3b59d0ebe9 [Doc][Feature] Add vLLM Ascend development guidelines AGETNS.md (#6797) wangxiyuan 2026-02-26 08:47:46 +08:00
  • aa7fb5d707 [Bugfix] Fix DeepseekV3.1 Accuracy issue (#6805) Zhu Yi Lin 2026-02-25 23:02:00 +08:00
  • e3927cc8f5 [Bugfix] fix bug for mtp (#6514) bowenli 2026-02-25 17:50:57 +08:00
  • ed051737e9 [Bugfix] Support Kimi-K2.5 models (#6755) LoganJane 2026-02-25 14:51:46 +08:00
  • 4efd362bac [fix]change num_commmon_tokens to num_common_tokens (#6792) kx 2026-02-25 14:48:54 +08:00
  • 2260af405f [DOC] add request forwarding (#6780) starmountain1997 2026-02-25 14:43:51 +08:00
  • ad9d9569ea [Bugfix] Add the missing parentheses to @torch.inference_mode (#6757) Canlin Guo 2026-02-25 14:37:53 +08:00
  • 957804df56 [Refactor][Bugfix] Use upstream mem_utils for profiling and correct non-torch memory recorded during profiling (#6625) Shanshan Shen 2026-02-25 14:28:08 +08:00
  • 812c722cfb [KVPool][BugFix] Correctly initialize head_or_tp_rank for mooncake backend (#6498) DreamerLeader 2026-02-25 14:22:00 +08:00
  • 3da2ba22eb [Platform] Enable ARM-only CPU binding with NUMA-balanced A3 policy and update docs/tests (#6686) Frank Chen 2026-02-25 11:15:14 +08:00
  • ac9a7d1301 [Nightly] Increase VLLM_ENGINE_READY_TIMEOUT_S to avoid nightly failure (#6778) Li Wang 2026-02-25 10:14:51 +08:00
  • db51a1b9b6 [Feat]ds3.2 support pcp (#6733) weiguihua2 2026-02-25 09:46:57 +08:00
  • ee59429015 upgrade main to 0212 (#6712) Icey 2026-02-25 09:17:29 +08:00
  • 0331f16a50 [EPLB] Reduce the memory used for heat aggregation (#6729) LI SHENGYONG 2026-02-24 18:02:24 +08:00
  • 5c8ab7af39 [main]update release note & support matrix (#6759) zzzzwwjj 2026-02-24 17:39:35 +08:00
  • a8e951e6f5 [Feat] 310p supports PrefillCacheHit State (#6756) pu-zhe 2026-02-24 16:48:05 +08:00
  • 62ea664aa7 [Lint]Style: Convert test/ to ruff format(Batch #5) (#6747) SILONG ZENG 2026-02-24 15:50:00 +08:00
  • 747484cb64 [Bugfix] Fix wrong computed_tokens when meet exception. (#6522) xleoken 2026-02-24 15:29:30 +08:00
  • ff29e029de [EPLB][Bugfix] Bugfix for ineffective dynamic eplb (#6653) LI SHENGYONG 2026-02-24 14:43:04 +08:00
  • f41eeeb11e Refactor the ops PyTorch adapter,cleanup for csrc/torch_binding.cpp (#6732) luomin2005 2026-02-24 09:12:43 +08:00
  • f0caeeadcb [CI] unlock when load model (#6771) Nengjun Ma 2026-02-14 18:54:04 +08:00
  • 70e26551cf [Doc] modify glm doc (#6770) yydyzr 2026-02-14 16:47:23 +08:00
  • e2237819a9 [CI]Fixed the spell check function in typos.toml (#6753) SILONG ZENG 2026-02-14 11:57:26 +08:00
  • 64aea60f2e [EPLB][Nightly] Refactor UT (#6543) JIACHENG XU 2026-02-14 10:56:29 +08:00
  • 1e77077788 [Bugfix][DispatchFFNCombine] resolve vec error caused by unaligned UB access (#6707) xulei 2026-02-14 10:32:50 +08:00
  • e2175d9c7e [Lint] Adapt lint tools for windows (#6727) whx 2026-02-13 15:53:16 +08:00
  • 6de207de88 [main][Docs] Fix typos across documentation (#6728) Cao Yi 2026-02-13 15:50:05 +08:00
  • b6bc3d2f9d [Feat.][310P]: weightNZ feature with quant or unquant. (#6705) Shaoxu Cheng 2026-02-13 15:41:02 +08:00
  • f40256b697 [Feat.][310P] addrmsnorm for 300I DUO (#6704) Shaoxu Cheng 2026-02-13 15:40:49 +08:00
  • 7164990904 [Graph][Fusion] Integrating inductor pass and npugraph ex pass (#6354) Icey 2026-02-13 15:34:55 +08:00
  • 87a0b7b7c7 [bugfix] adapt bugfix for norm_quant_fusion_pass to npugraph_ex (#6726) iiiklw 2026-02-13 10:10:39 +08:00
  • 41d056f947 [doc] add A2 series doc for GLM5.md (#6717) taoyao1221 2026-02-12 16:08:17 +08:00
  • b881fab416 [P/D][PCP] mooncake layerwise support pcp function (#6627) wangxiaoteng888 2026-02-12 11:02:25 +08:00
  • 8b23554741 [Misc] gen kv events in ascendconnector (#6593) yejj 2026-02-12 11:01:09 +08:00
  • 7221045777 [Attention] add gpt-oss support (#5901) jiahao.quan 2026-02-12 10:55:34 +08:00
  • f71812011d [Feature] DispatchGmmCombineDecode support bf16/float16 gmm1/gmm2 weight and support gmm weight with ND format (#6393) lih827 2026-02-12 10:37:41 +08:00
  • f1ffb5fb19 [Feature] adapt to uva buffer and main2main (#6657) Ronald 2026-02-12 10:36:31 +08:00
  • 56269eae0e [BugFix] Fix AddRMSNormQuant not taking effect (#6620) ZYang6263 2026-02-12 09:26:05 +08:00