Commit Graph

  • 389030a8f8 add env vars & misc main starkwj 2026-02-11 06:27:58 +00:00
  • 739d074b0c update other platforms' Dockerfile starkwj 2026-01-22 12:07:03 +00:00
  • 2a571d8bc8 support multi npu partially starkwj 2026-01-08 06:54:33 +00:00
  • fa0fb46853 fix reload return value starkwj 2026-01-07 07:42:30 +00:00
  • 074ae28d6e 更新 README.md lumian 2026-01-05 20:33:31 +08:00
  • caf0289e1a add Dockerfile and readme starkwj 2026-01-05 09:10:56 +00:00
  • 135cc0a505 vllm-ascend vnpu v1 starkwj 2025-12-26 07:37:35 +00:00
  • 2f1aed98cc [Doc] Update version policy to the latest. (#5071) zhangyiming 2025-12-16 15:24:46 +08:00
  • 8c41770f1f [bugfix] fix fp32 trans nz (#5068) zzzzwwjj 2025-12-16 15:04:31 +08:00
  • 11e6d6c291 [doc] update developer guide (#5060) wangxiyuan 2025-12-16 14:09:52 +08:00
  • e07abfaa75 [Doc] Add new contributors. (#5066) zhangyiming 2025-12-16 12:47:40 +08:00
  • ca0823f238 [0.11.0][Bugfix] fix fastapi version (#5052) zhangxinyuehfad 2025-12-16 11:34:11 +08:00
  • 303c08aec9 [Doc] Update structured output doc with upstream link (#5058) Shanshan Shen 2025-12-16 11:32:53 +08:00
  • 2b5b309133 [Bugfix]Fix precision issues in moe_mlp (vllm-ascend v0.11.0-dev) (#5023) Clorist33 2025-12-16 08:40:03 +08:00
  • 87c0cfafa3 [0.11.0][Bugfix] fix fastapi version (#5048) zhangxinyuehfad 2025-12-15 23:51:38 +08:00
  • 01a13a9b77 fix nz for quantization (#4943) wangxiyuan 2025-12-12 14:54:41 +08:00
  • 5932abc446 [Bugfix] Fix the Eagle3 inference failure issue. (#4721) sunchendd 2025-12-12 14:52:29 +08:00
  • 4f0dddc9ee [Bugfix] bugfix for moe_mlp in vllm-ascend/v0.11.0-dev (#4885) Clorist33 2025-12-12 14:51:47 +08:00
  • 9c0ad46c1a [0.11.0][Bugfix] Remove the ZMQ communication setup on the D node (#4916) Slightwind 2025-12-12 14:37:49 +08:00
  • ceadc2788d Revert "[refactor]support gatingtopk operator generalization (#4356)" (#4873) 1092626063 2025-12-10 15:45:20 +08:00
  • 9a144bc7be [Docs][0.11.0] delete AIV env variables in DSV32 documentation (#4833) linfeng-yuan 2025-12-09 15:53:53 +08:00
  • 8f45f9ce29 BugFix: Resolve shape mismatch in eplb update and calculation issues in quant_apply_mlp (#4777) Mercykid-bash 2025-12-09 15:46:58 +08:00
  • 695e5c9ebc [0.11.0][ops] npu_top_k_top_p supports k and p only (#4153) linfeng-yuan 2025-12-09 15:45:40 +08:00
  • 4588d1f215 [CI] Use arm node for unit tests (#4819) Li Wang 2025-12-09 15:45:14 +08:00
  • e0757dc376 [0.11.0]fix the configuration conflicts in documentation (#4824) linfeng-yuan 2025-12-09 15:37:06 +08:00
  • 033e3557cc [cherry-pick]fix qwen3vl mrope op (#4484) (#4811) zhangxinyuehfad 2025-12-09 11:07:32 +08:00
  • 9862a23985 【0.11.0-dev】optimization of kimi-k2 in cann8.3 (#4555) Levi 2025-12-09 08:49:15 +08:00
  • 0d094531b4 [bugfix] Fixed the bug in retrieving the quantization method for mlp.… (#4797) zhangxinyuehfad 2025-12-09 08:47:19 +08:00
  • 4e728f1f40 [Bugfix] fix qwen3-vl-moe shape ERROR during the _prepare_inputs phase under high concurrency. (#4658) Levi 2025-12-08 19:30:16 +08:00
  • d412565ec9 [Cherry-pick]bmm_transpose to v011dev (#3995) Wang Yixuan 2025-12-08 19:22:14 +08:00
  • 6391f0625f [v0.11.0-dev][bugfix] Add branch for stream up-lifting in update_attn_params (#4437) Angazenn 2025-12-08 08:54:46 +08:00
  • 2598124e67 [Image] Correcting the vllm tag of the openeuler image on the A2 device. (#4745) Li Wang 2025-12-06 10:55:22 +08:00
  • 350999c4ef [Bugfix]Fix eplb enable when using mtp float weights. (#4576) offline893 2025-12-05 21:15:32 +08:00
  • c4a11a745a [refactor]support gatingtopk operator generalization (#4356) 1092626063 2025-12-04 20:10:13 +08:00
  • 593a96056c 【EPLB】Eplb Redundant Experts Bugfix (#4232) LI SHENGYONG 2025-12-03 12:00:05 +08:00
  • b6d63bbd52 [v0.11.0-dev][CI] Fix ngram lacking of input arg dummy_compute_logits error (#4648) Mengqing Cao 2025-12-03 09:22:07 +08:00
  • 865f1f7fc8 [Bugfix] Resolve the interface compatibility issue of get_input_embeddings in MM (#4638) Levi 2025-12-02 22:21:47 +08:00
  • 3b4cb23616 [Bugfix] fix qwen2.5-vl-72b shape ERROR during the _prepare_inputs phase under high concurrency. (#4553) Levi 2025-12-02 14:20:45 +08:00
  • 52abd47f8c [Bugfix][SHM] Use writer lock by default and remove redundant env (#4117) Zetong Li 2025-12-01 22:27:01 +08:00
  • 76d0ba4342 [Image][Build] Cherry pick #4062 from main (#4506) Li Wang 2025-12-01 11:39:40 +08:00
  • 2b4f7a5016 [cherry-pick pr-4254] bugfix for mtp>1 when lm_head_tp>1 (#4360) zouyida2052 2025-12-01 11:11:15 +08:00
  • cd9f5c0611 [bugfix] dep ineffective (#4416) LI SHENGYONG 2025-11-29 15:19:11 +08:00
  • 71acc8ddeb For nz unset in bf16&fp16 (#4495) henryxuxu0716 2025-11-28 17:32:25 +08:00
  • 96c362361e [0.11.0][TEST] Delete Comment (#4428) Zhu Yi Lin 2025-11-25 21:39:36 +08:00
  • a686f2962a [0.11.0][Bugfix] fix e2e full test (#4424) zhangxinyuehfad 2025-11-25 21:21:42 +08:00
  • cdaf7f4a51 [MM][Bugfix] Minor fix for VL model verification (#4385) Shanshan Shen 2025-11-25 20:36:32 +08:00
  • 386a85eccc [Bugfix]Fix the hang issue of multimodal model when running with DP>1 (#4393) wujinyuan1 2025-11-25 09:32:22 +08:00
  • a3164ac372 [v0.11.0][Bugfix][MoE] enable force_load_balance in aclgraph (#4367) weichen 2025-11-25 09:16:57 +08:00
  • 75452abe1e [Doc][v11.0-dev][cherry-pick]Add single node PD disaggregation instructions (#4370) mazhixin000 2025-11-24 17:23:11 +08:00
  • a2e4c3fe78 Revert "[cherry-pick][refactor]support gatingtopk operator generalization (#4050)" (#4352) wangxiyuan 2025-11-21 23:03:20 +08:00
  • 5ad0ccdc31 [v0.11.0]Upgrade cann to 8.3.rc2 (#4332) SILONG ZENG 2025-11-21 22:48:57 +08:00
  • 0f9025cceb [EPLB] Eplb Verify Fix (#4334) LI SHENGYONG 2025-11-21 18:18:15 +08:00
  • 97ffb9120f [CI] Defaultly compile vllm with multimodal audio feature in dockerfile (#4324) (#4341) Ting FU 2025-11-21 17:53:00 +08:00
  • 218bc70f6f [CI] Remove redundant workflows (#4335) Li Wang 2025-11-21 16:48:35 +08:00
  • 70f076331f [MM][Bugfix] Add error log for VL models when enabling FLASHCOMM (#4222) Shanshan Shen 2025-11-21 15:04:35 +08:00
  • c94b38c82e [Readme] EPLB Support Scenarios (#4315) LI SHENGYONG 2025-11-21 14:25:39 +08:00
  • 9c6d0b422c [v0.11.0-dev][misc]change default capture size for Qwen3-MoE when using full dp (#4205) Angazenn 2025-11-21 11:19:11 +08:00
  • b6d59bdea2 cherry pick from pr 4270 (#4285) shaopeng-666 2025-11-19 22:32:02 +08:00
  • 277670730c [Bugfix][Aclgraph] failed to update graph task (#4282) MengLong Chen 2025-11-19 21:30:48 +08:00
  • c87a77e8b4 [cherry-pick][refactor]support gatingtopk operator generalization (#4050) 1092626063 2025-11-19 10:39:28 +08:00
  • ddf3e75800 [Cherry-pick] [0.11.0] pd proxy support ipv6 and fix proxy (#4242) liziyu 2025-11-18 16:33:00 +08:00
  • 378e92a2a2 [Cherry-pick][0.11.0] Adapted to torch_npu.npu_fused_infer_attention_score (#4202) Icey 2025-11-17 10:56:23 +08:00
  • a7eb42cf0a [v0.11.0-dev][Bugfix][cherry-pick]bugfix for weight load of kimi-k2 (#4190) zhangyiming 2025-11-14 15:43:22 +08:00
  • 51e5806d76 [0.11.0-dev][Bugfix][EPLB] Quick fix for missing log2phy conversion (#4150) weichen 2025-11-13 14:32:40 +08:00
  • cd652acb65 [BugFix] Fix kv_no_split not contiguous (#3711) zhaozx-cn 2025-11-13 11:29:37 +08:00
  • 28a15299ea [cherry-pick][v0.11.0-dev][bugfix] Change seq_lens in dummy attn_metadata to max_query_len (#4099) Angazenn 2025-11-12 20:32:50 +08:00
  • 7732a89fd9 [v0.11.0][UT][Fixbug] Fix UT test (#4151) zhangxinyuehfad 2025-11-12 16:55:18 +08:00
  • 650ce8ad19 [0.11.0][Bugfix] Fix ngram precision issue and open e2e ngram test (#4092) zhaomingyu13 2025-11-11 09:58:03 +08:00
  • 2069bef449 [v0.11.0-dev][bugfix] Fix a bug in wrongly set npu_stream (#4106) Angazenn 2025-11-11 09:16:41 +08:00
  • c5fe179cef [0.11.0] [Cherry-pick #4058] Fixes Qwen3-Next enable nz accuracy problem (#4056) Icey 2025-11-10 20:56:39 +08:00
  • ebd45b6596 [V0.11.0][Core] Restore scheduling logic under default configuration (#4094) rjg-lyh 2025-11-10 20:02:23 +08:00
  • c3c9138719 [Perf] Move attention update stream out of loop to optimize performance (#3985) XiaoxinWang 2025-11-10 17:18:45 +08:00
  • d913f9474b [0.11.0][Fix] Fix Qwen2-Audio-7B-Instruct accuracy test (#4018) zhangxinyuehfad 2025-11-10 11:54:30 +08:00
  • 7ea17fbee3 [0.11.0][BugFix] Improve the performance of prefixcache features (#4021) hucong 2025-11-10 11:51:34 +08:00
  • c2d58c0655 [P/D][BugFix][v0.11.0-dev]Fix proxy format processing errors & Layerwise connector performance optimization (#4069) wangxiaoteng888 2025-11-09 09:55:10 +08:00
  • 55e37f5041 [v0.11.0][Bugfix] fix sleepmode level2 e2e test (#4023) wangx700 2025-11-08 14:11:15 +08:00
  • f9842560cb [0.11.0][Perf] Add padding vision tower for Qwen2_5_Omni (#4041) tingfu 2025-11-08 13:56:05 +08:00
  • d4e2a44307 [Cherry Pick from pr#3981][0.11.0][P/D]Make kv-transfer env variable take effect & Fix load-balance proxy (#3983) zxr2333 2025-11-08 13:52:33 +08:00
  • 8e72758645 [BugFix]Fix grouplist type of mc2. (#4049) offline893 2025-11-07 17:43:23 +08:00
  • 016337eaec [v0.11.0][UT] Add new ut case for aclgraph enable (#4038) lilinsiman 2025-11-07 11:35:24 +08:00
  • f9494d978a [cherry-pick][v0.11.0-dev][bugfix] Fix a rare bug triggered by _npu_paged_attention in FULL_DECODE_ONLY mode (#3987) Angazenn 2025-11-06 23:08:57 +08:00
  • 27547a10e6 [MM][Bugfix] Add MoE verification for multi-modal models (#3897) (#4027) Shanshan Shen 2025-11-06 20:30:40 +08:00
  • 3db53d117e [0.11.0][doc] add aclgraph developer guide (#3947) zzzzwwjj 2025-11-06 09:54:38 +08:00
  • 7ee0b0b5d8 [cherry-pick]Upgrade CANN to 8.3.rc1 (#3945) (#3962) wangxiyuan 2025-11-06 09:05:08 +08:00
  • 66b67f9cf2 [Bugfix][SHM] Fix weak memory ordering problem in share memory (#3988) Zetong Li 2025-11-04 23:07:23 +08:00
  • 954dab64fb [v0.11.0][P/D]Set adxl as default backend and update readme (#3771) zxr2333 2025-11-04 16:06:58 +08:00
  • 0cead5c1ee Quality enhancement: Immediately interrupt execution when allocate NPU memory OOM (#3944) leo-pony 2025-11-04 08:55:22 +08:00
  • 7cc6208029 [0.11.0][MTP][Aclgraph] Fix the support aclgraph with MTP (#3912) Mengqing Cao 2025-11-03 14:25:37 +08:00
  • 8a7154001e [0.11.0]Chery pick pta upgrade change (#3940) wangxiyuan 2025-10-31 22:14:26 +08:00
  • 3d81ea03ed [v0.11.0-dev][bugfix] fix valueError in static_forward_context when prefix is empty (#3929) rjg-lyh 2025-10-31 15:45:06 +08:00
  • 9f7de45b75 [Bugfix] fix MTP support for lmhead_tensor_parallel_size (#3921) Nagisa125 2025-10-31 14:34:28 +08:00
  • ee2e55e602 [v0.11.0][Test] Add new test model for aclgraph single_request v0.11.0 (#3889) lilinsiman 2025-10-31 11:23:55 +08:00
  • 90aca84e60 fix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_len (#3909) zouyida2052 2025-10-31 09:25:06 +08:00
  • 387ce1cc5b add new e2e tests case for aclgraph memory to v0.11.0 (#3880) lilinsiman 2025-10-31 09:17:09 +08:00
  • 38afd2c9cb [bugfix_v0.11.0]cancel tokenize for layerwise_proxy (#3913) wangxiaoteng888 2025-10-30 23:55:04 +08:00
  • af7a56550b [bugfix_v0.11.0-dev] layerwise D first plan (#3907) wangxiaoteng888 2025-10-30 22:21:11 +08:00
  • d5a9aba03f [BugFix]Fix group list type of mc2. (#3890) offline893 2025-10-30 21:44:14 +08:00
  • c506ba60fb [v0.11.0] [Bugfix] [MoE]fix error in deepseek when using allgather (#3827) weichen 2025-10-30 14:59:46 +08:00
  • 211d4b9da4 [BugFix] Fix mlapo accuracy problem related with weight processing. (#3857) whx 2025-10-30 00:35:50 +08:00
  • d9249c968e bugfix for mtp in fullgraph (#3878) zouyida2052 2025-10-29 23:52:20 +08:00