Commit Graph

  • 5ec96fd46c [long_seq_Feat] support chunk prefill (#4158) LookAround0301 2025-11-14 08:43:37 +08:00
  • 7294f89e43 [CI] Add daily images build for nightly ci (#3989) Li Wang 2025-11-13 20:10:12 +08:00
  • f7d1f73b98 [CI] Remove unsupported python 3.9 format check (#4172) Nengjun Ma 2025-11-13 16:47:24 +08:00
  • 49818dbbed [Test]Add ut test qwen3_moe and sfa (#4121) CodeCat 2025-11-13 16:09:22 +08:00
  • adee9dd3b1 [Info][main] Correct the mistake in information documents (#4157) lilinsiman 2025-11-13 15:53:58 +08:00
  • 51e5806d76 [0.11.0-dev][Bugfix][EPLB] Quick fix for missing log2phy conversion (#4150) weichen 2025-11-13 14:32:40 +08:00
  • cd652acb65 [BugFix] Fix kv_no_split not contiguous (#3711) zhaozx-cn 2025-11-13 11:29:37 +08:00
  • fdd2db097a [BugFix] Fix kv_no_split not contiguous (#3594) zhaozx-cn 2025-11-13 11:28:09 +08:00
  • 9d84172359 [BugFix] adapted e2e tests for Qwen3-next-mtp (#4160) drslark 2025-11-13 11:08:35 +08:00
  • 5093192769 [Bugfix] fix mtp profile run error where main model and mtp model use different quantization (#4102) realliujiaxu 2025-11-13 11:02:31 +08:00
  • 17259cb265 [Perf] [MoE] optimize all2allv (#3738) weichen 2025-11-13 09:38:11 +08:00
  • 6bc770cd78 [Perf] fix async copy for async scheduling (#4113) realliujiaxu 2025-11-13 09:11:26 +08:00
  • c272747d13 Upgrade to 0.11.1 newest vllm commit (#3982) 22dimensions 2025-11-12 23:01:19 +08:00
  • 3ca11d5a7c [CI] Fix nightly-ci (#4159) Li Wang 2025-11-12 22:06:49 +08:00
  • 28a15299ea [cherry-pick][v0.11.0-dev][bugfix] Change seq_lens in dummy attn_metadata to max_query_len (#4099) Angazenn 2025-11-12 20:32:50 +08:00
  • fc7e5cd9dc [main][bugfix] Change seq_lens in dummy attn_metadata to max_query_len (#4097) Angazenn 2025-11-12 17:31:39 +08:00
  • a123f355e9 [feature] support pcp + mtp (in pd co-locate scenario) (#4098) zhangsicheng5 2025-11-12 17:22:21 +08:00
  • 7732a89fd9 [v0.11.0][UT][Fixbug] Fix UT test (#4151) zhangxinyuehfad 2025-11-12 16:55:18 +08:00
  • 1b4ce63ec9 fix fullgraph in ds. (#4016) XiaoxinWang 2025-11-12 10:11:43 +08:00
  • c9e5b90f53 [Doc] Fix DeepSeek-3.2-Exp doc, remove v0.11.0rc0 outdated infos. (#4095) zhangyiming 2025-11-12 09:11:31 +08:00
  • 638dbcdb32 [Perf] Remove D2H operations to imporve performance (#4063) Yizhou 2025-11-12 09:08:55 +08:00
  • e38fe92f40 [Misc][Doc] Add service profiling feature with user guide (#3756) thonean 2025-11-12 09:07:14 +08:00
  • 1c677c3b87 [Test][Accuracy] Add accuracy evaluation config for InternVL3_5-8B (#3964) Canlin Guo 2025-11-12 09:05:55 +08:00
  • 46a41b26d3 oproj TP support acl graph (#4073) zzhxxx 2025-11-11 19:39:06 +08:00
  • 0e6e08e939 [TEST]Update nightly cases and add mtpx (#4111) jiangyunfan1 2025-11-11 17:39:58 +08:00
  • 9cc42226d5 [CI] Integrate mooncake to vllm-ascend base image (#4062) Li Wang 2025-11-11 15:51:16 +08:00
  • f811a24bf0 Remove VLLM_USE_V1 (#4086) wangxiyuan 2025-11-11 15:43:39 +08:00
  • d5567680a2 [Fixbug] Fix ut test (#4116) zhangxinyuehfad 2025-11-11 15:31:00 +08:00
  • fae1c59a79 [Fix] Refactor and fix dist test to e2e full test (#3808) zhangxinyuehfad 2025-11-11 10:36:05 +08:00
  • b77b4f1abf [Test] Add nightly test for DeepSeek-V3.2-Exp (#3908) zhangxinyuehfad 2025-11-11 10:29:57 +08:00
  • 650ce8ad19 [0.11.0][Bugfix] Fix ngram precision issue and open e2e ngram test (#4092) zhaomingyu13 2025-11-11 09:58:03 +08:00
  • e384755ce1 [Doc] Recover installation doc to use pip install (#4109) Yikun Jiang 2025-11-11 09:25:44 +08:00
  • 71866d5311 [feature] chunkprefill support pcp&dcp (#3801) Apocalypse 2025-11-11 09:18:02 +08:00
  • 2069bef449 [v0.11.0-dev][bugfix] Fix a bug in wrongly set npu_stream (#4106) Angazenn 2025-11-11 09:16:41 +08:00
  • 7ffbe73d54 [main][Bugfix] Fix ngram precision issue and open e2e ngram test (#4090) zhaomingyu13 2025-11-11 09:06:24 +08:00
  • 64220c68c5 [Doc] Add release note for v0.11.0rc1 (#3931) wangxiyuan 2025-11-10 21:01:50 +08:00
  • c5fe179cef [0.11.0] [Cherry-pick #4058] Fixes Qwen3-Next enable nz accuracy problem (#4056) Icey 2025-11-10 20:56:39 +08:00
  • e04a87f4be [BugFix] Fixes Qwen3-Next enable nz accuracy problem (#4058) Icey 2025-11-10 20:54:57 +08:00
  • e6625bb582 [Doc] add qwen3 w4a4 tutorial (#4076) 22dimensions 2025-11-10 20:30:07 +08:00
  • ebd45b6596 [V0.11.0][Core] Restore scheduling logic under default configuration (#4094) rjg-lyh 2025-11-10 20:02:23 +08:00
  • a1558b99c2 [Core] Restore scheduling logic under default configuration (#3967) rjg-lyh 2025-11-10 17:48:56 +08:00
  • c3c9138719 [Perf] Move attention update stream out of loop to optimize performance (#3985) XiaoxinWang 2025-11-10 17:18:45 +08:00
  • 75c3f9a780 [Typo] LLama has been changed to Llama (#4089) herizhen 2025-11-10 16:22:52 +08:00
  • d913f9474b [0.11.0][Fix] Fix Qwen2-Audio-7B-Instruct accuracy test (#4018) zhangxinyuehfad 2025-11-10 11:54:30 +08:00
  • d40ba52454 [Fix] fix Qwen2-Audio-7B-Instruct accuracy test (#4017) zhangxinyuehfad 2025-11-10 11:54:18 +08:00
  • 7ea17fbee3 [0.11.0][BugFix] Improve the performance of prefixcache features (#4021) hucong 2025-11-10 11:51:34 +08:00
  • de49fb3deb [Feature][Build] Upgrade the minimum version to 3.10 (#3926) Canlin Guo 2025-11-10 11:50:12 +08:00
  • 0a62e671fb [Feat] flashcomm_v2 optim solution (#3232) Levi 2025-11-10 11:01:45 +08:00
  • b1a00e0512 [docs] [P/D] add feature guide for disaggregated-prefill (#3950) wangxiaoteng888 2025-11-10 09:31:30 +08:00
  • a74e76b02d [Doc] Remove extra MLAPO installation step for DeepSeek-V3.2. (#4024) zhangyiming 2025-11-10 09:09:59 +08:00
  • c2d58c0655 [P/D][BugFix][v0.11.0-dev]Fix proxy format processing errors & Layerwise connector performance optimization (#4069) wangxiaoteng888 2025-11-09 09:55:10 +08:00
  • c116524379 [TEST]Add qwen3-235b-w8a8 and qwen3-30b-w8a8 nightly test (#3973) jiangyunfan1 2025-11-08 18:49:28 +08:00
  • a3ff765c65 [Info][main] Corrected the errors in the information (#4055) lilinsiman 2025-11-08 18:48:59 +08:00
  • 1d7cb5880a [Bugfix]fix pcp dcp attn aclgraph (#4066) weiguihua2 2025-11-08 18:47:12 +08:00
  • 48094148f8 [BugFix] Improve the performance of prefixcache features (#4022) hucong 2025-11-08 18:45:31 +08:00
  • 1d81a289d0 [P/D][BugFix]Fix proxy format processing errors & Layerwise connector performance optimization (#4043) zxr2333 2025-11-08 18:44:06 +08:00
  • 24d6314718 [Bugfix] fix sleepmode level2 e2e test (#4019) wangx700 2025-11-08 14:11:55 +08:00
  • 55e37f5041 [v0.11.0][Bugfix] fix sleepmode level2 e2e test (#4023) wangx700 2025-11-08 14:11:15 +08:00
  • f9842560cb [0.11.0][Perf] Add padding vision tower for Qwen2_5_Omni (#4041) tingfu 2025-11-08 13:56:05 +08:00
  • d4e2a44307 [Cherry Pick from pr#3981][0.11.0][P/D]Make kv-transfer env variable take effect & Fix load-balance proxy (#3983) zxr2333 2025-11-08 13:52:33 +08:00
  • f7ca3bc0fa [CI]Fix eplb ci. (#4052) offline893 2025-11-07 23:53:35 +08:00
  • 8e72758645 [BugFix]Fix grouplist type of mc2. (#4049) offline893 2025-11-07 17:43:23 +08:00
  • e687d6af85 [BugFix]Fix group list type of mc2. (#4047) offline893 2025-11-07 17:41:56 +08:00
  • 23b785fdfb [Feat] Adapted mtp function to Qwen3-next (#3918) drslark 2025-11-07 16:39:03 +08:00
  • 016337eaec [v0.11.0][UT] Add new ut case for aclgraph enable (#4038) lilinsiman 2025-11-07 11:35:24 +08:00
  • 46ef280105 [Doc] Add model feature matrix table. (#4040) zhangyiming 2025-11-07 11:28:05 +08:00
  • 22286fc67d [UT] Add new ut case for aclgraph in auto enable (#4031) lilinsiman 2025-11-07 10:39:11 +08:00
  • 79e536d939 [Feat] update op for mla (#4000) LookAround0301 2025-11-07 09:48:39 +08:00
  • f8610b7d67 [long_seq] fix A2 accuracy problem (#4030) LookAround0301 2025-11-07 09:29:33 +08:00
  • f9494d978a [cherry-pick][v0.11.0-dev][bugfix] Fix a rare bug triggered by _npu_paged_attention in FULL_DECODE_ONLY mode (#3987) Angazenn 2025-11-06 23:08:57 +08:00
  • e0d58d543b [main][bugfix] Fix a rare bug triggered by _npu_paged_attention in FULL_DECODE_ONLY mode (#3986) Angazenn 2025-11-06 23:08:07 +08:00
  • 1804b60ec8 [BugFix][main] Adapted to torch_npu.npu_fused_infer_attention_score (#4025) drslark 2025-11-06 22:00:24 +08:00
  • 27547a10e6 [MM][Bugfix] Add MoE verification for multi-modal models (#3897) (#4027) Shanshan Shen 2025-11-06 20:30:40 +08:00
  • 22005c64c1 [Bugfix] Add constraints for sequence parallelism (#4014) realliujiaxu 2025-11-06 20:02:03 +08:00
  • 259eb25f88 [CI] Quick fix mooncake for nightly-ci (#4028) Li Wang 2025-11-06 18:46:00 +08:00
  • 34b278a339 [TEST]Update nightly acc test standard (#4032) jiangyunfan1 2025-11-06 16:58:38 +08:00
  • 2eebe1dc0a [feat]decode convert bsnd to tnd and fix bug when pcp and dcp (#3980) weiguihua2 2025-11-06 14:58:24 +08:00
  • 25b24c02ea [Feat](Mooncake) Supports multiple input suffixes for global_segment_size (#3690) Liziqi-77 2025-11-06 14:48:15 +08:00
  • b206e831e9 [P/D]Make kv-transfer env variable take effect & Fix load-balance proxy (#3981) zxr2333 2025-11-06 12:02:47 +08:00
  • 3db53d117e [0.11.0][doc] add aclgraph developer guide (#3947) zzzzwwjj 2025-11-06 09:54:38 +08:00
  • 737cad2b6b [Test] Refactor accuracy test to nightly test (#3814) zhangxinyuehfad 2025-11-06 09:06:59 +08:00
  • 7ee0b0b5d8 [cherry-pick]Upgrade CANN to 8.3.rc1 (#3945) (#3962) wangxiyuan 2025-11-06 09:05:08 +08:00
  • b1488ecdb1 [main][doc][kv_pool]Add adxl timeout parameter in kv pool user guide (#4012) pz1116 2025-11-05 18:39:35 +08:00
  • 5cff3069f4 [Doc]Add developer guide of eplb. (#3759) offline893 2025-11-05 18:35:41 +08:00
  • e0c23cb011 [docs] Add kv pool developer guide (#3752) pz1116 2025-11-05 18:03:36 +08:00
  • 1ba158567c [Doc] add mtp doc (#3770) zouyida2052 2025-11-05 16:38:35 +08:00
  • 3ac76fdccc [Doc] Update version policy (#3999) wangxiyuan 2025-11-05 14:55:54 +08:00
  • 46d5a77688 [docs] add aclgraph developer guide (#3683) zzzzwwjj 2025-11-05 10:34:28 +08:00
  • 738bf2b720 support qwen3-next full_decode_only mode. (#3949) XiaoxinWang 2025-11-05 08:46:05 +08:00
  • 66b67f9cf2 [Bugfix][SHM] Fix weak memory ordering problem in share memory (#3988) Zetong Li 2025-11-04 23:07:23 +08:00
  • 5f08e07208 [Doc] Refactor the DeepSeek-V3.2-Exp tutorial. (#3871) zhangyiming 2025-11-04 18:58:33 +08:00
  • 49e6983b3b [Test] Add accuracy test for qwen3-30b-a3b-w8a8 (#3807) zhangxinyuehfad 2025-11-04 18:56:31 +08:00
  • 5fed166a99 [ModelRunner][Refactor] Refactor kv cache tensor initialization logic (#3106) Mengqing Cao 2025-11-04 17:26:54 +08:00
  • bedf223771 [Perf] move quant before allgather in Allgather EP (#3420) realliujiaxu 2025-11-04 16:49:58 +08:00
  • 44b58b8665 [TEST]Add full graph for multimodal nightly tests (#3968) jiangyunfan1 2025-11-04 16:47:48 +08:00
  • 954dab64fb [v0.11.0][P/D]Set adxl as default backend and update readme (#3771) zxr2333 2025-11-04 16:06:58 +08:00
  • 15bb5098ad [PD Disaggregation]Set adxl engine as default backend and update README (#3761) zxr2333 2025-11-04 16:06:39 +08:00
  • dc1a6cb503 [Test]Add accuracy test for multiple models (#3823) ZengSilong 2025-11-04 14:46:39 +08:00
  • e9bb4491ec [BugFix] Fix deepseek v3.2 mtp bug. (#3900) whx 2025-11-04 14:06:59 +08:00
  • 646fbac7a9 [Test] Add accuracy test for qwen3-8b-w8a8 (#3799) zhangxinyuehfad 2025-11-04 09:23:11 +08:00