Commit Graph

  • 6b094a2bd4 [ModelRunner]Add profile execute duration observation (#1013) depeng1994 2025-06-06 09:29:34 +08:00
  • 78431b3469 [perf]Support MOE Multi-stream in Deepseek (#947) David9857 2025-06-05 23:39:38 +08:00
  • 908a851a77 optimize the funtion of computing topk and topp in sampler. (#970) sherie 2025-06-05 16:42:18 +08:00
  • e1ab6d318e [Misc] Refactor additional_config (#1029) wangxiyuan 2025-06-05 16:28:01 +08:00
  • 7737aaa40f [CI] Add accuracy test for Qwen2.5-VL-3B-Instruct (#766) zhangxinyuehfad 2025-06-05 15:09:20 +08:00
  • b4cb0eecb6 [CI] Hotfix on benchmark results path (#1076) Li Wang 2025-06-05 12:53:46 +08:00
  • fd136e6762 Add vLLM Ascend project governance docs (#1070) Yikun Jiang 2025-06-05 11:56:51 +08:00
  • 31dd471574 [CI] Add workflow_dispatch and use main benchmarks directly (#1071) Li Wang 2025-06-05 10:29:30 +08:00
  • 9e855b70be Adjust concurrency group for each npu workflow (#1068) Yikun Jiang 2025-06-05 09:17:04 +08:00
  • afc4c0cd03 [Bugfix] Fix deepseek percision issue and add acc ci for it (#905) Mengqing Cao 2025-06-04 20:26:44 +08:00
  • da9acfca60 feat: support data parallel for deepseek (#1012) NeverRaR 2025-06-04 18:31:41 +08:00
  • 517811449e [CI] Re-enable sleep mode test and skip failure breaking CI (#990) Li Wang 2025-06-04 16:24:16 +08:00
  • eb2701e0b2 [CI] Remove workflow_dispatch and change schedule time (#1056) Li Wang 2025-06-04 01:19:20 +08:00
  • 06fb5a8d81 [CI][Bugfix] Upgrade escli to v0.2.1 to fix benchmark deps (#1055) Li Wang 2025-06-04 01:03:56 +08:00
  • 76dacf3fa0 [CI][Benchmark] Optimize performance benchmark workflow (#1039) Li Wang 2025-06-03 23:38:34 +08:00
  • 543380ceae [CI] Add merge conflict label job (#1050) wangxiyuan 2025-06-03 17:32:31 +08:00
  • f24375f318 Enable accuracy test for PR labeled with "*accuracy-test" (#1040) Yikun Jiang 2025-06-03 15:38:13 +08:00
  • 068c3a0167 [Bugfix] Add verification for quant_action.choices to avoid TypeError (#1046) Shanshan Shen 2025-06-03 11:44:45 +08:00
  • 93860574bb [ModelRunner][MultiModal] Remove legacy input mapper/processor from V0 (#951) Shanshan Shen 2025-06-03 11:32:03 +08:00
  • 6ec64a3f96 [bugfix] some bugs maybe fail to run (#896) NINGBENZHE 2025-06-03 11:07:33 +08:00
  • 92bc5576d8 Skip benchmarks/** in vllm ascend test (#1041) Yikun Jiang 2025-06-01 19:01:26 +08:00
  • 507ae627ca feat: support compile torchair graph while warming up (#839) NeverRaR 2025-05-31 06:03:03 +08:00
  • d9fb027068 [CI] Add benchmark workflows (#1014) Li Wang 2025-05-30 22:42:44 +08:00
  • 5a1689fc64 [Fix] Fix update_aclgraph_sizes when running MoE models (#913) yiz-liu 2025-05-30 15:17:11 +08:00
  • 3442fbdb23 [1/N][UT][v1 MTP] add basic v1 mtp features (#890) XWFAlone 2025-05-30 08:59:58 +08:00
  • 5903547d09 [doc] add 0.7.3.post1 release note (#1008) wangxiyuan 2025-05-29 17:38:34 +08:00
  • c464c32b81 add doc for offline quantization inference (#1009) 22dimensions 2025-05-29 17:32:42 +08:00
  • 05a471001b bugfix for qwen2_5_vl (#805) zouyida2052 2025-05-29 17:20:39 +08:00
  • a93bed4535 [aclgraph] implentment NPUPiecewiseBackend to enable aclgraph (#836) Mengqing Cao 2025-05-29 11:58:26 +08:00
  • cc74b97f74 [Bugfix][V1] Fix deepseek with v1 (#958) Mengqing Cao 2025-05-29 11:57:43 +08:00
  • e3c7f71462 [Perf] Refactor tensor disposal logic to reduce memory usage (#966) ApsarasX 2025-05-29 11:48:26 +08:00
  • 6eddbd2521 [CI/UT][PD Disaggreate] Initialize PD Disaggreate UT (#889) Mengqing Cao 2025-05-29 10:17:12 +08:00
  • f6e5decc10 [CI] upgrade to vllm 0.9.0 (#959) wangxiyuan 2025-05-28 21:18:41 +08:00
  • e2a0c19cea [CI] Refactor CI (#952) wangxiyuan 2025-05-28 06:31:35 +08:00
  • 9f5ab59e30 [WIP][BugFix]Fix accuracy issues caused by wrong etp_size passed into FusedMoEParallelConfig when using vLLM 0.9.0 (#961) Angazenn 2025-05-27 15:16:17 +08:00
  • 01e3d59eae add workflow to build and release wheel (#775) Shuqiao Li 2025-05-26 14:18:26 +08:00
  • a0c3e9ba50 [Bugfix] Adjust inputbatch to be compatible with latest vllm (#945) Mengqing Cao 2025-05-26 10:33:28 +08:00
  • 1f9fb869ad [BugFix] Fix accuracy bugs for unquantized deepseekv3 models (#897) Angazenn 2025-05-24 14:29:36 +08:00
  • 17f05b1089 [Feature] Add CustomQwen3MoeForCausalLM model (#925) yiz-liu 2025-05-23 15:50:48 +08:00
  • df58fb80ee Spec decode support for V1 Engine (#874) jiangpeng 2025-05-23 14:25:46 +08:00
  • a970b27e2d [WIP][Perf]remove unnecessary padding before MLA V1 prefill (#917) Angazenn 2025-05-23 14:14:06 +08:00
  • dc6172efd3 update attention nz and mla nz(Improve TPOP 6ms performance) (#909) ttanzhiqiang 2025-05-23 10:18:10 +08:00
  • 7153d8890b [Feature] Impl v1 disaggregated prefill in ascend scheduler (#852) Jade Zheng 2025-05-23 10:15:29 +08:00
  • b434f37b46 [V1] Revert the default value of enable_chunked_prefill in additional… (#935) rjg-lyh 2025-05-23 10:06:50 +08:00
  • 46df67a5e9 [bugfix] Improve log level and info for custom ops build (#937) yangpuPKU 2025-05-23 10:05:57 +08:00
  • 8ddc0a1002 [DOC] mark v1 multi-lora functional (#932) yupeng 2025-05-22 19:53:14 +08:00
  • 0f53b138f6 [V1][LoRA][Test] V1 Engine LoRA support & e2e test (#893) yupeng 2025-05-22 19:20:51 +08:00
  • 7aa4f85f10 [Bugfix][kvcache] revert multiple kv cache groups (#923) Mengqing Cao 2025-05-22 15:15:33 +08:00
  • b4d6672d01 [BugFix] Fix chunked prefill bugs in engine v1 (#844) rjg-lyh 2025-05-22 10:33:50 +08:00
  • a73bd6caf4 [Fix] Set div_mode to False and fix view_as position (#912) yiz-liu 2025-05-22 09:57:25 +08:00
  • 58b413752b [Doc] Support XLM-RoBERTa-based and MiniCPM3 model (#820) hfadzxy 2025-05-21 15:44:54 +08:00
  • d5401a08be [DOC] update modelslim version (#908) 22dimensions 2025-05-21 09:12:02 +08:00
  • 5cf9ff18e9 [Performance]: Custom AscendC Kernel of Multi-Step Prepare Input (#814) Wan_Danfeng 2025-05-20 09:31:30 +08:00
  • 00e0243561 enable online serving quantization (#877) 22dimensions 2025-05-17 17:36:04 +08:00
  • a8730e7a3c [Doc] update quantization docs with QwQ-32B-W8A8 example (#835) 22dimensions 2025-05-17 15:25:17 +08:00
  • 7326644513 [CI] Fix qwen2.5 vl CI failure (#888) wangxiyuan 2025-05-17 05:13:32 +08:00
  • df16c4f2bc [CI/UT] Ignore vllm/tests/test_vllm_port.py (#887) Mengqing Cao 2025-05-16 18:52:59 +08:00
  • 7a325b2e2d [Bugfix][Model] Fix fusedmoe and make modelrunner_v1 compatible with latest vllm (#867) Mengqing Cao 2025-05-16 12:14:55 +08:00
  • fd515cd60b [Doc][BugFix]Fix Release Compatibility Matrix (#865) hfadzxy 2025-05-15 15:38:38 +08:00
  • 1e67089bc9 [BugFix]add all2all when dp_size > 1 && downgrade npu_dequant_swiglu_quant (#819) Angazenn 2025-05-15 09:19:55 +08:00
  • 68fb63428b [CI] Patch torch.library.infer_schema for fused moe ops to fix CI (#854) wangxiyuan 2025-05-14 19:49:09 +08:00
  • 508242425c [CI][1/N] Add basic ci for PD disaggregation (#830) Yikun Jiang 2025-05-14 18:04:16 +08:00
  • 59e02502b1 [CI] Add e2e test frame work and doctest (#730) Yikun Jiang 2025-05-14 09:27:54 +08:00
  • 857f489cbf [CI] Patch torch.library.infer_schema for torch 2.5 backward compatibility (#837) wangxiyuan 2025-05-14 09:20:55 +08:00
  • e564470338 [Attention][Kernel]moe support for llama4 and mllama4 (#740) cxcxflying 2025-05-13 19:12:40 +08:00
  • 217211d8a3 [Misc][Doc] Add the latest stable release url (#826) hfadzxy 2025-05-13 12:53:23 +08:00
  • c6ac399091 [Bugfix] Fix the method of importing environment variables in DeepSee… (#817) rjg-lyh 2025-05-13 12:52:30 +08:00
  • 6193ba679b [CI] add codespell CI and fix format.sh (#827) wangxiyuan 2025-05-12 22:04:48 +08:00
  • 5998704c08 [BugFix] Fix ascend scheduler bugs. (#822) whx 2025-05-12 21:15:17 +08:00
  • 701b0fd95e [Enhancement] Add padding for ACL Graph (#803) yiz-liu 2025-05-12 20:26:22 +08:00
  • efabd722eb feat: support torchair graph mode in v1 engine (#789) NeverRaR 2025-05-12 19:14:07 +08:00
  • 4a2505f81f [accuracy test]Update cann version and huggingface-hub version for Qwen3 (#823) hfadzxy 2025-05-12 19:12:48 +08:00
  • 5305a2ccf9 [Bugfix] Tweak distributed process group initialization and add dummy… (#816) yiz-liu 2025-05-12 17:31:29 +08:00
  • 4df1e99614 [CI] Re-enable vllm-empty/tests/benchmarks (#812) Li Wang 2025-05-12 15:50:48 +08:00
  • 8e4e791fcd [CI] Add deepseek-v2-lite test (#631) Li Wang 2025-05-12 14:59:17 +08:00
  • cdece86f2c [Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass (#806) Li Wang 2025-05-12 00:36:56 +08:00
  • 218f21de21 [Benchmarks] Add qwen2.5-7b test (#763) Li Wang 2025-05-10 09:47:42 +08:00
  • 19c8e134e4 [CI/UT] fix spec ut in vllm-ascend main and vllm main (#759) wemaster 2025-05-10 09:45:56 +08:00
  • 58d2f85c4a [CI] Fix schedule trigger bug (#757) Li Wang 2025-05-10 09:45:07 +08:00
  • 804ebb17bd [Doc] Move Release Compatibility Matrix to top and remove v0.7.x rc info (#799) Yikun Jiang 2025-05-09 16:41:50 +08:00
  • fa99f89e93 [Core] Support the features of prefix cache and chunked prefill in v0/v1 (#782) rjg-lyh 2025-05-09 16:39:28 +08:00
  • 324f819b92 [Perf] Optimize fused_experts quantization code to save npu memory (#784) ApsarasX 2025-05-09 15:09:37 +08:00
  • 2c685e3b61 [Bugfix] Correct method call for _set_cos_sin_cache (#774) Jade Zheng 2025-05-09 12:55:57 +08:00
  • 5301649108 [Doc] Add notes for OOM in FAQs (#786) zzzzwwjj 2025-05-08 16:28:29 +08:00
  • 6c020883a8 [WIP]Add Func: aclgraph_batch_size auto-adjust to different model (#771) chris668899 2025-05-08 16:23:33 +08:00
  • 2e3520e285 [Bugfix] Fix output tensor shape in vanilla_chunked_prefill and update import paths for model_loader (#773) yiz-liu 2025-05-08 14:19:26 +08:00
  • ec27af346a [Doc] Add 0.8.5rc1 release note (#756) Yikun Jiang 2025-05-06 23:46:35 +08:00
  • 2cd036ee8e [Bugfix] fix accuracy problem for quantized deepseek models (#768) linfeng-yuan 2025-05-06 22:09:56 +08:00
  • d6e9417652 [Bugfix] Fix masked_fill_ function typo (#769) ApsarasX 2025-05-06 21:54:52 +08:00
  • afe1767c17 [Core] Cleanup triton patch which has been fixed in vllm (#764) Yikun Jiang 2025-05-06 18:52:15 +08:00
  • b0dbe5f8e1 [Bug fix] fix a typo in setup.py (#762) linfeng-yuan 2025-05-06 17:01:26 +08:00
  • 5897dc5bbe [Build] Bump vLLM version to v0.8.5.post1 (#755) Yikun Jiang 2025-05-06 11:44:12 +08:00
  • d6bfae8eee support 32K model len on deepseek r1 W8A8 (#728) sunbaosong 2025-05-06 10:12:07 +08:00
  • 79538b5d73 Upgrade CANN version to 8.1.rc1 (#747) Yikun Jiang 2025-05-06 05:44:18 +08:00
  • d7e1110c8e Re-patch TritonPlaceholder on main to make CI happy (#753) Yikun Jiang 2025-05-05 23:22:24 +08:00
  • d2ead057ae Re-enable Speculative Decode test for vLLM v0.8.5 (#749) Yikun Jiang 2025-05-02 14:44:48 +08:00
  • 8b194ad12e [Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694) whx 2025-05-01 22:31:36 +08:00
  • 84e2ed898b performance optimization, usability optimization and API compatibility adjustments for deepseek with npu graph mode (#731) linfeng-yuan 2025-05-01 13:51:42 +08:00
  • 399b03830d [Build][Bugfix] Fix source code path to avoid reference error (#726) Mengqing Cao 2025-04-30 17:38:13 +08:00
  • 3a628891ab [Feature] Add quant description file for new quant model generated by modelslim (#719) Pleaplusone 2025-04-30 16:51:56 +08:00