Commit Graph

  • 46df67a5e9 [bugfix] Improve log level and info for custom ops build (#937) yangpuPKU 2025-05-23 10:05:57 +08:00
  • 8ddc0a1002 [DOC] mark v1 multi-lora functional (#932) yupeng 2025-05-22 19:53:14 +08:00
  • 0f53b138f6 [V1][LoRA][Test] V1 Engine LoRA support & e2e test (#893) yupeng 2025-05-22 19:20:51 +08:00
  • 7aa4f85f10 [Bugfix][kvcache] revert multiple kv cache groups (#923) Mengqing Cao 2025-05-22 15:15:33 +08:00
  • b4d6672d01 [BugFix] Fix chunked prefill bugs in engine v1 (#844) rjg-lyh 2025-05-22 10:33:50 +08:00
  • a73bd6caf4 [Fix] Set div_mode to False and fix view_as position (#912) yiz-liu 2025-05-22 09:57:25 +08:00
  • 58b413752b [Doc] Support XLM-RoBERTa-based and MiniCPM3 model (#820) hfadzxy 2025-05-21 15:44:54 +08:00
  • d5401a08be [DOC] update modelslim version (#908) 22dimensions 2025-05-21 09:12:02 +08:00
  • 5cf9ff18e9 [Performance]: Custom AscendC Kernel of Multi-Step Prepare Input (#814) Wan_Danfeng 2025-05-20 09:31:30 +08:00
  • 00e0243561 enable online serving quantization (#877) 22dimensions 2025-05-17 17:36:04 +08:00
  • a8730e7a3c [Doc] update quantization docs with QwQ-32B-W8A8 example (#835) 22dimensions 2025-05-17 15:25:17 +08:00
  • 7326644513 [CI] Fix qwen2.5 vl CI failure (#888) wangxiyuan 2025-05-17 05:13:32 +08:00
  • df16c4f2bc [CI/UT] Ignore vllm/tests/test_vllm_port.py (#887) Mengqing Cao 2025-05-16 18:52:59 +08:00
  • 7a325b2e2d [Bugfix][Model] Fix fusedmoe and make modelrunner_v1 compatible with latest vllm (#867) Mengqing Cao 2025-05-16 12:14:55 +08:00
  • fd515cd60b [Doc][BugFix]Fix Release Compatibility Matrix (#865) hfadzxy 2025-05-15 15:38:38 +08:00
  • 1e67089bc9 [BugFix]add all2all when dp_size > 1 && downgrade npu_dequant_swiglu_quant (#819) Angazenn 2025-05-15 09:19:55 +08:00
  • 68fb63428b [CI] Patch torch.library.infer_schema for fused moe ops to fix CI (#854) wangxiyuan 2025-05-14 19:49:09 +08:00
  • 508242425c [CI][1/N] Add basic ci for PD disaggregation (#830) Yikun Jiang 2025-05-14 18:04:16 +08:00
  • 59e02502b1 [CI] Add e2e test frame work and doctest (#730) Yikun Jiang 2025-05-14 09:27:54 +08:00
  • 857f489cbf [CI] Patch torch.library.infer_schema for torch 2.5 backward compatibility (#837) wangxiyuan 2025-05-14 09:20:55 +08:00
  • e564470338 [Attention][Kernel]moe support for llama4 and mllama4 (#740) cxcxflying 2025-05-13 19:12:40 +08:00
  • 217211d8a3 [Misc][Doc] Add the latest stable release url (#826) hfadzxy 2025-05-13 12:53:23 +08:00
  • c6ac399091 [Bugfix] Fix the method of importing environment variables in DeepSee… (#817) rjg-lyh 2025-05-13 12:52:30 +08:00
  • 6193ba679b [CI] add codespell CI and fix format.sh (#827) wangxiyuan 2025-05-12 22:04:48 +08:00
  • 5998704c08 [BugFix] Fix ascend scheduler bugs. (#822) whx 2025-05-12 21:15:17 +08:00
  • 701b0fd95e [Enhancement] Add padding for ACL Graph (#803) yiz-liu 2025-05-12 20:26:22 +08:00
  • efabd722eb feat: support torchair graph mode in v1 engine (#789) NeverRaR 2025-05-12 19:14:07 +08:00
  • 4a2505f81f [accuracy test]Update cann version and huggingface-hub version for Qwen3 (#823) hfadzxy 2025-05-12 19:12:48 +08:00
  • 5305a2ccf9 [Bugfix] Tweak distributed process group initialization and add dummy… (#816) yiz-liu 2025-05-12 17:31:29 +08:00
  • 4df1e99614 [CI] Re-enable vllm-empty/tests/benchmarks (#812) Li Wang 2025-05-12 15:50:48 +08:00
  • 8e4e791fcd [CI] Add deepseek-v2-lite test (#631) Li Wang 2025-05-12 14:59:17 +08:00
  • cdece86f2c [Bugfix] Add max_num_batched_tokens to InputBatch to make main CI pass (#806) Li Wang 2025-05-12 00:36:56 +08:00
  • 218f21de21 [Benchmarks] Add qwen2.5-7b test (#763) Li Wang 2025-05-10 09:47:42 +08:00
  • 19c8e134e4 [CI/UT] fix spec ut in vllm-ascend main and vllm main (#759) wemaster 2025-05-10 09:45:56 +08:00
  • 58d2f85c4a [CI] Fix schedule trigger bug (#757) Li Wang 2025-05-10 09:45:07 +08:00
  • 804ebb17bd [Doc] Move Release Compatibility Matrix to top and remove v0.7.x rc info (#799) Yikun Jiang 2025-05-09 16:41:50 +08:00
  • fa99f89e93 [Core] Support the features of prefix cache and chunked prefill in v0/v1 (#782) rjg-lyh 2025-05-09 16:39:28 +08:00
  • 324f819b92 [Perf] Optimize fused_experts quantization code to save npu memory (#784) ApsarasX 2025-05-09 15:09:37 +08:00
  • 2c685e3b61 [Bugfix] Correct method call for _set_cos_sin_cache (#774) Jade Zheng 2025-05-09 12:55:57 +08:00
  • 5301649108 [Doc] Add notes for OOM in FAQs (#786) zzzzwwjj 2025-05-08 16:28:29 +08:00
  • 6c020883a8 [WIP]Add Func: aclgraph_batch_size auto-adjust to different model (#771) chris668899 2025-05-08 16:23:33 +08:00
  • 2e3520e285 [Bugfix] Fix output tensor shape in vanilla_chunked_prefill and update import paths for model_loader (#773) yiz-liu 2025-05-08 14:19:26 +08:00
  • ec27af346a [Doc] Add 0.8.5rc1 release note (#756) Yikun Jiang 2025-05-06 23:46:35 +08:00
  • 2cd036ee8e [Bugfix] fix accuracy problem for quantized deepseek models (#768) linfeng-yuan 2025-05-06 22:09:56 +08:00
  • d6e9417652 [Bugfix] Fix masked_fill_ function typo (#769) ApsarasX 2025-05-06 21:54:52 +08:00
  • afe1767c17 [Core] Cleanup triton patch which has been fixed in vllm (#764) Yikun Jiang 2025-05-06 18:52:15 +08:00
  • b0dbe5f8e1 [Bug fix] fix a typo in setup.py (#762) linfeng-yuan 2025-05-06 17:01:26 +08:00
  • 5897dc5bbe [Build] Bump vLLM version to v0.8.5.post1 (#755) Yikun Jiang 2025-05-06 11:44:12 +08:00
  • d6bfae8eee support 32K model len on deepseek r1 W8A8 (#728) sunbaosong 2025-05-06 10:12:07 +08:00
  • 79538b5d73 Upgrade CANN version to 8.1.rc1 (#747) Yikun Jiang 2025-05-06 05:44:18 +08:00
  • d7e1110c8e Re-patch TritonPlaceholder on main to make CI happy (#753) Yikun Jiang 2025-05-05 23:22:24 +08:00
  • d2ead057ae Re-enable Speculative Decode test for vLLM v0.8.5 (#749) Yikun Jiang 2025-05-02 14:44:48 +08:00
  • 8b194ad12e [Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694) whx 2025-05-01 22:31:36 +08:00
  • 84e2ed898b performance optimization, usability optimization and API compatibility adjustments for deepseek with npu graph mode (#731) linfeng-yuan 2025-05-01 13:51:42 +08:00
  • 399b03830d [Build][Bugfix] Fix source code path to avoid reference error (#726) Mengqing Cao 2025-04-30 17:38:13 +08:00
  • 3a628891ab [Feature] Add quant description file for new quant model generated by modelslim (#719) Pleaplusone 2025-04-30 16:51:56 +08:00
  • affca6f348 [Test] Add accuracy test report workflow (#542) hfadzxy 2025-04-30 14:53:58 +08:00
  • ba9714ccee Optimize qwen2_vl and qwen2_5_vl (#701) zouyida2052 2025-04-30 14:22:38 +08:00
  • 90aabaeb2e [Doc] Add benchmark guide (#635) Li Wang 2025-04-30 09:17:59 +08:00
  • f8350569e6 [CI] upgrade vllm to 0.8.5 (#715) wangxiyuan 2025-04-30 09:15:50 +08:00
  • 95e7aa4736 [Platform] format platform to make it more clear (#610) wangxiyuan 2025-04-30 09:03:10 +08:00
  • b917361ca5 [MISC] Clean up torch_npu (#688) wangxiyuan 2025-04-29 18:03:38 +08:00
  • 0329fad927 [Perf] Deepseekv3 performance optimization for eager mode (#598) Pleaplusone 2025-04-29 17:12:03 +08:00
  • 87975fa058 [Bugfix] Fix early return in CustomDeepseekV2MoE.forward during profile_run (#682) ApsarasX 2025-04-29 17:06:19 +08:00
  • 7aee9228f0 [CI] Add nightly CI (#668) Li Wang 2025-04-29 16:35:52 +08:00
  • d6be63e11d [CI] Add Qwen3-0.6B-Base test (#717) Li Wang 2025-04-29 14:35:19 +08:00
  • 0dae55a9a3 [MISC] fix format check error (#654) wangxiyuan 2025-04-29 11:14:19 +08:00
  • 1fce70a2fb [Model] Support common fused moe ops for moe model, such as Qwen3Moe (#709) wangxiyuan 2025-04-28 21:57:01 +08:00
  • 40bd602485 [Feature] Use reshape_and_cache fused op (#706) Jade Zheng 2025-04-28 21:54:42 +08:00
  • d39855b075 Update installation and tutorial doc (#711) Yikun Jiang 2025-04-28 21:52:17 +08:00
  • 5995d23532 [Doc] Add 0.8.4rc2 release note (#705) wangxiyuan 2025-04-28 21:51:35 +08:00
  • 54c0e63df7 [MTP] follow custom deepseek modeling changes to support graph mode (#636) wemaster 2025-04-28 21:18:53 +08:00
  • be9e3e8545 [Bugfix] Fix triton placeholder patch period (#704) Mengqing Cao 2025-04-28 18:52:03 +08:00
  • 58f9d932d3 [Doc] Update faqs (#699) Li Wang 2025-04-28 18:48:23 +08:00
  • d0a0c81ced [Doc] Add deepsee-v2-lite w8a8 quantization turorial (#630) Li Wang 2025-04-28 17:14:26 +08:00
  • 5de3646522 [MISC] Make vllm version configurable (#651) wangxiyuan 2025-04-28 14:19:06 +08:00
  • 8849cf1eda Bump actions/setup-python from 5.5.0 to 5.6.0 (#697) dependabot[bot] 2025-04-28 14:06:38 +08:00
  • ee7a0e2cd4 Update openEuler dockerfile for COMPILE_CUSTOM_KERNELS=1 (#689) Icey 2025-04-28 11:45:46 +08:00
  • 38f34e359f [Fix] fix deepseek v0 attention eager mode (#671) Pleaplusone 2025-04-28 08:53:06 +08:00
  • 413657ae43 [FOLLOWUP][DOC] Fix pip install cmd in installation.md (#680) Yikun Jiang 2025-04-27 18:37:25 +08:00
  • 2e20797934 [BUILD] Upgrade torch-npu to 2.5.1 (#661) Yikun Jiang 2025-04-27 17:28:29 +08:00
  • fa4a5d980e [Bugfix] Remove redundant tensor creation and unused code (#656) Jade Zheng 2025-04-27 14:09:16 +08:00
  • ba3d8aae94 [Model][MiniCPM] support MiniCPM (#645) Mengqing Cao 2025-04-27 11:27:24 +08:00
  • 742f679c7d Remove prompt string from engine core data structures (#663) Yikun Jiang 2025-04-26 23:15:58 +08:00
  • c99c4c8c70 [Doc] Update feature support list (#650) wangxiyuan 2025-04-26 10:27:29 +08:00
  • 3879d9cad9 [CI] Fix sample backward compatibility problem (#648) wangxiyuan 2025-04-25 11:53:26 +08:00
  • d785e78563 [V1] Make V1 engine backward compatible (#637) yiz-liu 2025-04-24 17:20:11 +08:00
  • bd70ce828c [CI] Add qwen2.5-vl test (#643) Li Wang 2025-04-24 17:12:12 +08:00
  • a9c6b52205 [Bugfix] Fix qwen2.5-vl positon input bug (#639) Li Wang 2025-04-24 15:21:57 +08:00
  • 866ce7168c [Benchmark] Download model from modelscope (#634) Li Wang 2025-04-24 14:48:24 +08:00
  • 05bdcbeae4 support aclgraph (#426) Bug Hunter Yan 2025-04-23 20:56:24 +08:00
  • 5c6d05a59e support deepseek quant & mix-parallel with graphmode (#585) zzzzwwjj 2025-04-23 16:23:25 +08:00
  • e74331a1ed Add dp initialize patch with hccl backend (#626) Pleaplusone 2025-04-23 15:47:51 +08:00
  • 848e041a54 Using EvalScope evaluation (#611) RongRongStudio 2025-04-23 00:50:09 +08:00
  • 4a0ce3660e [Misc] Remove some parts of metrics patch (#603) Shanshan Shen 2025-04-22 18:45:21 +08:00
  • cf6ab42ee2 [CI]Add guided decoding test (#422) Li Wang 2025-04-22 17:50:06 +08:00
  • 538a69c145 [Patch] format patch module to make it more clear (#601) wangxiyuan 2025-04-22 14:13:00 +08:00
  • ad845bfe82 fix doc to mention env setting for v0.7.3-dev (#602) Shuqiao Li 2025-04-22 14:11:41 +08:00
  • d12a057df8 Add note for deepseek related docs and remove unnecessary comments (#590) Pleaplusone 2025-04-22 09:59:09 +08:00
  • c5850d302d [Doc] Update installation (#596) Mengqing Cao 2025-04-22 09:04:20 +08:00