Commit Graph

  • 7beb4339dc [Doc]Add developer guide for using OpenCompass (#368) hfadzxy 2025-03-31 00:24:25 +08:00
  • b6499ed97d [CI] Use CI pool (#428) wangxiyuan 2025-03-29 12:42:59 +08:00
  • ca8b1c3e47 [Doc] Add 0.7.3rc2 release note (#419) wangxiyuan 2025-03-29 09:02:08 +08:00
  • 31f29b9f30 [Core] Make V1 work and enable V1 engine test (#389) wangxiyuan 2025-03-28 19:34:23 +08:00
  • 57a84bb7be [Bug Fix] Fix bug of platform for parameter checking (#411) wuhuikx 2025-03-28 16:31:27 +08:00
  • b1557abab6 fix multistep bug,remove uselesscodes (#355) Tony 2025-03-28 09:55:35 +08:00
  • 1864c40520 Add vLLM Ascend Weekly meeting link (#400) Yikun Jiang 2025-03-27 09:00:21 +08:00
  • 4804b74e95 Update 110-user-story.yml (#402) Zhenyu Zheng 2025-03-27 08:58:57 +08:00
  • 0b5a9643fd Add an example for user stories (#399) Zhenyu Zheng 2025-03-26 16:25:57 +08:00
  • 122505208f FastPatch: Optimized Patch Embedding for Qwen2VL (#345) BAI Fan 2025-03-26 14:28:20 +08:00
  • d4accf4ec2 [Doc][Model] update LLaVA 1.6 support (#373) Mengqing Cao 2025-03-26 09:07:55 +08:00
  • 6295d2e9bc [CI/Build][Doc] upgrade torch-npu to 0320 (#392) Mengqing Cao 2025-03-26 09:04:12 +08:00
  • 3fb3b5cf75 [Doc] Update model support doc (add QwQ-32B) (#388) Shanshan Shen 2025-03-25 11:40:50 +08:00
  • 8996733307 [CI] fix vllm test (#365) Mengqing Cao 2025-03-24 16:09:06 +08:00
  • 89ca63a2c2 [Bugfix] Disable torch.compile() (#370) Shanshan Shen 2025-03-21 15:55:51 +08:00
  • 9a175ca0fc [Doc]Add benchmark scripts (#74) Li Wang 2025-03-21 15:54:34 +08:00
  • befbee5883 Update README and add collect_env info (#369) wangxiyuan 2025-03-21 15:43:43 +08:00
  • 243ed4da69 Add vLLM forum info and update readme (#366) Yikun Jiang 2025-03-21 09:32:42 +08:00
  • c06af8b2e0 [V1][Core] Add support for V1 Engine (#295) Shanshan Shen 2025-03-20 19:34:44 +08:00
  • 663dca7578 [CI] fix race condition problem (#353) wangxiyuan 2025-03-19 17:04:36 +08:00
  • 441a62e937 [Doc] Fix bugs of installation doc and format tool (#330) Shanshan Shen 2025-03-14 10:21:35 +08:00
  • ac1ba1d8d2 [Build] Fix x86 image build (#327) wangxiyuan 2025-03-14 09:41:57 +08:00
  • c25631ec7b [Doc] Add the release note for 0.7.3rc1 (#285) wangxiyuan 2025-03-13 17:57:06 +08:00
  • 41aba1cfc1 [Doc]Fix tutorial doc expression (#319) Li Wang 2025-03-13 15:24:05 +08:00
  • 59ea23d0d3 [Doc] Add Single NPU (Qwen2.5-VL-7B) tutorial (#311) xiemingda 2025-03-12 20:37:12 +08:00
  • 7330416de3 [BugFix] Fix bugs when using ascend quantization (#275) Angazenn 2025-03-12 11:33:21 +08:00
  • 5c7a95b01d [Attn] Support encoder-only attention with torch sdpa (#290) Mengqing Cao 2025-03-12 08:57:29 +08:00
  • 12aa7115b5 bugfix for qwen2_vl (#301) zouyida2002 2025-03-12 08:39:50 +08:00
  • 9450e9811b [CI] Uninstall triton in dockerfile (#298) wangxiyuan 2025-03-12 07:14:57 +08:00
  • 0db6670bfa [Feature] Implement EP-compatible fused_moe (#121) yiz-liu 2025-03-11 21:08:02 +08:00
  • 4c9d78a035 support multistep decode (#299) Tony 2025-03-11 19:20:06 +08:00
  • feb6bdb12e [Platform][Model Runner] Add hash of request_ids; Change blocksize back to 128. (#293) whx 2025-03-11 18:50:28 +08:00
  • 007aeaa48b [Doc] Change distributed_executor_backend to mp (#287) Yikun Jiang 2025-03-10 11:27:26 +08:00
  • 38334f5daa [Docs] Re-arch on doc and make QwQ doc work (#271) Yikun Jiang 2025-03-10 09:27:48 +08:00
  • 18bb8d1f52 Adapt vLLM requirements changes to fix main CI (#279) Yikun Jiang 2025-03-09 16:07:45 +08:00
  • 268da28961 Pin modelscope<1.23.0 on vLLM v0.7.3 (#272) Yikun Jiang 2025-03-09 15:59:42 +08:00
  • be58d5f3d8 Bump torch_npu version to dev20250308.3 (#276) Yikun Jiang 2025-03-09 15:59:15 +08:00
  • 91f7d8115d [CI/Build] Bump torch_npu to dev20250307.3 (#265) Mengqing Cao 2025-03-07 20:34:07 +08:00
  • faf8cd89cb register qwen2_vl to rewrite qwen2_vl forwad (#241) zouyida2002 2025-03-07 15:41:47 +08:00
  • 35cb7b5234 [CI] Add dispatch job to leverage dynamic devices (#251) Yikun Jiang 2025-03-07 09:47:13 +08:00
  • 3217f0d10f [Feature] Modify description and api for ascend quantization (#243) Angazenn 2025-03-06 15:17:25 +08:00
  • cff08f9df8 [Doc] Add initial FAQs (#247) Yikun Jiang 2025-03-06 10:42:42 +08:00
  • dcd0005058 [Fix] Remove npu_group_topk before CANN version update (#242) HongtaoYang 2025-03-06 09:02:46 +08:00
  • 0d3463400a [Performance] Change the shape of kv_cache to avoid view of k_cache and v_cache. (#204) whx 2025-03-05 10:51:07 +08:00
  • 562fa673e5 [Bugfix] Exclude collect_env.py from CODESPELL check in format.sh (#240) Shanshan Shen 2025-03-04 17:14:00 +08:00
  • 503f5045ff [ModelRunner] Remove redundant profile_run() in model runner (#224) Shanshan Shen 2025-03-04 16:58:33 +08:00
  • ae49bfd13a [Core] Support pooling (#229) wangxiyuan 2025-03-04 15:59:34 +08:00
  • 8fda31cafe [Doc] Update Feature Support doc (#234) Shanshan Shen 2025-03-04 14:18:32 +08:00
  • b9f0e25c16 [Misc] Add collect_env.py scripts for bug reporting (#175) Shanshan Shen 2025-03-04 14:14:37 +08:00
  • 839dac8d60 Install wget to fix image build (#231) Yikun Jiang 2025-03-04 09:01:23 +08:00
  • b64ee7d346 [Dist] Set device as rank (#202) Mengqing Cao 2025-03-03 09:23:13 +08:00
  • ebe14f20cf Recover vllm-ascend dev image (#209) Yikun Jiang 2025-03-03 09:08:41 +08:00
  • 6e358c4bef Add Document Branch Policy (#217) Yikun Jiang 2025-03-03 09:07:39 +08:00
  • 46740958f2 Add ray to docker image (#197) Yikun Jiang 2025-02-28 15:23:18 +08:00
  • 81dfaae88b Bump docker/setup-buildx-action from 2 to 3 (#191) dependabot[bot] 2025-02-28 09:06:46 +08:00
  • a710a7563a Bump docker/setup-qemu-action from 2 to 3 (#192) dependabot[bot] 2025-02-28 09:06:13 +08:00
  • a5564ed5d8 Bump actions/setup-python from 5.3.0 to 5.4.0 (#193) dependabot[bot] 2025-02-27 20:05:15 +08:00
  • 14bca9911a [CI] Fix unsolved bugs caused by pta api change. (#190) whx 2025-02-27 19:52:28 +08:00
  • 6aed83335c [CI] Add dependabot support and labeler workflow (#162) Yuanhao Ji 2025-02-27 19:46:31 +08:00
  • 03dc5c01fd [Doc] update multinode doc (#181) Mengqing Cao 2025-02-27 19:29:49 +08:00
  • 1715230867 [CI] Upgrade to newest pta.(MLA and FusedMoE) (#189) HongtaoYang 2025-02-27 18:50:52 +08:00
  • c131e43e7d [Worker]Lazy import torch_npu (#184) Li Wang 2025-02-27 16:52:11 +08:00
  • 6042c210bc [CI] upgrade to newest pta (#187) wangxiyuan 2025-02-27 16:40:23 +08:00
  • fd18ae6494 [MOE] fix #176 (#179) Mengqing Cao 2025-02-27 14:21:08 +08:00
  • ee43179767 [ModelRunner] Fix cuda hard code in model runner (#155) Shanshan Shen 2025-02-27 14:16:46 +08:00
  • 94cd66bba7 [CI][UT]enable multimodal ut (#158) zouyida2002 2025-02-27 14:14:43 +08:00
  • 94483775e1 [CI] fix hf_token (#180) Mengqing Cao 2025-02-26 17:29:31 +08:00
  • 1c238b930d [worker] remove unused assertion (#161) Mengqing Cao 2025-02-26 16:11:36 +08:00
  • 78530c0667 [CI/Build] add HF_TOKEN for model downloading (#173) Mengqing Cao 2025-02-26 15:35:03 +08:00
  • 7776f2e6a4 [ModelRunner] remove padding for vlm inputs (#150) Mengqing Cao 2025-02-26 10:26:39 +08:00
  • 79fbb20b4d [ModelRunner] remove unused args (follow vllm changes) (#159) Mengqing Cao 2025-02-25 17:51:09 +08:00
  • 51ae37b22a [Doc] update readme (#147) wangxiyuan 2025-02-25 11:00:58 +08:00
  • 3a7882208f [CI] enable test if pytest.ini changes (#151) Mengqing Cao 2025-02-24 16:47:05 +08:00
  • d0b3cb4fa7 modify:Eliminate redundant operations in the code to improve performance (#137) Yaphets24 2025-02-22 17:43:42 +08:00
  • 202b39a38c Ray Worker Ops Optimization (#136) Chenguang Li 2025-02-21 22:45:15 +08:00
  • 386817b4d1 [Model Runner][Performance] Cache the jugement result of is_encoder_decoder to decrease framework overhead (#138) whx 2025-02-21 22:43:11 +08:00
  • d21b3be685 Mark v0.7.1 as unmaintained and v0.7.3 as maintained (#139) Yikun Jiang 2025-02-21 22:41:44 +08:00
  • 72a43a61d8 [Docs] Add issue template (#113) Yikun Jiang 2025-02-21 17:20:21 +08:00
  • dd425d68f8 [Platform] add dispatch key (#17) Mengqing Cao 2025-02-21 17:10:30 +08:00
  • 5f465010de [Core] Cherry pick from 0.7.1 to keep the main code newest (#127) wangxiyuan 2025-02-21 17:07:37 +08:00
  • 36991b2052 [CI] enable CI on all branch (#124) Mengqing Cao 2025-02-21 16:16:48 +08:00
  • fd2cc1b883 [Docs] Add Tutorials for Online Serving on Multi Machine (#120) HongtaoYang 2025-02-21 11:03:00 +08:00
  • 3a4ce2aa15 [Docs] Fix vllm and vllm-ascend version (#107) Yikun Jiang 2025-02-20 11:05:35 +08:00
  • cff03a4913 [CI] change to quay.io (#102) wangxiyuan 2025-02-19 17:04:46 +08:00
  • fafd70e91c [Doc] Update doc to work with release (#85) wangxiyuan 2025-02-19 09:51:43 +08:00
  • 17de078d83 [Docs] Add dynamic version in docs (#90) Yikun Jiang 2025-02-19 08:57:27 +08:00
  • c18fb09b55 [MISC] set default model to qwen in example (#87) Mengqing Cao 2025-02-18 17:09:59 +08:00
  • 8ea8523744 reset default block_size from 16 to 128 (#84) Huazhong Ji 2025-02-18 14:19:38 +08:00
  • 7606977739 [Doc] Add release note (#59) wangxiyuan 2025-02-18 11:20:06 +08:00
  • 7cc024a2d3 [Docs] Refeactor installation doc (#78) Yikun Jiang 2025-02-17 22:12:07 +08:00
  • 7c8bdc3a18 [Doc] Update tutorials (#79) Shanshan Shen 2025-02-17 22:11:04 +08:00
  • 2a678141d4 [Doc] Add vllm-ascend usage doc & fix doc format (#53) Shanshan Shen 2025-02-17 18:37:29 +08:00
  • c935b7006c [doc] fix feature support (#70) Mengqing Cao 2025-02-17 15:43:37 +08:00
  • 36ea38fde5 [CI]add file to pytest.ini (#61) Niuya 2025-02-17 14:26:04 +08:00
  • a6f91f70b7 [Doc] Add versioning_policy doc (#62) Yikun Jiang 2025-02-17 14:13:28 +08:00
  • 4544e99d88 [dist] revert communicator patch (#66) Mengqing Cao 2025-02-17 11:42:33 +08:00
  • bfbfbce184 [CI] Add container image build ci (#64) Yikun Jiang 2025-02-17 09:07:35 +08:00
  • c1ac822642 [CI] Switch to cann latest version (#63) Yikun Jiang 2025-02-16 13:38:01 +08:00
  • b88443b6c6 [dist] fix communicator patch (#58) Mengqing Cao 2025-02-14 10:45:49 +08:00
  • e264987af2 [Doc] Add install doc (#49) wangxiyuan 2025-02-14 10:22:15 +08:00