Commit Graph

  • ab5d110fcc vllm-ascend support chunked prefill (#1172) fems14 2025-06-14 22:31:16 +08:00
  • a3b5af8307 [CI/UT][Graph] Add ut for torchair graph mode (#1103) Mengqing Cao 2025-06-14 16:59:00 +08:00
  • 94a52cf577 Add ShouJian Zheng (@jianzs) as vLLM Ascend maintainer (#1203) Yikun Jiang 2025-06-13 18:25:50 +08:00
  • 47b507b180 [CI] Recover ut for ascend scheduler only in ci of v1. (#1180) whx 2025-06-13 07:51:23 +08:00
  • e72f94e38f Support multistream of MLA vector operations (#1135) sdmyzlp 2025-06-12 21:42:09 +08:00
  • 55c0e68883 [Doc] Add Referer header for CANN package download url. (#1192) Wan_Danfeng 2025-06-12 21:22:23 +08:00
  • c6e2a5fb40 [fix] fix bug in 1p1d disaggregated_prefill example (#1184) wangyanhui-cmss 2025-06-12 19:40:58 +08:00
  • 37f4469a03 [CI][Benchmark] Add qwen2.5-7b test (#1104) Li Wang 2025-06-12 10:47:30 +08:00
  • dd207cb261 [CI][Benchmark] Add new model and v1 test to perf benchmarks (#1099) Li Wang 2025-06-12 10:46:41 +08:00
  • 2498d297ae add custom ascendc kernel vocabparallelembedding (#796) ttanzhiqiang 2025-06-12 10:44:33 +08:00
  • 3393d53b36 [Scheduler][MTP] Add support for speculative decoding in AsecendScheduler. (#943) whx 2025-06-11 20:55:44 +08:00
  • 4f5964420e [CI] Upgrade vllm to 0.9.1 (#1165) wangxiyuan 2025-06-11 16:33:11 +08:00
  • e46dc142bf Enable kvcache_nz for the decode process in torchair graph mode (#1098) chenwaner 2025-06-11 14:09:28 +08:00
  • 4153a5091b [Doc] Fix the config parameter name "enable" in graph_mode.md. (#1159) yz 2025-06-11 11:03:37 +08:00
  • 980cd81466 etp best a2 (#1101) ttanzhiqiang 2025-06-11 10:40:50 +08:00
  • 860a5ef7fd provide an e2e guide for execute duration profiling (#1113) depeng1994 2025-06-11 10:02:11 +08:00
  • 7bdc606677 Support multistream of shared experts in FusedMoE (#997) sdmyzlp 2025-06-11 09:18:38 +08:00
  • 04abfd8721 [CI] Skip test_v1_spec_decode.py::test_ngram_correctness to make longterm CI pass (#1163) Mengqing Cao 2025-06-11 07:31:13 +08:00
  • 8b48daaa44 [CI] rename Qwen2.5-0.5B-Instruct-W8A8 model (#1145) 22dimensions 2025-06-11 06:18:32 +08:00
  • 8dd686dfa2 [MLA][Graph] Improve assertion on Graph mode with MLA (#933) Mengqing Cao 2025-06-10 22:26:53 +08:00
  • 291c216898 fix torchair execute issue on padding data, and mtp padding logic (#1160) Pleaplusone 2025-06-10 22:20:40 +08:00
  • 95414bae70 [CI] Run e2e after pre check pass (#1132) wangxiyuan 2025-06-10 17:18:09 +08:00
  • b75cb788dd [Bugfix] add compilation/__init__.py to fix import error (#1152) wangxiyuan 2025-06-10 17:14:25 +08:00
  • e68e81f2ce [CI] Make accuarcy CI and report work (#1078) zhangxinyuehfad 2025-06-10 14:35:44 +08:00
  • 71aee6f97d Update 0.9.0rc1 contributors info (#1148) Yikun Jiang 2025-06-10 13:29:09 +08:00
  • 5cd5d64242 [CI] remove old quantization model (#1003) 22dimensions 2025-06-10 10:07:36 +08:00
  • 706de02317 [fix] fix compatibility for non-EPLB scenarios (#1142) linfeng-yuan 2025-06-10 08:39:24 +08:00
  • 571f88f85e [Doc] Update 0.9.0rc1 release date (#1139) wangxiyuan 2025-06-09 22:51:02 +08:00
  • cd2f14a1b3 [MTP][V1] Adapt mtp with graph mode in v1. (#1023) whx 2025-06-09 22:21:42 +08:00
  • 5ac4872f5e [Doc] Add 0.9.0rc1 release note (#1106) wangxiyuan 2025-06-09 19:39:21 +08:00
  • 6b853f15fe Add static EPLB (#1116) Yuxiao-Xu 2025-06-09 19:28:11 +08:00
  • cb341c7bcd [CI] Fix PD job (#1129) wangxiyuan 2025-06-09 16:34:41 +08:00
  • e63fc6f280 Init vLLM Ascend maintainers info (#1124) Yikun Jiang 2025-06-09 16:32:58 +08:00
  • d2f87ed9cc [Patch] Remove spec_decode.metrics patch (#1016) Shanshan Shen 2025-06-09 15:05:11 +08:00
  • 6003afa6d2 [BugFix] Fix data parallel (#940) yiz-liu 2025-06-09 14:08:18 +08:00
  • eec6068187 [Bugfix] Set ACL_OP_INIT_MODE env var default to 0 (#1123) Shanshan Shen 2025-06-09 14:07:37 +08:00
  • 4976b48b98 [Build] Move numba/quart to requirments and update DS baseline and sync graph typo fix (#1121) Yikun Jiang 2025-06-08 22:33:37 +08:00
  • f1543d5e0d [bugfix] fix deeepseek accuracy (#1118) zzzzwwjj 2025-06-07 21:11:36 +08:00
  • c8742146d3 [CherryPick] Add unpadded Qwen2.5-VL for verl scenario (#1095) wangxiyuan 2025-06-07 19:45:46 +08:00
  • b80a484864 Fix typo of VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE (#1112) linfeng-yuan 2025-06-07 19:45:33 +08:00
  • 20dedba5d1 Add qwen2.5 vl multimodal feature for vllm-ascend v1 (#736) TaoYu Chen 2025-06-07 16:53:19 +08:00
  • 87ebaef4e4 [perf]: support dual-batch overlap(dbo) for deepseek (#941) zxdukki 2025-06-07 16:46:58 +08:00
  • 3640c60b0e Avoid unfused Transpose in DeepSeekV3 EP256 MoE layer (#1091) sdmyzlp 2025-06-07 14:28:20 +08:00
  • 8d00775fce [SpecDecode][CI] Set default values to fix spec decode and fix multicard CI (#1109) Yikun Jiang 2025-06-07 11:23:30 +08:00
  • e9ada685ec [CI]Moe alltoall communication optimization (#1067) weijinqian0 2025-06-07 10:15:56 +08:00
  • a2552e10e4 [Worker][V1] Support sleep mode for v1 (#1084) Li Wang 2025-06-06 21:54:02 +08:00
  • 0395ab30be [Doc] Add graph mode user doc (#1083) wangxiyuan 2025-06-06 21:14:34 +08:00
  • 9a4eb94ca9 [Misc] Adjust the default profiler configuration (#1097) ApsarasX 2025-06-06 20:25:59 +08:00
  • 5d0e9fd19a [Misc] Add ACL_OP_INIT_MODE env var and set default to 1 (#597) Shanshan Shen 2025-06-06 20:22:51 +08:00
  • 11a7df4270 [ModelRunner] Support embedding inputs (#916) Li Wang 2025-06-06 20:21:13 +08:00
  • c7f1c59911 feat: support compile multiple batch graph (#1085) NeverRaR 2025-06-06 20:17:51 +08:00
  • c46632439a [Bugfix][DP] Add with_prefill_across_dp to AscendMetadata to fix dp (#1094) Mengqing Cao 2025-06-06 19:20:33 +08:00
  • 0b12c2acf7 [Kernel] Remove cumsum in groupedmatmul (#987) hahazhky 2025-06-06 19:17:27 +08:00
  • dab19d5dca [BugFix] Fix ascend config check (#1092) wangxiyuan 2025-06-06 18:54:37 +08:00
  • 973f993a13 [Misc] fix initialize_kv_cache (#1102) wangxiyuan 2025-06-06 16:46:23 +08:00
  • c94afd79ce [Doc] Update the description for env (#1079) wangxiyuan 2025-06-06 09:48:43 +08:00
  • 6b094a2bd4 [ModelRunner]Add profile execute duration observation (#1013) depeng1994 2025-06-06 09:29:34 +08:00
  • 78431b3469 [perf]Support MOE Multi-stream in Deepseek (#947) David9857 2025-06-05 23:39:38 +08:00
  • 908a851a77 optimize the funtion of computing topk and topp in sampler. (#970) sherie 2025-06-05 16:42:18 +08:00
  • e1ab6d318e [Misc] Refactor additional_config (#1029) wangxiyuan 2025-06-05 16:28:01 +08:00
  • 7737aaa40f [CI] Add accuracy test for Qwen2.5-VL-3B-Instruct (#766) zhangxinyuehfad 2025-06-05 15:09:20 +08:00
  • b4cb0eecb6 [CI] Hotfix on benchmark results path (#1076) Li Wang 2025-06-05 12:53:46 +08:00
  • fd136e6762 Add vLLM Ascend project governance docs (#1070) Yikun Jiang 2025-06-05 11:56:51 +08:00
  • 31dd471574 [CI] Add workflow_dispatch and use main benchmarks directly (#1071) Li Wang 2025-06-05 10:29:30 +08:00
  • 9e855b70be Adjust concurrency group for each npu workflow (#1068) Yikun Jiang 2025-06-05 09:17:04 +08:00
  • afc4c0cd03 [Bugfix] Fix deepseek percision issue and add acc ci for it (#905) Mengqing Cao 2025-06-04 20:26:44 +08:00
  • da9acfca60 feat: support data parallel for deepseek (#1012) NeverRaR 2025-06-04 18:31:41 +08:00
  • 517811449e [CI] Re-enable sleep mode test and skip failure breaking CI (#990) Li Wang 2025-06-04 16:24:16 +08:00
  • eb2701e0b2 [CI] Remove workflow_dispatch and change schedule time (#1056) Li Wang 2025-06-04 01:19:20 +08:00
  • 06fb5a8d81 [CI][Bugfix] Upgrade escli to v0.2.1 to fix benchmark deps (#1055) Li Wang 2025-06-04 01:03:56 +08:00
  • 76dacf3fa0 [CI][Benchmark] Optimize performance benchmark workflow (#1039) Li Wang 2025-06-03 23:38:34 +08:00
  • 543380ceae [CI] Add merge conflict label job (#1050) wangxiyuan 2025-06-03 17:32:31 +08:00
  • f24375f318 Enable accuracy test for PR labeled with "*accuracy-test" (#1040) Yikun Jiang 2025-06-03 15:38:13 +08:00
  • 068c3a0167 [Bugfix] Add verification for quant_action.choices to avoid TypeError (#1046) Shanshan Shen 2025-06-03 11:44:45 +08:00
  • 93860574bb [ModelRunner][MultiModal] Remove legacy input mapper/processor from V0 (#951) Shanshan Shen 2025-06-03 11:32:03 +08:00
  • 6ec64a3f96 [bugfix] some bugs maybe fail to run (#896) NINGBENZHE 2025-06-03 11:07:33 +08:00
  • 92bc5576d8 Skip benchmarks/** in vllm ascend test (#1041) Yikun Jiang 2025-06-01 19:01:26 +08:00
  • 507ae627ca feat: support compile torchair graph while warming up (#839) NeverRaR 2025-05-31 06:03:03 +08:00
  • d9fb027068 [CI] Add benchmark workflows (#1014) Li Wang 2025-05-30 22:42:44 +08:00
  • 5a1689fc64 [Fix] Fix update_aclgraph_sizes when running MoE models (#913) yiz-liu 2025-05-30 15:17:11 +08:00
  • 3442fbdb23 [1/N][UT][v1 MTP] add basic v1 mtp features (#890) XWFAlone 2025-05-30 08:59:58 +08:00
  • 5903547d09 [doc] add 0.7.3.post1 release note (#1008) wangxiyuan 2025-05-29 17:38:34 +08:00
  • c464c32b81 add doc for offline quantization inference (#1009) 22dimensions 2025-05-29 17:32:42 +08:00
  • 05a471001b bugfix for qwen2_5_vl (#805) zouyida2052 2025-05-29 17:20:39 +08:00
  • a93bed4535 [aclgraph] implentment NPUPiecewiseBackend to enable aclgraph (#836) Mengqing Cao 2025-05-29 11:58:26 +08:00
  • cc74b97f74 [Bugfix][V1] Fix deepseek with v1 (#958) Mengqing Cao 2025-05-29 11:57:43 +08:00
  • e3c7f71462 [Perf] Refactor tensor disposal logic to reduce memory usage (#966) ApsarasX 2025-05-29 11:48:26 +08:00
  • 6eddbd2521 [CI/UT][PD Disaggreate] Initialize PD Disaggreate UT (#889) Mengqing Cao 2025-05-29 10:17:12 +08:00
  • f6e5decc10 [CI] upgrade to vllm 0.9.0 (#959) wangxiyuan 2025-05-28 21:18:41 +08:00
  • e2a0c19cea [CI] Refactor CI (#952) wangxiyuan 2025-05-28 06:31:35 +08:00
  • 9f5ab59e30 [WIP][BugFix]Fix accuracy issues caused by wrong etp_size passed into FusedMoEParallelConfig when using vLLM 0.9.0 (#961) Angazenn 2025-05-27 15:16:17 +08:00
  • 01e3d59eae add workflow to build and release wheel (#775) Shuqiao Li 2025-05-26 14:18:26 +08:00
  • a0c3e9ba50 [Bugfix] Adjust inputbatch to be compatible with latest vllm (#945) Mengqing Cao 2025-05-26 10:33:28 +08:00
  • 1f9fb869ad [BugFix] Fix accuracy bugs for unquantized deepseekv3 models (#897) Angazenn 2025-05-24 14:29:36 +08:00
  • 17f05b1089 [Feature] Add CustomQwen3MoeForCausalLM model (#925) yiz-liu 2025-05-23 15:50:48 +08:00
  • df58fb80ee Spec decode support for V1 Engine (#874) jiangpeng 2025-05-23 14:25:46 +08:00
  • a970b27e2d [WIP][Perf]remove unnecessary padding before MLA V1 prefill (#917) Angazenn 2025-05-23 14:14:06 +08:00
  • dc6172efd3 update attention nz and mla nz(Improve TPOP 6ms performance) (#909) ttanzhiqiang 2025-05-23 10:18:10 +08:00
  • 7153d8890b [Feature] Impl v1 disaggregated prefill in ascend scheduler (#852) Jade Zheng 2025-05-23 10:15:29 +08:00
  • b434f37b46 [V1] Revert the default value of enable_chunked_prefill in additional… (#935) rjg-lyh 2025-05-23 10:06:50 +08:00