Commit Graph

  • e4e0b7af05 [Doc] Add patch doc (#1414) wangxiyuan 2025-06-25 12:00:45 +08:00
  • 52317f92cb [DP] Tiny fix of dp and update example (#1273) Mengqing Cao 2025-06-25 11:03:04 +08:00
  • c1c5d56255 [Doc] Update FAQ and add test guidance (#1360) Mengqing Cao 2025-06-25 09:59:23 +08:00
  • 5f5800ba42 [Bugfix] Sync MRotaryEmbedding interface change to recover CI (#1399) Li Wang 2025-06-24 22:56:39 +08:00
  • 6ed3f00427 [Doc] remove environment variable VLLM_ENABLE_MC2 (#1406) liziyu 2025-06-24 21:18:10 +08:00
  • 20767a043c [CI/UT] Fix disaggregated prefill ci (#1313) Mengqing Cao 2025-06-24 17:11:00 +08:00
  • 9cbce423ce [MISC] Remove useless patch (#1366) wangxiyuan 2025-06-24 10:05:59 +08:00
  • 5177bef87a support fused_moe_allgather_ep (#1335) lyj-jjj 2025-06-23 22:03:38 +08:00
  • 917c6b71af [TEST][DOC] Fix doctest and add system package installation (#1375) Yikun Jiang 2025-06-23 20:50:33 +08:00
  • 08cfc7cb4b Modify installation.md for adding pip extra index of torch-npu (#1272) Icey 2025-06-23 15:37:50 +08:00
  • e1123172d1 [Doc] Add reinstall instructions doc (#1303) weiguihua2 2025-06-23 14:06:27 +08:00
  • 15592c0d48 [bugfix] fix accuracy prolem for deepseek V3/R1 models with torchair graph in long sequence predictions (#1331) linfeng-yuan 2025-06-23 09:52:27 +08:00
  • f04c6763d8 [Bugfix] fix env variable in dbo (#1284) zxdukki 2025-06-23 09:07:57 +08:00
  • 21fb68a03a [CI] Update guided decoding ut (#1312) Shanshan Shen 2025-06-23 09:06:20 +08:00
  • 339d6894f6 [CI/UT][bugfix] fix v0 spec decode (#1321) wemaster 2025-06-23 09:05:13 +08:00
  • 7e6efbf2a9 update torch-npu to 2.5.1.post1.dev20250619 (#1347) Pleaplusone 2025-06-23 09:02:09 +08:00
  • 4447e53d7a [Doc] Change not to no in faqs.md (#1357) xleoken 2025-06-23 09:01:00 +08:00
  • a95afc011e [CI] Enable merge trigger unit test and accuracy test schedule job (#1345) Yikun Jiang 2025-06-22 17:21:57 +08:00
  • 2e5f312530 Cleanup ununsed doc (#1352) Yikun Jiang 2025-06-22 15:05:30 +08:00
  • c30ddb8331 Bump v0.9.1rc1 release (#1349) Yikun Jiang 2025-06-22 13:15:36 +08:00
  • 097e7149f7 [Platform] Add initial experimental support for Altlas 300I series (#1333) Yikun Jiang 2025-06-21 09:00:16 +08:00
  • 2009fdb8da [Test] Enable code cov for V1 and enable push trigger (#1164) Yikun Jiang 2025-06-21 00:01:05 +08:00
  • 2f1266d451 Support Pangu Pro MoE model (#1204) Angazenn 2025-06-20 23:59:59 +08:00
  • 00ae250f3c [V1][eagle3] Support eagle3 proposer for v1 (#1032) yuancaoyaoHW 2025-06-20 17:19:54 +08:00
  • 45be1aac0c [CI] Add codespell check for doc (#1314) wangxiyuan 2025-06-20 16:48:14 +08:00
  • 761bd3d9d7 Add user guide for quantization (#1206) 22dimensions 2025-06-20 15:53:25 +08:00
  • 2c7dd85fd8 [Fix] Fix the token-wise padding mechanism (#1300) yiz-liu 2025-06-20 14:46:17 +08:00
  • b350edae9a [UT] refactor test_expert_load_balancer and fix broken CI (#1293) wangxiyuan 2025-06-20 01:02:52 +08:00
  • ebb2a70dbb static EPLB fix bug, add unit test (#1186) songshanhu07 2025-06-18 19:46:56 +08:00
  • 2cd8ecdc4f [Bugfix][Spec Decode] Enable ACL_OP_INIT_MODE=1 directly only when using V0 spec decode (#1258) Shanshan Shen 2025-06-18 17:50:20 +08:00
  • db2f630aeb [bugfix] fix deepseek with mc2 (#1268) zzzzwwjj 2025-06-18 00:58:38 +08:00
  • d7e19ed57a [BugFix] fix length of sin/cos cache in rope (#1266) whx 2025-06-17 23:14:25 +08:00
  • afc8edb046 [Bugfix]: Pass scaling args to mc2 (#1202) Jade Zheng 2025-06-17 22:16:44 +08:00
  • f8029945c3 [Bugfix] Remove cuda related lines and add additional pip mirror (#1252) Li Wang 2025-06-17 21:25:40 +08:00
  • 23ca68d0c8 [refactor] Refactoring AscendFusedMoE (#1229) zzzzwwjj 2025-06-17 17:49:03 +08:00
  • 05dec7eda9 [Doc] Refactor and init user story page (#1224) Yikun Jiang 2025-06-17 09:36:35 +08:00
  • 9d3cbc0953 [Doctest] add installation doctest (#1179) Yikun Jiang 2025-06-17 08:52:26 +08:00
  • 96fa7ff63b [DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235) Mengqing Cao 2025-06-16 23:09:53 +08:00
  • f5404dc650 Fix the device error when using ray as vllm-acend backend (#884) zhuo97 2025-06-16 21:03:16 +08:00
  • 69b817ed65 [CI] Add unit test framework (#1201) wangxiyuan 2025-06-16 18:32:28 +08:00
  • 966557a2a3 [Build] Speedup image build (#1216) Yikun Jiang 2025-06-16 09:02:53 +08:00
  • 4ce860a2be [CI] Make e2e test to be preemptible and simple (#1217) Yikun Jiang 2025-06-15 22:07:43 +08:00
  • 4270682383 Waiting for BMM NZ support(Improve TPOP 2ms performance) (#1131) ttanzhiqiang 2025-06-15 19:57:02 +08:00
  • 0d2074a1ec [Doc] fix VLLM_USE_V1 value in graph mode docs (#1226) 22dimensions 2025-06-15 15:41:11 +08:00
  • ab5d110fcc vllm-ascend support chunked prefill (#1172) fems14 2025-06-14 22:31:16 +08:00
  • a3b5af8307 [CI/UT][Graph] Add ut for torchair graph mode (#1103) Mengqing Cao 2025-06-14 16:59:00 +08:00
  • 94a52cf577 Add ShouJian Zheng (@jianzs) as vLLM Ascend maintainer (#1203) Yikun Jiang 2025-06-13 18:25:50 +08:00
  • 47b507b180 [CI] Recover ut for ascend scheduler only in ci of v1. (#1180) whx 2025-06-13 07:51:23 +08:00
  • e72f94e38f Support multistream of MLA vector operations (#1135) sdmyzlp 2025-06-12 21:42:09 +08:00
  • 55c0e68883 [Doc] Add Referer header for CANN package download url. (#1192) Wan_Danfeng 2025-06-12 21:22:23 +08:00
  • c6e2a5fb40 [fix] fix bug in 1p1d disaggregated_prefill example (#1184) wangyanhui-cmss 2025-06-12 19:40:58 +08:00
  • 37f4469a03 [CI][Benchmark] Add qwen2.5-7b test (#1104) Li Wang 2025-06-12 10:47:30 +08:00
  • dd207cb261 [CI][Benchmark] Add new model and v1 test to perf benchmarks (#1099) Li Wang 2025-06-12 10:46:41 +08:00
  • 2498d297ae add custom ascendc kernel vocabparallelembedding (#796) ttanzhiqiang 2025-06-12 10:44:33 +08:00
  • 3393d53b36 [Scheduler][MTP] Add support for speculative decoding in AsecendScheduler. (#943) whx 2025-06-11 20:55:44 +08:00
  • 4f5964420e [CI] Upgrade vllm to 0.9.1 (#1165) wangxiyuan 2025-06-11 16:33:11 +08:00
  • e46dc142bf Enable kvcache_nz for the decode process in torchair graph mode (#1098) chenwaner 2025-06-11 14:09:28 +08:00
  • 4153a5091b [Doc] Fix the config parameter name "enable" in graph_mode.md. (#1159) yz 2025-06-11 11:03:37 +08:00
  • 980cd81466 etp best a2 (#1101) ttanzhiqiang 2025-06-11 10:40:50 +08:00
  • 860a5ef7fd provide an e2e guide for execute duration profiling (#1113) depeng1994 2025-06-11 10:02:11 +08:00
  • 7bdc606677 Support multistream of shared experts in FusedMoE (#997) sdmyzlp 2025-06-11 09:18:38 +08:00
  • 04abfd8721 [CI] Skip test_v1_spec_decode.py::test_ngram_correctness to make longterm CI pass (#1163) Mengqing Cao 2025-06-11 07:31:13 +08:00
  • 8b48daaa44 [CI] rename Qwen2.5-0.5B-Instruct-W8A8 model (#1145) 22dimensions 2025-06-11 06:18:32 +08:00
  • 8dd686dfa2 [MLA][Graph] Improve assertion on Graph mode with MLA (#933) Mengqing Cao 2025-06-10 22:26:53 +08:00
  • 291c216898 fix torchair execute issue on padding data, and mtp padding logic (#1160) Pleaplusone 2025-06-10 22:20:40 +08:00
  • 95414bae70 [CI] Run e2e after pre check pass (#1132) wangxiyuan 2025-06-10 17:18:09 +08:00
  • b75cb788dd [Bugfix] add compilation/__init__.py to fix import error (#1152) wangxiyuan 2025-06-10 17:14:25 +08:00
  • e68e81f2ce [CI] Make accuarcy CI and report work (#1078) zhangxinyuehfad 2025-06-10 14:35:44 +08:00
  • 71aee6f97d Update 0.9.0rc1 contributors info (#1148) Yikun Jiang 2025-06-10 13:29:09 +08:00
  • 5cd5d64242 [CI] remove old quantization model (#1003) 22dimensions 2025-06-10 10:07:36 +08:00
  • 706de02317 [fix] fix compatibility for non-EPLB scenarios (#1142) linfeng-yuan 2025-06-10 08:39:24 +08:00
  • 571f88f85e [Doc] Update 0.9.0rc1 release date (#1139) wangxiyuan 2025-06-09 22:51:02 +08:00
  • cd2f14a1b3 [MTP][V1] Adapt mtp with graph mode in v1. (#1023) whx 2025-06-09 22:21:42 +08:00
  • 5ac4872f5e [Doc] Add 0.9.0rc1 release note (#1106) wangxiyuan 2025-06-09 19:39:21 +08:00
  • 6b853f15fe Add static EPLB (#1116) Yuxiao-Xu 2025-06-09 19:28:11 +08:00
  • cb341c7bcd [CI] Fix PD job (#1129) wangxiyuan 2025-06-09 16:34:41 +08:00
  • e63fc6f280 Init vLLM Ascend maintainers info (#1124) Yikun Jiang 2025-06-09 16:32:58 +08:00
  • d2f87ed9cc [Patch] Remove spec_decode.metrics patch (#1016) Shanshan Shen 2025-06-09 15:05:11 +08:00
  • 6003afa6d2 [BugFix] Fix data parallel (#940) yiz-liu 2025-06-09 14:08:18 +08:00
  • eec6068187 [Bugfix] Set ACL_OP_INIT_MODE env var default to 0 (#1123) Shanshan Shen 2025-06-09 14:07:37 +08:00
  • 4976b48b98 [Build] Move numba/quart to requirments and update DS baseline and sync graph typo fix (#1121) Yikun Jiang 2025-06-08 22:33:37 +08:00
  • f1543d5e0d [bugfix] fix deeepseek accuracy (#1118) zzzzwwjj 2025-06-07 21:11:36 +08:00
  • c8742146d3 [CherryPick] Add unpadded Qwen2.5-VL for verl scenario (#1095) wangxiyuan 2025-06-07 19:45:46 +08:00
  • b80a484864 Fix typo of VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE (#1112) linfeng-yuan 2025-06-07 19:45:33 +08:00
  • 20dedba5d1 Add qwen2.5 vl multimodal feature for vllm-ascend v1 (#736) TaoYu Chen 2025-06-07 16:53:19 +08:00
  • 87ebaef4e4 [perf]: support dual-batch overlap(dbo) for deepseek (#941) zxdukki 2025-06-07 16:46:58 +08:00
  • 3640c60b0e Avoid unfused Transpose in DeepSeekV3 EP256 MoE layer (#1091) sdmyzlp 2025-06-07 14:28:20 +08:00
  • 8d00775fce [SpecDecode][CI] Set default values to fix spec decode and fix multicard CI (#1109) Yikun Jiang 2025-06-07 11:23:30 +08:00
  • e9ada685ec [CI]Moe alltoall communication optimization (#1067) weijinqian0 2025-06-07 10:15:56 +08:00
  • a2552e10e4 [Worker][V1] Support sleep mode for v1 (#1084) Li Wang 2025-06-06 21:54:02 +08:00
  • 0395ab30be [Doc] Add graph mode user doc (#1083) wangxiyuan 2025-06-06 21:14:34 +08:00
  • 9a4eb94ca9 [Misc] Adjust the default profiler configuration (#1097) ApsarasX 2025-06-06 20:25:59 +08:00
  • 5d0e9fd19a [Misc] Add ACL_OP_INIT_MODE env var and set default to 1 (#597) Shanshan Shen 2025-06-06 20:22:51 +08:00
  • 11a7df4270 [ModelRunner] Support embedding inputs (#916) Li Wang 2025-06-06 20:21:13 +08:00
  • c7f1c59911 feat: support compile multiple batch graph (#1085) NeverRaR 2025-06-06 20:17:51 +08:00
  • c46632439a [Bugfix][DP] Add with_prefill_across_dp to AscendMetadata to fix dp (#1094) Mengqing Cao 2025-06-06 19:20:33 +08:00
  • 0b12c2acf7 [Kernel] Remove cumsum in groupedmatmul (#987) hahazhky 2025-06-06 19:17:27 +08:00
  • dab19d5dca [BugFix] Fix ascend config check (#1092) wangxiyuan 2025-06-06 18:54:37 +08:00
  • 973f993a13 [Misc] fix initialize_kv_cache (#1102) wangxiyuan 2025-06-06 16:46:23 +08:00
  • c94afd79ce [Doc] Update the description for env (#1079) wangxiyuan 2025-06-06 09:48:43 +08:00