xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

1025344912 Doc Enhancement: Single NPU(Qwen3-8B) aclgraph mode + eager mode (#1374) leo-pony 2025-06-26 16:52:54 +08:00
53c2d58ae1 Handle with_prefill_across_dp for multistream mla (#1322) sdmyzlp 2025-06-26 09:32:07 +08:00
2690697caa [Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 (#1416) yiz-liu 2025-06-26 09:27:43 +08:00
06ccce1ddf [FOLLOWUP] fix name and format in accuracy test (#1288) (#1435) zhangxinyuehfad 2025-06-26 00:26:54 +08:00
2fda60464c [Perf] Use fused ops npu_top_k_top_p (#1308) Pr0Wh1teGivee 2025-06-25 20:59:06 +08:00
e7efc7e7e7 [BugFix] Remove not using patch_eagle.py for CI. (#1385) yuancaoyaoHW 2025-06-25 20:36:05 +08:00
941269a6c5 adjusting the communication method in graph mode (#1194) sharonyunyun 2025-06-25 19:56:49 +08:00
205cb85a1e [Doc] Fix doc typo (#1424) wangxiyuan 2025-06-25 19:28:26 +08:00
ca884ef86d [Misc] Clean up uesless code for LLM initialize (#1373) wangxiyuan 2025-06-25 16:20:14 +08:00
0060886a37 [CI]Update accuracy report test (#1288) zhangxinyuehfad 2025-06-25 14:10:34 +08:00
15df8be937 [Doc] Add sleep mode doc (#1295) Li Wang 2025-06-25 14:07:14 +08:00
e4e0b7af05 [Doc] Add patch doc (#1414) wangxiyuan 2025-06-25 12:00:45 +08:00
52317f92cb [DP] Tiny fix of dp and update example (#1273) Mengqing Cao 2025-06-25 11:03:04 +08:00
c1c5d56255 [Doc] Update FAQ and add test guidance (#1360) Mengqing Cao 2025-06-25 09:59:23 +08:00
5f5800ba42 [Bugfix] Sync MRotaryEmbedding interface change to recover CI (#1399) Li Wang 2025-06-24 22:56:39 +08:00
6ed3f00427 [Doc] remove environment variable VLLM_ENABLE_MC2 (#1406) liziyu 2025-06-24 21:18:10 +08:00
20767a043c [CI/UT] Fix disaggregated prefill ci (#1313) Mengqing Cao 2025-06-24 17:11:00 +08:00
9cbce423ce [MISC] Remove useless patch (#1366) wangxiyuan 2025-06-24 10:05:59 +08:00
5177bef87a support fused_moe_allgather_ep (#1335) lyj-jjj 2025-06-23 22:03:38 +08:00
917c6b71af [TEST][DOC] Fix doctest and add system package installation (#1375) Yikun Jiang 2025-06-23 20:50:33 +08:00
08cfc7cb4b Modify installation.md for adding pip extra index of torch-npu (#1272) Icey 2025-06-23 15:37:50 +08:00
e1123172d1 [Doc] Add reinstall instructions doc (#1303) weiguihua2 2025-06-23 14:06:27 +08:00
15592c0d48 [bugfix] fix accuracy prolem for deepseek V3/R1 models with torchair graph in long sequence predictions (#1331) linfeng-yuan 2025-06-23 09:52:27 +08:00
f04c6763d8 [Bugfix] fix env variable in dbo (#1284) zxdukki 2025-06-23 09:07:57 +08:00
21fb68a03a [CI] Update guided decoding ut (#1312) Shanshan Shen 2025-06-23 09:06:20 +08:00
339d6894f6 [CI/UT][bugfix] fix v0 spec decode (#1321) wemaster 2025-06-23 09:05:13 +08:00
7e6efbf2a9 update torch-npu to 2.5.1.post1.dev20250619 (#1347) Pleaplusone 2025-06-23 09:02:09 +08:00
4447e53d7a [Doc] Change not to no in faqs.md (#1357) xleoken 2025-06-23 09:01:00 +08:00
a95afc011e [CI] Enable merge trigger unit test and accuracy test schedule job (#1345) Yikun Jiang 2025-06-22 17:21:57 +08:00
2e5f312530 Cleanup ununsed doc (#1352) Yikun Jiang 2025-06-22 15:05:30 +08:00
c30ddb8331 Bump v0.9.1rc1 release (#1349) Yikun Jiang 2025-06-22 13:15:36 +08:00
097e7149f7 [Platform] Add initial experimental support for Altlas 300I series (#1333) Yikun Jiang 2025-06-21 09:00:16 +08:00
2009fdb8da [Test] Enable code cov for V1 and enable push trigger (#1164) Yikun Jiang 2025-06-21 00:01:05 +08:00
2f1266d451 Support Pangu Pro MoE model (#1204) Angazenn 2025-06-20 23:59:59 +08:00
00ae250f3c [V1][eagle3] Support eagle3 proposer for v1 (#1032) yuancaoyaoHW 2025-06-20 17:19:54 +08:00
45be1aac0c [CI] Add codespell check for doc (#1314) wangxiyuan 2025-06-20 16:48:14 +08:00
761bd3d9d7 Add user guide for quantization (#1206) 22dimensions 2025-06-20 15:53:25 +08:00
2c7dd85fd8 [Fix] Fix the token-wise padding mechanism (#1300) yiz-liu 2025-06-20 14:46:17 +08:00
b350edae9a [UT] refactor test_expert_load_balancer and fix broken CI (#1293) wangxiyuan 2025-06-20 01:02:52 +08:00
ebb2a70dbb static EPLB fix bug, add unit test (#1186) songshanhu07 2025-06-18 19:46:56 +08:00
2cd8ecdc4f [Bugfix][Spec Decode] Enable ACL_OP_INIT_MODE=1 directly only when using V0 spec decode (#1258) Shanshan Shen 2025-06-18 17:50:20 +08:00
db2f630aeb [bugfix] fix deepseek with mc2 (#1268) zzzzwwjj 2025-06-18 00:58:38 +08:00
d7e19ed57a [BugFix] fix length of sin/cos cache in rope (#1266) whx 2025-06-17 23:14:25 +08:00
afc8edb046 [Bugfix]: Pass scaling args to mc2 (#1202) Jade Zheng 2025-06-17 22:16:44 +08:00
f8029945c3 [Bugfix] Remove cuda related lines and add additional pip mirror (#1252) Li Wang 2025-06-17 21:25:40 +08:00
23ca68d0c8 [refactor] Refactoring AscendFusedMoE (#1229) zzzzwwjj 2025-06-17 17:49:03 +08:00
05dec7eda9 [Doc] Refactor and init user story page (#1224) Yikun Jiang 2025-06-17 09:36:35 +08:00
9d3cbc0953 [Doctest] add installation doctest (#1179) Yikun Jiang 2025-06-17 08:52:26 +08:00
96fa7ff63b [DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235) Mengqing Cao 2025-06-16 23:09:53 +08:00
f5404dc650 Fix the device error when using ray as vllm-acend backend (#884) zhuo97 2025-06-16 21:03:16 +08:00
69b817ed65 [CI] Add unit test framework (#1201) wangxiyuan 2025-06-16 18:32:28 +08:00
966557a2a3 [Build] Speedup image build (#1216) Yikun Jiang 2025-06-16 09:02:53 +08:00
4ce860a2be [CI] Make e2e test to be preemptible and simple (#1217) Yikun Jiang 2025-06-15 22:07:43 +08:00
4270682383 Waiting for BMM NZ support(Improve TPOP 2ms performance) (#1131) ttanzhiqiang 2025-06-15 19:57:02 +08:00
0d2074a1ec [Doc] fix VLLM_USE_V1 value in graph mode docs (#1226) 22dimensions 2025-06-15 15:41:11 +08:00
ab5d110fcc vllm-ascend support chunked prefill (#1172) fems14 2025-06-14 22:31:16 +08:00
a3b5af8307 [CI/UT][Graph] Add ut for torchair graph mode (#1103) Mengqing Cao 2025-06-14 16:59:00 +08:00
94a52cf577 Add ShouJian Zheng (@jianzs) as vLLM Ascend maintainer (#1203) Yikun Jiang 2025-06-13 18:25:50 +08:00
47b507b180 [CI] Recover ut for ascend scheduler only in ci of v1. (#1180) whx 2025-06-13 07:51:23 +08:00
e72f94e38f Support multistream of MLA vector operations (#1135) sdmyzlp 2025-06-12 21:42:09 +08:00
55c0e68883 [Doc] Add Referer header for CANN package download url. (#1192) Wan_Danfeng 2025-06-12 21:22:23 +08:00
c6e2a5fb40 [fix] fix bug in 1p1d disaggregated_prefill example (#1184) wangyanhui-cmss 2025-06-12 19:40:58 +08:00
37f4469a03 [CI][Benchmark] Add qwen2.5-7b test (#1104) Li Wang 2025-06-12 10:47:30 +08:00
dd207cb261 [CI][Benchmark] Add new model and v1 test to perf benchmarks (#1099) Li Wang 2025-06-12 10:46:41 +08:00
2498d297ae add custom ascendc kernel vocabparallelembedding (#796) ttanzhiqiang 2025-06-12 10:44:33 +08:00
3393d53b36 [Scheduler][MTP] Add support for speculative decoding in AsecendScheduler. (#943) whx 2025-06-11 20:55:44 +08:00
4f5964420e [CI] Upgrade vllm to 0.9.1 (#1165) wangxiyuan 2025-06-11 16:33:11 +08:00
e46dc142bf Enable kvcache_nz for the decode process in torchair graph mode (#1098) chenwaner 2025-06-11 14:09:28 +08:00
4153a5091b [Doc] Fix the config parameter name "enable" in graph_mode.md. (#1159) yz 2025-06-11 11:03:37 +08:00
980cd81466 etp best a2 (#1101) ttanzhiqiang 2025-06-11 10:40:50 +08:00
860a5ef7fd provide an e2e guide for execute duration profiling (#1113) depeng1994 2025-06-11 10:02:11 +08:00
7bdc606677 Support multistream of shared experts in FusedMoE (#997) sdmyzlp 2025-06-11 09:18:38 +08:00
04abfd8721 [CI] Skip test_v1_spec_decode.py::test_ngram_correctness to make longterm CI pass (#1163) Mengqing Cao 2025-06-11 07:31:13 +08:00
8b48daaa44 [CI] rename Qwen2.5-0.5B-Instruct-W8A8 model (#1145) 22dimensions 2025-06-11 06:18:32 +08:00
8dd686dfa2 [MLA][Graph] Improve assertion on Graph mode with MLA (#933) Mengqing Cao 2025-06-10 22:26:53 +08:00
291c216898 fix torchair execute issue on padding data, and mtp padding logic (#1160) Pleaplusone 2025-06-10 22:20:40 +08:00
95414bae70 [CI] Run e2e after pre check pass (#1132) wangxiyuan 2025-06-10 17:18:09 +08:00
b75cb788dd [Bugfix] add compilation/__init__.py to fix import error (#1152) wangxiyuan 2025-06-10 17:14:25 +08:00
e68e81f2ce [CI] Make accuarcy CI and report work (#1078) zhangxinyuehfad 2025-06-10 14:35:44 +08:00
71aee6f97d Update 0.9.0rc1 contributors info (#1148) Yikun Jiang 2025-06-10 13:29:09 +08:00
5cd5d64242 [CI] remove old quantization model (#1003) 22dimensions 2025-06-10 10:07:36 +08:00
706de02317 [fix] fix compatibility for non-EPLB scenarios (#1142) linfeng-yuan 2025-06-10 08:39:24 +08:00
571f88f85e [Doc] Update 0.9.0rc1 release date (#1139) wangxiyuan 2025-06-09 22:51:02 +08:00
cd2f14a1b3 [MTP][V1] Adapt mtp with graph mode in v1. (#1023) whx 2025-06-09 22:21:42 +08:00
5ac4872f5e [Doc] Add 0.9.0rc1 release note (#1106) wangxiyuan 2025-06-09 19:39:21 +08:00
6b853f15fe Add static EPLB (#1116) Yuxiao-Xu 2025-06-09 19:28:11 +08:00
cb341c7bcd [CI] Fix PD job (#1129) wangxiyuan 2025-06-09 16:34:41 +08:00
e63fc6f280 Init vLLM Ascend maintainers info (#1124) Yikun Jiang 2025-06-09 16:32:58 +08:00
d2f87ed9cc [Patch] Remove spec_decode.metrics patch (#1016) Shanshan Shen 2025-06-09 15:05:11 +08:00
6003afa6d2 [BugFix] Fix data parallel (#940) yiz-liu 2025-06-09 14:08:18 +08:00
eec6068187 [Bugfix] Set ACL_OP_INIT_MODE env var default to 0 (#1123) Shanshan Shen 2025-06-09 14:07:37 +08:00
4976b48b98 [Build] Move numba/quart to requirments and update DS baseline and sync graph typo fix (#1121) Yikun Jiang 2025-06-08 22:33:37 +08:00
f1543d5e0d [bugfix] fix deeepseek accuracy (#1118) zzzzwwjj 2025-06-07 21:11:36 +08:00
c8742146d3 [CherryPick] Add unpadded Qwen2.5-VL for verl scenario (#1095) wangxiyuan 2025-06-07 19:45:46 +08:00
b80a484864 Fix typo of VLLM_ASCEND_ENABLE_TOPK_OPTIMIZE (#1112) linfeng-yuan 2025-06-07 19:45:33 +08:00
20dedba5d1 Add qwen2.5 vl multimodal feature for vllm-ascend v1 (#736) TaoYu Chen 2025-06-07 16:53:19 +08:00
87ebaef4e4 [perf]: support dual-batch overlap(dbo) for deepseek (#941) zxdukki 2025-06-07 16:46:58 +08:00
3640c60b0e Avoid unfused Transpose in DeepSeekV3 EP256 MoE layer (#1091) sdmyzlp 2025-06-07 14:28:20 +08:00
8d00775fce [SpecDecode][CI] Set default values to fix spec decode and fix multicard CI (#1109) Yikun Jiang 2025-06-07 11:23:30 +08:00
e9ada685ec [CI]Moe alltoall communication optimization (#1067) weijinqian0 2025-06-07 10:15:56 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0