xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

afe1767c17 [Core] Cleanup triton patch which has been fixed in vllm (#764) Yikun Jiang 2025-05-06 18:52:15 +08:00
b0dbe5f8e1 [Bug fix] fix a typo in setup.py (#762) linfeng-yuan 2025-05-06 17:01:26 +08:00
5897dc5bbe [Build] Bump vLLM version to v0.8.5.post1 (#755) Yikun Jiang 2025-05-06 11:44:12 +08:00
d6bfae8eee support 32K model len on deepseek r1 W8A8 (#728) sunbaosong 2025-05-06 10:12:07 +08:00
79538b5d73 Upgrade CANN version to 8.1.rc1 (#747) Yikun Jiang 2025-05-06 05:44:18 +08:00
d7e1110c8e Re-patch TritonPlaceholder on main to make CI happy (#753) Yikun Jiang 2025-05-05 23:22:24 +08:00
d2ead057ae Re-enable Speculative Decode test for vLLM v0.8.5 (#749) Yikun Jiang 2025-05-02 14:44:48 +08:00
8b194ad12e [Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist (#694) whx 2025-05-01 22:31:36 +08:00
84e2ed898b performance optimization, usability optimization and API compatibility adjustments for deepseek with npu graph mode (#731) linfeng-yuan 2025-05-01 13:51:42 +08:00
399b03830d [Build][Bugfix] Fix source code path to avoid reference error (#726) Mengqing Cao 2025-04-30 17:38:13 +08:00
3a628891ab [Feature] Add quant description file for new quant model generated by modelslim (#719) Pleaplusone 2025-04-30 16:51:56 +08:00
affca6f348 [Test] Add accuracy test report workflow (#542) hfadzxy 2025-04-30 14:53:58 +08:00
ba9714ccee Optimize qwen2_vl and qwen2_5_vl (#701) zouyida2052 2025-04-30 14:22:38 +08:00
90aabaeb2e [Doc] Add benchmark guide (#635) Li Wang 2025-04-30 09:17:59 +08:00
f8350569e6 [CI] upgrade vllm to 0.8.5 (#715) wangxiyuan 2025-04-30 09:15:50 +08:00
95e7aa4736 [Platform] format platform to make it more clear (#610) wangxiyuan 2025-04-30 09:03:10 +08:00
b917361ca5 [MISC] Clean up torch_npu (#688) wangxiyuan 2025-04-29 18:03:38 +08:00
0329fad927 [Perf] Deepseekv3 performance optimization for eager mode (#598) Pleaplusone 2025-04-29 17:12:03 +08:00
87975fa058 [Bugfix] Fix early return in CustomDeepseekV2MoE.forward during profile_run (#682) ApsarasX 2025-04-29 17:06:19 +08:00
7aee9228f0 [CI] Add nightly CI (#668) Li Wang 2025-04-29 16:35:52 +08:00
d6be63e11d [CI] Add Qwen3-0.6B-Base test (#717) Li Wang 2025-04-29 14:35:19 +08:00
0dae55a9a3 [MISC] fix format check error (#654) wangxiyuan 2025-04-29 11:14:19 +08:00
1fce70a2fb [Model] Support common fused moe ops for moe model, such as Qwen3Moe (#709) wangxiyuan 2025-04-28 21:57:01 +08:00
40bd602485 [Feature] Use reshape_and_cache fused op (#706) Jade Zheng 2025-04-28 21:54:42 +08:00
d39855b075 Update installation and tutorial doc (#711) Yikun Jiang 2025-04-28 21:52:17 +08:00
5995d23532 [Doc] Add 0.8.4rc2 release note (#705) wangxiyuan 2025-04-28 21:51:35 +08:00
54c0e63df7 [MTP] follow custom deepseek modeling changes to support graph mode (#636) wemaster 2025-04-28 21:18:53 +08:00
be9e3e8545 [Bugfix] Fix triton placeholder patch period (#704) Mengqing Cao 2025-04-28 18:52:03 +08:00
58f9d932d3 [Doc] Update faqs (#699) Li Wang 2025-04-28 18:48:23 +08:00
d0a0c81ced [Doc] Add deepsee-v2-lite w8a8 quantization turorial (#630) Li Wang 2025-04-28 17:14:26 +08:00
5de3646522 [MISC] Make vllm version configurable (#651) wangxiyuan 2025-04-28 14:19:06 +08:00
8849cf1eda Bump actions/setup-python from 5.5.0 to 5.6.0 (#697) dependabot[bot] 2025-04-28 14:06:38 +08:00
ee7a0e2cd4 Update openEuler dockerfile for COMPILE_CUSTOM_KERNELS=1 (#689) Icey 2025-04-28 11:45:46 +08:00
38f34e359f [Fix] fix deepseek v0 attention eager mode (#671) Pleaplusone 2025-04-28 08:53:06 +08:00
413657ae43 [FOLLOWUP][DOC] Fix pip install cmd in installation.md (#680) Yikun Jiang 2025-04-27 18:37:25 +08:00
2e20797934 [BUILD] Upgrade torch-npu to 2.5.1 (#661) Yikun Jiang 2025-04-27 17:28:29 +08:00
fa4a5d980e [Bugfix] Remove redundant tensor creation and unused code (#656) Jade Zheng 2025-04-27 14:09:16 +08:00
ba3d8aae94 [Model][MiniCPM] support MiniCPM (#645) Mengqing Cao 2025-04-27 11:27:24 +08:00
742f679c7d Remove prompt string from engine core data structures (#663) Yikun Jiang 2025-04-26 23:15:58 +08:00
c99c4c8c70 [Doc] Update feature support list (#650) wangxiyuan 2025-04-26 10:27:29 +08:00
3879d9cad9 [CI] Fix sample backward compatibility problem (#648) wangxiyuan 2025-04-25 11:53:26 +08:00
d785e78563 [V1] Make V1 engine backward compatible (#637) yiz-liu 2025-04-24 17:20:11 +08:00
bd70ce828c [CI] Add qwen2.5-vl test (#643) Li Wang 2025-04-24 17:12:12 +08:00
a9c6b52205 [Bugfix] Fix qwen2.5-vl positon input bug (#639) Li Wang 2025-04-24 15:21:57 +08:00
866ce7168c [Benchmark] Download model from modelscope (#634) Li Wang 2025-04-24 14:48:24 +08:00
05bdcbeae4 support aclgraph (#426) Bug Hunter Yan 2025-04-23 20:56:24 +08:00
5c6d05a59e support deepseek quant & mix-parallel with graphmode (#585) zzzzwwjj 2025-04-23 16:23:25 +08:00
e74331a1ed Add dp initialize patch with hccl backend (#626) Pleaplusone 2025-04-23 15:47:51 +08:00
848e041a54 Using EvalScope evaluation (#611) RongRongStudio 2025-04-23 00:50:09 +08:00
4a0ce3660e [Misc] Remove some parts of metrics patch (#603) Shanshan Shen 2025-04-22 18:45:21 +08:00
cf6ab42ee2 [CI]Add guided decoding test (#422) Li Wang 2025-04-22 17:50:06 +08:00
538a69c145 [Patch] format patch module to make it more clear (#601) wangxiyuan 2025-04-22 14:13:00 +08:00
ad845bfe82 fix doc to mention env setting for v0.7.3-dev (#602) Shuqiao Li 2025-04-22 14:11:41 +08:00
d12a057df8 Add note for deepseek related docs and remove unnecessary comments (#590) Pleaplusone 2025-04-22 09:59:09 +08:00
c5850d302d [Doc] Update installation (#596) Mengqing Cao 2025-04-22 09:04:20 +08:00
a8d633f629 [Bugfix] fix import error (#600) paulyu12 2025-04-22 08:57:25 +08:00
0ae9ee0f8a [BUGFIX] main-sd-bugfix && [UT] add mtp UT (#593) wemaster 2025-04-21 19:25:51 +08:00
5442b463fd add doc for patch_config (#574) Shuqiao Li 2025-04-21 10:33:38 +08:00
96d6fa7c90 [Docker] Fix openEuler image suffix (#586) Yikun Jiang 2025-04-21 08:55:26 +08:00
12cae04db9 [quantization] Support w8a8 quantization (#580) Yikun Jiang 2025-04-20 18:14:05 +08:00
1a1f9a6d89 port deepseekv2 and mtp to main branch (#429) Pleaplusone 2025-04-19 17:38:18 +08:00
086423dc35 [Docker] Bump Dockerfile version to v0.8.4 (#577) Yikun Jiang 2025-04-18 19:15:17 +08:00
a127cc83f8 catch ImportError when C code not compiled (#575) Shuqiao Li 2025-04-18 18:11:49 +08:00
985b0548b0 [Doc] Update v0.8.4 release note, add contents for structured output feature (#576) Shanshan Shen 2025-04-18 17:44:16 +08:00
65c1f4579f [V1][Structured Output] Add apply_grammar_bitmask() method to model runner (#555) Shanshan Shen 2025-04-18 16:47:55 +08:00
2c903bc7ac [Doc] Update doc for custom ops build (#570) Mengqing Cao 2025-04-18 15:35:10 +08:00
b91f9a5afd [Doc][Build] Update build doc and faq (#568) Mengqing Cao 2025-04-18 14:16:41 +08:00
e66ded5679 [Doc] Add release note for 0.8.4rc1 (#557) wangxiyuan 2025-04-18 13:24:36 +08:00
7eeff60715 [Doc] Update FAQ doc (#561) Shanshan Shen 2025-04-18 13:13:13 +08:00
84563fc65d Add sleep mode feature for Ascend NPU (#513) Shuqiao Li 2025-04-18 13:11:39 +08:00
42c7fbb10e [Misc] Fix import error and address nits to make CI happy (#563) wangxiyuan 2025-04-18 12:23:32 +08:00
66a0837963 adopt rope in vllm-ascend (#530) Pleaplusone 2025-04-18 08:56:05 +08:00
23f85e3f74 [BugFix] Fix scheduler problems in last PR. (#558) whx 2025-04-18 08:49:48 +08:00
6ee7f5cf71 [SpecDecode] Add spec decode support (#500) Mengqing Cao 2025-04-17 20:16:32 +08:00
b71f193cb0 [Model][Doc] Update model support list (#552) Mengqing Cao 2025-04-17 19:32:20 +08:00
20dff4deff [Scheduler] Add AscendScheduler. (#543) whx 2025-04-17 19:31:50 +08:00
697908f5cd [Platform][Worker][ModelRunner] Add LoRA & Multi-LoRA support (#521) paulyu12 2025-04-17 16:48:46 +08:00
9935d45728 [CI]Add model basic accuracy test(Qwen2.5-0.5B-Instruct) (#460) hfadzxy 2025-04-17 14:59:56 +08:00
c3d1a3782a Add pyhccl (#503) Huazhong Ji 2025-04-17 14:57:52 +08:00
64fdf4cbef [Doc]Update faq (#536) Li Wang 2025-04-17 14:56:51 +08:00
6061f33670 [Bugfix][Model] Fix api in DeepSeek model (#545) Mengqing Cao 2025-04-17 11:56:05 +08:00
9859e7313f [CI]Add global env to runner (#537) Li Wang 2025-04-17 10:08:00 +08:00
00de2ee6ad [Doc] update faq about progress bar display issue (#538) hfadzxy 2025-04-16 16:07:08 +08:00
fe13cd9ea5 [Doc] update faq about w8a8 (#534) Mengqing Cao 2025-04-16 09:37:21 +08:00
415ed027fa [V1][Platform] Remove supports_structured_output() in platform (#531) Shanshan Shen 2025-04-16 09:30:33 +08:00
bbe7ccd366 [MISC] Add patch module (#526) wangxiyuan 2025-04-16 09:28:58 +08:00
434749d299 [CI] update 0.8.3 to 0.8.4 (#528) wangxiyuan 2025-04-16 09:26:30 +08:00
13480d1238 [CI]Fix workflow (#532) Li Wang 2025-04-15 19:55:41 +08:00
bcbc04f92b [Doc] Add environment variables doc (#519) Shanshan Shen 2025-04-15 16:09:36 +08:00
44a8301424 [Feature] Add PD separation feature (#432) eeethenQ 2025-04-15 15:11:35 +08:00
c7f6584d75 [V1] clean up V1 code (#505) wangxiyuan 2025-04-15 10:24:02 +08:00
f6af1d2471 [MISC] fix logger (#515) wangxiyuan 2025-04-15 10:18:05 +08:00
5c6d79687c [Doc] Update FAQ (#518) wangxiyuan 2025-04-15 10:17:56 +08:00
5fa70b6393 [Build] Update doc (#509) wangxiyuan 2025-04-14 14:38:50 +08:00
11ecbfdb31 [Doc] Update FAQ doc (#504) Shanshan Shen 2025-04-14 11:11:40 +08:00
9c7428b3d5 [CI] enable custom ops build (#466) wangxiyuan 2025-04-12 10:24:53 +08:00
d05ea17427 Add openEuler based container image for vLLM Ascend (#489) Icey 2025-04-10 14:30:49 +08:00
afdbf77483 [CI] Add new runner and enable QwQ multinpu test (#417) Li Wang 2025-04-08 16:52:45 +08:00
5d6239306b [DOC] Update multi_node.md (#468) jinyuxin 2025-04-08 14:19:57 +08:00
f6cf92e7d5 [quant][bugfix] fix deepseek quant bug (#478) Mengqing Cao 2025-04-08 09:15:56 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0