xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

66b67f9cf2 [Bugfix][SHM] Fix weak memory ordering problem in share memory (#3988) Zetong Li 2025-11-04 23:07:23 +08:00
5f08e07208 [Doc] Refactor the DeepSeek-V3.2-Exp tutorial. (#3871) zhangyiming 2025-11-04 18:58:33 +08:00
49e6983b3b [Test] Add accuracy test for qwen3-30b-a3b-w8a8 (#3807) zhangxinyuehfad 2025-11-04 18:56:31 +08:00
5fed166a99 [ModelRunner][Refactor] Refactor kv cache tensor initialization logic (#3106) Mengqing Cao 2025-11-04 17:26:54 +08:00
bedf223771 [Perf] move quant before allgather in Allgather EP (#3420) realliujiaxu 2025-11-04 16:49:58 +08:00
44b58b8665 [TEST]Add full graph for multimodal nightly tests (#3968) jiangyunfan1 2025-11-04 16:47:48 +08:00
954dab64fb [v0.11.0][P/D]Set adxl as default backend and update readme (#3771) zxr2333 2025-11-04 16:06:58 +08:00
15bb5098ad [PD Disaggregation]Set adxl engine as default backend and update README (#3761) zxr2333 2025-11-04 16:06:39 +08:00
dc1a6cb503 [Test]Add accuracy test for multiple models (#3823) ZengSilong 2025-11-04 14:46:39 +08:00
e9bb4491ec [BugFix] Fix deepseek v3.2 mtp bug. (#3900) whx 2025-11-04 14:06:59 +08:00
646fbac7a9 [Test] Add accuracy test for qwen3-8b-w8a8 (#3799) zhangxinyuehfad 2025-11-04 09:23:11 +08:00
40c7db6559 [MM][Bugfix] Add MoE verification for multi-modal models (#3897) Shanshan Shen 2025-11-04 09:16:19 +08:00
0cead5c1ee Quality enhancement: Immediately interrupt execution when allocate NPU memory OOM (#3944) leo-pony 2025-11-04 08:55:22 +08:00
892f1ee30f Quality enhancement: Immediately interrupt execution when memory OOM (#3932) leo-pony 2025-11-04 08:55:09 +08:00
5453033a41 revert TND modify when dcp pcp (#3948) weiguihua2 2025-11-03 22:22:17 +08:00
cc2cd42ad3 Upgrade CANN to 8.3.rc1 (#3945) wangxiyuan 2025-11-03 20:21:07 +08:00
49d74785c4 [Test] Add new e2e test use deepseek-v2-lite in ge graph mode (#3937) CodeCat 2025-11-03 20:10:01 +08:00
8f222f21f1 [CI][Nightly] Fix mooncake build (#3958) Li Wang 2025-11-03 20:07:47 +08:00
7cc6208029 [0.11.0][MTP][Aclgraph] Fix the support aclgraph with MTP (#3912) Mengqing Cao 2025-11-03 14:25:37 +08:00
ec98320285 correct bug to fix the value of max_num_tokens (#3933) zouyida2052 2025-11-03 14:17:51 +08:00
0b9b6d79fe [Feat][UT] Support Deepseekv32 FULL_DECODE_ONLY mode and add unit test of sfa_v1 (#3763) 1Fire4 2025-11-03 10:02:47 +08:00
d4c75088a0 [Perf] Move attention update stream out of loop to optimize performance (#3848) XiaoxinWang 2025-11-03 09:19:57 +08:00
d0cc9c1203 [CI][Nightly] Correct the commit hash available for mooncake (#3943) Li Wang 2025-11-01 21:52:16 +08:00
8a7154001e [0.11.0]Chery pick pta upgrade change (#3940) wangxiyuan 2025-10-31 22:14:26 +08:00
fcc9a0eaeb Update torch-npu version to 2.7.1 (#3896) wangxiyuan 2025-10-31 17:16:31 +08:00
5f6d1b3323 [Doc] Update doc for release notese (#3853) zhangxinyuehfad 2025-10-31 16:46:17 +08:00
3d81ea03ed [v0.11.0-dev][bugfix] fix valueError in static_forward_context when prefix is empty (#3929) rjg-lyh 2025-10-31 15:45:06 +08:00
0f70698d6d [feature] support pcp + mtp (with pd disaggregate) (#3822) zhangsicheng5 2025-10-31 15:43:22 +08:00
f99762eb25 [E2E][MM] Add e2e tests for InternVL model (#3796) Canlin Guo 2025-10-31 15:42:47 +08:00
c1a6aeab46 [main][bugfix] fix valueError in static_forward_context when prefix is empty (#3924) rjg-lyh 2025-10-31 14:55:58 +08:00
9f7de45b75 [Bugfix] fix MTP support for lmhead_tensor_parallel_size (#3921) Nagisa125 2025-10-31 14:34:28 +08:00
ee2e55e602 [v0.11.0][Test] Add new test model for aclgraph single_request v0.11.0 (#3889) lilinsiman 2025-10-31 11:23:55 +08:00
1f486b2dd1 [Test] Add new test model for aclgraph single_request (#3888) lilinsiman 2025-10-31 11:23:13 +08:00
6764777f00 [Bugfix] Fix MTP support for lmhead_tensor_parallel_size (#3915) Nagisa125 2025-10-31 10:30:28 +08:00
90aca84e60 fix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_len (#3909) zouyida2052 2025-10-31 09:25:06 +08:00
1966885be2 mfix bug when max_seqs=14 in mtp=2 scenario and raise error when cudagraph_capture_sizes can't be an integer multiple of uniform_decode_query_lentp (#3910) zouyida2052 2025-10-31 09:24:50 +08:00
387ce1cc5b add new e2e tests case for aclgraph memory to v0.11.0 (#3880) lilinsiman 2025-10-31 09:17:09 +08:00
35a913cf1e add new e2e tests case for aclgraph memory (#3879) lilinsiman 2025-10-31 09:16:52 +08:00
38afd2c9cb [bugfix_v0.11.0]cancel tokenize for layerwise_proxy (#3913) wangxiaoteng888 2025-10-30 23:55:04 +08:00
a2b325ee00 [bugfix]cancel tokenize for layerwise_proxy (#3914) wangxiaoteng888 2025-10-30 23:54:46 +08:00
eb0a2ee2d0 [CI] Optimize nightly CI (#3898) Li Wang 2025-10-30 23:42:20 +08:00
af7a56550b [bugfix_v0.11.0-dev] layerwise D first plan (#3907) wangxiaoteng888 2025-10-30 22:21:11 +08:00
2c291bc63f [bugfix] layerwise D first plan (#3866) wangxiaoteng888 2025-10-30 22:20:34 +08:00
d5a9aba03f [BugFix]Fix group list type of mc2. (#3890) offline893 2025-10-30 21:44:14 +08:00
627f20ce26 [BugFix]Fix group list type of mc2. (#3864) offline893 2025-10-30 21:39:01 +08:00
655a229455 [TEST]Add MALPO for aclgraph in nightly test (#3894) jiangyunfan1 2025-10-30 18:25:54 +08:00
216fc0e8e4 [feature] Prompt Embeddings Support for v1 Engine (#3026) Song Zhixin 2025-10-30 17:15:57 +08:00
f6149f3894 [Model][3/N] Refactor sfa into mla and remove deepseek_v3_2.py (#3769) whx 2025-10-30 17:06:38 +08:00
eff3e5fc6f [FEAT] Refactor spec decode to support efficient padded speculation (#3528) xuyexiong 2025-10-30 16:53:05 +08:00
10772d94e3 [Build] Force torch version (#3791) wangxiyuan 2025-10-30 15:53:15 +08:00
ff47524b88 [Doc] Remove modeling doc (#3789) wangxiyuan 2025-10-30 15:53:02 +08:00
67dd3a4581 [UT] fix skip ut test for test_utils (#3803) Meihan-chen 2025-10-30 15:52:53 +08:00
c506ba60fb [v0.11.0] [Bugfix] [MoE]fix error in deepseek when using allgather (#3827) weichen 2025-10-30 14:59:46 +08:00
eed1957f03 Add FAQ for docker pull error on Kylin OS (#3870) Liwx 2025-10-30 14:10:52 +08:00
14ca1e5cb2 [CI]Fix oom of deepseek-eplb nigtly test. (#3884) offline893 2025-10-30 10:18:07 +08:00
211d4b9da4 [BugFix] Fix mlapo accuracy problem related with weight processing. (#3857) whx 2025-10-30 00:35:50 +08:00
dc960e798e [BugFix] Fix mlapo accuracy problem related with weight processing. (#3850) whx 2025-10-30 00:34:55 +08:00
d9249c968e bugfix for mtp in fullgraph (#3878) zouyida2052 2025-10-29 23:52:20 +08:00
adadd50613 bugfix for mtp fullgraph (#3845) zouyida2052 2025-10-29 23:50:13 +08:00
19f49ecb5f [0.11.0][Bugfix]fix_mulit_connector_bug (#3332) (#3882) fems14 2025-10-29 23:44:52 +08:00
d6ef3df3b3 [Bugfix]fix_mulit_connector_bug (#3332) baxingpiaochong 2025-10-29 23:23:06 +08:00
e5b938c5fe [v0.11.0] [P/D] force with_prefill true after allreduce in kv producer (#3835) liziyu 2025-10-29 23:14:00 +08:00
07873d9396 fix mooncake layerwise connector (#3849) liziyu 2025-10-29 23:10:51 +08:00
5f176ca992 [CI]Fix eplb nightly tests. (#3863) offline893 2025-10-29 23:06:05 +08:00
b323be9fe4 deepseek torchair adapt for torch_npu version (#3876) Wang Yixuan 2025-10-29 22:44:44 +08:00
870a3f21cb [BugFix] deepseek torchair adapt for torch_npu version (#3862) Wang Yixuan 2025-10-29 22:39:34 +08:00
4a2ab13743 [CI] Optimize nightly CI (#3858) Li Wang 2025-10-29 22:30:19 +08:00
cba69e117e [CI]pin vllm commit id (#3861) Meihan-chen 2025-10-29 17:43:58 +08:00
74191864b7 [Perf] Delete redundant operations in model_runner and forward_context (#3677) realliujiaxu 2025-10-29 15:59:55 +08:00
29bd9235ed [v0.11.0][Perf] Delete redundant operations in model_runner and forward_context (#3775) realliujiaxu 2025-10-29 15:58:53 +08:00
0d1859af08 [Bugfix] [MoE] fix error in deepseek when using allgather (#3824) weichen 2025-10-29 14:51:39 +08:00
900086fdc6 [HybridKV][Bugfix] Fix Hybrid kvcache sharing bug in same attention type (#3760) Mengqing Cao 2025-10-29 14:18:52 +08:00
75de3fa172 [v0.11.0][Doc] Update doc (#3852) zhangxinyuehfad 2025-10-29 11:32:12 +08:00
789ba4c5c2 [Doc] Update doc (#3836) zhangxinyuehfad 2025-10-29 11:03:39 +08:00
1e31b07fa7 fix qwen3next full graph break. (#3812) XiaoxinWang 2025-10-29 10:30:23 +08:00
c76db627ab [P/D] force with_prefill true after allreduce in kv producer (#3768) liziyu 2025-10-29 10:15:38 +08:00
f57bdb09fc [long_seq_optim] BSND to TND and FA_UPDATE replacement (#3778) pichangping 2025-10-29 09:33:35 +08:00
e56b0017a3 [TEST]Add aisbench log and A2 cases (#3841) jiangyunfan1 2025-10-28 23:33:15 +08:00
6188450269 [v0.11.0][Bugfix]Avoid using the fusion operator in the MOE model (#3837) ZYang6263 2025-10-28 23:31:19 +08:00
d08401d1e7 [Main][Bugfix]Avoid using the fusion operator in the MOE model (#3834) ZYang6263 2025-10-28 23:30:27 +08:00
90ae114569 [CI] Fix nightly CI (#3821) Li Wang 2025-10-28 20:40:03 +08:00
a7450db1bd Upgrade to 0.11.1 newest vllm commit (#3762) Icey 2025-10-28 14:55:03 +08:00
f846bd20e4 [CI] Add multi-node test case for a2 (#3805) Li Wang 2025-10-27 23:10:17 +08:00
9030106a14 [TEST]Add 2P1D multi node cases for nightly test (#3764) jiangyunfan1 2025-10-27 23:09:15 +08:00
d64bdd06ae 【Bugfix】bugfix for weight load of kimi-k2 (#3798) Levi 2025-10-27 21:18:35 +08:00
da5f2cc1e3 [Doc] Update FAQ (#3792) wangxiyuan 2025-10-27 20:32:17 +08:00
00aa0bf33e support prefill cache mode use fia op (#3696) shiyuan680 2025-10-27 19:41:07 +08:00
3e5ae49160 [MM][Doc] Update online serving tutorials for Qwen2-Audio (#3606) Shanshan Shen 2025-10-27 16:58:03 +08:00
e48ca0b6ec [bugfix][0.11]fix proxy decode bug (#3751) Shirley125 2025-10-27 16:56:50 +08:00
d8ca7fee75 [bugfix][main]fix proxy decode bug (#3750) Shirley125 2025-10-27 16:56:09 +08:00
43276fd822 [v0.11.0][Fix] Prevent memory leak in MLA decode graph (#3743) (#3774) Yizhou 2025-10-27 16:00:20 +08:00
b8796b06c8 [Doc][Example][Bugfix] Elements in local_device_ids should be casted … (#3782) yupeng 2025-10-27 14:52:47 +08:00
638d8d1a47 Bump actions/upload-artifact from 4 to 5 (#3786) dependabot[bot] 2025-10-27 14:11:53 +08:00
79623e0bab Bump actions/download-artifact from 5 to 6 (#3787) dependabot[bot] 2025-10-27 14:10:56 +08:00
e9072429fb [CI] Enable 2 jobs for nightly test (#3781) jiangyunfan1 2025-10-27 14:08:29 +08:00
60ee4af6d0 [CI] Add custom op to nightly (#3765) Li Wang 2025-10-27 14:07:03 +08:00
4312a92a4f [feat]dcp pcp support aclgraph (#3731) weiguihua2 2025-10-27 09:58:23 +08:00
825fdfb197 [v0.11.0][Feat] Prefetching Attention QKV Linear Weight With AddRmsNormQuant Custom Op (#3649) Ruri 2025-10-27 09:42:09 +08:00
8ab8111fde [Fix] Prevent memory leak in MLA decode graph (#3743) Yizhou 2025-10-25 20:37:33 +08:00
1b16c01afd [v0.11.0-dev][Installation] limit opencv-python-headless version to resolve numpy version conflict (#3767) Mengqing Cao 2025-10-25 18:18:28 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0