xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

9a1cfb48d4 [TEST]Update prefixcache perf threshold for qwen3-32b-int8 (#4220) jiangyunfan1 2025-11-17 19:06:54 +08:00
378e92a2a2 [Cherry-pick][0.11.0] Adapted to torch_npu.npu_fused_infer_attention_score (#4202) Icey 2025-11-17 10:56:23 +08:00
e38ef2c434 support FULL graph mode for GQA (#3970) XiaoxinWang 2025-11-17 10:50:35 +08:00
c334114f69 [CI] Fix no space left in build wheel CI. (#4215) zhangyiming 2025-11-17 10:45:58 +08:00
67f2b3a031 [Test] Add deepseek v3.2 exp nightly test (#4191) zhangxinyuehfad 2025-11-14 15:46:10 +08:00
1d0f13c1a3 [Misc] Add benchmark results into .gitignore (#4200) Shanshan Shen 2025-11-14 15:44:28 +08:00
a7eb42cf0a [v0.11.0-dev][Bugfix][cherry-pick]bugfix for weight load of kimi-k2 (#4190) zhangyiming 2025-11-14 15:43:22 +08:00
f10251ede0 [Platform] Add import_kernels interface (#3694) Canlin Guo 2025-11-14 11:32:51 +08:00
094f32c8c9 [Feat] Adds a utility for printing from within ACL graphs (#4162) Yizhou 2025-11-14 09:41:14 +08:00
01195e860c [Bugfix] fix cannot import name get_mp_context (#4174) weiguihua2 2025-11-14 09:09:14 +08:00
f90ed95578 [CI] Add multi-nodes EPLB configs of DeepSeek-R1-W8A8 & Qwen3-235B-W8A8 (#4144) 欧派果奶我还要 2025-11-14 08:50:29 +08:00
5ec96fd46c [long_seq_Feat] support chunk prefill (#4158) LookAround0301 2025-11-14 08:43:37 +08:00
7294f89e43 [CI] Add daily images build for nightly ci (#3989) Li Wang 2025-11-13 20:10:12 +08:00
f7d1f73b98 [CI] Remove unsupported python 3.9 format check (#4172) Nengjun Ma 2025-11-13 16:47:24 +08:00
49818dbbed [Test]Add ut test qwen3_moe and sfa (#4121) CodeCat 2025-11-13 16:09:22 +08:00
adee9dd3b1 [Info][main] Correct the mistake in information documents (#4157) lilinsiman 2025-11-13 15:53:58 +08:00
51e5806d76 [0.11.0-dev][Bugfix][EPLB] Quick fix for missing log2phy conversion (#4150) weichen 2025-11-13 14:32:40 +08:00
cd652acb65 [BugFix] Fix kv_no_split not contiguous (#3711) zhaozx-cn 2025-11-13 11:29:37 +08:00
fdd2db097a [BugFix] Fix kv_no_split not contiguous (#3594) zhaozx-cn 2025-11-13 11:28:09 +08:00
9d84172359 [BugFix] adapted e2e tests for Qwen3-next-mtp (#4160) drslark 2025-11-13 11:08:35 +08:00
5093192769 [Bugfix] fix mtp profile run error where main model and mtp model use different quantization (#4102) realliujiaxu 2025-11-13 11:02:31 +08:00
17259cb265 [Perf] [MoE] optimize all2allv (#3738) weichen 2025-11-13 09:38:11 +08:00
6bc770cd78 [Perf] fix async copy for async scheduling (#4113) realliujiaxu 2025-11-13 09:11:26 +08:00
c272747d13 Upgrade to 0.11.1 newest vllm commit (#3982) 22dimensions 2025-11-12 23:01:19 +08:00
3ca11d5a7c [CI] Fix nightly-ci (#4159) Li Wang 2025-11-12 22:06:49 +08:00
28a15299ea [cherry-pick][v0.11.0-dev][bugfix] Change seq_lens in dummy attn_metadata to max_query_len (#4099) Angazenn 2025-11-12 20:32:50 +08:00
fc7e5cd9dc [main][bugfix] Change seq_lens in dummy attn_metadata to max_query_len (#4097) Angazenn 2025-11-12 17:31:39 +08:00
a123f355e9 [feature] support pcp + mtp (in pd co-locate scenario) (#4098) zhangsicheng5 2025-11-12 17:22:21 +08:00
7732a89fd9 [v0.11.0][UT][Fixbug] Fix UT test (#4151) zhangxinyuehfad 2025-11-12 16:55:18 +08:00
1b4ce63ec9 fix fullgraph in ds. (#4016) XiaoxinWang 2025-11-12 10:11:43 +08:00
c9e5b90f53 [Doc] Fix DeepSeek-3.2-Exp doc, remove v0.11.0rc0 outdated infos. (#4095) zhangyiming 2025-11-12 09:11:31 +08:00
638dbcdb32 [Perf] Remove D2H operations to imporve performance (#4063) Yizhou 2025-11-12 09:08:55 +08:00
e38fe92f40 [Misc][Doc] Add service profiling feature with user guide (#3756) thonean 2025-11-12 09:07:14 +08:00
1c677c3b87 [Test][Accuracy] Add accuracy evaluation config for InternVL3_5-8B (#3964) Canlin Guo 2025-11-12 09:05:55 +08:00
46a41b26d3 oproj TP support acl graph (#4073) zzhxxx 2025-11-11 19:39:06 +08:00
0e6e08e939 [TEST]Update nightly cases and add mtpx (#4111) jiangyunfan1 2025-11-11 17:39:58 +08:00
9cc42226d5 [CI] Integrate mooncake to vllm-ascend base image (#4062) Li Wang 2025-11-11 15:51:16 +08:00
f811a24bf0 Remove VLLM_USE_V1 (#4086) wangxiyuan 2025-11-11 15:43:39 +08:00
d5567680a2 [Fixbug] Fix ut test (#4116) zhangxinyuehfad 2025-11-11 15:31:00 +08:00
fae1c59a79 [Fix] Refactor and fix dist test to e2e full test (#3808) zhangxinyuehfad 2025-11-11 10:36:05 +08:00
b77b4f1abf [Test] Add nightly test for DeepSeek-V3.2-Exp (#3908) zhangxinyuehfad 2025-11-11 10:29:57 +08:00
650ce8ad19 [0.11.0][Bugfix] Fix ngram precision issue and open e2e ngram test (#4092) zhaomingyu13 2025-11-11 09:58:03 +08:00
e384755ce1 [Doc] Recover installation doc to use pip install (#4109) Yikun Jiang 2025-11-11 09:25:44 +08:00
71866d5311 [feature] chunkprefill support pcp&dcp (#3801) Apocalypse 2025-11-11 09:18:02 +08:00
2069bef449 [v0.11.0-dev][bugfix] Fix a bug in wrongly set npu_stream (#4106) Angazenn 2025-11-11 09:16:41 +08:00
7ffbe73d54 [main][Bugfix] Fix ngram precision issue and open e2e ngram test (#4090) zhaomingyu13 2025-11-11 09:06:24 +08:00
64220c68c5 [Doc] Add release note for v0.11.0rc1 (#3931) wangxiyuan 2025-11-10 21:01:50 +08:00
c5fe179cef [0.11.0] [Cherry-pick #4058] Fixes Qwen3-Next enable nz accuracy problem (#4056) Icey 2025-11-10 20:56:39 +08:00
e04a87f4be [BugFix] Fixes Qwen3-Next enable nz accuracy problem (#4058) Icey 2025-11-10 20:54:57 +08:00
e6625bb582 [Doc] add qwen3 w4a4 tutorial (#4076) 22dimensions 2025-11-10 20:30:07 +08:00
ebd45b6596 [V0.11.0][Core] Restore scheduling logic under default configuration (#4094) rjg-lyh 2025-11-10 20:02:23 +08:00
a1558b99c2 [Core] Restore scheduling logic under default configuration (#3967) rjg-lyh 2025-11-10 17:48:56 +08:00
c3c9138719 [Perf] Move attention update stream out of loop to optimize performance (#3985) XiaoxinWang 2025-11-10 17:18:45 +08:00
75c3f9a780 [Typo] LLama has been changed to Llama (#4089) herizhen 2025-11-10 16:22:52 +08:00
d913f9474b [0.11.0][Fix] Fix Qwen2-Audio-7B-Instruct accuracy test (#4018) zhangxinyuehfad 2025-11-10 11:54:30 +08:00
d40ba52454 [Fix] fix Qwen2-Audio-7B-Instruct accuracy test (#4017) zhangxinyuehfad 2025-11-10 11:54:18 +08:00
7ea17fbee3 [0.11.0][BugFix] Improve the performance of prefixcache features (#4021) hucong 2025-11-10 11:51:34 +08:00
de49fb3deb [Feature][Build] Upgrade the minimum version to 3.10 (#3926) Canlin Guo 2025-11-10 11:50:12 +08:00
0a62e671fb [Feat] flashcomm_v2 optim solution (#3232) Levi 2025-11-10 11:01:45 +08:00
b1a00e0512 [docs] [P/D] add feature guide for disaggregated-prefill (#3950) wangxiaoteng888 2025-11-10 09:31:30 +08:00
a74e76b02d [Doc] Remove extra MLAPO installation step for DeepSeek-V3.2. (#4024) zhangyiming 2025-11-10 09:09:59 +08:00
c2d58c0655 [P/D][BugFix][v0.11.0-dev]Fix proxy format processing errors & Layerwise connector performance optimization (#4069) wangxiaoteng888 2025-11-09 09:55:10 +08:00
c116524379 [TEST]Add qwen3-235b-w8a8 and qwen3-30b-w8a8 nightly test (#3973) jiangyunfan1 2025-11-08 18:49:28 +08:00
a3ff765c65 [Info][main] Corrected the errors in the information (#4055) lilinsiman 2025-11-08 18:48:59 +08:00
1d7cb5880a [Bugfix]fix pcp dcp attn aclgraph (#4066) weiguihua2 2025-11-08 18:47:12 +08:00
48094148f8 [BugFix] Improve the performance of prefixcache features (#4022) hucong 2025-11-08 18:45:31 +08:00
1d81a289d0 [P/D][BugFix]Fix proxy format processing errors & Layerwise connector performance optimization (#4043) zxr2333 2025-11-08 18:44:06 +08:00
24d6314718 [Bugfix] fix sleepmode level2 e2e test (#4019) wangx700 2025-11-08 14:11:55 +08:00
55e37f5041 [v0.11.0][Bugfix] fix sleepmode level2 e2e test (#4023) wangx700 2025-11-08 14:11:15 +08:00
f9842560cb [0.11.0][Perf] Add padding vision tower for Qwen2_5_Omni (#4041) tingfu 2025-11-08 13:56:05 +08:00
d4e2a44307 [Cherry Pick from pr#3981][0.11.0][P/D]Make kv-transfer env variable take effect & Fix load-balance proxy (#3983) zxr2333 2025-11-08 13:52:33 +08:00
f7ca3bc0fa [CI]Fix eplb ci. (#4052) offline893 2025-11-07 23:53:35 +08:00
8e72758645 [BugFix]Fix grouplist type of mc2. (#4049) offline893 2025-11-07 17:43:23 +08:00
e687d6af85 [BugFix]Fix group list type of mc2. (#4047) offline893 2025-11-07 17:41:56 +08:00
23b785fdfb [Feat] Adapted mtp function to Qwen3-next (#3918) drslark 2025-11-07 16:39:03 +08:00
016337eaec [v0.11.0][UT] Add new ut case for aclgraph enable (#4038) lilinsiman 2025-11-07 11:35:24 +08:00
46ef280105 [Doc] Add model feature matrix table. (#4040) zhangyiming 2025-11-07 11:28:05 +08:00
22286fc67d [UT] Add new ut case for aclgraph in auto enable (#4031) lilinsiman 2025-11-07 10:39:11 +08:00
79e536d939 [Feat] update op for mla (#4000) LookAround0301 2025-11-07 09:48:39 +08:00
f8610b7d67 [long_seq] fix A2 accuracy problem (#4030) LookAround0301 2025-11-07 09:29:33 +08:00
f9494d978a [cherry-pick][v0.11.0-dev][bugfix] Fix a rare bug triggered by _npu_paged_attention in FULL_DECODE_ONLY mode (#3987) Angazenn 2025-11-06 23:08:57 +08:00
e0d58d543b [main][bugfix] Fix a rare bug triggered by _npu_paged_attention in FULL_DECODE_ONLY mode (#3986) Angazenn 2025-11-06 23:08:07 +08:00
1804b60ec8 [BugFix][main] Adapted to torch_npu.npu_fused_infer_attention_score (#4025) drslark 2025-11-06 22:00:24 +08:00
27547a10e6 [MM][Bugfix] Add MoE verification for multi-modal models (#3897) (#4027) Shanshan Shen 2025-11-06 20:30:40 +08:00
22005c64c1 [Bugfix] Add constraints for sequence parallelism (#4014) realliujiaxu 2025-11-06 20:02:03 +08:00
259eb25f88 [CI] Quick fix mooncake for nightly-ci (#4028) Li Wang 2025-11-06 18:46:00 +08:00
34b278a339 [TEST]Update nightly acc test standard (#4032) jiangyunfan1 2025-11-06 16:58:38 +08:00
2eebe1dc0a [feat]decode convert bsnd to tnd and fix bug when pcp and dcp (#3980) weiguihua2 2025-11-06 14:58:24 +08:00
25b24c02ea [Feat](Mooncake) Supports multiple input suffixes for global_segment_size (#3690) Liziqi-77 2025-11-06 14:48:15 +08:00
b206e831e9 [P/D]Make kv-transfer env variable take effect & Fix load-balance proxy (#3981) zxr2333 2025-11-06 12:02:47 +08:00
3db53d117e [0.11.0][doc] add aclgraph developer guide (#3947) zzzzwwjj 2025-11-06 09:54:38 +08:00
737cad2b6b [Test] Refactor accuracy test to nightly test (#3814) zhangxinyuehfad 2025-11-06 09:06:59 +08:00
7ee0b0b5d8 [cherry-pick]Upgrade CANN to 8.3.rc1 (#3945) (#3962) wangxiyuan 2025-11-06 09:05:08 +08:00
b1488ecdb1 [main][doc][kv_pool]Add adxl timeout parameter in kv pool user guide (#4012) pz1116 2025-11-05 18:39:35 +08:00
5cff3069f4 [Doc]Add developer guide of eplb. (#3759) offline893 2025-11-05 18:35:41 +08:00
e0c23cb011 [docs] Add kv pool developer guide (#3752) pz1116 2025-11-05 18:03:36 +08:00
1ba158567c [Doc] add mtp doc (#3770) zouyida2052 2025-11-05 16:38:35 +08:00
3ac76fdccc [Doc] Update version policy (#3999) wangxiyuan 2025-11-05 14:55:54 +08:00
46d5a77688 [docs] add aclgraph developer guide (#3683) zzzzwwjj 2025-11-05 10:34:28 +08:00
738bf2b720 support qwen3-next full_decode_only mode. (#3949) XiaoxinWang 2025-11-05 08:46:05 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0