xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

5f8b1699ae [Feat][quantization] Support new version w4a8 dynamic quantization for Linear layers (#3311) Anion 2025-10-21 20:18:39 +08:00
11f9bccf6b Mooncake store use adxl inferface (#3350) Chao Lei 2025-10-21 20:18:17 +08:00
ef3fabf399 [Chore] Prevents use of ASCEND_LAUNCH_BLOCKING with ACL Graph (#3574) Yizhou 2025-10-21 20:17:33 +08:00
220df60c61 [Model][2/N] Remove deepseek_mtp modeling. (#3561) whx 2025-10-21 20:17:09 +08:00
ffb42a8daa [BugFix] Fixed the bug that caused the transposematmul operator to report an error due to the shape being too large (#3578) Zhu Yi Lin 2025-10-21 20:16:54 +08:00
3164cb663c [Bugfix] mooncake connector support external dp & update readme (#3579) liziyu 2025-10-21 20:15:24 +08:00
6b290acfe1 remove redundant params in mla_preprocess kernel (#3530) Chen Chen 2025-10-21 19:20:13 +08:00
80b8df881f [TEST] Add Qwen3-32b-w8a8 acc/perf A2/A3 test (#3541) jiangyunfan1 2025-10-21 17:34:48 +08:00
ec1d2b5c04 [Test] Temporarily skip flaky ACL graph test (#3577) Yizhou 2025-10-21 17:16:15 +08:00
9830f85c42 [CI] Fix test_mla_v1 (#3570) Li Wang 2025-10-21 10:31:55 +08:00
4a849df6fa [main] support cpu binding (#3546) Zhu Yi Lin 2025-10-21 09:17:03 +08:00
274b708e0c [Fix] Refactor dummy attention metadata creation (#3497) Yizhou 2025-10-21 00:00:42 +08:00
6b6857929d [Doc] Add --shm-size option to Docker command for qwen3 vl 235B (#3519) likeful 2025-10-20 23:37:35 +08:00
0bf3f21a98 Revert "Add mrope op fusion (#3509)" (#3562) wangxiyuan 2025-10-20 20:19:24 +08:00
068ed706c8 [feat][torchair] support super kernel feat for quantized dsr1 (#3485) linfeng-yuan 2025-10-20 20:04:37 +08:00
70bef33f13 add new accuracy test case for aclgraph (#3390) lilinsiman 2025-10-20 20:04:04 +08:00
b9e2896eb1 Revert "[Perf] Add FIA interface in FA case" (#3553) ZYang6263 2025-10-20 19:56:10 +08:00
34c2996ab8 [main] v_proj combining transpose and matmul (#3545) Zhu Yi Lin 2025-10-20 19:53:32 +08:00
e04a5e3dd3 [Bugfix] Fix race condition in d2h transfer (#3372) Jade Zheng 2025-10-20 18:24:21 +08:00
fdac146f71 [UT] fix skip ut test and enable ut test run normally (#3410) zhangxinyuehfad 2025-10-20 16:30:57 +08:00
f8b52fe950 [Model][1/N] Delete deepseek v2/v3 modeling codes. (#3189) whx 2025-10-20 15:31:34 +08:00
918ded9155 [BugFix][HybridKV] Update the check logic of reinitializing inputbatch (#3540) Mengqing Cao 2025-10-20 15:29:48 +08:00
daa4dd0a57 [DeepSeek] Seperate deepseek v3.2 modeling form deepseek v2 (#3531) Mengqing Cao 2025-10-20 09:50:44 +08:00
6c65dd891f [ModelRunner][Qwen3-Next] Fix attn_group initialization timing (#3477) Mengqing Cao 2025-10-20 09:39:40 +08:00
9e59fc1510 [TEST] Add initial aisbench support and Qwen3 32B acc/perf test (#3474) jiangyunfan1 2025-10-20 09:33:17 +08:00
58a37ce189 bugfix for mooncake (#3535) zouyida2052 2025-10-19 17:06:05 +08:00
1e78ecbad6 [Perf] Add FIA interface in FA case (#3321) ZYang6263 2025-10-19 12:45:33 +08:00
4b3bd4f397 [main][bugfix] bugfix for minicpm models (#3527) Wang Kunpeng 2025-10-19 11:00:55 +08:00
6c9909c861 [Patch]patch of v1 executor when enable eplb. (#3511) offline893 2025-10-19 10:54:26 +08:00
646c1db5d7 Add mrope op fusion (#3509) shaopeng-666 2025-10-18 18:08:24 +08:00
0777e2f899 Optimize torchair kv_consumer padding logic (#3526) xuyexiong 2025-10-18 16:42:17 +08:00
b4233a2ec3 [Bugfix] Route requests requiring KVC recomputation from the decode instance to the P instance (#3448) Shirley125 2025-10-18 15:56:44 +08:00
4750d45d86 [BugFix]Support redundant experts in EPLB (#3473) yechao237 2025-10-18 00:09:16 +08:00
07ca1b9b78 [Refactor] Clean up w4a4_flatquant_dynamic implementation (#3440) Slightwind 2025-10-17 23:53:19 +08:00
21769e8f44 [BUGFIX] Mtp torchair pd fix (#3506) xuyexiong 2025-10-17 21:57:05 +08:00
9547d6f0d9 [Core]Append padding logic for Attention (#3256) Angazenn 2025-10-17 21:56:01 +08:00
b154a8e22c [Bugfix] fix logging and d2h bug for flash comm1 (#3505) realliujiaxu 2025-10-17 21:13:41 +08:00
248ee7fa11 [Feat]Make full graph mode compalible with MTP (#3276) anon189Ty 2025-10-17 20:19:56 +08:00
46e62efd44 [Feat]mtp aclgraph support (#3244) anon189Ty 2025-10-17 18:14:49 +08:00
1b424fb7f1 ACLgraph enable: Test cases revisions for all features (#3388) lilinsiman 2025-10-17 17:15:19 +08:00
bf87606932 [Feat] Shared expert dp for deepseek and deepseek_mtp (#3495) zhaozx-cn 2025-10-17 15:06:37 +08:00
d9ee491f70 [BugFix]Move to_list in foward_v1 with FIA earlier to build (#3185) Angazenn 2025-10-17 11:19:41 +08:00
30e3d86b0f Revert "[BUGFIX] Mtp torchair pd fix (#3449)" (#3500) xuyexiong 2025-10-17 09:42:48 +08:00
3a53bbc508 [Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with bias, resolve conflict with weight prefetch (#3465) huangdong2022 2025-10-17 09:30:51 +08:00
4c4a8458a5 [CI] Refator multi-node CI (#3487) Li Wang 2025-10-17 09:04:31 +08:00
ccb6fb9ec1 [Fix] Clears unused slot mappings and fix accuracy issue with MLA models when enabling FULL_DECODE_ONLY (#3482) Yizhou 2025-10-16 19:43:09 +08:00
f9535cc9e2 [BugFix] fix qwenVL quant assertion error (#3466) elilzhu 2025-10-16 17:08:00 +08:00
9ff6b0b862 [CI]: Fix doctest ci for main release (#3451) menogrey 2025-10-16 14:38:11 +08:00
b0ae203e72 [BUGFIX] Mtp torchair pd fix (#3449) xuyexiong 2025-10-16 09:03:49 +08:00
291c00a224 [Doc] pin version that can stable running 310I Duo to vllm-ascend v0.10.0rc1 (#3455) leo-pony 2025-10-16 08:54:09 +08:00
ff91904ee2 [Doc] Clearer corresponding relationship between configurations for multi-node guides (#3441) leo-pony 2025-10-16 08:54:03 +08:00
aa6154703a [BugFix]GPQA Accuracy Issue Bugfix (#3476) DreamerLeader 2025-10-15 23:28:17 +08:00
cec1fab509 Revert "[MoE] [Refactor] Remove manual memory cleanup (#3365)" (#3483) weichen 2025-10-15 22:25:46 +08:00
f69a83b7ba [Feat] Flash comm allgher ep (#3334) realliujiaxu 2025-10-15 19:36:32 +08:00
8abe517870 [Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 (#3432) Mengqing Cao 2025-10-15 17:48:58 +08:00
099255e933 [bugfix] fix pipeline parallel for mla & sfa attention backend (#3459) linfeng-yuan 2025-10-15 17:13:27 +08:00
5a3082cd15 [EPLB]Record expert map without dynamic eplb. (#3409) offline893 2025-10-15 14:21:15 +08:00
4f937f561d [MoE] [Refactor] Remove manual memory cleanup (#3365) weichen 2025-10-15 12:36:24 +08:00
4e720936d8 Fix warning msg print (#3421) LeeWenquan 2025-10-15 11:30:30 +08:00
16cb3cc45d adapt the mla_v1 with the mla_preprocess kernel (#3397) Chen Chen 2025-10-15 10:34:25 +08:00
15b2e5c995 Remove unused row_idx in token_dispatcher (#3442) CaranLic 2025-10-15 09:08:31 +08:00
3642b64afc bugfix for mtp with multistream_moe (#3419) zouyida2052 2025-10-15 08:59:58 +08:00
c2c1db78a7 [Bugfix] fix ZeroDivisionError when prefill_tp_size > num_kv_head and fix tp_resharding README (#3437) zxr2333 2025-10-15 08:45:44 +08:00
02c26dcfc7 [Feat] Supports Aclgraph for bge-m3 (#3171) xuyexiong 2025-10-14 23:07:45 +08:00
434059e417 [BugFix] Fix multimodal model support fullgraph error (#3425) fan2956 2025-10-14 21:51:09 +08:00
223cc34085 [KVCache] Refactor KVCache as page_size_bytes is ineffective (#3438) Mengqing Cao 2025-10-14 21:28:41 +08:00
c55d99d13e [bugfix][torchair] fix missing weight nz cast for w13_weight in torchair_w8a8_dynamic.py (#3446) linfeng-yuan 2025-10-14 21:11:05 +08:00
5fe883fa43 fix the title of modelrunner's prepare inputs docs (#3457) TaoYu Chen 2025-10-14 20:35:58 +08:00
78777237a9 [2/N][Feat] Attention and MoE weight prefetch in Qwen3MoE models (#3203) yuzhup 2025-10-14 20:16:33 +08:00
07e39620ea [Feat] Unquantized Linear to nz and control all nz-cast (#3356) anon189Ty 2025-10-14 17:39:26 +08:00
5c45c227dc [BugFix] fix qwen2.5vl quant bug (#3426) elilzhu 2025-10-14 17:31:26 +08:00
ee25a517d1 [BugFix] Fix the port conflict bug of running external dp with disaggregated-prefill. (#3416) whx 2025-10-14 16:37:10 +08:00
9eb62935b8 fix pagedattention to support fullgraph. (#3436) XiaoxinWang 2025-10-14 16:10:09 +08:00
22a1d91cf5 [CI] Add single request test case for aclgraph (#3392) lilinsiman 2025-10-14 11:13:44 +08:00
4536123341 [Fix] Fix mc2_tokens_capacity-related issues (#3411) Yizhou 2025-10-14 10:56:12 +08:00
19b85ef1bc [Bugfix] multi_node_pd_disaggregation_mooncake.md update (#3400) wangxiaoteng888 2025-10-14 09:29:35 +08:00
49b850270f [Community] Nominate new maintainers: @yiz-liu @paulyu12 @weijinqian0 @nalinaly (#3406) wangxiyuan 2025-10-14 08:51:58 +08:00
657c08cfb2 [UT] fix skipped test_utils ut test. (#3422) menogrey 2025-10-14 08:31:13 +08:00
4f6d60eb06 [Feature] Add W4A4 Flat Quantization support (#3427) Slightwind 2025-10-13 23:20:16 +08:00
6972df5951 [Feature] optimize sp & qwen3 next support sp. (#3225) weijinqian0 2025-10-13 23:02:12 +08:00
31682961af [Feat] enable hierarchical communication for mc2 ops on A2 (#3015) realliujiaxu 2025-10-13 16:13:17 +08:00
0563106477 [Feature] mooncake connector support GQA transport (#2947) lidenghui1110 2025-10-13 15:48:37 +08:00
847d12a389 [BugFix]Fix moe load problems in torchair when using dynamic eplb (#3381) dsxsteven 2025-10-13 11:38:57 +08:00
cd69385dab Add models test and add serval new models yaml (#3394) Yikun Jiang 2025-10-12 17:27:50 +08:00
d05d29ff0e Enable nightly test and add qwen3 32b test case (#3370) jiangyunfan1 2025-10-12 15:46:28 +08:00
0d59a3c317 [CI] Make the test_pipeline_parallel run normally in full test (#3391) leo-pony 2025-10-12 15:43:13 +08:00
bcc313e8f2 add mla_preprocess kernel (#3226) Chen Chen 2025-10-12 07:39:45 +08:00
1b1207e3c3 [Bugfix] Add quantization param for multi-node CI (#3383) Li Wang 2025-10-11 19:25:16 +08:00
e8c871ed0a [Test] enable external launcher and add e2e test for sleep mode in level2 (#3344) huangxialu 2025-10-11 17:29:38 +08:00
ecb1713dfc Bugfix: Expose the user policy type interface (#3336) Mercykid-bash 2025-10-11 16:28:57 +08:00
e4acb2dfc7 [feat] support customized and separated hccl_buffer_size for process group initialization (#3073) linfeng-yuan 2025-10-11 15:55:22 +08:00
9eb103607f [1/N][CI] Add multi node test (#3359) Li Wang 2025-10-11 14:50:46 +08:00
82b6c846ca [BugFix]Fix eplb problems when using dynamic eplb. (#3364) offline893 2025-10-11 14:04:02 +08:00
ca05f7d632 [Bugfix] TP size larger than KV cache head causes accuracy issues (#3366) wangxiaoteng888 2025-10-11 11:22:23 +08:00
ace300a549 [Bugfix] Fix the abnormal NPU memory usage in full graph mode. (#3331) 无脸男 2025-10-11 10:20:10 +08:00
866f5e7283 [Bugfix] Fix weight prefetching AssertionError in W8A8 MTP scene (#3361) Ruri 2025-10-11 09:24:02 +08:00
8c1a4dedf3 [Bugfix]modify the enable range of _merge_multimodal_embeddings patch (#3360) Peipei 2025-10-11 08:37:07 +08:00
27e0f2c035 [Perf]Add YaRN custom op (#3355) Angazenn 2025-10-11 08:36:20 +08:00
ee0a95e47f bugfix for mtp when running torchair in a2 (#3354) zouyida2052 2025-10-10 23:07:24 +08:00
90e00deaa9 [Bugfix] Optimized exception throwing when stream captures exception (#3322) lilinsiman 2025-10-10 17:09:28 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0