xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

7ff1db4b84 [Refactor]5/N Extract common code of mla_v1.py & extract mla_cp (#5097) wujinyuan1 2025-12-24 10:25:19 +08:00
2a2d527e96 fix transformer version to 4.57.3 (#5250) shaopeng-666 2025-12-23 23:55:40 +08:00
3b59f20a28 update to vllm 12-19 (#5223) Nengjun Ma 2025-12-23 23:52:11 +08:00
e14514e2fd [Bugfix] quick fix balance scheduling patch (#5281) Zhu Yi Lin 2025-12-23 21:23:05 +08:00
ffe51eedd6 [Refactor][MoE] Reuse vLLM's all_reduce logic (#5189) weichen 2025-12-23 18:53:48 +08:00
8ae7fca947 [CI] refect e2e ci test (#5246) zhangxinyuehfad 2025-12-23 18:42:35 +08:00
5d1f6daef6 [CI] Mock spawn for vlm tests (#5279) Li Wang 2025-12-23 18:35:06 +08:00
cb963c53a5 [Doc] Added deploying on k8s with kthena (#4674) Tiger Xu / Zhonghu Xu 2025-12-23 17:46:04 +08:00
22138e2727 [main][Refactor] Remove with_prefill parameter from set_ascend_forward_context (#5094) Slightwind 2025-12-23 14:30:50 +08:00
fa0c212bfa [test]Corrected the Qwen3-Omni-30B-A3B-Instruct accuracy test configuration in nightly tests. (#5195) SILONG ZENG 2025-12-23 14:17:27 +08:00
29a93daa82 [CI]refactor: standardize test case naming convention (#5243) SILONG ZENG 2025-12-23 14:13:42 +08:00
592cfb6a6f [CI] Add Triton Ascend in CI (#4921) meihanc 2025-12-23 12:47:35 +08:00
2e010e12dd [EPLB][CI] Add dynamic EPLB CI for qwen3-moe (#5179) LI SHENGYONG 2025-12-23 11:31:00 +08:00
449f8f65a7 [KV-Sharing] Support KV-Sharing feature in CLA models (#4138) Mengqing Cao 2025-12-23 10:48:31 +08:00
9a79cbaecb [ModelRunner] Add hunyuan-vl basic support (#5151) Li Wang 2025-12-23 10:46:54 +08:00
c9b5881bcd [Doc] fix docs set rope_theta value is 10e6 in qwen3-235b model (#5258) rongfu.leng 2025-12-23 10:21:46 +08:00
6c478531f8 [CustomOp] Register AscendApplyRotaryEmb CustomOp and remove related patch (#4667) Shanshan Shen 2025-12-23 10:04:37 +08:00
35dbdbb398 [Doc] Add new contributors and relative scripts. (#5070) zhangyiming 2025-12-23 10:01:45 +08:00
3d04ae8e7d [Main] [Patch] support balance scheduling patch (#5212) Zhu Yi Lin 2025-12-23 09:04:38 +08:00
f883a2edb9 [Doc] Update the weight download URL. (#5238) zhangyiming 2025-12-23 08:53:30 +08:00
c3a8d13ca7 [refactor] Remove unnecessary attributes from set_ascend_forward_context (#5204) Wang Kunpeng 2025-12-23 08:49:52 +08:00
95e8a52156 [Refactor] move the metadata from attention_v1 to util(ready for extract common_cp) & realize Ascendmetadata inherit from the parent class. (#5203) weijinqian0 2025-12-23 00:10:52 +08:00
3d9954eff0 [Bugfix] Use hf_text_config instead of hf_config to support multimodal PD-Disaggregated (#5205) ApsarasX 2025-12-22 20:21:45 +08:00
3ba920a65b [TEST]Update mm param --mm-processor-cache-gb (#5242) jiangyunfan1 2025-12-22 18:54:03 +08:00
61efaffcaf [Bugfix] Implement multimodal_cpu_fields in model runner (#5196) zhangxinyuehfad 2025-12-22 18:39:45 +08:00
052e472453 [bugfix] fix w8a8dynamic fused_moe trans nz (#5199) zzzzwwjj 2025-12-22 17:45:34 +08:00
55beac9c91 [Feat]Xlite Qwen3-vl Support (#5228) lvjunqi 2025-12-22 16:30:52 +08:00
78aa7f2693 [feature] support pcp + mtp in full graph (#4572) zhangsicheng5 2025-12-22 16:13:39 +08:00
12d581605b [Triton]support swiglu_quant triton in w4a8 (#5161) Zhu Yi Lin 2025-12-22 16:01:58 +08:00
60d9398f6d [1/N][Eagle3] Aligns auxiliary hidden state usage for eagle3 models (#5162) Yizhou 2025-12-22 15:24:54 +08:00
b62b2ebd9b [Doc] Update readme (#5226) wangxiyuan 2025-12-22 15:21:16 +08:00
4861484b68 Bump actions/checkout from 4 to 6 (#5234) dependabot[bot] 2025-12-22 15:16:43 +08:00
11a25497ce Bump actions/upload-artifact from 4 to 6 (#5233) dependabot[bot] 2025-12-22 15:15:45 +08:00
64669c4243 [misc][FlashComm1][ACLGraph] Incompatibility between Flashcomm1 and FULL_DECODE_ONLY. (#5200) Qiu 2025-12-22 03:33:32 -03:00
b84ad8c5d8 [CustomOp] Register AscendMMEncoderAttention CustomOp and remove related patch (#4750) Shanshan Shen 2025-12-22 14:32:53 +08:00
b2c121637f [task] Add fused gdn gating triton kernel (#4304) Ascendyh 2025-12-22 14:09:19 +08:00
ea6206bb18 [bugfix][ACLGraph][MTP] deletes cudagraph_batch_sizes in MtpProposer (#5183) Qiu 2025-12-22 03:08:27 -03:00
dc047489c7 [Doc] Fix DeepSeek-V3.2 tutorial. (#5190) zhangyiming 2025-12-22 11:30:17 +08:00
492173cf89 [Misc] Cleanup useless print and logger (#5220) wangxiyuan 2025-12-22 11:28:26 +08:00
e117b3d693 [Perf] vectorize PCP/DCP loops in mla_v1.py (#5003) Feng Liu 2025-12-22 11:06:30 +08:00
49838d4bec [Perf] vectorize PCP/DCP loops in attention_cp.py (#4944) Feng Liu 2025-12-22 11:06:19 +08:00
904c18f929 [Feature]Use DispatchGmmCombineDecode operator to replace MC2(Optional) (#5040) wangqiankun13 2025-12-21 15:23:59 +08:00
67a0325cf2 [BugFix]Fix wrong _cos, _sin instantiation (#5154) Angazenn 2025-12-20 22:52:50 +08:00
5d02eed16f [Performance] Add async exponential while model executing (#4501) YuhanBai 2025-12-20 21:23:21 +08:00
58773af708 [Fix] Delete pooling redundant code (#4940) lianyibo 2025-12-20 20:47:30 +08:00
21745221a3 [lint]clean code (#5218) weiguihua2 2025-12-20 18:24:04 +08:00
bbde0f9743 [CI] fix lint (#5216) wangxiyuan 2025-12-20 17:03:25 +08:00
74aa968a9f [e2e] add pcp e2e (#5141) weiguihua2 2025-12-20 16:56:46 +08:00
5d59bf8ca0 [CI] unblock CI on suffix spec decoding (#4813) Mengqing Cao 2025-12-20 14:54:49 +08:00
758d81dcb1 Drop 0.12.0 support (#5146) wangxiyuan 2025-12-20 09:38:53 +08:00
243ab7d720 [CI] Use offline mode for nightly test (#5187) Li Wang 2025-12-19 21:21:42 +08:00
14931d2a86 [CI] Fix image merge bug (#5197) Li Wang 2025-12-19 17:30:48 +08:00
141bd913e1 restore matmul_allreduce_add_rmsnrom aclnn interface (#5119) Trunrain 2025-12-19 17:06:59 +08:00
17f2eead99 [Doc]Add the user_guide doc file regarding fine-grained TP. (#5084) zzhxxx 2025-12-19 16:37:25 +08:00
0cc3fc357f [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (#4818) XiaoxinWang 2025-12-19 16:34:11 +08:00
118b0ed346 [Feature] Add token mask for DispatchGmmCombineDecode operator (#5171) wangqiankun13 2025-12-19 16:31:48 +08:00
636265be6d [CI] Improve CI (#5078) wangxiyuan 2025-12-19 15:34:35 +08:00
35ad11b637 [Refactor] remove some metadata variables in attention_v1. (#5160) weijinqian0 2025-12-19 14:57:09 +08:00
bc05a81bf2 Add Qwen3-VL-235B-A22B-Instruct tutorials (#5167) luluxiu520 2025-12-19 14:56:17 +08:00
5ab6d124e5 [Doc] Add a perf tune section (#5127) Li Wang 2025-12-19 14:52:52 +08:00
a6eaf816f1 [Image] Refactor image build (#5175) Li Wang 2025-12-19 14:35:51 +08:00
cc23067f1e [refactor] refactor weight trans nz and transpose (#4878) zzzzwwjj 2025-12-19 14:27:24 +08:00
ea8f544ce7 [BugFix]Fix precision issue for LoRA feature (#4141) hukongyi 2025-12-19 14:22:06 +08:00
f952de93df 【Doc】Deepseekv3.1/R1 doc enhancement (#4827) 1092626063 2025-12-19 10:52:33 +08:00
76e58d66be support basic long_seq feature st (#5140) LookAround0301 2025-12-19 10:50:01 +08:00
cee9b715b5 [Bugfix] install trition for test_custom_op (#5112) zhangxinyuehfad 2025-12-19 10:40:46 +08:00
ca6f631cba [2/N][Pangu][MoE] Remove Pangu Related Code (#5130) weichen 2025-12-19 09:00:07 +08:00
1b47fca0e8 [bugfix] Use FUSED_MC2 MoE comm path for the op dispatch_ffn_combine (#5156) Chen Chen 2025-12-18 23:34:31 +08:00
73e4b4f496 [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (#5131) zhaomingyu13 2025-12-18 23:07:14 +08:00
073a3a6e6c [Doc][P/D] Fix MooncakeConnector's name (#5172) zxr2333 2025-12-18 22:29:19 +08:00
2304218f90 [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (#5165) Zetong Li 2025-12-18 22:27:47 +08:00
7d32371b7e [Doc] Refact benchmark doc (#5173) Li Wang 2025-12-18 22:26:13 +08:00
6cb76ecd02 [Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (#5174) ZT-AIA 2025-12-18 22:25:45 +08:00
632eab28b7 [BugFix]Fix incorrect get_current_vllm_config (#5121) Angazenn 2025-12-18 22:21:36 +08:00
fd9a47c04d fix vl pd smoke error (#5103) shaopeng-666 2025-12-18 22:20:45 +08:00
ff3914e31a [Fix] Refines decode mode padding condition for uniform queries (#5164) Yizhou 2025-12-18 21:09:23 +08:00
acc3578f58 [Graph][Fusion]Add new pattern for AddRmsnormQuant with SP. (#5077) Angazenn 2025-12-18 20:25:44 +08:00
a74a1196c5 [Feat] Support MLP_TP feature, exclude MOE layer (#4999) zzhxxx 2025-12-18 20:06:53 +08:00
5a88e3333b feat: implement high-performance Triton kernels for rejection sampling (#4830) yuxingcyx 2025-12-18 19:42:10 +08:00
0f571c347b Nominate new maintainers @zzzzwwjj @realliujiaxu @LCAIZJ (#5152) wangxiyuan 2025-12-18 18:49:07 +08:00
9fcaf66646 fix: use batch_matmul_transpose operator in MLA _v_up_proj for better performance (#5142) LICO67373 2025-12-18 16:48:55 +08:00
b69b04d3a9 implement model runner v2 basic framework (#5051) Ronald 2025-12-18 15:51:54 +08:00
1c8c23de58 [Bugfix] fix pipeline parallelism bug introduced by async-scheduling refactor work (#4973) lidenghui1110 2025-12-18 15:27:55 +08:00
9268ad11e3 Qwen3-Next：Update the gpu-memory-utilization parameter to 0.7 (#5129) ming1212 2025-12-18 15:16:33 +08:00
ef8157a5f2 fixed fused alltoall execute all reduce (#5109) AlvisGong 2025-12-18 15:07:40 +08:00
78602eab4f [UT] Add mooncake ut test (#5080) Yuzhou Tong 2025-12-18 15:07:14 +08:00
9045843c90 [UT]Ut for function cumsum_group_list in moe_mlp (ref #5025) (#5036) Clorist33 2025-12-18 15:00:16 +08:00
543f122101 [Fix] Fix DeepSeek V3.2 "no attr" error (#5147) Yizhou 2025-12-18 14:46:41 +08:00
b0376abd4c [feat] proxy support elastic scaling (#5063) yuxinshan 2025-12-18 14:29:53 +08:00
71e544e259 [test] add w4a8 accuracy case (#5110) ck-hw-1018 2025-12-18 14:10:14 +08:00
39fb9e7c83 qwen3_next add triton ops : fused_qkvzba_split_reshape (#4788) ZT-AIA 2025-12-18 11:31:04 +08:00
07014e2101 [UT] Add model_runner pcp related UTs (#4951) zhangsicheng5 2025-12-18 10:54:57 +08:00
879ec2d1c4 [Doc] add qwen3 reranker (#5086) TingW09 2025-12-18 10:54:07 +08:00
8069442b41 enable npugraph_ex (#5120) panchao-hub 2025-12-18 09:08:40 +08:00
39bdd4cfaa fix profile run for vl model (#5136) shaopeng-666 2025-12-17 23:51:31 +08:00
43d974c6f7 [Fix] Synchronize the host query_start_loc with device values to prevent shape mismatches (#5134) Yizhou 2025-12-17 23:50:12 +08:00
950570f8d1 [Bugfix]delele profile_run in model_runner (#5122) zhenwenqi2024 2025-12-17 23:48:34 +08:00
98e6e57622 [Refactor] 4/N Distinguish the branches based on the applicable scenarios of PA and FIA Ops. (#5081) weijinqian0 2025-12-17 23:14:02 +08:00
7671ce1bf1 Fix a data conversion bug introduced by commit 3b7eb51 in main#4655 (#5115) Yuzhou Tong 2025-12-17 20:19:02 +08:00
7f1e93f185 [Bugfix][MoE] Remove All2All in w4a8_dynamic (#4977) weichen 2025-12-17 17:39:57 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0