xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

a6ef3ac4e4 [Performance] Pre-issued exponential distribution operator. (#4908) weijinqian0 2025-12-11 23:02:51 +08:00
0fbe0831ec [bugfix][refactor] fix recompute_scheduler break with vllm 0.12.0 & support async scheduling & refactor recompute_scheduler.py (#4895) linfeng-yuan 2025-12-11 22:24:49 +08:00
e538fa6f9c [Doc] Update tutorial index (#4920) wangxiyuan 2025-12-11 20:53:13 +08:00
e56dba9b0d [CI]cleanup e2e test (#4800) SILONG ZENG 2025-12-11 20:35:32 +08:00
3349f61769 [CI] Cancel whl build when submitting a new commit (#4925) Li Wang 2025-12-11 19:54:52 +08:00
c30b51e764 Refactor CI workflow (#4912) wangxiyuan 2025-12-11 19:34:43 +08:00
551069e53a [Doc] Update structured output doc with upstream link (#4015) Shanshan Shen 2025-12-11 19:14:29 +08:00
06a66939cd Remove mindie_turbo (#4896) wangxiyuan 2025-12-11 18:46:12 +08:00
b89763f1ed [CI] speed up ut (#4901) wangxiyuan 2025-12-11 18:45:43 +08:00
3fade30275 [Bugfix] Prevent engine hang during KVCacheSendingThread startup (#4754) Jade Zheng 2025-12-11 18:39:25 +08:00
18221c0e1d [Fusion] normalize fusion naming and enable e2e test (#4693) Icey 2025-12-11 17:53:43 +08:00
07c7131104 [Fix] Delete redundant variable (#4903) Wang Yixuan 2025-12-11 17:50:25 +08:00
e1bb6f47ec [doc] Add Qwen2.5 tutorials (#4636) yangxiaoman8 2025-12-11 17:30:05 +08:00
332b547728 [Bugfix] support mtp kv transfer and pp partition by hand in kv transfer (#4892) lidenghui1110 2025-12-11 17:23:21 +08:00
a47aa4da2f [feat] apply flashcomm1 on bailing (#4868) hwhaokun 2025-12-11 17:02:21 +08:00
2f965d8339 [Bugfix] Fix the bug in sfa-cp under multi-DP scenarios. (#4850) zzhxxx 2025-12-11 16:44:14 +08:00
5ebb9bd8d2 【Bugfix】bugfix_for_bmm_transpose (#4899) ChrisGelhLan 2025-12-11 16:32:28 +08:00
78bf211539 [OPS] support triton causal_conv1d_fn ops (#4119) QilaiZhang 2025-12-11 15:52:39 +08:00
eac72f5f23 [Feat] Flashcomm2 use o_shared linear (#4188) zzhxxx 2025-12-11 12:43:04 +08:00
bb76f7962c cleanup useless torchair logic (#4856) wangxiyuan 2025-12-11 11:21:13 +08:00
c12eb22cbe [feat] mlapo add bf16 no_quant support (#4852) chenjunyi 2025-12-11 11:06:56 +08:00
c95c271538 [E2E] Optimize nightly testcase. (#4886) zhangyiming 2025-12-11 10:15:39 +08:00
66b0781840 [E2E] Refactor the e2e testcases. (#4789) zhangyiming 2025-12-11 10:15:00 +08:00
11bebb518c [E2E] Remove unused PD-disaggreate scripts in E2E test. (#4837) zhangyiming 2025-12-11 09:23:38 +08:00
0eefbe75b6 [Doc] Add local running multi-node nightly test case guide (#4884) Nengjun Ma 2025-12-11 08:56:27 +08:00
ff7d703192 [Doc]Add tutorial document for qwen-VL-Dense (#3516) SILONG ZENG 2025-12-11 08:55:23 +08:00
89a8607b30 add DeepSeek-R1 tutorial. (#4666) Leaf 2025-12-11 08:52:27 +08:00
f917d5edcf Remove useless env (#4858) wangxiyuan 2025-12-11 06:51:07 +08:00
08441baedd Remove VLLM_ASCEND_ENABLE_TOPK_TOPP_OPTIMIZATION (#4860) wangxiyuan 2025-12-10 23:50:18 +08:00
37db0844f5 Remove COMPILE_CUSTOM_KERNELS env (#4864) wangxiyuan 2025-12-10 23:48:03 +08:00
3362be7f86 Update patch doc (#4869) wangxiyuan 2025-12-10 23:27:45 +08:00
0fb1dc43a1 [BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill (#4770) drslark 2025-12-10 22:54:24 +08:00
490ddf536f [perf][dsv3.2][async_scheduling] improve dsv3.2 performance by eliminating HD synchronization (#4805) linfeng-yuan 2025-12-10 22:31:47 +08:00
dd622aa6a6 [Feature] Support npuhraph_ex backend (#4700) ChenCangtao 2025-12-10 20:48:05 +08:00
d7db6791e7 [Bugfix] Support for mlapo in deepseekv3.1 w4a8 (#4828) Zhu Yi Lin 2025-12-10 20:45:07 +08:00
8bb028424b Fixed the performance degradation issue in post-processing in speculative decoding scenarios. (#4849) FuNanyang 2025-12-10 20:32:44 +08:00
5b179c53f1 [FEAT] Support DeepSeek-V3.2 with FULL_DECODE_ONLY mode (#4706) Yizhou 2025-12-10 20:11:09 +08:00
0d8c0f1a24 [Bugfix] Fix out-of-bounds access to token_id due to uninitialized logprobs (#4248) JiangWeixiang 2025-12-10 17:45:58 +08:00
bd8be2e759 [Kernel] Add moe normal ops (#4810) shiro-zzzz 2025-12-10 17:15:28 +08:00
c77dca54b2 [CI] fix lint (#4888) wangxiyuan 2025-12-10 16:57:24 +08:00
1a443f2772 add multi_npu_qwen3_dense tutorials (#4543) wind-all 2025-12-10 16:09:56 +08:00
a82b0fa70e mooncake connector support pipeline parallel & fix pp with flashcomm1 (#4054) lidenghui1110 2025-12-10 16:01:43 +08:00
ce5872705e [Feat] Support native Kimi-K2-Thinking native W4A16 quantized experts weights (#4516) Ruri 2025-12-10 15:58:52 +08:00
c1db298f43 [CI] Use offline mode for modelscope (#4875) Li Wang 2025-12-10 15:49:34 +08:00
ceadc2788d Revert "[refactor]support gatingtopk operator generalization (#4356)" (#4873) 1092626063 2025-12-10 15:45:20 +08:00
7132ae8532 [CI]Cleanup accurary test (#4861) SILONG ZENG 2025-12-10 14:13:56 +08:00
e32014ac1d [Model] Support pooling models (#3122) lianyibo 2025-12-10 11:37:57 +08:00
1a7a34c5ec add e2e test for mtp async_scheduling (#4826) Ronald 2025-12-10 11:30:22 +08:00
134e011896 [Test] Temporarily skips Qwen3-30B-A3B-W8A8 data parallel test case (#4857) Yizhou 2025-12-10 11:05:32 +08:00
89733111fa [Nightly] Optimize nightly online test logger info (#4798) Li Wang 2025-12-10 09:24:19 +08:00
835b4c8f1d Drop torchair (#4814) wangxiyuan 2025-12-10 09:20:40 +08:00
ba9cda9dfd [Kernel] add custom op MatmulAllreduceAddRmsnorm (#4606) Trunrain 2025-12-10 09:05:33 +08:00
f404c9af7f [bugfix] fix quant method validation bug (#4831) zzzzwwjj 2025-12-09 23:42:01 +08:00
863a5a5a17 Add gsm8k accuracy test for multi-note Qwen3-235B-A22B (#4802) Nengjun Ma 2025-12-09 23:05:41 +08:00
a77045f355 [P/D][main]Offline the llmdatadist connector related parts of the code and files. (#4780) wangxiaoteng888 2025-12-09 22:36:43 +08:00
848419d1ba [Bugfix] Disable the dispatch_ffn_combine kernel in MTP path (#4751) Chen Chen 2025-12-09 22:14:05 +08:00
cd1c69ee0b [Fix] Add extra warmup run count for MC2 on specific SoC version (#4843) Yizhou 2025-12-09 21:37:38 +08:00
4813cefc58 [CI] Setup github proxy for self_hosted runners (#4841) Li Wang 2025-12-09 20:35:43 +08:00
c331503677 [Refactor] 2/N Unify all mask generation methods and cache mask (#4779) weijinqian0 2025-12-09 18:51:00 +08:00
dee00d0de3 [Usability]local_buffer_size support for units: GB, MB, KB, B (#4829) lty 2025-12-09 17:52:24 +08:00
49e346c6a6 [UT]add pcp aclgraph ut (#4804) weiguihua2 2025-12-09 17:27:40 +08:00
c68dfa70ac [Bugfix]fix bmm_transpose ops in dsv32 (#4791) Wang Yixuan 2025-12-09 16:55:09 +08:00
c8b671c498 [CI] Increase HCCL_BUFFSIZE for A3 (#4838) Li Wang 2025-12-09 16:39:50 +08:00
9567e5dd8c [kernel] Adapt DispatchGmmCombineDecode operator to parameters of small operators (#4790) wangqiankun13 2025-12-09 16:17:06 +08:00
9a885d08d0 [Feat] Multi-stream for eplb heat collection and aggregation (#4214) dsxsteven 2025-12-09 16:16:55 +08:00
dda027e680 [KVPOOl]Support pp (#4761) baxingpiaochong 2025-12-09 16:15:26 +08:00
9038865261 [CI] Optimize CI time (#4821) Li Wang 2025-12-09 16:09:37 +08:00
9a144bc7be [Docs][0.11.0] delete AIV env variables in DSV32 documentation (#4833) linfeng-yuan 2025-12-09 15:53:53 +08:00
8f45f9ce29 BugFix: Resolve shape mismatch in eplb update and calculation issues in quant_apply_mlp (#4777) Mercykid-bash 2025-12-09 15:46:58 +08:00
695e5c9ebc [0.11.0][ops] npu_top_k_top_p supports k and p only (#4153) linfeng-yuan 2025-12-09 15:45:40 +08:00
4588d1f215 [CI] Use arm node for unit tests (#4819) Li Wang 2025-12-09 15:45:14 +08:00
56f01820e8 [Docs]fix the configuration conflicts in documentation (#4823) linfeng-yuan 2025-12-09 15:37:38 +08:00
e0757dc376 [0.11.0]fix the configuration conflicts in documentation (#4824) linfeng-yuan 2025-12-09 15:37:06 +08:00
1c70f5c922 [CI] Skip test_suffix_correctness (#4820) Li Wang 2025-12-09 11:48:13 +08:00
033e3557cc [cherry-pick]fix qwen3vl mrope op (#4484) (#4811) zhangxinyuehfad 2025-12-09 11:07:32 +08:00
2b819bb35b [Bugfix] Add the check for a null VllmConfig (#4749) Canlin Guo 2025-12-09 09:21:17 +08:00
9862a23985 【0.11.0-dev】optimization of kimi-k2 in cann8.3 (#4555) Levi 2025-12-09 08:49:15 +08:00
0d094531b4 [bugfix] Fixed the bug in retrieving the quantization method for mlp.… (#4797) zhangxinyuehfad 2025-12-09 08:47:19 +08:00
7e70da9fb7 Revert "[Kernel] add custom moe ops for prefill" (#4806) Mengqing Cao 2025-12-08 23:20:32 +08:00
432b861cae Fix incorrect MLAPO weight release in PD mixex scenarios. (#4774) ZYang6263 2025-12-08 23:17:45 +08:00
b230e7e987 [MOE]move weight transpose to wakeup for RL secnarios (#4626) lhp-deep 2025-12-08 20:34:52 +08:00
58db21f56a [DP] Fix dp padding logic in dummyrun (#4705) Mengqing Cao 2025-12-08 20:32:35 +08:00
193dc1703f [Doc] Add Qwen3-235B tutorial (#4358) xuyexiong 2025-12-08 20:06:46 +08:00
4e728f1f40 [Bugfix] fix qwen3-vl-moe shape ERROR during the _prepare_inputs phase under high concurrency. (#4658) Levi 2025-12-08 19:30:16 +08:00
d412565ec9 [Cherry-pick]bmm_transpose to v011dev (#3995) Wang Yixuan 2025-12-08 19:22:14 +08:00
9766cf9128 fix qwen3vl mrope op (#4484) shaopeng-666 2025-12-08 19:19:17 +08:00
3c3c9a5386 Bump actions/checkout from 6.0.0 to 6.0.1 (#4772) dependabot[bot] 2025-12-08 19:15:40 +08:00
0617d7d394 [Kernel] add custom moe ops for prefill (#4194) shiro-zzzz 2025-12-08 19:11:58 +08:00
f0876b5d88 [Bugfix] Fix Dcp dimension mismatch when enable Mlapo (#4687) zengzengran 2025-12-08 17:19:58 +08:00
afe00505de [Fix] skip xlite e2e test (#4786) LuLina 2025-12-08 16:48:15 +08:00
96ea0e078f [EPLB] Add log Info for moe_load Imbalance Ratio (#4482) dsxsteven 2025-12-08 14:28:13 +08:00
a433f3280a [Op] DeepSeekV3.2 support bmm_transpose operator (#4631) ZYang6263 2025-12-08 14:03:38 +08:00
0b65ac6c4b remove useless patch (#4699) wangxiyuan 2025-12-08 11:02:42 +08:00
866347a621 Deepseek Mtp model uses the lm_head and embedding from the main model (#2790) zzhxxx 2025-12-08 10:33:29 +08:00
9fbcfa36af [CI] Fix ngram & suffix test oom (#4755) fluctlux 2025-12-08 09:26:29 +08:00
916a9a1913 fix synchronize error of exceeds_max_model_len d2h copy (#4708) Ronald 2025-12-08 09:07:59 +08:00
6391f0625f [v0.11.0-dev][bugfix] Add branch for stream up-lifting in update_attn_params (#4437) Angazenn 2025-12-08 08:54:46 +08:00
2be0fe2691 [Feat] Add Euler xlite graph wrapper support (#4526) LuLina 2025-12-08 08:27:46 +08:00
8fdb689a32 [BugFix] Refactor ACL graph size adjustment for speculative decoding (#4640) Yizhou 2025-12-07 17:32:45 +08:00
688b1332da [P/D] check kv extra config and del hccl backend (#4547) liziyu 2025-12-07 15:19:42 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0