xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

c980e68d40 [Feature] support aclgraph for model runner v2 (#7110) Ronald 2026-03-13 09:11:46 +08:00
1f71da80eb [CI] Fix server start failure when long weight loading (#7098) Li Wang 2026-03-13 08:52:56 +08:00
7fe0469e27 [CI][Misc] Use offline mode for model downloads (#7179) Li Wang 2026-03-13 08:52:24 +08:00
fe4cad24e9 [BugFix]fix qwen3.5 reshape_kvcache bug (#7209) zxr2333 2026-03-12 23:51:40 +08:00
5fe7942bbd [CI] add action for issue labeler on issue open/edit (#7208) drizzlezyk 2026-03-12 20:16:17 +08:00
0c659e91ed [MTP][Bugfix] Fix GLM5-W8A8 precision issues caused by rotary quant MTP weights (#7139) wangbj127 2026-03-12 20:01:24 +08:00
de93790d08 [main][bugfix] Fixed the problem of drafter crashed in FULL mode (#7158) drslark 2026-03-12 18:38:50 +08:00
88c56e3bf2 [Misc] Fix main lint to make CI happy (#7204) Li Wang 2026-03-12 18:27:48 +08:00
0a171b5cdd [Test][BugFix] Fix dispatch_gmm_combine_decode test stability (#7097) Li Wang 2026-03-12 17:22:44 +08:00
d866e6b238 [Bugfix] Fixed permission issues with the automatic PR submission workflow (#7142) Li Wang 2026-03-12 17:18:59 +08:00
e5343d6eb3 [310P][Bugfix]: fix ngram graph replay accuracy error (#7134) Shaoxu Cheng 2026-03-12 17:08:08 +08:00
bfd049aa2c [Lint] fix typos error in epd_load_balance_proxy_layerwise_server_example.py (#7199) Ronald 2026-03-12 17:04:38 +08:00
21fea86b08 feat: [CI] Introduce uv to accelerate pip install (#7127) tfhddd 2026-03-12 16:47:23 +08:00
592661e787 [Doc] EPD doc and load-balance proxy example (#6221) shaopeng-666 2026-03-12 16:17:17 +08:00
09d26754cd [Bugfix] Fix the issue where no exception is thrown when graph capture fails. (#5644) 无脸男 2026-03-12 16:14:45 +08:00
77b43492ae improve the ttft when use mooncake (#6125) xleoken 2026-03-12 16:13:48 +08:00
f244f3c4a9 [BugFix] Fix problem of extra processes on rank0 device (#7107) Hexiang Wang 2026-03-12 15:59:03 +08:00
e5024d0264 [doc] Add Ascend PyTorch Profiler section (#7117) herizhen 2026-03-12 15:51:00 +08:00
132f3c5d0a Support per-step heat collection and enhance FlashLB for multi-stage load balancing (#6477) Mercykid-bash 2026-03-12 15:49:09 +08:00
abe72d7cb9 Refactor quantization layer name mapping to leverage vLLM built-in mappers (#7050) Feng-xiaosuo 2026-03-12 15:48:14 +08:00
fb0d6dd175 [main][bugfix] Fixed the problem of speculative decoding in FULL mode (#7148) drslark 2026-03-12 14:51:12 +08:00
37d1bd8c50 fixed fia pad logic in graph mode. (#7144) XiaoxinWang 2026-03-12 14:50:54 +08:00
bbffe58b63 [Doc] fix DSV3.1 PD configs (#7187) MengLong Chen 2026-03-12 14:24:49 +08:00
aa0143e55d refactor: add a check before layer_sharding logging (#7186) Qiu 2026-03-12 11:56:04 +08:00
5f3826b093 [Build] Add support for Ascend950 chip (#7151) linfeng-yuan 2026-03-12 10:25:51 +08:00
da01a74009 Revert "[CI] fix skiped e2e test when upgrade vllm version (#6654)" (#7166) meihanc 2026-03-11 23:03:15 +08:00
3b6b3c4214 [MODELRUNNERV2]fix penality ops (#7013) shiyuan680 2026-03-11 17:13:34 +08:00
830f39dd70 [Bugfix][LoRA] Fix the issue when enable LoRA + tp + fully_sharded_loras (#6650) yupeng 2026-03-11 15:43:15 +08:00
a7f91fce71 [KV Pool]get_num_new_matched_tokens return 0 if token length < block_size (#7146) pz1116 2026-03-11 15:05:34 +08:00
1a83c8e2f5 [CI] Build Image for v0.16.0rc1 (#7155) Mengqing Cao 2026-03-11 14:48:50 +08:00
90aa048e60 [CI] Skip test_mooncake_layerwise_connector.py in ut (#7147) SILONG ZENG 2026-03-11 11:46:29 +08:00
e16009b2cc [BugFix]Fix recomputed scheduler bug (#7137) zxr2333 2026-03-11 00:32:19 +08:00
54668e73c5 [Model] Support Minimax-m2.5 on NPU (#7105) SparrowMu 2026-03-11 00:12:02 +08:00
239683c7a6 [P/D]Mooncake Layerwise Connector supports hybrid attention manager with multiple kvcache groups (#7022) zxr2333 2026-03-10 23:59:20 +08:00
0f289fa2a8 Add patch_qwen3_5 for triton ops fused_recurrent_gated_delta_rule (#7109) pppeng 2026-03-10 23:28:58 +08:00
a78a00e0b1 [Doc][ReleaseNote] Add release notes for v0.16.0rc1 (#7067) Canlin Guo 2026-03-10 22:45:05 +08:00
881c38d210 [Misc] Download on both hk and guiyang region (#7129) Li Wang 2026-03-10 19:22:32 +08:00
6e8d3681ae [bugdix] The problem that the w4a8 weight fails to be loaded when the EP is not enabled is resolved. (#7090) shaopeng-666 2026-03-10 16:57:05 +08:00
a5ea699e29 [eagle][cp] fix eagle_cp enable bug2 (#7079) lilinsiman 2026-03-10 16:32:49 +08:00
67d40f23fd [CI]Upgrade niglty multi-node-tests max-parallel to 2 (#7035) zhangxinyuehfad 2026-03-10 16:25:51 +08:00
5df450bca4 [Feat] [310p] Support w8a8sc quantization method (#7075) pu-zhe 2026-03-10 16:13:20 +08:00
14c71b19e1 [Doc][CPU binding] Add user/developer guide for CPU binding (#7045) Frank Chen 2026-03-10 15:59:31 +08:00
33234aa0c5 Revert "[Feature][Quant] Auto-detect quantization format from model f… (#6873) Li Wang 2026-03-10 11:27:32 +08:00
40f7d93f1a [bugfix][LoRA] Fix the lora accuracy issue introduced by the upstream vLLM changed. (#6958) yupeng 2026-03-10 10:43:18 +08:00
a398fa6a0b [Bugfix]: correct streaming content-type in load balance proxy server (#6985) ZRJ026 2026-03-10 10:11:35 +08:00
bb7ed759d4 [Doc] Fix broken chunked-prefill URL in supported features (#6963) NJX 2026-03-10 10:10:07 +08:00
9b30d4e774 [Doc][Misc] Add metrics usage documentation and example (#6962) NJX 2026-03-10 10:09:50 +08:00
326fd359aa [Docs] add and publish llms.txt for LLM discovery (#6886) Yikun Jiang 2026-03-10 10:06:27 +08:00
bdad11e9a8 [doc] Update GLM4.x.md, add GLM4.x multi-node deploy tutorial (#6872) ZKSU 2026-03-10 10:01:53 +08:00
146b9d2a83 [BugFix] fix metadata execute error: integer modulo by zero (#6521) xleoken 2026-03-10 09:58:06 +08:00
f6db47f103 [CI] fix skiped e2e test when upgrade vllm version (#6654) meihanc 2026-03-10 09:55:35 +08:00
43df2cb2fc [Lint]Style: Convert test/ to ruff format(Batch #1) (#6738) SILONG ZENG 2026-03-10 09:52:50 +08:00
9216e1b050 [fix] Add support for Qwen3.5 Dense and MoE on Ascend (#6933) xmpp777 2026-03-10 09:09:31 +08:00
3b25ded8b7 [CI] Bump docker/metadata-action from 5 to 6 (#7069) dependabot[bot] 2026-03-10 09:06:04 +08:00
2325bbe79b [CI] Bump actions/checkout from 4 to 6 (#7070) dependabot[bot] 2026-03-10 09:05:22 +08:00
ee5347e824 [qwen3 next ]add ascend c casual_conv1d_fn (#6661) ZT-AIA 2026-03-09 23:29:49 +08:00
48b624e4cc [BugFix] Fix implementation bug of triton rope_siso (#7082) Hexiang Wang 2026-03-09 23:08:43 +08:00
542258ac9d [feat] parameterize hardcoded MLA dimensions to support GLM5-W8A8 (#6902) liuchen2026fly 2026-03-09 20:17:21 +08:00
13adcbe44b feat(attention_cp): support chunked prefill for Qwen3Next with PCP&DCP (#6900) Qiu 2026-03-09 17:55:09 +08:00
a76a509fae [MOE][Bugfix] Cancel H2D for expert_map (#7000) LI SHENGYONG 2026-03-09 17:53:54 +08:00
82fdd40d49 [Feat]Xlite Qwen3 MoE Support Data Parallel (#6715) 王远 2026-03-09 17:53:35 +08:00
ba1c82e758 [DOC] Add explaination of 310p special param: max-model-len (#7065) Shaoxu Cheng 2026-03-09 16:54:43 +08:00
dec04ec8d8 [Bugfix] Fix incorrect layer count for MTP models in update_aclgraph_sizes (#7064) wanghuanjun2113 2026-03-09 16:14:51 +08:00
4b4961ba5f [fix]Resolve compilation errors that occur when building versions subsequent to b020 (#7059) guanguan0308 2026-03-09 16:09:35 +08:00
eb648f7398 [Bugfix] Support quant config in glm46v (#7062) LoganJane 2026-03-09 16:07:16 +08:00
57c554a23f [bugfix]Fix parameter ordering bug in _merge_multimodal_embeddings (#7068) tanhaoan333 2026-03-09 16:05:52 +08:00
cb4c7de856 [Perf] Optimize MTP execution by reordering state update operation (#6844) Cao Yi 2026-03-09 15:55:27 +08:00
d39d80830c [KVCache]Qwen3.5 supports contiguous tensor hybrid-attn kv-cache (#6887) zxr2333 2026-03-09 15:28:40 +08:00
482d39c1b0 [commuinty]update contributor and refresh tool (#7072) wangxiyuan 2026-03-09 15:19:35 +08:00
aef9d4249d [Perf] Avoid CPU sync in mrope_positions copy by using full tensor copy (#7014) Cao Yi 2026-03-09 14:46:37 +08:00
65eae6de7b Add Ascend Ops recurrent_gated_delta_rule (#6725) LeeWenquan 2026-03-09 14:14:14 +08:00
23bf5d4d48 [EPLB][bugfix] Bugfix for fused mc2 (#6794) JIACHENG XU 2026-03-09 11:26:57 +08:00
06ec136f08 [Bugfix] Obtain kernel block size for computing slot mapping correctly (#7019) Zetong Li 2026-03-09 11:05:01 +08:00
a3f4f6b10b [P/D][Bugfix] Layerwise stacking MTP error. (#7036) wangxiaoteng888 2026-03-09 10:55:43 +08:00
675387f1fd [P/D][KVPool]Mooncake Layerwise Connector supports kv_pool (#7032) zxr2333 2026-03-09 10:49:04 +08:00
6a7115fa0d [main][feature] Support quarot for eagle3 without embedding (#7038) drslark 2026-03-09 10:43:06 +08:00
737dfcf638 [MOE] commit GMM custom operator (#7010) chenxi-hh 2026-03-09 09:56:31 +08:00
01d3515dcf [eagle][cp][bugfix] Fix the bug in eagle and cp enabled (#6981) lilinsiman 2026-03-06 20:49:49 +08:00
1c0ecf806a [bugfix] fix pass bug: pass really rope dim for npu_rotary_embedding (#6880) aipaes 2026-03-06 19:35:17 +08:00
094eb0eff9 [bugfix]Qwen-Omni quantization bugfix (#7042) tanhaoan333 2026-03-06 17:24:22 +08:00
a51d6366b9 [Bugfix] Qwen3Next support FlashComm1 (#6830) ZhaoJiangJiang 2026-03-06 17:14:08 +08:00
a2696006d1 [Refactor][EAGLE] 8/N delete mtp_proposer (re-pull) (#7033) Zetong Li 2026-03-06 17:11:22 +08:00
c5dfa8d645 [OPS]add split_qkv_rmsnorm_mrope ops (#6730) Fager10086 2026-03-06 16:18:37 +08:00
bc0fd7ca72 [Feat]Adapt the graph mode (piecewise and full_decode_only) of PCP and DCP for DeepSeek v3.2. (#6940) xiaocongtou6 2026-03-06 16:10:24 +08:00
a813eadd2d [MM][Perf] Enable 2.7x faster for convolution computation with aclnn BatchMatMulV2 (#7017) Shanshan Shen 2026-03-06 14:26:37 +08:00
c49ce18ea5 [Test] Add e2e test cases for the Qwen-VL model adaptation to Ascend 310p (#6977) wanghengkang 2026-03-06 14:25:10 +08:00
620076b76a [bugs] fix install FIA sh (#6989) aipaes 2026-03-06 11:42:32 +08:00
16c3b0b822 Revert "[Refactor][EAGLE] 8/N delete mtp_proposer" (#7030) wangxiyuan 2026-03-06 11:24:05 +08:00
8c2c82f3e1 [Bugfix] Fix the moe_forward error when setting enable_static_kernel … (#6964) panchao-hub 2026-03-06 10:36:10 +08:00
a7820d20f4 [Doc][KV Pool]Update Memcache local service config example: increase default world size to 256 and update description (#7025) pz1116 2026-03-06 10:23:55 +08:00
a838a89630 [v0.16.0][P/D][Bugfix] Support ALL D-Nodes in fullgraph when running MTP in PD (#6948) MengLong Chen 2026-03-06 10:01:33 +08:00
ccd00798f3 [EPLB] Display the expert hotness comparison before and after eplb. (#6877) LI SHENGYONG 2026-03-06 09:53:29 +08:00
18b52afe2b [Ops][Misc] Optimize split_qkv_rmsnorm_rope op (#6827) frank 2026-03-06 09:30:31 +08:00
a60e179c7f [Refactor][EAGLE] 8/N delete mtp_proposer (#7016) Zetong Li 2026-03-06 09:10:57 +08:00
bd571cf6d6 [Main2Main] Upgrade vLLM to 0303 (#6944) SILONG ZENG 2026-03-06 09:08:52 +08:00
640ecd1b77 [BugFix] Fix muls_add fusion not working for GLM5 models (#6928) liuchen2026fly 2026-03-05 22:35:54 +08:00
ae394767d4 【main】ADXL/HIXL supports FabricMem Mode (#6806) fems14 2026-03-05 21:04:11 +08:00
50441e4650 [BugFix][MTP] Fix prefill misclassified as decode when prompt tokens == num_spec_tokens + 1 (#6835) Cao Yi 2026-03-05 17:33:10 +08:00
91c39ebae6 [BugFix] [dcp] Fix GQA Model Error when Enable both DP and DCP (#7012) dsxsteven 2026-03-05 16:51:08 +08:00
1e4017e3fa [CI] support nightly ci for per pr by labels (#6483) zhangxinyuehfad 2026-03-05 16:46:37 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0