xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

3f462d251e [v0.18.0][CI] fix acc baseline of qwen3vl 235b (#7981) jiangmengyu18 2026-04-03 17:38:17 +08:00
0d773efd70 [CI]Fix qwen3Next Nightly CI config (#7903) LeeWenquan 2026-04-03 16:46:25 +08:00
445dc7196f [v0.18.0][CI] add qwen3vl weights download (#7915) jiangmengyu18 2026-04-03 12:15:01 +08:00
902d1312d9 [v0.18.0][CI] add nightly ci test for qwen3vl (#7913) jiangmengyu18 2026-04-03 11:39:28 +08:00
3cbd6acc89 [v0.18.0][Feature] Support Flash Comm V1 for Qwen3-VL models (#7893) jiangmengyu18 2026-04-03 11:38:41 +08:00
8ce4cfdae7 [Doc][Misc][v0.18.0] Add GLM5 to supported model list and update deployment document for GLM5 (#7963) yydyzr 2026-04-03 10:15:39 +08:00
3218eb9fe1 [DOC]update Qwen3.5 user guide (#7934) shaopeng-666 2026-04-02 22:09:00 +08:00
85234d096d [v0.18.0][Feature] support qkv_rmsnorm_mrope for qwen3vl (#7852) jiangmengyu18 2026-04-02 17:46:50 +08:00
4969a0d783 [Doc][Misc][v0.18.0] Add Parameter Description, best practices and FAQs in GLM5.md (#7909) Zhujiyang2 2026-04-02 16:28:32 +08:00
829957b53f [Doc] Update docs of Kimi-K2.5 for 0.18.0rc1 (#7931) LoganJane 2026-04-02 14:15:12 +08:00
74699877c9 [v0.18.0][BugFix] fix the weightsmapper bug of qwen3-vl (#7868) jiangmengyu18 2026-04-02 12:56:08 +08:00
1225c613fb [BugFix][0.18.0][KV Pool] Fix KV Pool not putting kv cache for vllm v0.18.0 (#7874) pz1116 2026-04-02 10:57:09 +08:00
4b2f0130bc [V0.18.0][EPLB][BugFix] Fix moe_load precision in allgather (#7890) LI SHENGYONG 2026-04-02 09:20:31 +08:00
99e1ea0fe6 [v0.18.0][Misc] Upgrade torch_npu to pre-release built version (#7918) Li Wang 2026-04-01 22:41:09 +08:00
d3de7333dc [BugFix][v0.18.0][cherry-pick] Fix embedding prefix caching for APC (#7894) hucong 2026-04-01 16:57:33 +08:00
762850fb4e [v0.18.0][Misc] Install numactl in Docker images (#7898) Frank Chen 2026-04-01 16:22:37 +08:00
2cb9195ff0 [Releases/v0.18.0][CI] Updated the parameters for the single-node test to fix the OOM issue for DeepSeek-V3.2 (#7862) Nagisa125 2026-04-01 10:28:46 +08:00
59a7526339 [CI][Misc] modify ds3.2+dcp ci (#7841) weiguihua2 2026-04-01 08:58:21 +08:00
ef9964389f [v0.18.0][BugFix][P/D]Fix layerwise connector out of memory during large buffer transfer (#7752) zxr2333 2026-03-31 22:16:53 +08:00
b1cc6ef6ae [v0.18.0][BugFix] Fix bug of precision when DSA-CP is enabled on GLM5 (#7843) yydyzr 2026-03-31 21:51:10 +08:00
0b48ddbc8b [Bugfix][0.18.0][KV Pool]Fix KV transfer put logic (#7718) pz1116 2026-03-31 20:21:23 +08:00
14411e911e [Doc][0.18.0][KV Pool]add mooncake rdma timeout (#7784) pz1116 2026-03-31 20:17:03 +08:00
a63dd5868d [0.18.0][cherry-pick][BugFix]Fix compilation errors for operators dispatch_gmm_combine_decode/moe_combine_normal/moe_dispatch_normal (#7844) wangyibo1005 2026-03-31 19:58:46 +08:00
ed4ef1f4e7 [releases/v0.18.0][Triton][Sampler] Add penalty-related Triton kernel for better performance of penalties (#7794) linfeng-yuan 2026-03-31 19:01:51 +08:00
82e26b5a6e [BugFix][v0.18.0]Adjust request map pop time (#7857) wangxiaoteng888 2026-03-31 18:55:36 +08:00
66db070423 [cherry-pick][Test]repair for test_compute_slot_mapping (#7836) ZT-AIA 2026-03-31 16:52:58 +08:00
af4278be35 [v0.18.0][CI] Close build image by pr (#7776) zhangxinyuehfad 2026-03-31 16:38:43 +08:00
7314bbe2df fix(platform): reimplement MiniMax usage accounting patch (#7835) jack 2026-03-31 16:27:00 +08:00
4f259d4fd8 [Performance]Optimize DeepSeekOCR2 RelPosAttention and CustomQwen2Decoder (#7737) Wangbei25 2026-03-31 14:49:29 +08:00
2a0a588311 [0.18.0][BugFix] Disable block verify to avoid incorrect verification on NPU … (#7839) liuchenbing2026 2026-03-31 09:36:48 +08:00
ab928ed586 [v0.18.0][P/D][Feature]Layerwise connector supports Mamba prefill prefix caching (#7796) zxr2333 2026-03-31 09:25:22 +08:00
cab5d73633 [releases/v0.18.0][BugFix] Fix server init error when set max_num_seqs not a multiple of tp while FLASHCOMM is on (#7832) linfeng-yuan 2026-03-30 20:24:52 +08:00
deceefd305 [releases/v0.18.0][bugfix][eplb] remove unnecessary weight_scale wrap behaviour (#7732) linfeng-yuan 2026-03-30 16:16:03 +08:00
fdd0726ae4 [v0.18.0][Triton] Fix triton-ascend version in Dockerfile (#7766) Mengqing Cao 2026-03-30 14:43:16 +08:00
e776d5c0f1 [Bugfix]v0.18.0 support FlashComm1 & DCP for Qwen (#7726) Yang Yuxi 2026-03-29 15:59:19 +08:00
9cc41c9457 [v0.18.0][Bugfix][EAGLE] Fix FIA pad bug under max concurrency (#7754) wangbj127 2026-03-29 12:23:44 +08:00
5df2ddd8db [v0.18.0][Bugfix]Fix Error "AttributeError: 'AscendCompressedTensorsConfig' obiect has no attribute 'enabling_fa_quant'" (#7748) Wang Kunpeng 2026-03-28 17:03:56 +08:00
c1cefd26de [v0.18.0][CI] Add nightly- prefix to branch/PR image tags (#7765) zhangxinyuehfad 2026-03-28 11:31:16 +08:00
f83cb0e6dc [Bugfix][Platform] Fix GLM47 tool-call finish backfill (#7710) jack 2026-03-28 09:15:04 +08:00
6fbd0049df [v0.18.0] Apply Eagle3 to MiniMax-M2.5 (#7619) (#7714) SparrowMu 2026-03-27 18:33:29 +08:00
60e88d9541 [v0.18.0][Refactor] Use forward mapping instead of reverse mapping in AscendMo… (#7716) Feng-xiaosuo 2026-03-27 18:25:42 +08:00
7cca7e6990 [v0.18.0][Misc] Recompute scheduler upgrade to vLLM 0.18.0 (#7720) Angazenn 2026-03-27 18:24:53 +08:00
ab619e1c53 [0.18.0][profiler] profile AICore and MTE time with torch profiler (#7730) linfeng-yuan 2026-03-27 16:37:54 +08:00
2c175f5ed8 [v0.18.0][Bugfix] Fix pr triggers on branches for nightly test workflows (#7695) zhangxinyuehfad 2026-03-27 15:17:06 +08:00
bc8e87f3db [v0.18.0][Bugfix] fix ds3.2 dcp mtp (#7681) weiguihua2 2026-03-27 14:24:53 +08:00
048c8d1afe [v0.18.0][Bugfix] Fix the bug of MTP1 crashing in multiple concurrent scenarios. (#7699) bowenli 2026-03-27 14:13:12 +08:00
6ce1dc162a [v0.18.0] fix(attention): reuse weight address in graph + RL scenario (#7715) Debonet 2026-03-27 14:11:20 +08:00
29308ac3a9 [v0.18.0][Bugfix] Fixed wrong class attribute assignment (#7586) (#7655) Mengqing Cao 2026-03-27 11:20:59 +08:00
2c2d8bb015 [cherry-pick][CI] Enforce torchaudio and torchvison compatible with pta (#7688) Li Wang 2026-03-27 11:06:13 +08:00
53cc225cac [v0.18.0][Bugfix][Platform] Fix MiniMax M2 reasoning token usage accounting (#7700) jack 2026-03-27 10:45:28 +08:00
a40eee2ba1 [feat] support dispatch_v2/combine_v2 hierarchy communication (#7698) zzzzwwjj 2026-03-27 09:20:16 +08:00
0bab629f90 [v0.18.0][bugfix]fixed block_size incorrect setting issue in dsv3.2 (#7630) (#7652) Wang Kunpeng 2026-03-26 22:38:28 +08:00
d6661c09b6 [v0.18.0][kernel] Recompilation optimization triggered by triton function parameter optimization (#7647) HarpsealCC 2026-03-26 19:10:45 +08:00
d781902ce9 [v0.18.0][CI] Fix releases/v0.18.0 ci test only support vllm v0.18.0 (#7686) zhangxinyuehfad 2026-03-26 18:36:04 +08:00
124bb00158 [CI][v0.18.0] Build nightly image for releases/v0.18.0 per pr (#7662) zhangxinyuehfad 2026-03-26 16:48:51 +08:00
2db33868a4 [kernel] Recompilation optimization triggered by triton function parameter optimization (#7645) cvSoldier 2026-03-26 16:31:34 +08:00
dba34d4915 [v0.18.0][Triton][Qwen3.5] delete expr for kernels args (#7646) Mr.WXS 2026-03-25 23:31:27 +08:00
dd55736ee4 fix uncompatible between fc1 and non-sp-padding (#7643) Wangbei25 2026-03-25 23:23:37 +08:00
2ad0ca52a6 Qwen3.5 MoE supports flashcomm v1 (#7644) wangbj127 2026-03-25 23:09:33 +08:00
ff1860bd81 [CI]fix lint (#7641) Wang Kunpeng 2026-03-25 18:48:10 +08:00
05a561129e [Graph][Bugfix] Set default cudagraph max capture size via platform defaults (#7572) linfeng-yuan 2026-03-25 17:57:19 +08:00
d452d04656 [A5][bugfix] Fix fused MoE A5 MXFP8 scale normalization, load-balance routing and gating_topk ops (#7573) linfeng-yuan 2026-03-25 17:20:28 +08:00
e0e585a109 [310P]: add torch chunk gated delta rule and 910b parity ut (#7594) Shaoxu Cheng 2026-03-25 16:46:43 +08:00
17da96658f [ModelLoader][Feature] Add rfork support for fast model loading (#7392) Marck 2026-03-25 16:40:30 +08:00
6ddfc41312 [bugfix] Fixed the error issue when overlaying MTP and full decode on DSV3.1 C8. (#7571) pichangping 2026-03-25 14:36:26 +08:00
95d33f05c2 [eagle3][pcp] fix acceptance rate for eagle3 and pcp enabled (#7549) lilinsiman 2026-03-25 11:52:04 +08:00
114ec75a06 [bugfix][CI] fix '_OpNamespace' 'vllm' object has no attribute 'qkv_rmsnorm_rope' (#7620) meihanc 2026-03-25 11:05:34 +08:00
8e3f8bab57 [Nightly] Nightly pre-build image (#7388) Li Wang 2026-03-25 09:24:01 +08:00
8977be1df3 [Bugfix]Fix deepseek 3.2 C8 precision by rotary tensor (#7537) Yaphets24 2026-03-25 09:18:00 +08:00
d96440924a adapt to main2main for model runner v2 (#7578) Ronald 2026-03-25 09:08:44 +08:00
fc3ec100bc [Patch] Fix balance scheduling (#7611) Zhu Yi Lin 2026-03-25 08:57:06 +08:00
3f4087a8f0 [310P]fused recurrent gated delta rule pytorch core and ut (#7398) Shaoxu Cheng 2026-03-25 08:53:14 +08:00
54879467c4 [CI] refine issue triage rules, wan regex and update stale setting (#7531) drizzlezyk 2026-03-24 20:11:31 +08:00
1e3c1e76bf [Lint]Add lint hooks for clang-format, shellcheck, forbidden imports, and boolean context manager checks (#7511) SILONG ZENG 2026-03-24 20:03:01 +08:00
d1a83a72f7 [doc] add enable_sparse_c8 option in configuration options (#7600) rjg-lyh 2026-03-24 19:36:34 +08:00
0210cc0b07 lower log level in PD Disaggregation (#7589) zouyida2052 2026-03-24 18:03:17 +08:00
0e3186f07c [model_runner_v2]:optimize the performance of the _compute_slot_mappings_kernel (#7575) lhp-deep 2026-03-24 17:29:14 +08:00
5d12446573 [Feat][SP] Suport SP for VL MoE models (#7044) realliujiaxu 2026-03-24 17:16:00 +08:00
9615bc33fd Fix Qwen3Next CI Config (#7561) LeeWenquan 2026-03-24 17:08:17 +08:00
d98a0727c8 [Feat] Add npugraph_ex enablement logging (#7574) panchao-hub 2026-03-24 17:04:48 +08:00
bdb65319a9 [UT] Align input arguments with Ascend(Yarn)RotaryEmbedding with vLLM and add ut (#7358) Angazenn 2026-03-24 16:02:56 +08:00
568b6d0601 [P/D] Check wildcard address for layerwise connector (#7389) liziyu 2026-03-24 15:50:06 +08:00
73cadecfb4 [P/D] [Bugfix] fix mooncake layerconnector dead when update_decoder_info fail (#7514) liziyu 2026-03-24 15:49:46 +08:00
67aad1fce8 [BugFix][P/D] fix padding error on FullGraph mode && fix layerwise connector mamba accuracy (#7506) zxr2333 2026-03-24 15:15:55 +08:00
475b4b0cea Revert "GMM custom operator optimization in small batch scenarios (vllm-project#7100)" (#7557) LeeWenquan 2026-03-24 14:24:44 +08:00
83bd77c983 [310p]: add rmsnorm gated fallback and unit test (#7424) Shaoxu Cheng 2026-03-24 09:00:11 +08:00
1de805ce0a [Ops][Misc] Refactor and optimize CausalConv1d for Ascend (#7495) jiaojiao 2026-03-24 00:07:12 +08:00
e942b62d74 [features]support split qkv rmsnorm rmope for qwen3.5 (#7368) ZhuQi-seu 2026-03-23 23:58:12 +08:00
8e0789bb36 [CI] Recover pd disaggregated encoder test case that been incorrectly skipped (#7505) Nengjun Ma 2026-03-23 21:41:28 +08:00
fcba91a392 Main2main Upgrade vllm commit to 0320 17:00 (#7510) Nengjun Ma 2026-03-23 21:37:41 +08:00
bdd90c0088 [model_runner_v2]optimize the performance of the post_update. (#7496) weijinqian0 2026-03-23 20:29:55 +08:00
170dcbda62 [Feature] Support DeepSeek for A5 (#7232) lijiahang226 2026-03-23 20:28:26 +08:00
13397e9cb7 [310p] Add a PyTorch implementation of the GDN gating operator on 310P (#7430) Shaoxu Cheng 2026-03-23 20:26:39 +08:00
e344a53127 [bugfix][CI]Skip e2e log summary when the log file is missing or empty (#7552) meihanc 2026-03-23 20:25:59 +08:00
886756aea0 [Bugfix][CI] Fix aisbench installation to avoid Gitee authentication (#7536) zhangxinyuehfad 2026-03-23 20:16:51 +08:00
ffd195b0fe [Bugfix]Remove conflicting triton after vllm-ascend install on x86 (#7497) SILONG ZENG 2026-03-23 20:14:42 +08:00
fb283b5820 [CI] Add nightly CI test cases for the GLM-5 (#7429) liuhy1213-cell 2026-03-23 19:14:19 +08:00
41dadd4312 [main][bugfix] Solved the problem of the d node getting stuck in the pd-separation scenario (#7534) drslark 2026-03-23 18:53:07 +08:00
a253235a59 [Doc] Add note for unsupported PCP + FULL (#7559) Zetong Li 2026-03-23 17:34:51 +08:00
9976e685b7 [Bugfix][eager][oom] fix rank0 load imbalance by no padding when multi dp (#7297) Levi 2026-03-23 17:05:02 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0