xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

56d8f088dd [Doc] Update DeepSeek-V3.2 tutorail, add single-node and multi-node deployment (#6196) zhangyiming 2026-01-24 11:29:07 +08:00
2dd68652bc [Doc] Add the setting description of cudagraph_capture_sizes in speculative decoding user guide (#5637) zhaomingyu13 2026-01-23 23:22:44 +08:00
a2f022f9b6 [UCMConnector]Add has_connector_metadata (#6172) UnifiedCacheManager 2026-01-23 21:16:48 +08:00
717d299ae5 [BugFix]bug fix for dispatch_ffn_combine (#6156) lhchg 2026-01-23 21:14:18 +08:00
44a4ff6960 [main][BugFix] Avoided a bug of torch_npu.npu_mm_reduce_scatter_base when sp size >= 16 (#6168) drslark 2026-01-23 21:12:23 +08:00
e90b14140b [feature] add_rms_norm support bias (#5790) yjmyl 2026-01-23 21:09:54 +08:00
6c73b88dd6 [CI] Enable FLASHCOMM1 with layer_sharding and FULL_DECODE_ONLY in ds32 testing (#6115) starmountain1997 2026-01-23 19:48:37 +08:00
8786412f5c [Bugfix]KV pool rank 0 consumes more HBM (#6113) baxingpiaochong 2026-01-23 19:47:33 +08:00
bdf65e6bd3 [TEST]Add mooncake common method for tests (#6194) jiangyunfan1 2026-01-23 17:14:15 +08:00
1e116829ac [doc]update --max-num-seqs in Qwen3-235b tutorial (#6197) Angazenn 2026-01-23 17:11:10 +08:00
af4dbb6b26 [CI] Use nginx for package cache to speed up CI (#6170) Li Wang 2026-01-23 16:56:16 +08:00
4173255c0c [main][Bugix] fix kv pcp+pooling+pd separation bug (#6153) weiguihua2 2026-01-23 16:15:04 +08:00
ff63626874 [Bugfix] Fix the issue of the acceptance rate decline for Qwen3-30B-A3B-EAGLE3 (#6138) zhaomingyu13 2026-01-23 16:12:56 +08:00
a3079cd253 [Tests] Skip unstable eagle cases to keep CI success (#6180) wjunLu 2026-01-23 15:33:53 +08:00
78af0c30a3 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #12) (#6177) SILONG ZENG 2026-01-23 14:59:19 +08:00
193acc2c19 [CI] Add nightly ci test for deepseek v3.1 (#5386) zhangxinyuehfad 2026-01-23 14:36:49 +08:00
8210a62a44 [EPLB][Bugfix]Reduce unnecessary video memory usage (#6020) LI SHENGYONG 2026-01-23 14:21:13 +08:00
749e24f81e [bugfix] align max_num_batched_tokens with tp*pcp when using FLASHCOMM1 (#6000) Qiu 2026-01-23 14:19:49 +08:00
f8d03d21f1 Add Medusa speculative decoding support for vllm_ascend (#5668) simplzyu 2026-01-23 14:14:23 +08:00
a69ef10c3a [Refactor] Quantization Module Refactor (#5738) Cao Yi 2026-01-23 14:13:47 +08:00
8378bc28b0 [Misc] Remove CP Redundant Variables after FIA operator enables for CANN 8.5 (#6013) dsxsteven 2026-01-23 14:13:12 +08:00
418a43e2a2 [Bugfix] Fix seq_lens reset issue causing performance degradation (#6158) ZYang6263 2026-01-23 11:29:54 +08:00
739d074b0c update other platforms' Dockerfile starkwj 2026-01-22 12:07:03 +00:00
82a2b3bcc7 [P/D]Add ssl cert for metaserver proxy (#5875) wangxiaoteng888 2026-01-23 11:11:44 +08:00
f4a361fcc3 [CI] Re-open skipped cases due to PTA upgrading and update the golden results (#6144) wjunLu 2026-01-23 10:46:31 +08:00
4d780a8b01 [Misc] Revert "[Misc] Bump mooncake version to v0.3.8.post1 (#6110)" (#6164) Li Wang 2026-01-23 09:53:32 +08:00
72ffc00b86 [Bugfix] Fix structured outputs errors: TypeError: apply_token_bitmask_inplace_cpu() (#6151) wjunLu 2026-01-23 09:52:55 +08:00
08a45e6053 [Doc] update supported features (#6165) zhangxinyuehfad 2026-01-23 09:50:11 +08:00
819a4459ce Drop vLLM 0.13.0 support (#6069) zhangxinyuehfad 2026-01-23 09:45:08 +08:00
27a513b672 [BugFix]hccl bufferSize check for dispatch_ffn_combine (#6130) lhchg 2026-01-23 08:41:40 +08:00
7725314b26 [Feat] Merge the multi eagle graphs to one graph (#5940) anon189Ty 2026-01-23 08:37:02 +08:00
63d3921208 [Bugfix] Remove use_aclgraph in mtp_proposer and use use_cuda_graph (#6032) Zetong Li 2026-01-22 21:08:07 +08:00
176bfc36bc [BugFix] fix 3vl dense model load quant weight (#6100) shaopeng-666 2026-01-22 20:05:25 +08:00
7f91ac2649 [CP&SP] Integrate FIA operator in mla_cp._forward_decode (#5641) Bai Yongbin 2026-01-22 20:02:30 +08:00
88632cf976 [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (#6145) wjunLu 2026-01-22 19:50:54 +08:00
e54d294df3 [CI]Install clang in dokerfile for triton ascend (#4409) meihanc 2026-01-22 19:01:28 +08:00
a7d781f135 [Main] Upgrade PTA to 2.9.0 (#6112) wjunLu 2026-01-22 17:59:06 +08:00
1402cf6874 [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (#5721) CodeCat 2026-01-22 17:22:41 +08:00
f2c0ced06d [P/D][PCP]bugfix pcp force free twice caused logger error (#6124) wangxiaoteng888 2026-01-22 16:24:33 +08:00
1d3544c887 [BugFix]converting pa get_workspace back to capturing (#5833) Angazenn 2026-01-22 15:49:22 +08:00
484e7c59dc [CI] optimize lint term (#5986) Li Wang 2026-01-22 15:46:59 +08:00
9bba0a2a68 [Bugfix] Fix Triton operator usage for multimodal models based on the mrope_interleaved parameter (#6042) zhangxinyuehfad 2026-01-22 15:46:05 +08:00
38edfd585a [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (#6015) ChenCangtao 2026-01-22 12:35:06 +08:00
34fb628248 [BugFix] Support setting tp=1 for the Eagle draft model to take effect (#6097) zhaomingyu13 2026-01-22 11:36:23 +08:00
37a9cf818a [Misc] Bump mooncake version to v0.3.8.post1 (#6110) Li Wang 2026-01-22 11:03:16 +08:00
08d7014874 [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (#5758) wangqiankun13 2026-01-22 10:51:02 +08:00
cef04b3555 [bugfix] adapt_remote_request_id (#6051) JiangWeixiang 2026-01-22 10:48:40 +08:00
ef9d8367f5 [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (#5143) maxmgrdv 2026-01-22 05:34:58 +03:00
dd8571860d [Feature] Support DSA-CP for Hybrid scenario (#5702) zzhxxx 2026-01-22 10:12:09 +08:00
69740039b7 [CI] Upgrade CANN to 8.5.0 (#6070) wangxiyuan 2026-01-22 09:29:50 +08:00
ab676413e6 Default enable MLAPO (#5952) Nengjun Ma 2026-01-22 09:26:39 +08:00
a15a5f6aa5 [Doc] Supplement PD separation parameters of DeepSeek V3.1 (#6053) MengLong Chen 2026-01-22 08:53:44 +08:00
8900e3398b [Ascend] perf: optimize rope embedding with triton kernel for huge performance gain (#5918) ZCG12345 2026-01-21 22:01:22 +08:00
2a618d2454 [Ops] update causal_conv1d_update (#5984) LeeWenquan 2026-01-21 16:33:52 +08:00
53bfb38192 [CI]Update triton ascend version in 3.2.0 (#6067) meihanc 2026-01-21 16:02:23 +08:00
58ff465821 [bugfix] fix the complex and potentially problematic generate_kv_idx. (#5957) Qiu 2026-01-21 14:21:02 +08:00
12a668b1d9 [Refactor] AttentionBuilder inherit from base class in vllm (#5916) LICO67373 2026-01-21 10:45:45 +08:00
839e03cbc9 [Nightly] Use Qwen repo for qwen3-next (#6064) Li Wang 2026-01-21 10:39:12 +08:00
1ed9524763 add dispath_ffn_combine_bf16 (#5866) guanguan0308 2026-01-21 09:30:30 +08:00
bec8641876 [BugFix] Fix input parameter bug of dispatch_gmm_combine_decode[RFC: issue 5476] (#5932) wangqiankun13 2026-01-21 09:26:40 +08:00
5b129cf0a1 [1/N][Feat] Xlite Qwen3 MoE Support (#5951) Magnus 2026-01-21 09:26:03 +08:00
1ab6cd4935 [Bugfix] Fix setting of speculative_config.enforce_eager for dsv32 (#5945) Zetong Li 2026-01-21 09:24:33 +08:00
936d81a258 [bugfix][mm] change get_num_encoder_tokens to get_num_encoder_embeds in recompute_schedule.py (#5132) kx 2026-01-21 09:13:52 +08:00
b399117e89 [Bugfix] fix pcp qwen full graph FIA bug (#6037) weiguihua2 2026-01-21 08:49:05 +08:00
b6d55fc48e [Bugfix]Fixed precision issues caused by pooled request pooling (#6049) DreamerLeader 2026-01-20 23:51:31 +08:00
8b98d7a4e8 【main】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (#6045) fems14 2026-01-20 22:56:04 +08:00
b2475099a0 [main][Bugfix] Fixed an problem related to embeddings sharing (#5967) drslark 2026-01-20 21:34:28 +08:00
6c30f8bf87 [Feature]refactor the npugraph_ex config, support online-infer with static kernel (#5775) ChenCangtao 2026-01-20 21:31:38 +08:00
0c0514579f [CI][Lint] Show lint diff on failure (#5956) Li Wang 2026-01-20 21:07:01 +08:00
8cf1e8d8a7 [CI] Add wait logic for each individual case (#6036) Li Wang 2026-01-20 21:05:44 +08:00
750c06c78a [CI] Add DeepSeek-V3.2-W8A8 nightly ci test (#4633) zhangxinyuehfad 2026-01-20 21:05:15 +08:00
cea48c2a34 model runner v2 support triton of penalty (#5854) shiyuan680 2026-01-20 20:26:05 +08:00
afabb49f00 [Docs][Model] Support Qwen3-VL-Embedding & Qwen3-VL-Reranker (#6034) Canlin Guo 2026-01-20 17:36:31 +08:00
402872050a [Tests] move qwen3 performance test from nightly to e2e (#5980) Icey 2026-01-20 17:08:43 +08:00
5892455f43 [Bugfix] fix bug of pcp+mtp+async scheduler (#5994) weiguihua2 2026-01-20 15:24:05 +08:00
ea57e3e7a4 [Main2Main] Upgrade vllm commit to releases/v0.14.0 (#5988) meihanc 2026-01-20 15:10:40 +08:00
55b20ac63b [Ops] Add layernorm for qwen3Next (#5765) LeeWenquan 2026-01-20 14:43:14 +08:00
0664c6e67a [Doc] Add layer_sharding additional config for DeepSeek-V3.2-W8A8 (#5921) starmountain1997 2026-01-20 12:40:54 +08:00
a5b099c73d [Bugfix] Reset incompatible config (#6005) zhangxinyuehfad 2026-01-20 11:02:38 +08:00
a8576ec610 [Refactor][EAGLE] 5/N Update attn_metadata by common_attn_metadata (#5869) lilinsiman 2026-01-20 10:06:00 +08:00
f58e110afe 【feat】switch for fusion ops gmmswigluquant (#5992) aipaes 2026-01-19 21:19:25 +08:00
38cfcd572a [doc](cp) correct the prefill of GQA and adjust desc of block table. (#5697) Qiu 2026-01-19 18:53:48 +08:00
f0d41199a6 [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (#5936) Levi 2026-01-19 17:12:13 +08:00
bc486d9530 [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (#5960) wangxiaochao6 2026-01-19 16:35:13 +08:00
ebb940691f [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (#5755) wangqiankun13 2026-01-19 16:10:43 +08:00
687df88151 [Refactor] Move AttentionSpec initialization to Attention module (#5834) LICO67373 2026-01-19 14:22:18 +08:00
83de5385b4 [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (#5897) LI SHENGYONG 2026-01-19 13:47:40 +08:00
b27774dbd6 [CI]fix for lint CI (#5982) SILONG ZENG 2026-01-19 09:49:28 +08:00
c929bd1e8d [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (#5034) Icey 2026-01-19 09:28:07 +08:00
9cad1a8349 [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (#5928) meihanc 2026-01-19 09:27:55 +08:00
bc1f6713e7 [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (#5933) LI SHENGYONG 2026-01-19 09:24:25 +08:00
9fed2636cb [EPLB][Nightly][Bugfix] Get expert from moe layer only (#5908) LI SHENGYONG 2026-01-19 09:23:28 +08:00
ad3a1eaf70 [Bugfix][MM] Fix multi-modal inference OOM issues by setting expandable_segments:True (#5855) Shanshan Shen 2026-01-19 09:17:31 +08:00
0eafed9bd6 [doc]Table split (#5929) herizhen 2026-01-19 09:15:04 +08:00
c4fde5c064 [Doc] Upgrade outdated ut doc (#5937) Li Wang 2026-01-19 09:12:46 +08:00
329961b375 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #2) (#5977) SILONG ZENG 2026-01-19 08:59:46 +08:00
2b6dc100b5 Eagle3 mm support, enablement on qwen3vl (#4848) Song Zhixin 2026-01-19 08:58:07 +08:00
05e69b99e5 [Doc] Remove Chinese characters from the icons in the doc. (#5959) zzhxxx 2026-01-18 07:22:57 +08:00
fff5df3efe [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (#5968) wangxiaoteng888 2026-01-17 18:49:27 +08:00
22f253142a [Feature] Support fine-grained shared expert overlap (#5482) Jade Zheng 2026-01-17 11:53:22 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0