xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

b6bc3d2f9d [Feat.][310P]: weightNZ feature with quant or unquant. (#6705) Shaoxu Cheng 2026-02-13 15:41:02 +08:00
f40256b697 [Feat.][310P] addrmsnorm for 300I DUO (#6704) Shaoxu Cheng 2026-02-13 15:40:49 +08:00
7164990904 [Graph][Fusion] Integrating inductor pass and npugraph ex pass (#6354) Icey 2026-02-13 15:34:55 +08:00
87a0b7b7c7 [bugfix] adapt bugfix for norm_quant_fusion_pass to npugraph_ex (#6726) iiiklw 2026-02-13 10:10:39 +08:00
41d056f947 [doc] add A2 series doc for GLM5.md (#6717) taoyao1221 2026-02-12 16:08:17 +08:00
b881fab416 [P/D][PCP] mooncake layerwise support pcp function (#6627) wangxiaoteng888 2026-02-12 11:02:25 +08:00
8b23554741 [Misc] gen kv events in ascendconnector (#6593) yejj 2026-02-12 11:01:09 +08:00
7221045777 [Attention] add gpt-oss support (#5901) jiahao.quan 2026-02-12 10:55:34 +08:00
f71812011d [Feature] DispatchGmmCombineDecode support bf16/float16 gmm1/gmm2 weight and support gmm weight with ND format (#6393) lih827 2026-02-12 10:37:41 +08:00
f1ffb5fb19 [Feature] adapt to uva buffer and main2main (#6657) Ronald 2026-02-12 10:36:31 +08:00
56269eae0e [BugFix] Fix AddRMSNormQuant not taking effect (#6620) ZYang6263 2026-02-12 09:26:05 +08:00
052cc4e61b [Docs] Fix GLM-5 deploy command (#6711) Canlin Guo 2026-02-12 08:55:48 +08:00
a0315f6697 [npugraph_ex]enable npugraph_ex by default (#6664) iiiklw 2026-02-12 08:44:06 +08:00
b86ea66b0a [doc]add GLM5.md (#6709) rika 2026-02-12 04:00:40 +08:00
ff3a50d011 [Model] GLM5 adaptation (#6642) yydyzr 2026-02-11 22:22:22 +08:00
140fcaffc3 [Bugfix] Update target probs to target logits in rejection sample (#6685) Zetong Li 2026-02-11 21:31:40 +08:00
c0c2eb614e [Main][Ops] Make triton rope support index_selecting from cos_sin_cache (#5450) Angazenn 2026-02-11 21:20:53 +08:00
6bc44bf49b [CI]fix nightly multi node test error for wait for pod ready (#6675) SILONG ZENG 2026-02-11 18:11:00 +08:00
88773bb101 [main to main] upgrade main 0210 (#6673) Icey 2026-02-11 18:10:14 +08:00
53b494b1e4 [main][Quant] Remove unused rotation functions and parameters from W4A4 LAOS quantization (#6648) Cao Yi 2026-02-11 16:38:45 +08:00
bb73478c00 [Test][BugFix] Fix torch.rand usage in triton penalty test (#6680) whx 2026-02-11 16:31:49 +08:00
0c1cfa2bac Add Worker Interface:check_health (#6681) luomin2005 2026-02-11 15:24:48 +08:00
389030a8f8 add env vars & misc v0.11.0 starkwj 2026-02-11 06:27:58 +00:00
02886e2641 [Feat] 310p support MoE W8A8 quantizaition (#6641) pu-zhe 2026-02-10 17:17:44 +08:00
1eb07986bf [TEST]add a qwen3-30b acc case with mooncake mempool (#6244) jiangyunfan1 2026-02-10 16:26:55 +08:00
7cf285a77a [MOE Refactor] Remove QuantType in prepare_finalize.py (#6534) LI SHENGYONG 2026-02-10 15:59:58 +08:00
34eecacace [EPLB] Avoiding eplb's dependency on a specified model (#6528) LI SHENGYONG 2026-02-10 15:58:44 +08:00
7d4833bce9 [Doc][Misc] Restructure tutorial documentation (#6501) wangxiyuan 2026-02-10 15:03:35 +08:00
77305df398 implement batch invariant with ascendc (#6590) Ronald 2026-02-10 14:15:26 +08:00
66b60c9440 [Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (#6629) Nengjun Ma 2026-02-10 14:14:37 +08:00
2a826b5fad [Misc] upgrade to vllm main (#6646) wangxiyuan 2026-02-10 14:08:59 +08:00
1c7d1163f5 [main][Docs] Fix spelling errors across documentation (#6649) Cao Yi 2026-02-10 11:14:57 +08:00
5b8e47cb68 [bugfix]Fix no attribute 'data' when MLAPO is enable (#6601) meihanc 2026-02-10 09:04:32 +08:00
905f0764e0 [DOC]Add Memcache Usage Guide (#6476) DreamerLeader 2026-02-09 21:55:00 +08:00
9564c6bb5d [main][bugfix] Fix spec acceptance rate problem in vllm_0.15.0 (#6606) lilinsiman 2026-02-09 21:33:58 +08:00
8d44ddacb0 [Test][LoRA] Add e2e test for base model inference (#6624) yupeng 2026-02-09 21:06:49 +08:00
156976b982 [refactor]Optimized the kvcache usage of Deepseek v3.2 (#6610) Wang Kunpeng 2026-02-09 18:53:56 +08:00
cb7c419bc0 [Feat](sfa,dcp) support dcp for sfa (#6563) Qiu 2026-02-09 18:52:25 +08:00
80e5812b39 [BugFix] Add support for rotary_dim parameter when using partial rope in rotary_embedding (#6581) GoCHug 2026-02-09 17:17:52 +08:00
d060c797ed [fix bug] fix tensor mismatch bug in sigmoid operate test case (#6619) lhp-deep 2026-02-09 16:43:27 +08:00
8325528368 [Kernel]: Optimize DispatchFFNCombine performance (#6468) xulei 2026-02-09 16:30:34 +08:00
9c6d031797 [MISC] Clean up useless env USE_OPTIMIZED_MODEL (#6618) wangxiyuan 2026-02-09 15:38:58 +08:00
b7aa511daa [Patch] Remove the patch of MiniCPM (#5975) Canlin Guo 2026-02-09 14:07:44 +08:00
e5f0e0eaf7 [P/D] layerwise connector support recompute scheduler (#5900) liziyu 2026-02-07 15:24:42 +08:00
d266fd7b47 [CI] Add workflow support for lint image build (#6489) wangxiyuan 2026-02-07 09:32:01 +08:00
4fa7cf6f50 [Bugfix] Fix problematic dummy_run & improper input_batch_size in eagle (#6517) Zetong Li 2026-02-07 09:30:10 +08:00
1cc225711d [Refactor]310p_e2e test case update (#6539) pu-zhe 2026-02-07 09:28:37 +08:00
c3db1aca2f [Refactor]refactor p2p connector (#6551) lty 2026-02-07 09:27:15 +08:00
4f33e25046 [Refactor]refactor 310p attention impl and add ut (#6579) pu-zhe 2026-02-07 09:26:26 +08:00
23524f2ca4 [Refactor]refactor 310p ops and add ut (#6591) pu-zhe 2026-02-07 09:25:17 +08:00
6c49f95da2 [Ops][Refactor] Remove custom rotary_embedding operator (#6523) wangxiyuan 2026-02-07 09:24:05 +08:00
06aa6036f6 [Lint]Style: Convert vllm-ascend/ to ruff format(new Batch #8) (#6604) SILONG ZENG 2026-02-07 09:16:07 +08:00
c63b7a1188 [Test] Add initial multi modal cases of Qwen2.5-VL-7B-Instruct for disaggregated encoder (#5301) wangyu 2026-02-06 17:30:17 +08:00
06c0aed124 [CI] Fix broken CI (#6599) wangxiyuan 2026-02-06 17:23:58 +08:00
19b5d44ea8 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #10) (#6173) SILONG ZENG 2026-02-06 15:35:06 +08:00
65b7f716e6 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #11) (#6176) SILONG ZENG 2026-02-06 15:28:49 +08:00
4fb3d5e1b2 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #8) (#6129) SILONG ZENG 2026-02-06 15:25:08 +08:00
99aedaff63 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #7) (#6023) SILONG ZENG 2026-02-06 14:56:53 +08:00
d0bc16859c [CI][Misc] Some improvement for github action (#6587) wangxiyuan 2026-02-06 14:06:27 +08:00
d018aeb5fa [Image] Bump mooncake version to v0.3.8.post1 (#6428) Li Wang 2026-02-06 10:54:03 +08:00
85e33941e8 [Feat.]: 310p support MOE models (#6530) pu-zhe 2026-02-06 10:30:56 +08:00
c38166eefa [Doc] backport 0.13.0 release note (#6584) wangxiyuan 2026-02-06 10:29:15 +08:00
11339eb48a [CI] Update UT CANN version to 8.5.0 for main branch (#6564) Nengjun Ma 2026-02-06 10:28:42 +08:00
81f3c09d6d [CI] Change A2 runner (#6557) zhangxinyuehfad 2026-02-05 23:43:57 +08:00
8e66299bf1 [Bugfix] Fix the incorrect use of the output parameter in _forward_fia_slidingwindow (#6469) Ruowei Zheng 2026-02-05 20:58:54 +08:00
922e5c163b [main2main] upgrade vllm main 0202 (#6560) meihanc 2026-02-05 19:31:17 +08:00
2c1608265b [CI][npugraph_ex]Fix npugraph ex e2e test (#6553) ChenCangtao 2026-02-05 14:03:10 +08:00
33b8ca4e96 [Feature]KV pool supports sparse attention (#6339) lty 2026-02-05 10:36:52 +08:00
13c4a9c78b [bugfix]Fix accuracy issue in PCP/DCP with speculative decoding (#6491) Wang Kunpeng 2026-02-05 10:06:14 +08:00
0ead5e8681 perf: adaptive block size selection in linear_persistent kernel (#6537) Zhijun Chen 2026-02-04 21:36:26 +08:00
2ee4f23f28 [ModelRunner][Fix] Pads query_start_loc to satisfy FIA/TND constraint (#6475) Yizhou 2026-02-04 21:11:08 +08:00
2dac18afea [Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide (#6126) DreamerLeader 2026-02-04 16:35:41 +08:00
804a9ec4e6 [Fusion] Add rmsnorm dynamic quant fusion pass (#6274) Zhang-Bryan 2026-02-04 15:53:53 +08:00
e7a13beedb [Bugfix] Synchronize only the current stream to avoid device sync (#6432) IWantFight 2026-02-04 10:59:45 +08:00
bfcc372f75 [CI] Add long and short prompt tests for DeepSeek-V3.2 (#6499) starmountain1997 2026-02-04 09:10:50 +08:00
78fad4e348 [Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (#6442) Nengjun Ma 2026-02-04 09:08:18 +08:00
fa56abea9f [bugfix][npugraph_ex]duplicate pattern issue (#6513) ChenCangtao 2026-02-04 08:49:13 +08:00
7b3921c498 [bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass (#6430) ChenCangtao 2026-02-04 08:48:28 +08:00
a80e524fbc [Quant] GLM4.7-Flash Support W8A8 (#6492) dsxsteven 2026-02-03 19:49:58 +08:00
4d6444d5fd [Nightly][BugFix] Remove kv_cache nz test case for test_mla_preprocess_nq.py (#6505) whx 2026-02-03 18:26:51 +08:00
b804eb12f6 [CI]Nightly test use main (#6502) SILONG ZENG 2026-02-03 15:40:59 +08:00
41d48cb974 [CI] Update doctest from 0.9.1 to 0.13.0, and copy doc test workflow to nightly CI for better monitor. (#6452) zhangyiming 2026-02-03 15:19:03 +08:00
03a18ad6fd [E2E] add E2E for Prefix Caching cp & Chunked Prefill cp (#5149) Feng Liu 2026-02-03 15:04:14 +08:00
be5b66de6d [Doc] Contributing a Benchmark Tutorial for Suffix Speculative Decoding (#6323) zhangguinan 2026-02-03 14:52:38 +08:00
b1de6cbb31 [Bugfix][CI]Add qwen3Next MTP+Full Decode (#6047) LeeWenquan 2026-02-03 14:26:21 +08:00
39e77fb9e4 [Feat.]: support 310p w8a8 (#6454) Shaoxu Cheng 2026-02-03 14:13:06 +08:00
79803932e2 [Kernel] Add AscendC fused op transpose_kv_cache_by_block to speed up GQA transfer (#6366) lidenghui1110 2026-02-03 14:10:01 +08:00
f4a72f0d16 [CI]Disable early exit to complete all tests (#6482) SILONG ZENG 2026-02-03 11:25:51 +08:00
dffac6db73 [Refactor] Add expert processed token count output for DispatchFFNCombine/DispatchFFNCombineBF16 (#6402) guanguan0308 2026-02-03 10:41:06 +08:00
26b83f8bde [Bugfix] Improve Triton stability on Ascend for large grids (#6301) zhangxinyuehfad 2026-02-03 10:32:27 +08:00
05cc03d785 [Bugfix] fix hash conflict due to reset incompatible configuations (#6368) zhangxinyuehfad 2026-02-03 10:32:02 +08:00
b6256e8bc9 Revert "[CI] fix DS3.2 single node cudagraph_sizes config (#6241)" (#6497) starmountain1997 2026-02-03 08:42:58 +08:00
c1618a0427 [Bugfix]Fix the compatibility issue of may_reinitialize_input_batch (#6290) debuger 2026-02-02 19:16:26 +08:00
7932255c06 [Refactor][EAGLE] 6/N route mtp to eagle except pcp/dcp+mtp (#6349) lilinsiman 2026-02-02 19:15:31 +08:00
c08364f761 [Bugfix] Fix intermittent kv_port conflict with AscendDirectTransport (#6455) meihanc 2026-02-02 17:31:21 +08:00
45a573cff1 [Quantization][Feature] Support compressed tensors moe w4a8 dynamic weight (#5889) LHXuuu 2026-02-02 16:39:32 +08:00
082aa2e5b7 [Bugfix]The service fails to be started when the memcache pool is enabled (#6229) lty 2026-02-02 16:26:18 +08:00
460ea88276 [Refact.]: Refactor some leftover implementations of 300I DUO in the main branch. (#6425) Shaoxu Cheng 2026-02-02 16:12:04 +08:00
eeedf7c503 [Main2Main][Deps][Misc] Upgrade vLLM to v0.15.0 (#6470) wangxiyuan 2026-02-02 15:57:55 +08:00
d53510b26d [Misc] Print triton info in collect_env.py (#6298) zhangyiming 2026-02-02 15:53:42 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0