xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

a6745b8577 [CI] fix test_qwen3_moe_external_launcher_ep_tp2 (#6951) zhangxinyuehfad 2026-03-05 16:43:45 +08:00
1f2a083597 [bugfix]Qwen-Omni quantization model_type bugfix (#7007) tanhaoan333 2026-03-05 16:34:34 +08:00
1a7f845696 [Feat][Worker] NPUWorker Profiler profile_prefix full adaptation (RFC #6954) (#6968) realliujiaxu 2026-03-05 16:18:34 +08:00
3047b724b3 Add GemmaRmsNorm ACLGraph Support (#6473) LeeWenquan 2026-03-05 16:15:07 +08:00
5a3744c542 [EPLB] The profiling can collect the time required for adjusting the eplb. (#7001) LI SHENGYONG 2026-03-05 16:10:57 +08:00
43c8da3574 [Feat]fused_qkvzba_split_reshape supports token number greater than 65536 (#6740) songjianquan 2026-03-05 14:41:38 +08:00
13777bf3f0 [Spec Decode]clean up spec decode interface (#6947) wangxiyuan 2026-03-05 14:30:10 +08:00
2bd9c35788 [perf][refactor] Refactor and optimize sfa_v1.py for dsv3.2/glm5 (#6874) rjg-lyh 2026-03-05 14:27:11 +08:00
77e009d9fc [Feature] Add docs of batch invariance and make some extra operators patch (#6910) Ronald 2026-03-05 09:12:40 +08:00
f8315f5717 [bugfix]Qwen2.5VL accurate question (#6975) tanhaoan333 2026-03-04 22:02:29 +08:00
566c367a10 [CI] Add DeepSeek-V3.2 large EP nightly ci (#6378) zhangxinyuehfad 2026-03-04 16:15:56 +08:00
c3c265648f [Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (#6939) Zhujiyang2 2026-03-04 16:02:08 +08:00
95b44d7b73 [bugfix]fix file not found error in nightly of single-node (#6976) SILONG ZENG 2026-03-04 11:47:26 +08:00
52d9086f64 [Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (#6914) zhaomingyu13 2026-03-04 11:29:49 +08:00
d431d7d526 [CI] Enable auto upgrade e2e estimated time for auto-partition suites (#6840) Li Wang 2026-03-04 10:38:34 +08:00
c7fd7a25f7 [Doc][Misc] Fix msprobe_guide.md documentation issues (#6965) NJX 2026-03-04 10:28:31 +08:00
859f2c25b9 [Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml (#6503) SILONG ZENG 2026-03-03 20:13:43 +08:00
a0a904a3d4 [BugFix] Improve GDN layer detection for multimodal models (#6941) Cao Yi 2026-03-03 20:08:39 +08:00
5b05b3a090 [feat]ds3.2 pcp support mtp and chunkprefill (#6917) weiguihua2 2026-03-03 19:03:50 +08:00
b771ca9a47 [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (#6945) Frank Chen 2026-03-03 17:20:52 +08:00
700423156f [Triton] Centralize Ascend extension op dispatch in triton_utils (#6937) linfeng-yuan 2026-03-03 17:10:30 +08:00
cb893bcdb0 [csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (#6936) linfeng-yuan 2026-03-03 17:08:22 +08:00
2064afe380 [300I][Bugfix] fix unquant model weight nd2nz error (#6851) Shaoxu Cheng 2026-03-03 15:57:26 +08:00
f19f7b1fe2 [doc] fix supported_models (#6930) zzzzwwjj 2026-03-03 09:47:50 +08:00
248d07566f [CI] nightly test timeout (#6912) starmountain1997 2026-03-03 09:31:46 +08:00
f7a8befc20 [CI] Upgrade CANN to 8.5.1 (#6897) Xiaoshuang Wang 2026-03-03 09:02:42 +08:00
15f6564976 [Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (#6828) tanhaoan333 2026-03-03 00:07:23 +08:00
dfa9ff7f2a [P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (#6898) wangxiaoteng888 2026-03-02 23:24:03 +08:00
5899438a86 [Feat][310p] 310P support w8a8s quantization and saving w8a8sc state (#6878) pu-zhe 2026-03-02 20:09:15 +08:00
68d8d20ca2 [misc] move mxfp_compat into device to decouple from quantization init chain (#6918) linfeng-yuan 2026-03-02 18:17:01 +08:00
632801b0ad [CI][310P] Add 310p tracked files in CI light. (#6923) pu-zhe 2026-03-02 18:03:46 +08:00
16c879cdf7 [Triton][Config] Add muls_add triton kernel and refactor AscendCompilationConfig (#5518) whx 2026-03-02 17:54:25 +08:00
8547520726 [Doc][Misc] Update AGENTS.md with sign-off and PR template requirements (#6892) realliujiaxu 2026-03-02 16:44:59 +08:00
9180dd6c51 [BugFix][PCP] Fix presion bugs for pcp/dcp in PD disaggregate (#6876) Yuzhou Tong 2026-03-02 16:11:00 +08:00
ddc78dbade [300I] support decode-only aclgraph mode (#6849) Shaoxu Cheng 2026-03-02 14:15:14 +08:00
86c9109d16 Bump actions/upload-artifact from 6 to 7 (#6906) dependabot[bot] 2026-03-02 14:08:28 +08:00
002ec24dd8 Bump actions/download-artifact from 7 to 8 (#6907) dependabot[bot] 2026-03-02 14:07:59 +08:00
3c66a970f2 add mxfp8 moe quantization (#6670) Eric-dot 2026-03-02 11:04:06 +08:00
c324053b44 [CI] Revert speedup image building and CI Installation related PRs (#6891) wjunLu 2026-03-02 08:53:10 +08:00
a77fe932e4 [Platform] Fix CPU binding logic (#6889) Frank Chen 2026-03-01 20:30:43 +08:00
5e24b26a54 [Bugfix] rename enable_flash_comm_v1 back to enable_sp (#6883) realliujiaxu 2026-03-01 20:22:50 +08:00
8835236181 [Image] Fix docker image merge tag settings (#6884) wjunLu 2026-03-01 12:20:57 +08:00
9d09488b4a [Feat] support basic pcp&dcp for qwen3next (#6091) Bai Yongbin 2026-02-28 21:44:08 +08:00
64fba51275 [Bugfix] Fix openEuler dockerfile error (#6871) wjunLu 2026-02-28 20:55:18 +08:00
5ffae03156 [bugfix] fix capture shape in sp_eagle_fullgraph (#6846) starmountain1997 2026-02-28 17:30:02 +08:00
81fb7d5779 [Doc] add 310P3 guidance of PaddleOCR-VL (#6837) zyz111222 2026-02-28 16:03:07 +08:00
3cc8bf15da Support platform.get_device_uuid function (#6777) luomin2005 2026-02-28 14:17:12 +08:00
263c2f8e8d [CI] Revert auto rebase (#6867) wjunLu 2026-02-28 11:54:31 +08:00
3d563292f3 clean 0.15.0 support (#6852) wangxiyuan 2026-02-28 09:20:57 +08:00
84b00695f8 [CI] Refactor to speedup image building and CI Installation (#6708) wjunLu 2026-02-28 09:06:00 +08:00
5666ce03f5 [bugfix] Fixed an accuracy problem of gdn layer in graph (#6822) drslark 2026-02-28 08:57:53 +08:00
9cd0d6c33d [Doc][Misc] Update release notes for v0.15.0rc1 (#6859) wangxiyuan 2026-02-27 22:35:09 +08:00
b60b991005 [CI] Add nightly test for Qwen3-235B-A22B with mooncake layerwise connector (#5441) wjunLu 2026-02-27 16:31:02 +08:00
c13d90b766 [Refactor][EAGLE] 7/N Merged PCP and disable_padded interface (#6811) lilinsiman 2026-02-27 16:06:56 +08:00
e4458b2d2b [Main2Main] Upgrade vLLM to 0226 (#6813) Canlin Guo 2026-02-27 16:05:21 +08:00
80316c5824 [DOC] enable both flashcomm1 and cudagraph (#6807) starmountain1997 2026-02-27 14:52:55 +08:00
3d43ed997e add release note for 0.15.0rc1 (#6839) wangxiyuan 2026-02-27 11:55:55 +08:00
a95c0b8b82 [Doc] fix the nit in docs (#6826) wangxiyuan 2026-02-27 11:50:27 +08:00
981d803cb7 [CI] Fix doc test fail when load model with error information: 'Stale file handle' (#6832) Nengjun Ma 2026-02-27 09:14:42 +08:00
5def28dcd3 [Feat]support sequence parallelism by pass for VL models (#5632) realliujiaxu 2026-02-27 08:27:41 +08:00
ed175d6d92 [Doc][Release] Add release note skill (#6824) Yikun Jiang 2026-02-26 21:01:21 +08:00
2d49f9079a [BugFix] Support ALL D-Nodes in fullgraph when running MTP in PD (#5472) MengLong Chen 2026-02-26 19:09:05 +08:00
532f7a82f2 [Patch][Misc] Cleanup and update patches (#6802) wangxiyuan 2026-02-26 14:45:33 +08:00
c9d05d10aa [Doc][Misc] Refactor skill documentation and add Claude support instructions (#6817) wangxiyuan 2026-02-26 14:42:59 +08:00
e76b69b9ef [BugFix] [310p] Fix attention accuracy issue (#6803) pu-zhe 2026-02-26 14:30:39 +08:00
9f8b84e5fc [Misc] Drop patch_rope.py (#6291) Canlin Guo 2026-02-26 14:04:53 +08:00
3953dcf784 [Feature][Quant] Auto-detect quantization format from model files (#6645) Cao Yi 2026-02-26 10:59:25 +08:00
bc1622338c [CI] Add long and short prompt tests for DeepSeek-V3.2 (#6536) starmountain1997 2026-02-26 10:58:50 +08:00
169e434f78 [CI] Fix EAGLE CI problems (#6702) Dijurido 2026-02-26 10:26:01 +08:00
2870f7c8ad [Feat] Support routing replay (#6696) Li-Yongwen 2026-02-26 10:22:47 +08:00
a9cca0c5c4 [Refactor] Modify the binding logic, added memory migration and interrupt core binding functions. (#6785) Rozwel-dx 2026-02-26 08:49:50 +08:00
3a4292e5b7 [MM][Perf] Use seq_lens CPU cache to avoid frequent d2h copy for better performance (#6448) Shanshan Shen 2026-02-26 08:49:36 +08:00
29e3cdde20 [Doc][Skill] Introduce AI-assisted model-adaptation workflow for vllm-ascend (#6731) jack 2026-02-26 08:48:15 +08:00
3b59d0ebe9 [Doc][Feature] Add vLLM Ascend development guidelines AGETNS.md (#6797) wangxiyuan 2026-02-26 08:47:46 +08:00
aa7fb5d707 [Bugfix] Fix DeepseekV3.1 Accuracy issue (#6805) Zhu Yi Lin 2026-02-25 23:02:00 +08:00
e3927cc8f5 [Bugfix] fix bug for mtp (#6514) bowenli 2026-02-25 17:50:57 +08:00
ed051737e9 [Bugfix] Support Kimi-K2.5 models (#6755) LoganJane 2026-02-25 14:51:46 +08:00
4efd362bac [fix]change num_commmon_tokens to num_common_tokens (#6792) kx 2026-02-25 14:48:54 +08:00
2260af405f [DOC] add request forwarding (#6780) starmountain1997 2026-02-25 14:43:51 +08:00
ad9d9569ea [Bugfix] Add the missing parentheses to @torch.inference_mode (#6757) Canlin Guo 2026-02-25 14:37:53 +08:00
957804df56 [Refactor][Bugfix] Use upstream mem_utils for profiling and correct non-torch memory recorded during profiling (#6625) Shanshan Shen 2026-02-25 14:28:08 +08:00
812c722cfb [KVPool][BugFix] Correctly initialize head_or_tp_rank for mooncake backend (#6498) DreamerLeader 2026-02-25 14:22:00 +08:00
3da2ba22eb [Platform] Enable ARM-only CPU binding with NUMA-balanced A3 policy and update docs/tests (#6686) Frank Chen 2026-02-25 11:15:14 +08:00
ac9a7d1301 [Nightly] Increase VLLM_ENGINE_READY_TIMEOUT_S to avoid nightly failure (#6778) Li Wang 2026-02-25 10:14:51 +08:00
db51a1b9b6 [Feat]ds3.2 support pcp (#6733) weiguihua2 2026-02-25 09:46:57 +08:00
ee59429015 upgrade main to 0212 (#6712) Icey 2026-02-25 09:17:29 +08:00
0331f16a50 [EPLB] Reduce the memory used for heat aggregation (#6729) LI SHENGYONG 2026-02-24 18:02:24 +08:00
5c8ab7af39 [main]update release note & support matrix (#6759) zzzzwwjj 2026-02-24 17:39:35 +08:00
a8e951e6f5 [Feat] 310p supports PrefillCacheHit State (#6756) pu-zhe 2026-02-24 16:48:05 +08:00
62ea664aa7 [Lint]Style: Convert test/ to ruff format(Batch #5) (#6747) SILONG ZENG 2026-02-24 15:50:00 +08:00
747484cb64 [Bugfix] Fix wrong computed_tokens when meet exception. (#6522) xleoken 2026-02-24 15:29:30 +08:00
ff29e029de [EPLB][Bugfix] Bugfix for ineffective dynamic eplb (#6653) LI SHENGYONG 2026-02-24 14:43:04 +08:00
f41eeeb11e Refactor the ops PyTorch adapter，cleanup for csrc/torch_binding.cpp (#6732) luomin2005 2026-02-24 09:12:43 +08:00
f0caeeadcb [CI] unlock when load model (#6771) Nengjun Ma 2026-02-14 18:54:04 +08:00
70e26551cf [Doc] modify glm doc (#6770) yydyzr 2026-02-14 16:47:23 +08:00
e2237819a9 [CI]Fixed the spell check function in typos.toml (#6753) SILONG ZENG 2026-02-14 11:57:26 +08:00
64aea60f2e [EPLB][Nightly] Refactor UT (#6543) JIACHENG XU 2026-02-14 10:56:29 +08:00
1e77077788 [Bugfix][DispatchFFNCombine] resolve vec error caused by unaligned UB access (#6707) xulei 2026-02-14 10:32:50 +08:00
e2175d9c7e [Lint] Adapt lint tools for windows (#6727) whx 2026-02-13 15:53:16 +08:00
6de207de88 [main][Docs] Fix typos across documentation (#6728) Cao Yi 2026-02-13 15:50:05 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0