xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

8134146ab6 [CI] fix DS3.2 single node cudagraph_sizes config (#6241) starmountain1997 2026-02-02 11:47:32 +08:00
d1dcdfc408 [bugfix]fix some bug in dispatch_ffn_combine kernel (#6465) LQLlulu 2026-02-02 08:32:42 +08:00
347eb36a59 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #9) (#6135) SILONG ZENG 2026-02-01 23:20:20 +08:00
f7dc7d9b86 [CI] support build wheel and docker image by workflow (#6453) wangxiyuan 2026-02-01 20:06:22 +08:00
b4aafd4293 [Core][Misc] Clean up ProfileExecuteDuration (#6461) wangxiyuan 2026-02-01 20:06:01 +08:00
775fbc4cd2 【main】【bugfix】fix: restrict default MLAPO activation to Decode nodes only (#6451) fems14 2026-01-31 22:44:56 +08:00
ef02d20086 [CI] update gemini styleguide (#6463) wangxiyuan 2026-01-31 18:02:49 +08:00
5b0a6bcfe9 [ModelRunner] Revert "[Fix] Pads query_start_loc to satisfy FIA/TND constraint (#6459) Li Wang 2026-01-31 16:33:34 +08:00
96cbfebede [CI]Update gemini guide (#6458) wangxiyuan 2026-01-31 15:17:39 +08:00
e3a1586fce [CI]Update gemini config (#6447) wangxiyuan 2026-01-31 10:47:38 +08:00
638cae824d [bugfix](CP) Fix and unify the PD request discrimination logic. (#5939) Qiu 2026-01-31 10:26:02 +08:00
4230bc8646 [Bugfix]Modify NPU rotary encoding parameter fields，fix RopeOperation setup failed in condition of self.rotary_dim < self.head_size (#6310) wubin58 2026-01-30 21:25:04 +08:00
77ea873224 fix: resolve sync bug in DispathFFNCombine when expert num per card is 32 (#6416) xulei 2026-01-30 21:21:20 +08:00
56f5d3bd49 [Fix] Pads query_start_loc to satisfy FIA/TND constraint (#6357) Yizhou 2026-01-30 16:41:44 +08:00
f2990f7741 [e2e Test][npugraph_ex]add static kernel e2e test case (#6320) ChenCangtao 2026-01-30 16:24:48 +08:00
8969b94a14 [Nightly] Correct nightly image build ref (#6420) Li Wang 2026-01-30 15:55:58 +08:00
d252e4f5ec [P/D] Using the cache load operator to replace the index select operator. (#6295) liziyu 2026-01-30 14:27:53 +08:00
70cc5f7969 [bugfix]fix rope_forward_triton error (#6404) Wang Kunpeng 2026-01-30 14:09:00 +08:00
46cee945b3 [doc][npugraph_ex]add npugraph_ex introduction doc (#6306) ChenCangtao 2026-01-30 11:21:37 +08:00
1d661bb279 [Bugfix] Specify tensorflow version in accuracy test to avoid segmentation fault (#6292) zhangxinyuehfad 2026-01-30 09:28:24 +08:00
b2857de43f [ST]Add e2e test for Npugraphex_pass (#6388) CodeCat 2026-01-30 09:14:07 +08:00
4970de4242 [CI] Enable the skipped cases when HDK is upgraded to 25.5.0 (#6195) wjunLu 2026-01-29 22:41:41 +08:00
e35f304419 [CI] Auto partition for test cases (#6379) Li Wang 2026-01-29 20:28:10 +08:00
14bd55f30c [P/D][BugFix] Fix layerwise P/D request_id error (#6360) zxr2333 2026-01-29 20:19:05 +08:00
feab047084 [bugfix](pcp,gqa) set kv_inverse_idx_for_chunk and cp_kv_recover_idx_for_chunk to None when dcp only (#6317) Qiu 2026-01-29 19:35:52 +08:00
50e0e87646 [bugfix](CP,MLA) fix wrong slot_mapping of decode for mixed p/d batch (#6344) Qiu 2026-01-29 16:48:37 +08:00
6a7b3bc29c Qwen3-VL-MoE EAGLE support for vLLM-Ascend (#6327) Sergey-Zlobin 2026-01-29 11:44:30 +03:00
41a52beb26 [bugfix] resolve kv cache leak on P-side due to incorrect req_id (#6325) JiangWeixiang 2026-01-29 16:05:56 +08:00
597091be9f [Doc] Reranker guide remove deprecated task option (#6385) Nengjun Ma 2026-01-29 16:00:26 +08:00
7a5b345dc4 [Misc] Drop deepseek patch (#6288) wangxiyuan 2026-01-29 14:45:50 +08:00
39f8af9d96 [Main2Main][BugFix] Add shared_experts check for AscendSharedFusedMoE (#6335) whx 2026-01-29 08:47:20 +08:00
f0ff2cc22d [CI] hot fix for nightly image build tag (#6367) Li Wang 2026-01-28 23:29:50 +08:00
86b6ecac4c [CI][BugFix] Import error fix. (#6293) InSec 2026-01-28 22:07:47 +08:00
df588ed488 [BugFix] Disable enable_shared_expert_dp by default if tensor_parallel_size=1 (#6361) hucong 2026-01-28 22:01:01 +08:00
8b0a7b6d80 [CI] Nightly tests use releases/v0.13.0 (#6355) Li Wang 2026-01-28 21:46:13 +08:00
501bb395b1 [CI] Fix image build (#6333) Li Wang 2026-01-28 21:36:44 +08:00
245c1ca241 [0.14.1][bugfix][sched] fix incompatibility of RecomputeScheduler with vllm v0.14.1 (#6286) linfeng-yuan 2026-01-28 20:16:58 +08:00
e25ee65729 [Misc][Test] add e2e test for apply_top_k_top_p_custom kernel (#6348) linfeng-yuan 2026-01-28 17:25:57 +08:00
857c533e27 [CI]: add production safeguards for 300I (#6343) Shaoxu Cheng 2026-01-28 16:43:48 +08:00
9fadc8df4f [Fixbugs]: fix refactor cause to 310p chunkprefill error (#6340) Shaoxu Cheng 2026-01-28 16:41:32 +08:00
325cb16e3f [BugFix][CI]Fix DeepSeek-R1-W8A8-longseq nightly CI (#6297) dsxsteven 2026-01-28 16:36:24 +08:00
ac963f1519 [Fix] Adds CUDA graph stats to execution state (#6331) Yizhou 2026-01-28 16:34:20 +08:00
379ce599d0 [Bugfix] Add missing draft_attn_metadatas parameter to fix MTP test (#6232) LICO67373 2026-01-28 14:41:18 +08:00
f8e76a49fa [CI] Upgrade trasnformers version (#6307) wangxiyuan 2026-01-28 14:06:39 +08:00
c498cea22d [refactor] refactor excute_model and _dymmy_run method (#6043) Wang Kunpeng 2026-01-27 22:27:01 +08:00
41eb71d665 [Refactor] profiler config optimze (#6141) TMC 2026-01-27 22:09:50 +08:00
54e8389f8e [Graph][Fusion] Add MatmulAllReduceAddRMSNorm graph fusion for npugraph_ex. (#6006) CodeCat 2026-01-27 16:41:48 +08:00
21b6779a33 [UT]: refactoring 310p ops ut (#6296) pu-zhe 2026-01-27 16:31:51 +08:00
57fd6e4bd9 [Refact.]: refactoring 310p-kv cache allocator, align with main branch (#6270) pu-zhe 2026-01-27 16:26:48 +08:00
5e34c70ffc [Misc] Removes unnecessary graph size re-initialization (#6280) Angazenn 2026-01-27 14:38:07 +08:00
fea197ad50 [Main2Main] Upgrade vllm commit to 0123 (#6169) meihanc 2026-01-27 08:44:36 +08:00
9780a995e1 [BugFix] Fix wheel package build workflow (#6276) Icey 2026-01-26 20:42:17 +08:00
595b57c4d4 [CI][BugFix] Qwen3-Next nightly test fix. (#6247) InSec 2026-01-26 19:53:53 +08:00
d9979f4d13 [Doc] quick fix for vllm-ascend version (#6278) wangxiyuan 2026-01-26 19:33:18 +08:00
cb553f8eee [Community] Nominate whx-sjtu as maintainer (#6268) wangxiyuan 2026-01-26 19:22:26 +08:00
43be004379 [Lint] Fix mypy issue to make CI happy (#6272) Li Wang 2026-01-26 17:54:00 +08:00
29fb27d3bb BugFix: Fix moe_load accumulation error in ACL graph mode (#6182) Mercykid-bash 2026-01-26 17:18:46 +08:00
2d3b8a51f9 [Patch] Remove the patch of ECExampleConnector (#5976) Canlin Guo 2026-01-26 17:10:03 +08:00
b390e0ef78 [Bugfix] Fix PP+PCP and PP+flashcomm1 bugs (#5416) Jingchun Gao 2026-01-26 16:53:07 +08:00
7d119df2a9 [Feat] proxy delay to remove instances (#5934) yuxinshan 2026-01-26 16:29:45 +08:00
de095c5fed [CI] Add workfolw_dispatch for nightly image build (#6269) Li Wang 2026-01-26 15:56:38 +08:00
1645546661 [bugfix][npugraph_ex]fix static kernel uninstall issue (#6128) ChenCangtao 2026-01-26 15:03:18 +08:00
f910cebe04 [Doc] 310P Documents update (#6246) Nengjun Ma 2026-01-26 14:33:21 +08:00
0bb1f91c2c [Feature] Mooncake connector get remote ptp size (#5822) yuxinshan 2026-01-26 14:28:33 +08:00
611e223b7d [EPLB][Bugfix] EPLB support fp/bf16 (#5531) LI SHENGYONG 2026-01-26 14:28:16 +08:00
52d4acfa51 [Doc] add release note for v0.14.0rc1 (#6225) wangxiyuan 2026-01-26 14:22:40 +08:00
1f26f83e34 [CI] Bump actions/checkout from 4 to 6 (#6255) dependabot[bot] 2026-01-26 14:21:00 +08:00
ae71c4237e [CI] Bump actions/setup-python from 6.1.0 to 6.2.0 (#6256) dependabot[bot] 2026-01-26 14:20:14 +08:00
c26ad78f86 [CI][lint] Add rule codespell back (#6236) Li Wang 2026-01-26 14:12:33 +08:00
f4abd9b7b5 [CI] Fix 310p image build (#6259) wangxiyuan 2026-01-26 14:11:56 +08:00
65289676b4 [Refactor] Separate _prepare_inputs to _prepare_inputs and _preprocess (#6191) Canlin Guo 2026-01-26 14:05:23 +08:00
e3eefdecbd [Doc] Update max_tokens to max_completion_tokens in all docs (#6248) Shanshan Shen 2026-01-26 11:57:40 +08:00
418fccf0bc [310P]: fix 310p image cannot build (#6238) Shaoxu Cheng 2026-01-26 11:37:19 +08:00
76ac688388 [MM][Perf] Parallelize Q/K/V padding in AscendMMEncoderAttention for better performance (#6204) Shanshan Shen 2026-01-26 10:20:24 +08:00
ce11fd49f3 [Feature] Batch invariant torch.compile (#6107) huangning1995 2026-01-26 09:15:06 +08:00
96309e2b79 [ops] support advanced apply_top_k_top_p without top_k constraint (#6098) linfeng-yuan 2026-01-26 09:08:42 +08:00
4e3919e965 Reapply "[Refactor] Unify full-graph parameter update logic (#6041)" (#6227) (#6231) wangxiyuan 2026-01-26 09:04:54 +08:00
c38c838d03 [CI] Decrease Qwen3 dense model output throughput baseline to make ci happy (#6233) Li Wang 2026-01-26 09:04:13 +08:00
63adbedb7a [Worker] Implement update max_model_len interface for NPUWorker (#6193) Li Wang 2026-01-26 09:03:33 +08:00
ca297eb57f [CI] Migrate e2e test runner to hk (#5344) Li Wang 2026-01-26 09:00:51 +08:00
99bdd7363c [CI] update vLLM to 0.14.1 (#6222) wangxiyuan 2026-01-25 17:52:16 +08:00
384d84c7ef [Bugfix] Avoided a bug of drafter when dp and sp are enabled (#6226) drslark 2026-01-25 17:45:29 +08:00
b45bd92c2b [Bugfix] Add defensive check for multimodal_config (#6230) Canlin Guo 2026-01-25 17:39:19 +08:00
2928ae2af5 [Image] fix 310p image build (#6228) wangxiyuan 2026-01-25 16:07:13 +08:00
95649344aa Revert "[Refactor] Unify full-graph parameter update logic (#6041)" (#6227) wangxiyuan 2026-01-25 15:25:38 +08:00
7799c4ca3b [Fusion] change fusion env variable (#6201) Icey 2026-01-24 22:49:33 +08:00
6ccccad102 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #5) (#5996) SILONG ZENG 2026-01-24 22:45:38 +08:00
7faa6878a6 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #3) (#5978) SILONG ZENG 2026-01-24 22:10:18 +08:00
4e53c1d900 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #6) (#6001) SILONG ZENG 2026-01-24 22:08:33 +08:00
153da1a669 [Lint]Style: Convert vllm-ascend/ to ruff format(Batch #4) (#6200) SILONG ZENG 2026-01-24 20:40:48 +08:00
fbae41697e [310P]: refactoring for 310p kvcache and some ops class (#6117) Shaoxu Cheng 2026-01-24 20:34:29 +08:00
5b746f3e83 [Inductor]change pass to adapt to new addrmsnormBias operator (#6094) Angazenn 2026-01-24 20:16:44 +08:00
8966a99710 [Refactor] Unify full-graph parameter update logic (#6041) LICO67373 2026-01-24 20:12:57 +08:00
8129c429ef [Doc] Improved English grammar and integrated the DeepWiki badge for Ask AI (#6216) Zeng haolong 2026-01-24 20:11:18 +08:00
4fcacca8a6 [BugFix] Fix build wheel (#6218) Icey 2026-01-24 20:08:20 +08:00
fc26260d84 [BugFix] buildwheel dependency install (#6212) Icey 2026-01-24 17:11:55 +08:00
21833a4321 [Doc] Add release note for 0.13.0rc2 (#6207) wangxiyuan 2026-01-24 12:51:47 +08:00
f66bcdfb29 [P/D] Mooncake connector add zmq socket fail log (#6155) liziyu 2026-01-24 12:06:42 +08:00
14bef9af6f [P/D] Remove restrictions on mooncake for IPv6 (#5946) liziyu 2026-01-24 11:30:22 +08:00
019a2fe6e6 [Eagle3]enhance skipping dp allreduce and add it into eagle proposer (#6192) Angazenn 2026-01-24 11:29:42 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0