xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

8e2c59e1ee Main2main upgrade vllm commit to 03 19 17:00 (#7478) Nengjun Ma 2026-03-23 16:25:57 +08:00
caa71e50ca [Perf] Simplify FIA prefill context merge path (#7293) LICO67373 2026-03-23 15:47:42 +08:00
da866cc168 [CI] Bump docker/build-push-action from 6 to 7 (#7541) dependabot[bot] 2026-03-23 15:46:12 +08:00
71df17f4e6 bugfix(MC2): refactor the comm group of MC2 to be compatible with PP (#7291) Qiu 2026-03-23 15:44:21 +08:00
8527b49764 [CI] Bump docker/setup-buildx-action from 3 to 4 (#7542) dependabot[bot] 2026-03-23 15:44:14 +08:00
5b60b530d6 [Bugfix][310p] the new A5 mmencoder op donot support 310p (#7518) Shaoxu Cheng 2026-03-23 15:40:34 +08:00
9e2878065a [Spec-Decode] Fix spec decode proposer in 0.18.0 (#7544) Mengqing Cao 2026-03-23 15:39:24 +08:00
6b7d9b76f1 [MM][Perf] Pre-compute seq_lens and put it on CPU before ViT vision blocks for better performance (#7104) Shanshan Shen 2026-03-23 15:24:26 +08:00
5c0d02f689 [Bugfix] Fix multi-instance serving OOM on single card (#7427) Shanshan Shen 2026-03-23 14:22:59 +08:00
44ef9a36ac [fix]: fix precision issue in dispatch_ffn_combine_bf16 and remove redundant sync (#7198) guanguan0308 2026-03-23 10:14:03 +08:00
e68464a1d6 [Bugfix] Fix slow hasattr in ACLGraphWrapper.__getattr__ (#7442) Canlin Guo 2026-03-23 09:26:24 +08:00
75fae619d5 [Misc] Refactor aclgraph accuracy test to use logprob-based comparison (#7455) Li Wang 2026-03-23 09:08:21 +08:00
9bf9b4b267 [Feature] Optimize Qwen3.5/Qwen3Next GDN prefill by prebuilding chunk metadata (#7487) Qi Mao 2026-03-22 23:09:23 +08:00
b2e71b7930 [Bugfix] Fix get_rope_shape for Kimi-K2.5 (#7521) LoganJane 2026-03-22 21:06:31 +08:00
9e2965bae2 [Feature] Support Flash Comm V1 for VL models (with MLA) (#7390) Cao Yi 2026-03-22 21:05:28 +08:00
9d0b7c8e98 [Platform][BugFix] Preserve hybrid block size on Ascend (#7528) Qi Mao 2026-03-22 11:21:49 +08:00
cbf46fad3c fixed graph mode bug. (#7460) XiaoxinWang 2026-03-22 10:09:37 +08:00
84a74f0cb1 [Bugfix] Fix padding logic in eagle proposer for kimi25 (#7348) Zetong Li 2026-03-21 16:57:22 +08:00
f482c314cf Upgrade vllm v0.18.0 in dockerfile (#7523) zhangxinyuehfad 2026-03-21 16:19:41 +08:00
bff4fbfca5 upgrade to 0.18.0 (#7502) meihanc 2026-03-21 16:05:38 +08:00
80a4265717 [Feat] Support separate attention backend for target and draft model. (#7342) HongtaoYang 2026-03-21 10:48:01 +08:00
88d03a783f [refactor] replace scattered business kwargs with typed request objects and explicit stage boundaries (#7024) linfeng-yuan 2026-03-20 23:23:57 +08:00
c860535246 【A5】【Qwen VL】Qwen VL adapt for A5 (#7046) yesyue-w 2026-03-20 16:56:12 +08:00
f39f566e22 Refactor duplicated code into a common method to reduce redundancy (#7210) idouba 2026-03-20 16:49:02 +08:00
6ad74e8c80 [CI] Add git safe repo (#7501) Li Wang 2026-03-20 16:40:24 +08:00
a16c99141b Adapt w8a8mxfp8 quantization for Qwen VL models (#7417) Siyuan Kong 2026-03-20 16:18:58 +08:00
4e6dbe0956 [EPLB][Bugfix] Set parallel_config.enable_eplb to true to load redundant experts (#7470) LI SHENGYONG 2026-03-20 15:22:55 +08:00
1e05c4908f [EPLB] Reduce the memory used for batch_isend_irecv (#7344) LI SHENGYONG 2026-03-20 12:25:58 +08:00
a1f321a556 [Doc]Refresh model tutorial examples and serving commands (#7426) SILONG ZENG 2026-03-20 11:34:18 +08:00
7be66cec75 [Test] Add the always_check_nodes parameter to the _wait_for_multiple_servers function in conftest.py for the EPD test case. (#7410) wangyu 2026-03-20 11:33:48 +08:00
eb92e7d50e [Bugfix] Restore balance scheduling patch for v0.17.0 (#7479) SILONG ZENG 2026-03-19 20:12:57 +08:00
95e1dc11d8 [CI]: Auto-update estimated test times in config.yaml (#7413) vllm-ascend-ci 2026-03-19 19:01:16 +08:00
9d1452c74d [OPS]add split_qkv_tp_rmsnorm_rope ops (#7376) ichaoren 2026-03-19 17:19:18 +08:00
ee804ce23e Main2main upgrade vllm to 0318 commit (#7412) Nengjun Ma 2026-03-19 17:17:36 +08:00
05afc7f8c3 [CI]repair for ci custom ops (#7461) ZT-AIA 2026-03-19 17:13:12 +08:00
83a4065b4b [CI] Add pre-commit check for patch logger (#7446) Li Wang 2026-03-19 16:53:20 +08:00
38e637eef5 Fix manual mapping registration and kimi_k2 layer name mapping (#7347) Feng-xiaosuo 2026-03-19 16:46:41 +08:00
87d6424b2e [CI] Add nightly CI test cases for the GLM-4.7 model. (#7391) aipaes 2026-03-19 16:43:29 +08:00
0261d1b1c6 [CI] add glm4.7 weights download (#7395) aipaes 2026-03-19 16:43:15 +08:00
5e65062973 [doc] Fix issues in the GLM4.7 documentation (#7457) aipaes 2026-03-19 16:42:59 +08:00
6fc190b44a [Doc][KV Pool]Revision KV Pool User Guide [2/2] (#7456) pz1116 2026-03-19 16:17:34 +08:00
42bcad7e9b GMM custom operator optimization in small batch scenarios (#7100) chenxi-hh 2026-03-19 16:10:30 +08:00
8e0ebb470a [Misc] Drop Prefetch MLP Env (#7357) wangxiyuan 2026-03-19 14:27:27 +08:00
ce239db4fb [CI] Add multi-hardware wheel build and release workflow (#7312) zhangxinyuehfad 2026-03-19 11:06:17 +08:00
270c5cb8cd [CI] Add nightly CI test cases for the Kimi-K2.5 (#7416) LoganJane 2026-03-19 11:02:29 +08:00
3effc4bc70 [Doc][KV Pool]Revision KV Pool User Guide (#7434) pz1116 2026-03-19 10:13:13 +08:00
ab9cd2e305 [CI]Add CI summary log (#7202) meihanc 2026-03-19 09:32:06 +08:00
e8f7b2e3f1 [Refactor] [310p] Support Mamba Cache and support attn_head_size larger than 128 (#7372) pu-zhe 2026-03-19 09:16:22 +08:00
8b79d4de52 Main2main upgrade to vllm 0317 afternoon (#7409) Nengjun Ma 2026-03-18 23:24:27 +08:00
305820f1a9 [Bugfix] fix bug about model type of qwen3_vl_8b_instruct_w8a8 (#7383) jiangmengyu18 2026-03-18 20:30:03 +08:00
fb8e22ec00 [DOC] MiniMax-M2.5 model intro (#7296) SparrowMu 2026-03-18 20:14:36 +08:00
2916601e6c [CI] add Kimi-K2.5 weights download (#7406) LoganJane 2026-03-18 18:29:37 +08:00
adc57c5951 [release] Add GLM5 known issue for 2-node PD mixed deployment (#7436) SILONG ZENG 2026-03-18 18:03:18 +08:00
565868a2a6 [doc] add doc for Kimi-K2.5.md (#7371) LoganJane 2026-03-18 17:16:35 +08:00
ec34bf0062 [Misc]fix logger which does not take effects in patches (#7402) Angazenn 2026-03-18 17:13:12 +08:00
1ff9e3f25f [CI] Bump docker/login-action from 3 to 4 (#7299) dependabot[bot] 2026-03-18 17:06:48 +08:00
b3206cd6f6 [CI] Bump actions/setup-python from 5 to 6 (#7298) dependabot[bot] 2026-03-18 17:06:28 +08:00
58725b8b24 [doc] add Prefill-Decode Disaggregation doc for GLM5.md (#7300) liuhy1213-cell 2026-03-18 17:00:31 +08:00
6bc68c55d0 [doc] Refresh the documentation for DeepSeek-V3.2 (#7403) Nagisa125 2026-03-18 14:59:48 +08:00
c1392a6ce6 [bugfix][accuracy] Fix ds indexer accuracy problem caused by k rope (#7341) rjg-lyh 2026-03-18 14:20:21 +08:00
c7157af8f7 [P/D] LayerwiseConnector supports the virtual push functionality on node D. (#7361) wangxiaoteng888 2026-03-18 10:50:02 +08:00
5894a27bfd [CI] Add PAT_TOKEN when checkout (#7400) Li Wang 2026-03-18 10:31:32 +08:00
1c954ff264 [main2main] upgrade vllm to 0308 (#7213) zhangyiming 2026-03-18 09:24:43 +08:00
79ef41a53d [CI] add scheduled stale issue management (#7354) drizzlezyk 2026-03-17 23:28:29 +08:00
467c815db6 [CI] expand issue labeler rules for feature/model triage (#7356) drizzlezyk 2026-03-17 23:28:04 +08:00
d9ac7e8539 [Bugfix] Assertion error when decode prefix cache fully hits (#7236) Chao Lei 2026-03-17 23:17:45 +08:00
3b3dd2a889 [doc] Refresh the documentation for GLM-4.7 (#7292) aipaes 2026-03-17 23:09:12 +08:00
5645ca8392 [BugFix]A2 MOE method&& layerwise MTP bugfix && Mamba gdn_metadata bugfix (#7364) zxr2333 2026-03-17 23:03:45 +08:00
a457d0f0e8 [doc] Upload doc for qwen3.5-27B and qwen3.5-397B-A17B on Ascend (#7313) pppeng 2026-03-17 22:54:57 +08:00
a370dfa962 [bugfix]Enable dispatch_ffn_combine feature for qwen3.5 (#7066) asunxiao 2026-03-17 19:53:02 +08:00
83ad14c74c [bugfix] fix unzip file path for fia operator (#7367) aipaes 2026-03-17 17:21:27 +08:00
7669963c27 [Perf] Optimize bias handling in AscendRMSNorm (#7226) rjg-lyh 2026-03-17 16:53:28 +08:00
8f278fc101 [eagle3][pcp] fix bug for eagle3 and cp enable (#7309) lilinsiman 2026-03-17 16:14:45 +08:00
4e62a2ae15 [Bugfix] fix TransposeKvCacheByBlock op error report in plog (#7235) lidenghui1110 2026-03-17 10:08:32 +08:00
3f39ac9c8d [Feature]Supports DSv3.1 PD separation and C8 quantization (#7222) pichangping 2026-03-16 22:49:05 +08:00
a6f6e919e6 [main][bugfix] Fixed the problem that eagle3 will crash in FULL_DECODE_ONLY (#7290) drslark 2026-03-16 20:41:36 +08:00
b1a78886a9 [xlite][Bugfix] Support mrope and deepstack features in xlite backend (#7295) LVYANGGUO 2026-03-16 17:05:52 +08:00
22d0e1d3d7 [model_runner_v2]optimize the performance of the _topk_log_softmax_kernel (#7221) wangx700 2026-03-16 16:49:10 +08:00
4d443b9228 [bugfix] restore pr-7029 and fix patch error (#7294) rjg-lyh 2026-03-16 15:39:42 +08:00
9320365dab [Test][Feature] Add e2e test for QuaRot model with eagle3 (#7128) zhaomingyu13 2026-03-16 15:35:55 +08:00
71c21f76f5 [Refactor] Replace npu_ring_mla with FIA in MLA prefill (#5704) LICO67373 2026-03-16 10:33:09 +08:00
e20f0b1a0d [ReleaseNote] Add release note for v0.17.0rc1 (#7240) Mengqing Cao 2026-03-15 22:47:47 +08:00
7e85f2ff97 [CI] Add test_qwen3_5.py (#7133) pppeng 2026-03-15 22:19:02 +08:00
0c299f79b9 Revert "[Perf][1/N] w8a8c8 support in dsv3.2/glm5 (#7029)" (#7288) Mengqing Cao 2026-03-15 20:19:09 +08:00
29f195a91c [Bugfix][LoRA] Fix the bug when runs Qwen3-Reranker-0.6B with LoRA. (#7156) yupeng 2026-03-15 17:55:42 +08:00
7daccf4b64 Perf(PP): support PP with async send/recv. (#7143) Qiu 2026-03-15 09:45:09 +08:00
ce5544bfc1 [Hybrid] support prefix cache for Qwen3.5/Next with --mamba-cache-mode align (#7103) Angazenn 2026-03-15 09:44:09 +08:00
c69291eefc [Doc] Add USE_MODELSCOPE_HUB=0 to lm-eval guide (#7279) bazingazhou233-hub 2026-03-14 22:41:02 +08:00
9e6c547d98 [Doc] Replace deprecated full_cuda_graph with cudagraph_mode in Qwen2.5-Omni (#7286) bazingazhou233-hub 2026-03-14 22:38:36 +08:00
bb506a1c99 [Doc][Installation] Clarify SOC_VERSION for CPU-only source builds (#7278) NJX 2026-03-14 22:38:25 +08:00
199df03524 [BugFix]Fix CI errors “ascend_transport.so: cannot open shared object file: No such file or directory” (#7242) DreamerLeader 2026-03-14 21:23:05 +08:00
e7aa2c285c [SpecDecode] Fix Draft model proposer (#7230) Mengqing Cao 2026-03-14 18:26:37 +08:00
0ad52517a1 Revert "Refactor quantization layer name mapping to leverage vLLM built-in mappers" (#7237) Hexiang Wang 2026-03-14 00:05:54 +08:00
5ec610e832 [Feature][Quant] Reapply auto-detect quantization format and support remote model ID (#7111) Cao Yi 2026-03-13 22:53:25 +08:00
6852a2e267 [feat] add LMCacheAscendConnector (#6882) Junyuan 2026-03-13 17:41:35 +08:00
986cd45397 [Version] Drop 0.16.0 support (#7153) Mengqing Cao 2026-03-13 16:14:15 +08:00
7ed9e9de69 [Perf][1/N] w8a8c8 support in dsv3.2/glm5 (#7029) rjg-lyh 2026-03-13 14:47:42 +08:00
df1ee8070d [feat][spec decode]Unified draft parallel (#6766) kx 2026-03-13 14:07:35 +08:00
6ee7ffb98a Add Qwen3_5 to model list (#7130) pppeng 2026-03-13 11:42:28 +08:00
c377e73933 Perf(PP): support PP with async scheduling. (#7136) Qiu 2026-03-13 10:27:23 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0