xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

1756efa5fd [Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125) panchao-hub 2025-10-10 16:31:20 +08:00
ba19dd3183 Revert PTA upgrade PR (#3352) wangxiyuan 2025-10-10 14:09:53 +08:00
601a37aeff [Fixbug] Fix accuarcy template (#3088) zhangxinyuehfad 2025-10-10 09:03:21 +08:00
6ae75933da [Feat] Load balance of tokens across experts in dummy_run (#3184) MengLong Chen 2025-10-10 09:00:07 +08:00
60b7c936c5 [Doc] Update deepseek-v3.2 doc (#3319) Li Wang 2025-10-10 08:55:39 +08:00
579b7e5f21 add pagedattention to support FULL_DECODE_ONLY. (#3102) XiaoxinWang 2025-10-10 08:50:33 +08:00
1c2c72af8d [bugfix]change log2phy map to npu (#3339) offline893 2025-10-10 08:47:55 +08:00
55e23fabec 【bugfix】fix connector register failed (#3335) fems14 2025-10-09 21:09:54 +08:00
ff37575936 [1/N][Feat] Add weight prefetch feature for Attention layers (#3146) Ruri 2025-10-09 20:38:39 +08:00
23db56a340 [Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with norm bias (#3205) huangdong2022 2025-10-09 20:18:10 +08:00
81aff9c555 bugfix for mtp (#3300) zouyida2052 2025-10-09 19:22:46 +08:00
30c5d947c3 [bugfix]fix multistream moe in torchair (#3164) Wang Yixuan 2025-10-09 19:00:32 +08:00
94dd832815 [MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176) weichen 2025-10-09 14:12:46 +08:00
a36e3da78e [Misc] Drop 0102 related lines (#3323) Li Wang 2025-10-09 14:10:57 +08:00
1c5b302f0d [Misc] Clean up useless patch (#3320) wangxiyuan 2025-10-09 14:07:26 +08:00
a43e2f61e1 [CI] Update vLLM to v0.11.0 (#3315) wangxiyuan 2025-10-09 10:41:19 +08:00
f12f76d7ba Drop 0.10.2 (#3284) wangxiyuan 2025-10-09 10:28:38 +08:00
2dde1268c7 Fix doc for A2 series and cleanup note (#3307) Yikun Jiang 2025-10-01 14:39:48 +08:00
474fa737c8 [bugfix] Fix moe bug: allgather error. (#3279) weijinqian0 2025-09-30 18:45:09 +08:00
b8c58d68e1 [Doc] Add deepseek v3.2 tutorial (#3275) wangxiyuan 2025-09-30 17:54:31 +08:00
4abdcdba4e upgrade pta to 0919 (#3295) wangxiyuan 2025-09-30 17:14:23 +08:00
3a27b15ddc [bugfix] Fix Qwen3-30B-A3B dp parallel hung issue when running with the dp parallel example (#3287) leo-pony 2025-09-30 15:30:01 +08:00
a486ff8c11 KVCache Transfer via Layer-wise Strategy in Disaggregation (#2602) Chao Lei 2025-09-30 15:10:29 +08:00
f8c93d8d24 [Aclgraph][DP] Fix dp dummy run not in aclgraph error (#3208) Mengqing Cao 2025-09-30 11:14:51 +08:00
ddf4d53ca3 [bugfix] Fix bugs in _dumm_run and re-initialize kv-cache. (#3262) Angazenn 2025-09-30 10:54:14 +08:00
00ba071022 [Doc] Release note for v0.11.0rc0 (#3224) wangxiyuan 2025-09-30 03:26:18 +08:00
81bd6e4c99 Add DeepSeek V3.2 support (#3270) wangxiyuan 2025-09-30 03:25:58 +08:00
5503a3142f Bump version to v0.11.0rc3 (#3213) Yikun Jiang 2025-09-29 21:48:06 +08:00
83092d9b8b [BugFix] Fix Qwen3-Next because of vllm #24982 (#3221) Icey 2025-09-29 15:27:30 +08:00
c73dd8fecb [CI] Fix CI by addressing max_split_size_mb config (#3258) wangxiyuan 2025-09-29 14:05:12 +08:00
69cc99d004 Add restriction conditions to the ApplyTopPTopK operator (#3254) LeeWenquan 2025-09-29 14:04:58 +08:00
065486820b [Doc] add faqs:install vllm-ascend will overwrite existing torch-npu (#3245) weiguihua2 2025-09-29 12:02:23 +08:00
373f84a193 [Bugfix] Fix the error "cur batch_size is invalid" during profile_run in the torchair scenario (#3243) 无脸男 2025-09-29 11:51:07 +08:00
8870966031 [bugfix] Fix warning bug: model config is None. (#3238) weijinqian0 2025-09-29 09:44:49 +08:00
15b8aff582 [CI] Add max_split_size_mb for e2e test to avoid oom (#3252) wangxiyuan 2025-09-29 09:13:08 +08:00
050d202bb9 [Quickfix] Fix dp+ep+tp error when sp chunked the hidden_states (#3246) Mengqing Cao 2025-09-29 09:12:49 +08:00
cf445c41f9 [Doc]Add qwen3_vl series guide (#3227) Peipei 2025-09-28 21:35:52 +08:00
14d4ed5f0c [BugFix] Fix aclgraph accu problem in A2. (#3163) whx 2025-09-28 21:31:55 +08:00
c3fee66806 [Model] Optimizing gemma3 model's GemmaRMSNorm function (#3151) socrahow 2025-09-28 21:19:10 +08:00
dd56e9306b [3/N][Refactor][Qwen3-Next] Refacotr model structure and fix bug by vllm #25400 (#3142) Icey 2025-09-28 21:14:36 +08:00
4ff422c730 [CI][Bugfix] Quickfix for DPMetaData (#3234) Mengqing Cao 2025-09-28 21:11:22 +08:00
f2d8493221 [BugFix] Fix ascend scheduler assert error (#3191) fan2956 2025-09-28 18:22:08 +08:00
68c5401ad6 [Eagle] Fix attn_mask index out of range in high concurrency situations (#3187) Icey 2025-09-28 18:09:26 +08:00
1705501ae2 [BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204) lilinsiman 2025-09-28 17:44:04 +08:00
a86ece5e39 [Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153) Zetong Li 2025-09-28 17:30:50 +08:00
3d21ed9ee8 [Bugfix]Fix quant_config input parameter bug in qwenvl series (#3220) Peipei 2025-09-28 14:08:24 +08:00
96089b5155 Add vLLM 0.11.0 release hourly job (#3215) Yikun Jiang 2025-09-27 23:15:41 +08:00
859e861d92 [main][quantization] Support deepseek w4a8 per-channel quantization (#3011) Wang Kunpeng 2025-09-27 21:01:16 +08:00
e9359bd8fa [CI] Pin vLLM to releases/v0.11.0 (#3211) wangxiyuan 2025-09-27 10:41:48 +08:00
9caf6fbaf5 [Bugfix][LoRA] Fix LoRA bug after supporting Qwen3-Next (#3044) yupeng 2025-09-26 11:12:45 +08:00
8406aafaff Add e2e test related to weight updates in RL scenarios. (#2954) XiaoxinWang 2025-09-26 11:07:10 +08:00
d8a9cb8458 [Bugfix] fix bug when tp=1 (#3193) realliujiaxu 2025-09-26 10:55:32 +08:00
b72e3327a6 bugfix for mtp>1 (#3174) zouyida2052 2025-09-26 09:04:16 +08:00
69509bcdd6 [bugfix] fix oom in aclgraph (#3158) 无脸男 2025-09-26 08:57:47 +08:00
621aa7d270 fix error async_scheduler can't be enabled (#3127) Ronald 2025-09-26 08:51:54 +08:00
14497b748d Remove qwen3 moe MC2 cumsum & cast (#3126) florenceCH 2025-09-26 08:51:30 +08:00
2930e4a6bd [CI] Upgrade vllm to newest commit (#3182) wangxiyuan 2025-09-26 06:18:15 +08:00
0794f64a18 Revert "[Disagg][Perf] Use NPU event sync instead of blocking tolist (#3194) wangxiyuan 2025-09-26 06:17:36 +08:00
31dda3f557 [Model]Add support for qwen3_vl and qwen3_vl_moe (#3103) Peipei 2025-09-25 18:50:12 +08:00
f7a3815bff [CI] Do not drop ready label when PR is merge conflict (#3173) wangxiyuan 2025-09-25 18:45:19 +08:00
5d13bbe796 [BugFix]Modify eplb feature guide. (#3183) offline893 2025-09-25 17:01:51 +08:00
07f4710216 [BugFix] Fix dummy_run memory explosion in eager mode (#3132) MengLong Chen 2025-09-25 16:09:44 +08:00
72f64c10b7 [bugFix] Correct the vllm interface e2e test Base container image name (#3179) leo-pony 2025-09-25 16:03:09 +08:00
2a9d02e080 [Bugfix] eagle and eagle3 spec decode failures and enable e2e test (#2979) Icey 2025-09-25 14:39:12 +08:00
ac1c2cd9ac [CI] Upgrade vllm version - 0925 (#3167) wangxiyuan 2025-09-25 14:20:10 +08:00
33c118c80e [core]vllm-ascend support msMonitor tool (#3123) mfyCn-1204 2025-09-25 14:15:02 +08:00
c814b32b90 [Quant][GLM] Adapt glm quant. (#3147) whx 2025-09-25 11:13:29 +08:00
a055183821 [CI] Upgrade vLLM version (#3139) wangxiyuan 2025-09-25 07:36:51 +08:00
464270e4ca Remove useless PD check in deepseek (#3161) liziyu 2025-09-24 23:25:47 +08:00
4ee58e213b [BugFix] explicitly setting the tensor shape of otp output (#3027) zzhxxx 2025-09-24 18:44:15 +08:00
360a736dfa Add OOT platform E2E test case to be run in the vllm buildkite pipeline (#3154) leo-pony 2025-09-24 17:55:58 +08:00
cd1ffbb6cd [1/N][Feat] Cut down memory usage for o_proj in DeepSeek (#2931) clrs97 2025-09-24 17:16:41 +08:00
302494c1fe [EPLB] ut for EPLB (#3035) Clorist33 2025-09-24 17:14:38 +08:00
80524f5711 [CORE] concurrent partial prefills (#2372) Csrayz 2025-09-24 17:12:55 +08:00
2d885869c5 [KVCache][Bugfix] Fix kv cache initialization error of attention layer (#3113) Mengqing Cao 2025-09-24 11:32:34 +08:00
6aa4253798 [Refactor] [SP]The sequence parallelism characteristics in the MoE and Dense models are integrated into a single solution. (#3085) weijinqian0 2025-09-24 11:29:59 +08:00
e7618d9414 [2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082) Icey 2025-09-24 11:25:42 +08:00
eb205d9f35 [P/D][BugFix]Mooncake timeout release bug fix (#2899) baxingpiaochong 2025-09-24 11:22:46 +08:00
6995a7bc5b [Disagg][Perf] Use NPU event sync instead of blocking tolist to avoid unintentional copy ops blocking across different NPU streams, improving disagg TTIT/TTFT (#2788) Song Zhixin 2025-09-24 11:21:58 +08:00
c4b976af1a [Model][VLM][Patch]Modify ascend affinity _merge_multimodal_embeddings (#3071) Peipei 2025-09-24 10:25:28 +08:00
b1380f3b87 [Doc] modify the version compatibility between vllm and vllm-ascend (#3130) weiguihua2 2025-09-23 20:31:49 +08:00
d01fd1d1c3 [misc][torchair] fix bugs around deepseek mtp, enable_shared_expert_dp and use_cached_kv_cache_bytes (#3074) linfeng-yuan 2025-09-23 14:52:42 +08:00
0f3939e5a9 [Feature]cpu offload connector (#1659) lidenghui1110 2025-09-23 14:25:05 +08:00
96eb1ed408 [CI] Bump vLLM commit hash to 0923(f225ea7) (#3110) Li Wang 2025-09-23 14:13:25 +08:00
d586255678 fix wrong --num-gpus parameter requirements, and avoid ambiguity (#3116) Jianwei Mao 2025-09-23 11:58:44 +08:00
39a85c49fa [Refactor] Rename cudagraph_support to aclgraph_support (#3104) Yizhou 2025-09-23 11:30:31 +08:00
d2399ab97b Fix VLLM_ASCEND_LLMDD_RPC_PORT renaming (#3108) wyu0-0 2025-09-23 10:33:04 +08:00
29c173ab48 FlashLB algorithm (#3042) Mercykid-bash 2025-09-23 10:27:14 +08:00
8dd53c8860 [Bugfix][PD] Auto-clear producer KV cache if no pull notification (#2174) hucong 2025-09-23 09:53:34 +08:00
704467cd9a [Bugfix][LoRA] Fix bug introduced by upstream vllm#25249 (#3095) yupeng 2025-09-22 22:26:01 +08:00
3fa7cf6345 [Refactor][Graph] Move graph parameter logic to acl_graph module (#3101) Yizhou 2025-09-22 22:23:14 +08:00
02f89d166f [CI] Update vllm version to 20250922(5aeb925) (#3091) Li Wang 2025-09-22 22:18:13 +08:00
1c9f0fe26f Fix of DeepSeek Error in KV Pool Mixed Deployment Scenario (#3087) fems14 2025-09-22 20:36:41 +08:00
37a0715eda [Refactor] Adjustments to moe_comm_method selection process (#3001) weichen 2025-09-22 19:12:58 +08:00
bb1f0d5a62 [main] remove the redundant log prints in register_custom_ops.py (#3094) rjg-lyh 2025-09-22 17:17:31 +08:00
338231acaf [Feat][Graph] Support FULL_DECODE_ONLY mode for GQA/MHA models (#2128) Yizhou 2025-09-22 17:14:28 +08:00
f39bd309b6 [Hybrid KV] Follow up UniformTypeKVCacheSpecs (#3070) Mengqing Cao 2025-09-22 15:02:41 +08:00
f1f2c8f5e5 [Perf] Add new npu_fused_infer_attention_score op to improve perfomance in splitfuse cases and resolve long-seq mask problems (#2962) tianyitang 2025-09-22 14:56:14 +08:00
c90a6d3658 [Test] Update the format of the accuracy report (#3081) zhangxinyuehfad 2025-09-22 14:10:03 +08:00
37a0b3f25e Bump actions/labeler from 5 to 6 (#3086) dependabot[bot] 2025-09-22 14:07:37 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0