Commit Graph

  • 30c5d947c3 [bugfix]fix multistream moe in torchair (#3164) Wang Yixuan 2025-10-09 19:00:32 +08:00
  • 94dd832815 [MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176) weichen 2025-10-09 14:12:46 +08:00
  • a36e3da78e [Misc] Drop 0102 related lines (#3323) Li Wang 2025-10-09 14:10:57 +08:00
  • 1c5b302f0d [Misc] Clean up useless patch (#3320) wangxiyuan 2025-10-09 14:07:26 +08:00
  • a43e2f61e1 [CI] Update vLLM to v0.11.0 (#3315) wangxiyuan 2025-10-09 10:41:19 +08:00
  • f12f76d7ba Drop 0.10.2 (#3284) wangxiyuan 2025-10-09 10:28:38 +08:00
  • 2dde1268c7 Fix doc for A2 series and cleanup note (#3307) Yikun Jiang 2025-10-01 14:39:48 +08:00
  • 474fa737c8 [bugfix] Fix moe bug: allgather error. (#3279) weijinqian0 2025-09-30 18:45:09 +08:00
  • b8c58d68e1 [Doc] Add deepseek v3.2 tutorial (#3275) wangxiyuan 2025-09-30 17:54:31 +08:00
  • 4abdcdba4e upgrade pta to 0919 (#3295) wangxiyuan 2025-09-30 17:14:23 +08:00
  • 3a27b15ddc [bugfix] Fix Qwen3-30B-A3B dp parallel hung issue when running with the dp parallel example (#3287) leo-pony 2025-09-30 15:30:01 +08:00
  • a486ff8c11 KVCache Transfer via Layer-wise Strategy in Disaggregation (#2602) Chao Lei 2025-09-30 15:10:29 +08:00
  • f8c93d8d24 [Aclgraph][DP] Fix dp dummy run not in aclgraph error (#3208) Mengqing Cao 2025-09-30 11:14:51 +08:00
  • ddf4d53ca3 [bugfix] Fix bugs in _dumm_run and re-initialize kv-cache. (#3262) Angazenn 2025-09-30 10:54:14 +08:00
  • 00ba071022 [Doc] Release note for v0.11.0rc0 (#3224) wangxiyuan 2025-09-30 03:26:18 +08:00
  • 81bd6e4c99 Add DeepSeek V3.2 support (#3270) wangxiyuan 2025-09-30 03:25:58 +08:00
  • 5503a3142f Bump version to v0.11.0rc3 (#3213) Yikun Jiang 2025-09-29 21:48:06 +08:00
  • 83092d9b8b [BugFix] Fix Qwen3-Next because of vllm #24982 (#3221) Icey 2025-09-29 15:27:30 +08:00
  • c73dd8fecb [CI] Fix CI by addressing max_split_size_mb config (#3258) wangxiyuan 2025-09-29 14:05:12 +08:00
  • 69cc99d004 Add restriction conditions to the ApplyTopPTopK operator (#3254) LeeWenquan 2025-09-29 14:04:58 +08:00
  • 065486820b [Doc] add faqs:install vllm-ascend will overwrite existing torch-npu (#3245) weiguihua2 2025-09-29 12:02:23 +08:00
  • 373f84a193 [Bugfix] Fix the error "cur batch_size is invalid" during profile_run in the torchair scenario (#3243) 无脸男 2025-09-29 11:51:07 +08:00
  • 8870966031 [bugfix] Fix warning bug: model config is None. (#3238) weijinqian0 2025-09-29 09:44:49 +08:00
  • 15b8aff582 [CI] Add max_split_size_mb for e2e test to avoid oom (#3252) wangxiyuan 2025-09-29 09:13:08 +08:00
  • 050d202bb9 [Quickfix] Fix dp+ep+tp error when sp chunked the hidden_states (#3246) Mengqing Cao 2025-09-29 09:12:49 +08:00
  • cf445c41f9 [Doc]Add qwen3_vl series guide (#3227) Peipei 2025-09-28 21:35:52 +08:00
  • 14d4ed5f0c [BugFix] Fix aclgraph accu problem in A2. (#3163) whx 2025-09-28 21:31:55 +08:00
  • c3fee66806 [Model] Optimizing gemma3 model's GemmaRMSNorm function (#3151) socrahow 2025-09-28 21:19:10 +08:00
  • dd56e9306b [3/N][Refactor][Qwen3-Next] Refacotr model structure and fix bug by vllm #25400 (#3142) Icey 2025-09-28 21:14:36 +08:00
  • 4ff422c730 [CI][Bugfix] Quickfix for DPMetaData (#3234) Mengqing Cao 2025-09-28 21:11:22 +08:00
  • f2d8493221 [BugFix] Fix ascend scheduler assert error (#3191) fan2956 2025-09-28 18:22:08 +08:00
  • 68c5401ad6 [Eagle] Fix attn_mask index out of range in high concurrency situations (#3187) Icey 2025-09-28 18:09:26 +08:00
  • 1705501ae2 [BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204) lilinsiman 2025-09-28 17:44:04 +08:00
  • a86ece5e39 [Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153) Zetong Li 2025-09-28 17:30:50 +08:00
  • 3d21ed9ee8 [Bugfix]Fix quant_config input parameter bug in qwenvl series (#3220) Peipei 2025-09-28 14:08:24 +08:00
  • 96089b5155 Add vLLM 0.11.0 release hourly job (#3215) Yikun Jiang 2025-09-27 23:15:41 +08:00
  • 859e861d92 [main][quantization] Support deepseek w4a8 per-channel quantization (#3011) Wang Kunpeng 2025-09-27 21:01:16 +08:00
  • e9359bd8fa [CI] Pin vLLM to releases/v0.11.0 (#3211) wangxiyuan 2025-09-27 10:41:48 +08:00
  • 9caf6fbaf5 [Bugfix][LoRA] Fix LoRA bug after supporting Qwen3-Next (#3044) yupeng 2025-09-26 11:12:45 +08:00
  • 8406aafaff Add e2e test related to weight updates in RL scenarios. (#2954) XiaoxinWang 2025-09-26 11:07:10 +08:00
  • d8a9cb8458 [Bugfix] fix bug when tp=1 (#3193) realliujiaxu 2025-09-26 10:55:32 +08:00
  • b72e3327a6 bugfix for mtp>1 (#3174) zouyida2052 2025-09-26 09:04:16 +08:00
  • 69509bcdd6 [bugfix] fix oom in aclgraph (#3158) 无脸男 2025-09-26 08:57:47 +08:00
  • 621aa7d270 fix error async_scheduler can't be enabled (#3127) Ronald 2025-09-26 08:51:54 +08:00
  • 14497b748d Remove qwen3 moe MC2 cumsum & cast (#3126) florenceCH 2025-09-26 08:51:30 +08:00
  • 2930e4a6bd [CI] Upgrade vllm to newest commit (#3182) wangxiyuan 2025-09-26 06:18:15 +08:00
  • 0794f64a18 Revert "[Disagg][Perf] Use NPU event sync instead of blocking tolist (#3194) wangxiyuan 2025-09-26 06:17:36 +08:00
  • 31dda3f557 [Model]Add support for qwen3_vl and qwen3_vl_moe (#3103) Peipei 2025-09-25 18:50:12 +08:00
  • f7a3815bff [CI] Do not drop ready label when PR is merge conflict (#3173) wangxiyuan 2025-09-25 18:45:19 +08:00
  • 5d13bbe796 [BugFix]Modify eplb feature guide. (#3183) offline893 2025-09-25 17:01:51 +08:00
  • 07f4710216 [BugFix] Fix dummy_run memory explosion in eager mode (#3132) MengLong Chen 2025-09-25 16:09:44 +08:00
  • 72f64c10b7 [bugFix] Correct the vllm interface e2e test Base container image name (#3179) leo-pony 2025-09-25 16:03:09 +08:00
  • 2a9d02e080 [Bugfix] eagle and eagle3 spec decode failures and enable e2e test (#2979) Icey 2025-09-25 14:39:12 +08:00
  • ac1c2cd9ac [CI] Upgrade vllm version - 0925 (#3167) wangxiyuan 2025-09-25 14:20:10 +08:00
  • 33c118c80e [core]vllm-ascend support msMonitor tool (#3123) mfyCn-1204 2025-09-25 14:15:02 +08:00
  • c814b32b90 [Quant][GLM] Adapt glm quant. (#3147) whx 2025-09-25 11:13:29 +08:00
  • a055183821 [CI] Upgrade vLLM version (#3139) wangxiyuan 2025-09-25 07:36:51 +08:00
  • 464270e4ca Remove useless PD check in deepseek (#3161) liziyu 2025-09-24 23:25:47 +08:00
  • 4ee58e213b [BugFix] explicitly setting the tensor shape of otp output (#3027) zzhxxx 2025-09-24 18:44:15 +08:00
  • 360a736dfa Add OOT platform E2E test case to be run in the vllm buildkite pipeline (#3154) leo-pony 2025-09-24 17:55:58 +08:00
  • cd1ffbb6cd [1/N][Feat] Cut down memory usage for o_proj in DeepSeek (#2931) clrs97 2025-09-24 17:16:41 +08:00
  • 302494c1fe [EPLB] ut for EPLB (#3035) Clorist33 2025-09-24 17:14:38 +08:00
  • 80524f5711 [CORE] concurrent partial prefills (#2372) Csrayz 2025-09-24 17:12:55 +08:00
  • 2d885869c5 [KVCache][Bugfix] Fix kv cache initialization error of attention layer (#3113) Mengqing Cao 2025-09-24 11:32:34 +08:00
  • 6aa4253798 [Refactor] [SP]The sequence parallelism characteristics in the MoE and Dense models are integrated into a single solution. (#3085) weijinqian0 2025-09-24 11:29:59 +08:00
  • e7618d9414 [2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082) Icey 2025-09-24 11:25:42 +08:00
  • eb205d9f35 [P/D][BugFix]Mooncake timeout release bug fix (#2899) baxingpiaochong 2025-09-24 11:22:46 +08:00
  • 6995a7bc5b [Disagg][Perf] Use NPU event sync instead of blocking tolist to avoid unintentional copy ops blocking across different NPU streams, improving disagg TTIT/TTFT (#2788) Song Zhixin 2025-09-24 11:21:58 +08:00
  • c4b976af1a [Model][VLM][Patch]Modify ascend affinity _merge_multimodal_embeddings (#3071) Peipei 2025-09-24 10:25:28 +08:00
  • b1380f3b87 [Doc] modify the version compatibility between vllm and vllm-ascend (#3130) weiguihua2 2025-09-23 20:31:49 +08:00
  • d01fd1d1c3 [misc][torchair] fix bugs around deepseek mtp, enable_shared_expert_dp and use_cached_kv_cache_bytes (#3074) linfeng-yuan 2025-09-23 14:52:42 +08:00
  • 0f3939e5a9 [Feature]cpu offload connector (#1659) lidenghui1110 2025-09-23 14:25:05 +08:00
  • 96eb1ed408 [CI] Bump vLLM commit hash to 0923(f225ea7) (#3110) Li Wang 2025-09-23 14:13:25 +08:00
  • d586255678 fix wrong --num-gpus parameter requirements, and avoid ambiguity (#3116) Jianwei Mao 2025-09-23 11:58:44 +08:00
  • 39a85c49fa [Refactor] Rename cudagraph_support to aclgraph_support (#3104) Yizhou 2025-09-23 11:30:31 +08:00
  • d2399ab97b Fix VLLM_ASCEND_LLMDD_RPC_PORT renaming (#3108) wyu0-0 2025-09-23 10:33:04 +08:00
  • 29c173ab48 FlashLB algorithm (#3042) Mercykid-bash 2025-09-23 10:27:14 +08:00
  • 8dd53c8860 [Bugfix][PD] Auto-clear producer KV cache if no pull notification (#2174) hucong 2025-09-23 09:53:34 +08:00
  • 704467cd9a [Bugfix][LoRA] Fix bug introduced by upstream vllm#25249 (#3095) yupeng 2025-09-22 22:26:01 +08:00
  • 3fa7cf6345 [Refactor][Graph] Move graph parameter logic to acl_graph module (#3101) Yizhou 2025-09-22 22:23:14 +08:00
  • 02f89d166f [CI] Update vllm version to 20250922(5aeb925) (#3091) Li Wang 2025-09-22 22:18:13 +08:00
  • 1c9f0fe26f Fix of DeepSeek Error in KV Pool Mixed Deployment Scenario (#3087) fems14 2025-09-22 20:36:41 +08:00
  • 37a0715eda [Refactor] Adjustments to moe_comm_method selection process (#3001) weichen 2025-09-22 19:12:58 +08:00
  • bb1f0d5a62 [main] remove the redundant log prints in register_custom_ops.py (#3094) rjg-lyh 2025-09-22 17:17:31 +08:00
  • 338231acaf [Feat][Graph] Support FULL_DECODE_ONLY mode for GQA/MHA models (#2128) Yizhou 2025-09-22 17:14:28 +08:00
  • f39bd309b6 [Hybrid KV] Follow up UniformTypeKVCacheSpecs (#3070) Mengqing Cao 2025-09-22 15:02:41 +08:00
  • f1f2c8f5e5 [Perf] Add new npu_fused_infer_attention_score op to improve perfomance in splitfuse cases and resolve long-seq mask problems (#2962) tianyitang 2025-09-22 14:56:14 +08:00
  • c90a6d3658 [Test] Update the format of the accuracy report (#3081) zhangxinyuehfad 2025-09-22 14:10:03 +08:00
  • 37a0b3f25e Bump actions/labeler from 5 to 6 (#3086) dependabot[bot] 2025-09-22 14:07:37 +08:00
  • ffdd1a36e2 [bugfix][torchair] fix wasted NPU memory buffer allocation for quantized deepseek with unquantized MTP layer (#3068) linfeng-yuan 2025-09-22 14:06:43 +08:00
  • 14b39d3c70 [1/N][Refactor][Qwen3-Next] remove redundant Qwen3NextSparseMoeBlock and Qwen3NextAttention (#3019) Icey 2025-09-22 11:24:08 +08:00
  • 88d24cce8b [CI] Enable main based lint check and light ci matrix (#3079) wangxiyuan 2025-09-22 10:37:53 +08:00
  • 693f547ccf Refactor ci to reuse base workflow and re-enable ut coverage (#3064) Yikun Jiang 2025-09-21 13:27:08 +08:00
  • b8b68b3dfe [CI] Upgrade vLLM to 20250920 (c60e613) and address config break (#3067) Yikun Jiang 2025-09-21 09:49:17 +08:00
  • 12bcbd02bb [CI] Upgrade vLLM to 20250919 (6d8246aa) and fix some broken issue (#2907) Li Wang 2025-09-20 17:37:57 +08:00
  • 53ecd89e8f [Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE (#2969) Lucas Kabela 2025-09-19 17:22:30 -07:00
  • e26fe1caf1 [TEST] Speed up DS V2 accuracy test and turn up accuracy baseline (#3047) zhangxinyuehfad 2025-09-20 00:40:33 +08:00
  • a22b532d38 [Fixbug] Fix shape not match when sliding_window and dynamic batch_size (#2830) zhangxinyuehfad 2025-09-19 22:35:14 +08:00
  • cf549b976d [Test]Add unit test for compilation/acl_graph.py (#3039) zhanghw0354 2025-09-19 21:31:17 +08:00
  • 0942d9aaab [3/N][Refactor][Quantization]remove packed_modules_mapping from models (#3021) 22dimensions 2025-09-19 20:50:14 +08:00