Commit Graph

  • 099255e933 [bugfix] fix pipeline parallel for mla & sfa attention backend (#3459) linfeng-yuan 2025-10-15 17:13:27 +08:00
  • 5a3082cd15 [EPLB]Record expert map without dynamic eplb. (#3409) offline893 2025-10-15 14:21:15 +08:00
  • 4f937f561d [MoE] [Refactor] Remove manual memory cleanup (#3365) weichen 2025-10-15 12:36:24 +08:00
  • 4e720936d8 Fix warning msg print (#3421) LeeWenquan 2025-10-15 11:30:30 +08:00
  • 16cb3cc45d adapt the mla_v1 with the mla_preprocess kernel (#3397) Chen Chen 2025-10-15 10:34:25 +08:00
  • 15b2e5c995 Remove unused row_idx in token_dispatcher (#3442) CaranLic 2025-10-15 09:08:31 +08:00
  • 3642b64afc bugfix for mtp with multistream_moe (#3419) zouyida2052 2025-10-15 08:59:58 +08:00
  • c2c1db78a7 [Bugfix] fix ZeroDivisionError when prefill_tp_size > num_kv_head and fix tp_resharding README (#3437) zxr2333 2025-10-15 08:45:44 +08:00
  • 02c26dcfc7 [Feat] Supports Aclgraph for bge-m3 (#3171) xuyexiong 2025-10-14 23:07:45 +08:00
  • 434059e417 [BugFix] Fix multimodal model support fullgraph error (#3425) fan2956 2025-10-14 21:51:09 +08:00
  • 223cc34085 [KVCache] Refactor KVCache as page_size_bytes is ineffective (#3438) Mengqing Cao 2025-10-14 21:28:41 +08:00
  • c55d99d13e [bugfix][torchair] fix missing weight nz cast for w13_weight in torchair_w8a8_dynamic.py (#3446) linfeng-yuan 2025-10-14 21:11:05 +08:00
  • 5fe883fa43 fix the title of modelrunner's prepare inputs docs (#3457) TaoYu Chen 2025-10-14 20:35:58 +08:00
  • 78777237a9 [2/N][Feat] Attention and MoE weight prefetch in Qwen3MoE models (#3203) yuzhup 2025-10-14 20:16:33 +08:00
  • 07e39620ea [Feat] Unquantized Linear to nz and control all nz-cast (#3356) anon189Ty 2025-10-14 17:39:26 +08:00
  • 5c45c227dc [BugFix] fix qwen2.5vl quant bug (#3426) elilzhu 2025-10-14 17:31:26 +08:00
  • ee25a517d1 [BugFix] Fix the port conflict bug of running external dp with disaggregated-prefill. (#3416) whx 2025-10-14 16:37:10 +08:00
  • 9eb62935b8 fix pagedattention to support fullgraph. (#3436) XiaoxinWang 2025-10-14 16:10:09 +08:00
  • 22a1d91cf5 [CI] Add single request test case for aclgraph (#3392) lilinsiman 2025-10-14 11:13:44 +08:00
  • 4536123341 [Fix] Fix mc2_tokens_capacity-related issues (#3411) Yizhou 2025-10-14 10:56:12 +08:00
  • 19b85ef1bc [Bugfix] multi_node_pd_disaggregation_mooncake.md update (#3400) wangxiaoteng888 2025-10-14 09:29:35 +08:00
  • 49b850270f [Community] Nominate new maintainers: @yiz-liu @paulyu12 @weijinqian0 @nalinaly (#3406) wangxiyuan 2025-10-14 08:51:58 +08:00
  • 657c08cfb2 [UT] fix skipped test_utils ut test. (#3422) menogrey 2025-10-14 08:31:13 +08:00
  • 4f6d60eb06 [Feature] Add W4A4 Flat Quantization support (#3427) Slightwind 2025-10-13 23:20:16 +08:00
  • 6972df5951 [Feature] optimize sp & qwen3 next support sp. (#3225) weijinqian0 2025-10-13 23:02:12 +08:00
  • 31682961af [Feat] enable hierarchical communication for mc2 ops on A2 (#3015) realliujiaxu 2025-10-13 16:13:17 +08:00
  • 0563106477 [Feature] mooncake connector support GQA transport (#2947) lidenghui1110 2025-10-13 15:48:37 +08:00
  • 847d12a389 [BugFix]Fix moe load problems in torchair when using dynamic eplb (#3381) dsxsteven 2025-10-13 11:38:57 +08:00
  • cd69385dab Add models test and add serval new models yaml (#3394) Yikun Jiang 2025-10-12 17:27:50 +08:00
  • d05d29ff0e Enable nightly test and add qwen3 32b test case (#3370) jiangyunfan1 2025-10-12 15:46:28 +08:00
  • 0d59a3c317 [CI] Make the test_pipeline_parallel run normally in full test (#3391) leo-pony 2025-10-12 15:43:13 +08:00
  • bcc313e8f2 add mla_preprocess kernel (#3226) Chen Chen 2025-10-12 07:39:45 +08:00
  • 1b1207e3c3 [Bugfix] Add quantization param for multi-node CI (#3383) Li Wang 2025-10-11 19:25:16 +08:00
  • e8c871ed0a [Test] enable external launcher and add e2e test for sleep mode in level2 (#3344) huangxialu 2025-10-11 17:29:38 +08:00
  • ecb1713dfc Bugfix: Expose the user policy type interface (#3336) Mercykid-bash 2025-10-11 16:28:57 +08:00
  • e4acb2dfc7 [feat] support customized and separated hccl_buffer_size for process group initialization (#3073) linfeng-yuan 2025-10-11 15:55:22 +08:00
  • 9eb103607f [1/N][CI] Add multi node test (#3359) Li Wang 2025-10-11 14:50:46 +08:00
  • 82b6c846ca [BugFix]Fix eplb problems when using dynamic eplb. (#3364) offline893 2025-10-11 14:04:02 +08:00
  • ca05f7d632 [Bugfix] TP size larger than KV cache head causes accuracy issues (#3366) wangxiaoteng888 2025-10-11 11:22:23 +08:00
  • ace300a549 [Bugfix] Fix the abnormal NPU memory usage in full graph mode. (#3331) 无脸男 2025-10-11 10:20:10 +08:00
  • 866f5e7283 [Bugfix] Fix weight prefetching AssertionError in W8A8 MTP scene (#3361) Ruri 2025-10-11 09:24:02 +08:00
  • 8c1a4dedf3 [Bugfix]modify the enable range of _merge_multimodal_embeddings patch (#3360) Peipei 2025-10-11 08:37:07 +08:00
  • 27e0f2c035 [Perf]Add YaRN custom op (#3355) Angazenn 2025-10-11 08:36:20 +08:00
  • ee0a95e47f bugfix for mtp when running torchair in a2 (#3354) zouyida2052 2025-10-10 23:07:24 +08:00
  • 90e00deaa9 [Bugfix] Optimized exception throwing when stream captures exception (#3322) lilinsiman 2025-10-10 17:09:28 +08:00
  • 1756efa5fd [Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125) panchao-hub 2025-10-10 16:31:20 +08:00
  • ba19dd3183 Revert PTA upgrade PR (#3352) wangxiyuan 2025-10-10 14:09:53 +08:00
  • 601a37aeff [Fixbug] Fix accuarcy template (#3088) zhangxinyuehfad 2025-10-10 09:03:21 +08:00
  • 6ae75933da [Feat] Load balance of tokens across experts in dummy_run (#3184) MengLong Chen 2025-10-10 09:00:07 +08:00
  • 60b7c936c5 [Doc] Update deepseek-v3.2 doc (#3319) Li Wang 2025-10-10 08:55:39 +08:00
  • 579b7e5f21 add pagedattention to support FULL_DECODE_ONLY. (#3102) XiaoxinWang 2025-10-10 08:50:33 +08:00
  • 1c2c72af8d [bugfix]change log2phy map to npu (#3339) offline893 2025-10-10 08:47:55 +08:00
  • 55e23fabec 【bugfix】fix connector register failed (#3335) fems14 2025-10-09 21:09:54 +08:00
  • ff37575936 [1/N][Feat] Add weight prefetch feature for Attention layers (#3146) Ruri 2025-10-09 20:38:39 +08:00
  • 23db56a340 [Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with norm bias (#3205) huangdong2022 2025-10-09 20:18:10 +08:00
  • 81aff9c555 bugfix for mtp (#3300) zouyida2052 2025-10-09 19:22:46 +08:00
  • 30c5d947c3 [bugfix]fix multistream moe in torchair (#3164) Wang Yixuan 2025-10-09 19:00:32 +08:00
  • 94dd832815 [MoE] [Refactor] Combine common_fused_moe and fused_moe (#3176) weichen 2025-10-09 14:12:46 +08:00
  • a36e3da78e [Misc] Drop 0102 related lines (#3323) Li Wang 2025-10-09 14:10:57 +08:00
  • 1c5b302f0d [Misc] Clean up useless patch (#3320) wangxiyuan 2025-10-09 14:07:26 +08:00
  • a43e2f61e1 [CI] Update vLLM to v0.11.0 (#3315) wangxiyuan 2025-10-09 10:41:19 +08:00
  • f12f76d7ba Drop 0.10.2 (#3284) wangxiyuan 2025-10-09 10:28:38 +08:00
  • 2dde1268c7 Fix doc for A2 series and cleanup note (#3307) Yikun Jiang 2025-10-01 14:39:48 +08:00
  • 474fa737c8 [bugfix] Fix moe bug: allgather error. (#3279) weijinqian0 2025-09-30 18:45:09 +08:00
  • b8c58d68e1 [Doc] Add deepseek v3.2 tutorial (#3275) wangxiyuan 2025-09-30 17:54:31 +08:00
  • 4abdcdba4e upgrade pta to 0919 (#3295) wangxiyuan 2025-09-30 17:14:23 +08:00
  • 3a27b15ddc [bugfix] Fix Qwen3-30B-A3B dp parallel hung issue when running with the dp parallel example (#3287) leo-pony 2025-09-30 15:30:01 +08:00
  • a486ff8c11 KVCache Transfer via Layer-wise Strategy in Disaggregation (#2602) Chao Lei 2025-09-30 15:10:29 +08:00
  • f8c93d8d24 [Aclgraph][DP] Fix dp dummy run not in aclgraph error (#3208) Mengqing Cao 2025-09-30 11:14:51 +08:00
  • ddf4d53ca3 [bugfix] Fix bugs in _dumm_run and re-initialize kv-cache. (#3262) Angazenn 2025-09-30 10:54:14 +08:00
  • 00ba071022 [Doc] Release note for v0.11.0rc0 (#3224) wangxiyuan 2025-09-30 03:26:18 +08:00
  • 81bd6e4c99 Add DeepSeek V3.2 support (#3270) wangxiyuan 2025-09-30 03:25:58 +08:00
  • 5503a3142f Bump version to v0.11.0rc3 (#3213) Yikun Jiang 2025-09-29 21:48:06 +08:00
  • 83092d9b8b [BugFix] Fix Qwen3-Next because of vllm #24982 (#3221) Icey 2025-09-29 15:27:30 +08:00
  • c73dd8fecb [CI] Fix CI by addressing max_split_size_mb config (#3258) wangxiyuan 2025-09-29 14:05:12 +08:00
  • 69cc99d004 Add restriction conditions to the ApplyTopPTopK operator (#3254) LeeWenquan 2025-09-29 14:04:58 +08:00
  • 065486820b [Doc] add faqs:install vllm-ascend will overwrite existing torch-npu (#3245) weiguihua2 2025-09-29 12:02:23 +08:00
  • 373f84a193 [Bugfix] Fix the error "cur batch_size is invalid" during profile_run in the torchair scenario (#3243) 无脸男 2025-09-29 11:51:07 +08:00
  • 8870966031 [bugfix] Fix warning bug: model config is None. (#3238) weijinqian0 2025-09-29 09:44:49 +08:00
  • 15b8aff582 [CI] Add max_split_size_mb for e2e test to avoid oom (#3252) wangxiyuan 2025-09-29 09:13:08 +08:00
  • 050d202bb9 [Quickfix] Fix dp+ep+tp error when sp chunked the hidden_states (#3246) Mengqing Cao 2025-09-29 09:12:49 +08:00
  • cf445c41f9 [Doc]Add qwen3_vl series guide (#3227) Peipei 2025-09-28 21:35:52 +08:00
  • 14d4ed5f0c [BugFix] Fix aclgraph accu problem in A2. (#3163) whx 2025-09-28 21:31:55 +08:00
  • c3fee66806 [Model] Optimizing gemma3 model's GemmaRMSNorm function (#3151) socrahow 2025-09-28 21:19:10 +08:00
  • dd56e9306b [3/N][Refactor][Qwen3-Next] Refacotr model structure and fix bug by vllm #25400 (#3142) Icey 2025-09-28 21:14:36 +08:00
  • 4ff422c730 [CI][Bugfix] Quickfix for DPMetaData (#3234) Mengqing Cao 2025-09-28 21:11:22 +08:00
  • f2d8493221 [BugFix] Fix ascend scheduler assert error (#3191) fan2956 2025-09-28 18:22:08 +08:00
  • 68c5401ad6 [Eagle] Fix attn_mask index out of range in high concurrency situations (#3187) Icey 2025-09-28 18:09:26 +08:00
  • 1705501ae2 [BugFix] Fix ACLgraph bug in Qwen3_32b_int8 case (#3204) lilinsiman 2025-09-28 17:44:04 +08:00
  • a86ece5e39 [Bugfix][LoRA] Fix forward error and shape mismatch when using LoRA (#3153) Zetong Li 2025-09-28 17:30:50 +08:00
  • 3d21ed9ee8 [Bugfix]Fix quant_config input parameter bug in qwenvl series (#3220) Peipei 2025-09-28 14:08:24 +08:00
  • 96089b5155 Add vLLM 0.11.0 release hourly job (#3215) Yikun Jiang 2025-09-27 23:15:41 +08:00
  • 859e861d92 [main][quantization] Support deepseek w4a8 per-channel quantization (#3011) Wang Kunpeng 2025-09-27 21:01:16 +08:00
  • e9359bd8fa [CI] Pin vLLM to releases/v0.11.0 (#3211) wangxiyuan 2025-09-27 10:41:48 +08:00
  • 9caf6fbaf5 [Bugfix][LoRA] Fix LoRA bug after supporting Qwen3-Next (#3044) yupeng 2025-09-26 11:12:45 +08:00
  • 8406aafaff Add e2e test related to weight updates in RL scenarios. (#2954) XiaoxinWang 2025-09-26 11:07:10 +08:00
  • d8a9cb8458 [Bugfix] fix bug when tp=1 (#3193) realliujiaxu 2025-09-26 10:55:32 +08:00
  • b72e3327a6 bugfix for mtp>1 (#3174) zouyida2052 2025-09-26 09:04:16 +08:00
  • 69509bcdd6 [bugfix] fix oom in aclgraph (#3158) 无脸男 2025-09-26 08:57:47 +08:00
  • 621aa7d270 fix error async_scheduler can't be enabled (#3127) Ronald 2025-09-26 08:51:54 +08:00