Commit Graph

  • 274b708e0c [Fix] Refactor dummy attention metadata creation (#3497) Yizhou 2025-10-21 00:00:42 +08:00
  • 6b6857929d [Doc] Add --shm-size option to Docker command for qwen3 vl 235B (#3519) likeful 2025-10-20 23:37:35 +08:00
  • 0bf3f21a98 Revert "Add mrope op fusion (#3509)" (#3562) wangxiyuan 2025-10-20 20:19:24 +08:00
  • 068ed706c8 [feat][torchair] support super kernel feat for quantized dsr1 (#3485) linfeng-yuan 2025-10-20 20:04:37 +08:00
  • 70bef33f13 add new accuracy test case for aclgraph (#3390) lilinsiman 2025-10-20 20:04:04 +08:00
  • b9e2896eb1 Revert "[Perf] Add FIA interface in FA case" (#3553) ZYang6263 2025-10-20 19:56:10 +08:00
  • 34c2996ab8 [main] v_proj combining transpose and matmul (#3545) Zhu Yi Lin 2025-10-20 19:53:32 +08:00
  • e04a5e3dd3 [Bugfix] Fix race condition in d2h transfer (#3372) Jade Zheng 2025-10-20 18:24:21 +08:00
  • fdac146f71 [UT] fix skip ut test and enable ut test run normally (#3410) zhangxinyuehfad 2025-10-20 16:30:57 +08:00
  • f8b52fe950 [Model][1/N] Delete deepseek v2/v3 modeling codes. (#3189) whx 2025-10-20 15:31:34 +08:00
  • 918ded9155 [BugFix][HybridKV] Update the check logic of reinitializing inputbatch (#3540) Mengqing Cao 2025-10-20 15:29:48 +08:00
  • daa4dd0a57 [DeepSeek] Seperate deepseek v3.2 modeling form deepseek v2 (#3531) Mengqing Cao 2025-10-20 09:50:44 +08:00
  • 6c65dd891f [ModelRunner][Qwen3-Next] Fix attn_group initialization timing (#3477) Mengqing Cao 2025-10-20 09:39:40 +08:00
  • 9e59fc1510 [TEST] Add initial aisbench support and Qwen3 32B acc/perf test (#3474) jiangyunfan1 2025-10-20 09:33:17 +08:00
  • 58a37ce189 bugfix for mooncake (#3535) zouyida2052 2025-10-19 17:06:05 +08:00
  • 1e78ecbad6 [Perf] Add FIA interface in FA case (#3321) ZYang6263 2025-10-19 12:45:33 +08:00
  • 4b3bd4f397 [main][bugfix] bugfix for minicpm models (#3527) Wang Kunpeng 2025-10-19 11:00:55 +08:00
  • 6c9909c861 [Patch]patch of v1 executor when enable eplb. (#3511) offline893 2025-10-19 10:54:26 +08:00
  • 646c1db5d7 Add mrope op fusion (#3509) shaopeng-666 2025-10-18 18:08:24 +08:00
  • 0777e2f899 Optimize torchair kv_consumer padding logic (#3526) xuyexiong 2025-10-18 16:42:17 +08:00
  • b4233a2ec3 [Bugfix] Route requests requiring KVC recomputation from the decode instance to the P instance (#3448) Shirley125 2025-10-18 15:56:44 +08:00
  • 4750d45d86 [BugFix]Support redundant experts in EPLB (#3473) yechao237 2025-10-18 00:09:16 +08:00
  • 07ca1b9b78 [Refactor] Clean up w4a4_flatquant_dynamic implementation (#3440) Slightwind 2025-10-17 23:53:19 +08:00
  • 21769e8f44 [BUGFIX] Mtp torchair pd fix (#3506) xuyexiong 2025-10-17 21:57:05 +08:00
  • 9547d6f0d9 [Core]Append padding logic for Attention (#3256) Angazenn 2025-10-17 21:56:01 +08:00
  • b154a8e22c [Bugfix] fix logging and d2h bug for flash comm1 (#3505) realliujiaxu 2025-10-17 21:13:41 +08:00
  • 248ee7fa11 [Feat]Make full graph mode compalible with MTP (#3276) anon189Ty 2025-10-17 20:19:56 +08:00
  • 46e62efd44 [Feat]mtp aclgraph support (#3244) anon189Ty 2025-10-17 18:14:49 +08:00
  • 1b424fb7f1 ACLgraph enable: Test cases revisions for all features (#3388) lilinsiman 2025-10-17 17:15:19 +08:00
  • bf87606932 [Feat] Shared expert dp for deepseek and deepseek_mtp (#3495) zhaozx-cn 2025-10-17 15:06:37 +08:00
  • d9ee491f70 [BugFix]Move to_list in foward_v1 with FIA earlier to build (#3185) Angazenn 2025-10-17 11:19:41 +08:00
  • 30e3d86b0f Revert "[BUGFIX] Mtp torchair pd fix (#3449)" (#3500) xuyexiong 2025-10-17 09:42:48 +08:00
  • 3a53bbc508 [Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with bias, resolve conflict with weight prefetch (#3465) huangdong2022 2025-10-17 09:30:51 +08:00
  • 4c4a8458a5 [CI] Refator multi-node CI (#3487) Li Wang 2025-10-17 09:04:31 +08:00
  • ccb6fb9ec1 [Fix] Clears unused slot mappings and fix accuracy issue with MLA models when enabling FULL_DECODE_ONLY (#3482) Yizhou 2025-10-16 19:43:09 +08:00
  • f9535cc9e2 [BugFix] fix qwenVL quant assertion error (#3466) elilzhu 2025-10-16 17:08:00 +08:00
  • 9ff6b0b862 [CI]: Fix doctest ci for main release (#3451) menogrey 2025-10-16 14:38:11 +08:00
  • b0ae203e72 [BUGFIX] Mtp torchair pd fix (#3449) xuyexiong 2025-10-16 09:03:49 +08:00
  • 291c00a224 [Doc] pin version that can stable running 310I Duo to vllm-ascend v0.10.0rc1 (#3455) leo-pony 2025-10-16 08:54:09 +08:00
  • ff91904ee2 [Doc] Clearer corresponding relationship between configurations for multi-node guides (#3441) leo-pony 2025-10-16 08:54:03 +08:00
  • aa6154703a [BugFix]GPQA Accuracy Issue Bugfix (#3476) DreamerLeader 2025-10-15 23:28:17 +08:00
  • cec1fab509 Revert "[MoE] [Refactor] Remove manual memory cleanup (#3365)" (#3483) weichen 2025-10-15 22:25:46 +08:00
  • f69a83b7ba [Feat] Flash comm allgher ep (#3334) realliujiaxu 2025-10-15 19:36:32 +08:00
  • 8abe517870 [Refactor] Adapt deepseek-v3.2 to vllm 0.11.0 (#3432) Mengqing Cao 2025-10-15 17:48:58 +08:00
  • 099255e933 [bugfix] fix pipeline parallel for mla & sfa attention backend (#3459) linfeng-yuan 2025-10-15 17:13:27 +08:00
  • 5a3082cd15 [EPLB]Record expert map without dynamic eplb. (#3409) offline893 2025-10-15 14:21:15 +08:00
  • 4f937f561d [MoE] [Refactor] Remove manual memory cleanup (#3365) weichen 2025-10-15 12:36:24 +08:00
  • 4e720936d8 Fix warning msg print (#3421) LeeWenquan 2025-10-15 11:30:30 +08:00
  • 16cb3cc45d adapt the mla_v1 with the mla_preprocess kernel (#3397) Chen Chen 2025-10-15 10:34:25 +08:00
  • 15b2e5c995 Remove unused row_idx in token_dispatcher (#3442) CaranLic 2025-10-15 09:08:31 +08:00
  • 3642b64afc bugfix for mtp with multistream_moe (#3419) zouyida2052 2025-10-15 08:59:58 +08:00
  • c2c1db78a7 [Bugfix] fix ZeroDivisionError when prefill_tp_size > num_kv_head and fix tp_resharding README (#3437) zxr2333 2025-10-15 08:45:44 +08:00
  • 02c26dcfc7 [Feat] Supports Aclgraph for bge-m3 (#3171) xuyexiong 2025-10-14 23:07:45 +08:00
  • 434059e417 [BugFix] Fix multimodal model support fullgraph error (#3425) fan2956 2025-10-14 21:51:09 +08:00
  • 223cc34085 [KVCache] Refactor KVCache as page_size_bytes is ineffective (#3438) Mengqing Cao 2025-10-14 21:28:41 +08:00
  • c55d99d13e [bugfix][torchair] fix missing weight nz cast for w13_weight in torchair_w8a8_dynamic.py (#3446) linfeng-yuan 2025-10-14 21:11:05 +08:00
  • 5fe883fa43 fix the title of modelrunner's prepare inputs docs (#3457) TaoYu Chen 2025-10-14 20:35:58 +08:00
  • 78777237a9 [2/N][Feat] Attention and MoE weight prefetch in Qwen3MoE models (#3203) yuzhup 2025-10-14 20:16:33 +08:00
  • 07e39620ea [Feat] Unquantized Linear to nz and control all nz-cast (#3356) anon189Ty 2025-10-14 17:39:26 +08:00
  • 5c45c227dc [BugFix] fix qwen2.5vl quant bug (#3426) elilzhu 2025-10-14 17:31:26 +08:00
  • ee25a517d1 [BugFix] Fix the port conflict bug of running external dp with disaggregated-prefill. (#3416) whx 2025-10-14 16:37:10 +08:00
  • 9eb62935b8 fix pagedattention to support fullgraph. (#3436) XiaoxinWang 2025-10-14 16:10:09 +08:00
  • 22a1d91cf5 [CI] Add single request test case for aclgraph (#3392) lilinsiman 2025-10-14 11:13:44 +08:00
  • 4536123341 [Fix] Fix mc2_tokens_capacity-related issues (#3411) Yizhou 2025-10-14 10:56:12 +08:00
  • 19b85ef1bc [Bugfix] multi_node_pd_disaggregation_mooncake.md update (#3400) wangxiaoteng888 2025-10-14 09:29:35 +08:00
  • 49b850270f [Community] Nominate new maintainers: @yiz-liu @paulyu12 @weijinqian0 @nalinaly (#3406) wangxiyuan 2025-10-14 08:51:58 +08:00
  • 657c08cfb2 [UT] fix skipped test_utils ut test. (#3422) menogrey 2025-10-14 08:31:13 +08:00
  • 4f6d60eb06 [Feature] Add W4A4 Flat Quantization support (#3427) Slightwind 2025-10-13 23:20:16 +08:00
  • 6972df5951 [Feature] optimize sp & qwen3 next support sp. (#3225) weijinqian0 2025-10-13 23:02:12 +08:00
  • 31682961af [Feat] enable hierarchical communication for mc2 ops on A2 (#3015) realliujiaxu 2025-10-13 16:13:17 +08:00
  • 0563106477 [Feature] mooncake connector support GQA transport (#2947) lidenghui1110 2025-10-13 15:48:37 +08:00
  • 847d12a389 [BugFix]Fix moe load problems in torchair when using dynamic eplb (#3381) dsxsteven 2025-10-13 11:38:57 +08:00
  • cd69385dab Add models test and add serval new models yaml (#3394) Yikun Jiang 2025-10-12 17:27:50 +08:00
  • d05d29ff0e Enable nightly test and add qwen3 32b test case (#3370) jiangyunfan1 2025-10-12 15:46:28 +08:00
  • 0d59a3c317 [CI] Make the test_pipeline_parallel run normally in full test (#3391) leo-pony 2025-10-12 15:43:13 +08:00
  • bcc313e8f2 add mla_preprocess kernel (#3226) Chen Chen 2025-10-12 07:39:45 +08:00
  • 1b1207e3c3 [Bugfix] Add quantization param for multi-node CI (#3383) Li Wang 2025-10-11 19:25:16 +08:00
  • e8c871ed0a [Test] enable external launcher and add e2e test for sleep mode in level2 (#3344) huangxialu 2025-10-11 17:29:38 +08:00
  • ecb1713dfc Bugfix: Expose the user policy type interface (#3336) Mercykid-bash 2025-10-11 16:28:57 +08:00
  • e4acb2dfc7 [feat] support customized and separated hccl_buffer_size for process group initialization (#3073) linfeng-yuan 2025-10-11 15:55:22 +08:00
  • 9eb103607f [1/N][CI] Add multi node test (#3359) Li Wang 2025-10-11 14:50:46 +08:00
  • 82b6c846ca [BugFix]Fix eplb problems when using dynamic eplb. (#3364) offline893 2025-10-11 14:04:02 +08:00
  • ca05f7d632 [Bugfix] TP size larger than KV cache head causes accuracy issues (#3366) wangxiaoteng888 2025-10-11 11:22:23 +08:00
  • ace300a549 [Bugfix] Fix the abnormal NPU memory usage in full graph mode. (#3331) 无脸男 2025-10-11 10:20:10 +08:00
  • 866f5e7283 [Bugfix] Fix weight prefetching AssertionError in W8A8 MTP scene (#3361) Ruri 2025-10-11 09:24:02 +08:00
  • 8c1a4dedf3 [Bugfix]modify the enable range of _merge_multimodal_embeddings patch (#3360) Peipei 2025-10-11 08:37:07 +08:00
  • 27e0f2c035 [Perf]Add YaRN custom op (#3355) Angazenn 2025-10-11 08:36:20 +08:00
  • ee0a95e47f bugfix for mtp when running torchair in a2 (#3354) zouyida2052 2025-10-10 23:07:24 +08:00
  • 90e00deaa9 [Bugfix] Optimized exception throwing when stream captures exception (#3322) lilinsiman 2025-10-10 17:09:28 +08:00
  • 1756efa5fd [Feat][Graph]Support FULL_DECEDE_ONLY mode for MLA models (#3125) panchao-hub 2025-10-10 16:31:20 +08:00
  • ba19dd3183 Revert PTA upgrade PR (#3352) wangxiyuan 2025-10-10 14:09:53 +08:00
  • 601a37aeff [Fixbug] Fix accuarcy template (#3088) zhangxinyuehfad 2025-10-10 09:03:21 +08:00
  • 6ae75933da [Feat] Load balance of tokens across experts in dummy_run (#3184) MengLong Chen 2025-10-10 09:00:07 +08:00
  • 60b7c936c5 [Doc] Update deepseek-v3.2 doc (#3319) Li Wang 2025-10-10 08:55:39 +08:00
  • 579b7e5f21 add pagedattention to support FULL_DECODE_ONLY. (#3102) XiaoxinWang 2025-10-10 08:50:33 +08:00
  • 1c2c72af8d [bugfix]change log2phy map to npu (#3339) offline893 2025-10-10 08:47:55 +08:00
  • 55e23fabec 【bugfix】fix connector register failed (#3335) fems14 2025-10-09 21:09:54 +08:00
  • ff37575936 [1/N][Feat] Add weight prefetch feature for Attention layers (#3146) Ruri 2025-10-09 20:38:39 +08:00
  • 23db56a340 [Feat]Qwen3 Moe supports npu_add_rms_norm_quant op by default, update op with norm bias (#3205) huangdong2022 2025-10-09 20:18:10 +08:00
  • 81aff9c555 bugfix for mtp (#3300) zouyida2052 2025-10-09 19:22:46 +08:00