Commit Graph

  • 4ba56716f9 Increase doctest timeout to 300s and time print (#3041) Yikun Jiang 2025-09-19 20:26:00 +08:00
  • 8326f15ecf [CustomOp] Register AscendSharedFusedMoE custom op (#2980) Shanshan Shen 2025-09-19 19:05:01 +08:00
  • 05a700d370 [Bugfix] Fix async copy bug under single expert scenario (#3005) sdmyzlp 2025-09-19 14:05:36 +08:00
  • 2a87b4cecb [Bugfix] Fix specdecoding in chunkedprefill scenario (#3025) xuyexiong 2025-09-19 14:05:08 +08:00
  • 833cd1b698 [BugFix] Async scheduling and PP compatibility with DP (#2796) Song Zhixin 2025-09-19 11:29:50 +08:00
  • 0a526768f5 [Feature] Support moe multi-stream for aclgraph. (#2946) whx 2025-09-19 11:06:45 +08:00
  • 0c04bf1e36 [Fixbug] Fix accuracy for DeepSeek-V2-Lite (#3016) zhangxinyuehfad 2025-09-18 23:58:23 +08:00
  • 367edff5af [HybridKV] Fix prefill disaggregation kvcache addr alignment & use hybrid kv cache only when running qwen3_next (#3007) Mengqing Cao 2025-09-18 21:43:22 +08:00
  • acb46f303f Fix VocabParallelEmbedding UT (#2722) Icey 2025-09-18 19:54:01 +08:00
  • 01592515b8 [Bugfix] Fix sleep mode level 2 (#1376) Li Wang 2025-09-18 19:51:52 +08:00
  • f4e3d22432 Remove chunked_prefill_for_mla and fix ring_mla bug (#2781) LeeWenquan 2025-09-18 19:43:26 +08:00
  • 79a910ef47 [bugfix][torchair] fix multistream_moe problems in torchair graph mode (#2681) linfeng-yuan 2025-09-18 17:35:04 +08:00
  • 4267f5d55f [Doc] Add multi-node ray backend tutorial (#2376) Li Wang 2025-09-18 15:30:18 +08:00
  • af2a886814 refactor linear (#2867) realliujiaxu 2025-09-18 14:09:19 +08:00
  • a7f8ed38ed [Bugfix]:replace npu_incre_flash_attention with npu_fused_infer_atten… (#2901) panchao-hub 2025-09-18 14:06:08 +08:00
  • 6681dde902 [Feat][Graph] Support MTP for ACL Graph (#2932) xuyexiong 2025-09-18 14:05:33 +08:00
  • cef43b524e [Feat] A Connector that supports Mooncake store (#2913) Chao Lei 2025-09-18 14:04:45 +08:00
  • 723d460894 [Bugfix] fix kv nz accuracy bug (#2988) realliujiaxu 2025-09-17 21:10:25 +08:00
  • 8bcc0ccd57 [bugfix] fix shared expert dp with hybrid kvcache (#2964) linfeng-yuan 2025-09-17 20:01:47 +08:00
  • 1f6465c399 Add an option of enable frozen parameter (#2869) 1Fire4 2025-09-17 12:00:44 +08:00
  • 76844eec78 Dynamic Expert Load Balance with Zero-like-overhead (#2956) offline893 2025-09-17 10:36:43 +08:00
  • ae758dda05 [Bugfix] Fix mtp torchair in pd Disaggregation scenario (#2951) xuyexiong 2025-09-17 09:07:58 +08:00
  • 6b7117dbb7 [main] addrmsnorm + quant fusion optim in Dense Models (#2772) rjg-lyh 2025-09-16 22:31:38 +08:00
  • 88ca8a051c [Feat][Graph] Support DeepSeek with ACL Graph (#2707) yiz-liu 2025-09-16 17:50:17 +08:00
  • 3e60aa5483 Bump actions/setup-python from 5.4.0 to 6.0.0 (#2926) dependabot[bot] 2025-09-16 14:15:10 +08:00
  • 1c5900327b [refactor] refactor deepseek-related files (#2849) linfeng-yuan 2025-09-16 14:13:07 +08:00
  • 18ca7861f6 [Main] [Refactor] Enable MoECommMethod in Eager Mode (#2791) weichen 2025-09-16 11:06:00 +08:00
  • 0aba644633 Update max_tokens and prompt in qwen3 online doc (#2945) Yikun Jiang 2025-09-16 09:27:50 +08:00
  • 048bfd5553 [Release] Add release note for v0.10.2rc1 (#2921) wangxiyuan 2025-09-16 01:20:05 +08:00
  • c556038ef0 [New model] Qwen3-next support (#2917) wangxiyuan 2025-09-16 01:17:42 +08:00
  • b5ccef6115 [Doc] Add doc for Qwen3 Next (#2916) Yikun Jiang 2025-09-16 01:16:06 +08:00
  • aa3c4563ce fix all cards super_pod_id same on A3 & proxy support min_tokens (#2939) liziyu 2025-09-16 01:09:18 +08:00
  • 382c29f3e1 [BugFix] Fix world size bug in model_runner (#2915) wangxiyuan 2025-09-14 12:20:25 +08:00
  • c5a502fd2e main add ascend scheduler support multimodal (#2844) fan2956 2025-09-14 09:38:51 +08:00
  • 0747a6e68c Bump vLLM version to v0.10.2 (#2914) Yikun Jiang 2025-09-14 06:57:59 +08:00
  • f97a64ba7f Bump vLLM version to v0.10.2rc3 (#2911) Yikun Jiang 2025-09-13 19:15:48 +08:00
  • 8ece6956e7 Revert "Upgrade CANN version to 8.3.rc1.alpha001 (#2903)" (#2909) Yikun Jiang 2025-09-13 16:21:54 +08:00
  • 0a27705917 fix mooncake connector adxl hostname usage (#2824) zxr2333 2025-09-13 14:38:48 +08:00
  • d2250c80b5 Enable push trigger for image job (#2906) Yikun Jiang 2025-09-13 12:31:36 +08:00
  • 339fceb89c Upgrade CANN version to 8.3.rc1.alpha001 (#2903) Yikun Jiang 2025-09-13 12:10:21 +08:00
  • e57cca971c Fix the bugs about operator registration by PyTorch Dispatcher (#2786) Jiawei Li 2025-09-13 11:58:52 +08:00
  • 138e932630 Bump vLLM version to v0.10.2rc2 (#2902) Yikun Jiang 2025-09-13 11:39:48 +08:00
  • 585a494baa [Core] Disable the chunked prefill feature in Non-MLA LLMs (#2894) rjg-lyh 2025-09-12 23:17:09 +08:00
  • 756b8a1946 Revert "[Feat] Unquantized linear nz support (#2619)" (#2896) Yikun Jiang 2025-09-12 20:51:12 +08:00
  • fc2bcbe21c [Ops] Fix bug in register_custom_ops without forward_context (#2883) rjg-lyh 2025-09-12 16:58:08 +08:00
  • 6d8bc38c7b Enable label-based image test and use free runner to run lint (#2864) Yikun Jiang 2025-09-12 10:49:42 +08:00
  • 778cb72556 fix bug when rotary_dim is not 128 (#2847) realliujiaxu 2025-09-12 09:49:36 +08:00
  • f5a97e8fa5 [Quantization] register AscendQuantRMSNorm for quantization (#2856) 22dimensions 2025-09-11 23:14:02 +08:00
  • eab3635850 [Bugfix] Retrieve num_redundant_experts from eplb_config in torchair qwen3_moe.py (#2857) wyu0-0 2025-09-11 22:15:19 +08:00
  • aeffe27b30 [Perf]set moe w2_weight default to be nz (#2842) Angazenn 2025-09-11 21:40:54 +08:00
  • 9615dea3a7 Refactor tensor_parallel and comm_utils (#2814) wuweiqiang24 2025-09-11 21:26:36 +08:00
  • 0005479b9c [main] mlp weight prefetch in Qwen Dense Models (#2816) rjg-lyh 2025-09-11 21:20:09 +08:00
  • c3c2221503 [Feat]support dynamic quantization in allgather (#2841) 无脸男 2025-09-11 18:47:20 +08:00
  • 07c58669fd [Bugfix] Update lm_eval version to remove deprecated param (#2871) Li Wang 2025-09-11 18:39:03 +08:00
  • bd3dedea61 support qwen25 vl w8a8 quantization (#2778) 6lazijiamo 2025-09-11 16:40:51 +08:00
  • 2b9269b581 [Perf][V1] Fully overlap model execution (#2783) jiangpeng 2025-09-11 16:35:36 +08:00
  • 923cdaeba3 fix ascend fused moe spelling error (#2863) zhaozx-cn 2025-09-11 14:35:46 +08:00
  • b9a0a75c78 fix qwen torchair attention PrefillCacheHit (#2787) zhaozx-cn 2025-09-11 14:26:59 +08:00
  • 7b2ecc1e9a [Feat] Unquantized linear nz support (#2619) anon189Ty 2025-09-11 11:40:00 +08:00
  • 5691104249 LLMdatadist connector adapt the distributed KV aggregation (#2718) liziyu 2025-09-11 11:37:41 +08:00
  • c2fdd4b8bc [CI/UT] Fix UTs on register customop and warm up model (#2862) Mengqing Cao 2025-09-11 11:30:16 +08:00
  • b7df04de9b debug_aclgraph_sizes_capture (#2827) lilinsiman 2025-09-10 22:50:48 +08:00
  • e75b568011 [CI] Update pre_commit runner (#2850) zhangxinyuehfad 2025-09-10 20:23:25 +08:00
  • b7ee3fdad3 [Code clean] Remove the unnecessary code (#2815) Jiawei Li 2025-09-10 17:19:39 +08:00
  • 88d7af62be [main] adjust the position of warm_up_atb (#2823) huangxialu 2025-09-10 14:06:38 +08:00
  • 22b425765a [Bugfix] Fix broken CI (#2825) Li Wang 2025-09-10 13:29:29 +08:00
  • aa4d2a91ed Refactor AscendMultiHeadLatentAttention (#2826) Icey 2025-09-10 11:26:11 +08:00
  • 168ad600b5 [main] add pd transfer for ascend scheduler (#2753) CaranLic 2025-09-10 08:46:39 +08:00
  • edf1f600ad [CI] Remove compatibility maintenance for vllm v0.10.1 and v0.10.1.1 (#2840) Mengqing Cao 2025-09-10 08:43:10 +08:00
  • 93e28e6862 add weight transpose check. (#2756) sherie 2025-09-09 20:33:43 +08:00
  • e13c4ddb42 [Fix] Fix SharedFusedMoE (#2817) yiz-liu 2025-09-09 18:19:56 +08:00
  • 7a205dbaa8 [main] Optimize rope in Qwen Models (#2571) rjg-lyh 2025-09-09 14:28:14 +08:00
  • 5bcb4c1528 [CI] Reduce CI time (#2801) wangxiyuan 2025-09-09 10:52:14 +08:00
  • 1bbb20ea13 [main] flashcomm_v1 optim in Qwen Dense Models (#2802) rjg-lyh 2025-09-08 22:52:24 +08:00
  • 4df8df5b94 [bugfix] fix deepseek rope sincoscache re-generation (#2744) zzzzwwjj 2025-09-08 22:03:34 +08:00
  • 7d6d9449a8 [Misc] Move lora patch file into lora module (#2797) wangxiyuan 2025-09-08 21:42:12 +08:00
  • 85d989a3b9 [Misc] Remove pangu model file (#2798) wangxiyuan 2025-09-08 21:30:37 +08:00
  • a041d4f328 [main] [refactor] refactor common_fused_moe.py (#2706) weichen 2025-09-08 20:09:50 +08:00
  • 1a82b16355 Remove unused code in fused_moe.py (#2805) machenglong2025 2025-09-08 20:05:19 +08:00
  • d51694a77b [2/N][Refactor][Quantization] clean quantization patch (#2785) 22dimensions 2025-09-08 17:31:53 +08:00
  • cd88f89267 Bump actions/github-script from 7 to 8 (#2803) dependabot[bot] 2025-09-08 14:53:26 +08:00
  • d3c3538ddc [Bugfix]fix bug when graph_size is not divisible by tp_size (#2719) realliujiaxu 2025-09-08 14:52:33 +08:00
  • dd087effcc Refector prepare_inputs in model_runner_v1.py (#2750) TaoYu Chen 2025-09-08 10:45:23 +08:00
  • c735bb0941 [Fix] Ensure metadata sync across DP ranks in eager mode (#2766) yiz-liu 2025-09-08 09:55:16 +08:00
  • 2693196ef8 add gatherep select. (#2740) sherie 2025-09-08 09:15:50 +08:00
  • 6666e5265d Added support for KV connector v1 (#2039) Marco Barletta 2025-09-08 03:04:22 +02:00
  • 2967e5e22a [Benchmark] Correctly kill vllm process in performance benchamrk (#2782) Li Wang 2025-09-07 10:36:34 +08:00
  • a746f8274f [DOC] Qwen3 PD disaggregation user guide (#2751) yupeng 2025-09-07 10:35:37 +08:00
  • b2f77d3aa8 [fix] prefill unsupport sliding window attention (#2758) yeyifan 2025-09-07 10:34:38 +08:00
  • 752e272a55 Add note for Ascend HDK version (#2765) Yikun Jiang 2025-09-07 10:33:41 +08:00
  • 5a7181569c [feat]: oproj tensor parallelism in pure DP and graph-mode scenarios. (#2167) lidenghui1110 2025-09-07 10:31:32 +08:00
  • a58b43b72c Remove git .extraheader and fecth all commtis in /vllm-workspace/vllm-ascend (#2746) Yikun Jiang 2025-09-05 09:45:11 +08:00
  • 51a2aec115 Delete redundant codes related to communication (#2717) henryxuxu0716 2025-09-05 09:39:39 +08:00
  • 5b3646ab21 [FEATURE][MTP] Support MTP > 1 (#2708) 1092626063 2025-09-05 09:11:22 +08:00
  • 83eb40a51c [Fix][MoE] Refine MoE communication strategy (#2734) yiz-liu 2025-09-05 09:04:04 +08:00
  • 4c90fa79ca [Misc] Remove useless PD check in deepseek (#2739) liziyu 2025-09-04 22:22:19 +08:00
  • 3a2a7d88db [Doc] Update accuracy reports for v0.10.1rc1 (#2755) vllm-ascend-ci 2025-09-04 22:17:17 +08:00
  • f86596a66c allgather use fusedop. (#2689) sherie 2025-09-04 11:56:29 +08:00
  • 7d47d8f4f6 [Fix] fix resources limit error when apply speculative decoding and aclgraph (#2472) 无脸男 2025-09-04 11:50:43 +08:00
  • 0c0789be74 [Feat] allow using aclgraph in ray backend (#2589) 无脸男 2025-09-04 11:45:56 +08:00