Commit Graph

  • 14497b748d Remove qwen3 moe MC2 cumsum & cast (#3126) florenceCH 2025-09-26 08:51:30 +08:00
  • 2930e4a6bd [CI] Upgrade vllm to newest commit (#3182) wangxiyuan 2025-09-26 06:18:15 +08:00
  • 0794f64a18 Revert "[Disagg][Perf] Use NPU event sync instead of blocking tolist (#3194) wangxiyuan 2025-09-26 06:17:36 +08:00
  • 31dda3f557 [Model]Add support for qwen3_vl and qwen3_vl_moe (#3103) Peipei 2025-09-25 18:50:12 +08:00
  • f7a3815bff [CI] Do not drop ready label when PR is merge conflict (#3173) wangxiyuan 2025-09-25 18:45:19 +08:00
  • 5d13bbe796 [BugFix]Modify eplb feature guide. (#3183) offline893 2025-09-25 17:01:51 +08:00
  • 07f4710216 [BugFix] Fix dummy_run memory explosion in eager mode (#3132) MengLong Chen 2025-09-25 16:09:44 +08:00
  • 72f64c10b7 [bugFix] Correct the vllm interface e2e test Base container image name (#3179) leo-pony 2025-09-25 16:03:09 +08:00
  • 2a9d02e080 [Bugfix] eagle and eagle3 spec decode failures and enable e2e test (#2979) Icey 2025-09-25 14:39:12 +08:00
  • ac1c2cd9ac [CI] Upgrade vllm version - 0925 (#3167) wangxiyuan 2025-09-25 14:20:10 +08:00
  • 33c118c80e [core]vllm-ascend support msMonitor tool (#3123) mfyCn-1204 2025-09-25 14:15:02 +08:00
  • c814b32b90 [Quant][GLM] Adapt glm quant. (#3147) whx 2025-09-25 11:13:29 +08:00
  • a055183821 [CI] Upgrade vLLM version (#3139) wangxiyuan 2025-09-25 07:36:51 +08:00
  • 464270e4ca Remove useless PD check in deepseek (#3161) liziyu 2025-09-24 23:25:47 +08:00
  • 4ee58e213b [BugFix] explicitly setting the tensor shape of otp output (#3027) zzhxxx 2025-09-24 18:44:15 +08:00
  • 360a736dfa Add OOT platform E2E test case to be run in the vllm buildkite pipeline (#3154) leo-pony 2025-09-24 17:55:58 +08:00
  • cd1ffbb6cd [1/N][Feat] Cut down memory usage for o_proj in DeepSeek (#2931) clrs97 2025-09-24 17:16:41 +08:00
  • 302494c1fe [EPLB] ut for EPLB (#3035) Clorist33 2025-09-24 17:14:38 +08:00
  • 80524f5711 [CORE] concurrent partial prefills (#2372) Csrayz 2025-09-24 17:12:55 +08:00
  • 2d885869c5 [KVCache][Bugfix] Fix kv cache initialization error of attention layer (#3113) Mengqing Cao 2025-09-24 11:32:34 +08:00
  • 6aa4253798 [Refactor] [SP]The sequence parallelism characteristics in the MoE and Dense models are integrated into a single solution. (#3085) weijinqian0 2025-09-24 11:29:59 +08:00
  • e7618d9414 [2/N][Refactor][Qwen3-Next] remove redundant methods and patch methods in Qwen3NextGatedDeltaNet (#3082) Icey 2025-09-24 11:25:42 +08:00
  • eb205d9f35 [P/D][BugFix]Mooncake timeout release bug fix (#2899) baxingpiaochong 2025-09-24 11:22:46 +08:00
  • 6995a7bc5b [Disagg][Perf] Use NPU event sync instead of blocking tolist to avoid unintentional copy ops blocking across different NPU streams, improving disagg TTIT/TTFT (#2788) Song Zhixin 2025-09-24 11:21:58 +08:00
  • c4b976af1a [Model][VLM][Patch]Modify ascend affinity _merge_multimodal_embeddings (#3071) Peipei 2025-09-24 10:25:28 +08:00
  • b1380f3b87 [Doc] modify the version compatibility between vllm and vllm-ascend (#3130) weiguihua2 2025-09-23 20:31:49 +08:00
  • d01fd1d1c3 [misc][torchair] fix bugs around deepseek mtp, enable_shared_expert_dp and use_cached_kv_cache_bytes (#3074) linfeng-yuan 2025-09-23 14:52:42 +08:00
  • 0f3939e5a9 [Feature]cpu offload connector (#1659) lidenghui1110 2025-09-23 14:25:05 +08:00
  • 96eb1ed408 [CI] Bump vLLM commit hash to 0923(f225ea7) (#3110) Li Wang 2025-09-23 14:13:25 +08:00
  • d586255678 fix wrong --num-gpus parameter requirements, and avoid ambiguity (#3116) Jianwei Mao 2025-09-23 11:58:44 +08:00
  • 39a85c49fa [Refactor] Rename cudagraph_support to aclgraph_support (#3104) Yizhou 2025-09-23 11:30:31 +08:00
  • d2399ab97b Fix VLLM_ASCEND_LLMDD_RPC_PORT renaming (#3108) wyu0-0 2025-09-23 10:33:04 +08:00
  • 29c173ab48 FlashLB algorithm (#3042) Mercykid-bash 2025-09-23 10:27:14 +08:00
  • 8dd53c8860 [Bugfix][PD] Auto-clear producer KV cache if no pull notification (#2174) hucong 2025-09-23 09:53:34 +08:00
  • 704467cd9a [Bugfix][LoRA] Fix bug introduced by upstream vllm#25249 (#3095) yupeng 2025-09-22 22:26:01 +08:00
  • 3fa7cf6345 [Refactor][Graph] Move graph parameter logic to acl_graph module (#3101) Yizhou 2025-09-22 22:23:14 +08:00
  • 02f89d166f [CI] Update vllm version to 20250922(5aeb925) (#3091) Li Wang 2025-09-22 22:18:13 +08:00
  • 1c9f0fe26f Fix of DeepSeek Error in KV Pool Mixed Deployment Scenario (#3087) fems14 2025-09-22 20:36:41 +08:00
  • 37a0715eda [Refactor] Adjustments to moe_comm_method selection process (#3001) weichen 2025-09-22 19:12:58 +08:00
  • bb1f0d5a62 [main] remove the redundant log prints in register_custom_ops.py (#3094) rjg-lyh 2025-09-22 17:17:31 +08:00
  • 338231acaf [Feat][Graph] Support FULL_DECODE_ONLY mode for GQA/MHA models (#2128) Yizhou 2025-09-22 17:14:28 +08:00
  • f39bd309b6 [Hybrid KV] Follow up UniformTypeKVCacheSpecs (#3070) Mengqing Cao 2025-09-22 15:02:41 +08:00
  • f1f2c8f5e5 [Perf] Add new npu_fused_infer_attention_score op to improve perfomance in splitfuse cases and resolve long-seq mask problems (#2962) tianyitang 2025-09-22 14:56:14 +08:00
  • c90a6d3658 [Test] Update the format of the accuracy report (#3081) zhangxinyuehfad 2025-09-22 14:10:03 +08:00
  • 37a0b3f25e Bump actions/labeler from 5 to 6 (#3086) dependabot[bot] 2025-09-22 14:07:37 +08:00
  • ffdd1a36e2 [bugfix][torchair] fix wasted NPU memory buffer allocation for quantized deepseek with unquantized MTP layer (#3068) linfeng-yuan 2025-09-22 14:06:43 +08:00
  • 14b39d3c70 [1/N][Refactor][Qwen3-Next] remove redundant Qwen3NextSparseMoeBlock and Qwen3NextAttention (#3019) Icey 2025-09-22 11:24:08 +08:00
  • 88d24cce8b [CI] Enable main based lint check and light ci matrix (#3079) wangxiyuan 2025-09-22 10:37:53 +08:00
  • 693f547ccf Refactor ci to reuse base workflow and re-enable ut coverage (#3064) Yikun Jiang 2025-09-21 13:27:08 +08:00
  • b8b68b3dfe [CI] Upgrade vLLM to 20250920 (c60e613) and address config break (#3067) Yikun Jiang 2025-09-21 09:49:17 +08:00
  • 12bcbd02bb [CI] Upgrade vLLM to 20250919 (6d8246aa) and fix some broken issue (#2907) Li Wang 2025-09-20 17:37:57 +08:00
  • 53ecd89e8f [Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE (#2969) Lucas Kabela 2025-09-19 17:22:30 -07:00
  • e26fe1caf1 [TEST] Speed up DS V2 accuracy test and turn up accuracy baseline (#3047) zhangxinyuehfad 2025-09-20 00:40:33 +08:00
  • a22b532d38 [Fixbug] Fix shape not match when sliding_window and dynamic batch_size (#2830) zhangxinyuehfad 2025-09-19 22:35:14 +08:00
  • cf549b976d [Test]Add unit test for compilation/acl_graph.py (#3039) zhanghw0354 2025-09-19 21:31:17 +08:00
  • 0942d9aaab [3/N][Refactor][Quantization]remove packed_modules_mapping from models (#3021) 22dimensions 2025-09-19 20:50:14 +08:00
  • 4ba56716f9 Increase doctest timeout to 300s and time print (#3041) Yikun Jiang 2025-09-19 20:26:00 +08:00
  • 8326f15ecf [CustomOp] Register AscendSharedFusedMoE custom op (#2980) Shanshan Shen 2025-09-19 19:05:01 +08:00
  • 05a700d370 [Bugfix] Fix async copy bug under single expert scenario (#3005) sdmyzlp 2025-09-19 14:05:36 +08:00
  • 2a87b4cecb [Bugfix] Fix specdecoding in chunkedprefill scenario (#3025) xuyexiong 2025-09-19 14:05:08 +08:00
  • 833cd1b698 [BugFix] Async scheduling and PP compatibility with DP (#2796) Song Zhixin 2025-09-19 11:29:50 +08:00
  • 0a526768f5 [Feature] Support moe multi-stream for aclgraph. (#2946) whx 2025-09-19 11:06:45 +08:00
  • 0c04bf1e36 [Fixbug] Fix accuracy for DeepSeek-V2-Lite (#3016) zhangxinyuehfad 2025-09-18 23:58:23 +08:00
  • 367edff5af [HybridKV] Fix prefill disaggregation kvcache addr alignment & use hybrid kv cache only when running qwen3_next (#3007) Mengqing Cao 2025-09-18 21:43:22 +08:00
  • acb46f303f Fix VocabParallelEmbedding UT (#2722) Icey 2025-09-18 19:54:01 +08:00
  • 01592515b8 [Bugfix] Fix sleep mode level 2 (#1376) Li Wang 2025-09-18 19:51:52 +08:00
  • f4e3d22432 Remove chunked_prefill_for_mla and fix ring_mla bug (#2781) LeeWenquan 2025-09-18 19:43:26 +08:00
  • 79a910ef47 [bugfix][torchair] fix multistream_moe problems in torchair graph mode (#2681) linfeng-yuan 2025-09-18 17:35:04 +08:00
  • 4267f5d55f [Doc] Add multi-node ray backend tutorial (#2376) Li Wang 2025-09-18 15:30:18 +08:00
  • af2a886814 refactor linear (#2867) realliujiaxu 2025-09-18 14:09:19 +08:00
  • a7f8ed38ed [Bugfix]:replace npu_incre_flash_attention with npu_fused_infer_atten… (#2901) panchao-hub 2025-09-18 14:06:08 +08:00
  • 6681dde902 [Feat][Graph] Support MTP for ACL Graph (#2932) xuyexiong 2025-09-18 14:05:33 +08:00
  • cef43b524e [Feat] A Connector that supports Mooncake store (#2913) Chao Lei 2025-09-18 14:04:45 +08:00
  • 723d460894 [Bugfix] fix kv nz accuracy bug (#2988) realliujiaxu 2025-09-17 21:10:25 +08:00
  • 8bcc0ccd57 [bugfix] fix shared expert dp with hybrid kvcache (#2964) linfeng-yuan 2025-09-17 20:01:47 +08:00
  • 1f6465c399 Add an option of enable frozen parameter (#2869) 1Fire4 2025-09-17 12:00:44 +08:00
  • 76844eec78 Dynamic Expert Load Balance with Zero-like-overhead (#2956) offline893 2025-09-17 10:36:43 +08:00
  • ae758dda05 [Bugfix] Fix mtp torchair in pd Disaggregation scenario (#2951) xuyexiong 2025-09-17 09:07:58 +08:00
  • 6b7117dbb7 [main] addrmsnorm + quant fusion optim in Dense Models (#2772) rjg-lyh 2025-09-16 22:31:38 +08:00
  • 88ca8a051c [Feat][Graph] Support DeepSeek with ACL Graph (#2707) yiz-liu 2025-09-16 17:50:17 +08:00
  • 3e60aa5483 Bump actions/setup-python from 5.4.0 to 6.0.0 (#2926) dependabot[bot] 2025-09-16 14:15:10 +08:00
  • 1c5900327b [refactor] refactor deepseek-related files (#2849) linfeng-yuan 2025-09-16 14:13:07 +08:00
  • 18ca7861f6 [Main] [Refactor] Enable MoECommMethod in Eager Mode (#2791) weichen 2025-09-16 11:06:00 +08:00
  • 0aba644633 Update max_tokens and prompt in qwen3 online doc (#2945) Yikun Jiang 2025-09-16 09:27:50 +08:00
  • 048bfd5553 [Release] Add release note for v0.10.2rc1 (#2921) wangxiyuan 2025-09-16 01:20:05 +08:00
  • c556038ef0 [New model] Qwen3-next support (#2917) wangxiyuan 2025-09-16 01:17:42 +08:00
  • b5ccef6115 [Doc] Add doc for Qwen3 Next (#2916) Yikun Jiang 2025-09-16 01:16:06 +08:00
  • aa3c4563ce fix all cards super_pod_id same on A3 & proxy support min_tokens (#2939) liziyu 2025-09-16 01:09:18 +08:00
  • 382c29f3e1 [BugFix] Fix world size bug in model_runner (#2915) wangxiyuan 2025-09-14 12:20:25 +08:00
  • c5a502fd2e main add ascend scheduler support multimodal (#2844) fan2956 2025-09-14 09:38:51 +08:00
  • 0747a6e68c Bump vLLM version to v0.10.2 (#2914) Yikun Jiang 2025-09-14 06:57:59 +08:00
  • f97a64ba7f Bump vLLM version to v0.10.2rc3 (#2911) Yikun Jiang 2025-09-13 19:15:48 +08:00
  • 8ece6956e7 Revert "Upgrade CANN version to 8.3.rc1.alpha001 (#2903)" (#2909) Yikun Jiang 2025-09-13 16:21:54 +08:00
  • 0a27705917 fix mooncake connector adxl hostname usage (#2824) zxr2333 2025-09-13 14:38:48 +08:00
  • d2250c80b5 Enable push trigger for image job (#2906) Yikun Jiang 2025-09-13 12:31:36 +08:00
  • 339fceb89c Upgrade CANN version to 8.3.rc1.alpha001 (#2903) Yikun Jiang 2025-09-13 12:10:21 +08:00
  • e57cca971c Fix the bugs about operator registration by PyTorch Dispatcher (#2786) Jiawei Li 2025-09-13 11:58:52 +08:00
  • 138e932630 Bump vLLM version to v0.10.2rc2 (#2902) Yikun Jiang 2025-09-13 11:39:48 +08:00
  • 585a494baa [Core] Disable the chunked prefill feature in Non-MLA LLMs (#2894) rjg-lyh 2025-09-12 23:17:09 +08:00
  • 756b8a1946 Revert "[Feat] Unquantized linear nz support (#2619)" (#2896) Yikun Jiang 2025-09-12 20:51:12 +08:00