Commit Graph

  • 350999c4ef [Bugfix]Fix eplb enable when using mtp float weights. (#4576) offline893 2025-12-05 21:15:32 +08:00
  • 3740b3edfc 【main】[Doc]add 2P1D instruction for single node (#4716) mazhixin000 2025-12-05 18:35:18 +08:00
  • 4b016b98a2 [CI] Fix unit test fault no space left (#4728) Li Wang 2025-12-05 17:21:30 +08:00
  • 41fbc5ebc9 [P/D][main] Clean connector history information (#4650) wangxiaoteng888 2025-12-05 16:22:23 +08:00
  • a336543977 [Bugifx] fix quant_apply_mlp w1_scale type error & fix getting num_local_expert (#4632) 欧派果奶我还要 2025-12-05 16:04:24 +08:00
  • a7f91079b8 [BugFix][Triton] Fix ub overflow bug of sample_recover_tokens_kernel (#4673) whx 2025-12-05 15:16:19 +08:00
  • 7f33838e6e Update comment doc (#4731) Chen Chen 2025-12-05 15:07:31 +08:00
  • b32ef53b3b [long_seq] remove long_seq env (#4660) LookAround0301 2025-12-05 10:31:49 +08:00
  • ea54388e19 Drop ascend scheduler (#4623) wangxiyuan 2025-12-05 09:03:45 +08:00
  • 00b4fb80de [Doc] Update vLLM version in doc (#4691) wangxiyuan 2025-12-05 08:59:41 +08:00
  • cd8e5be7c7 [Bugfix] Quick hot fix for nightly CI (#4727) Li Wang 2025-12-04 23:51:16 +08:00
  • ad0607f900 add dispatch_gmm_combine kernel (#3532) Chen Chen 2025-12-04 23:00:59 +08:00
  • 752a55473c [Misc] Upgrade vllm vllm commit to 2025_12_04 (#4690) Li Wang 2025-12-04 22:31:45 +08:00
  • 283bc5c7ba [Nightly] Optimize nightly CI (#4509) Li Wang 2025-12-04 22:31:07 +08:00
  • fb15fec662 [MM][Patch] Remove patch for cos/sin cache (#4672) Shanshan Shen 2025-12-04 22:30:06 +08:00
  • c4a11a745a [refactor]support gatingtopk operator generalization (#4356) 1092626063 2025-12-04 20:10:13 +08:00
  • b3e1377a92 【fix】ops gatingtopk fix nightly ci error (#4340) 1092626063 2025-12-04 20:09:21 +08:00
  • da84eb2f40 Remove ascend schuduler ut (#4684) wangxiyuan 2025-12-04 14:10:28 +08:00
  • 178ca1607e Adopt inductor fusion and define quantization fusion pass (#4168) Icey 2025-12-04 10:29:48 +08:00
  • c4a71fc6d5 Remove cancel for main to main check (#4685) Yikun Jiang 2025-12-04 09:10:27 +08:00
  • 3f4c0ea0a0 upgrade vLLM to 0.12.0 tag (#4647) wangxiyuan 2025-12-03 23:43:05 +08:00
  • 26e8e58cea [Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#4176) amy-why-3459 2025-12-03 20:48:45 +08:00
  • 6ece6660ec fix custom ops env set error (#4675) wangxiyuan 2025-12-03 19:27:38 +08:00
  • a1c0667392 [Misc] Add cann custom ops to .gitignore (#4670) Shanshan Shen 2025-12-03 18:29:10 +08:00
  • 6ac5730640 [CI] Fix ut ci: no space on the device (#4662) Icey 2025-12-03 17:35:06 +08:00
  • 15dc01f050 [Fix] Fix FIA query and query_start_loc shape mismatch error (#4518) XiaoxinWang 2025-12-03 17:33:31 +08:00
  • 7271f0d536 [Feat] MTP support DeepSeekV3.2 (#4465) ZYang6263 2025-12-03 14:24:33 +08:00
  • 38bd95229f [Model] Add qwen3Next support in Main (#4596) LeeWenquan 2025-12-03 14:17:37 +08:00
  • 593a96056c 【EPLB】Eplb Redundant Experts Bugfix (#4232) LI SHENGYONG 2025-12-03 12:00:05 +08:00
  • 3f81c4bb25 fix typo (#4657) wangxiyuan 2025-12-03 11:56:47 +08:00
  • 9a73c22b1c [Doc] add release note for v0.11.0rc3 (#4646) wangxiyuan 2025-12-03 11:49:44 +08:00
  • 18b90b501d [kernel] add AscendC op: lightning_indexer and sparse_flash_attention (#4625) Song Mingyang 2025-12-03 09:53:10 +08:00
  • b6d63bbd52 [v0.11.0-dev][CI] Fix ngram lacking of input arg dummy_compute_logits error (#4648) Mengqing Cao 2025-12-03 09:22:07 +08:00
  • 865f1f7fc8 [Bugfix] Resolve the interface compatibility issue of get_input_embeddings in MM (#4638) Levi 2025-12-02 22:21:47 +08:00
  • 7f2673ea2d upgrade vLLM to main (#4608) wangxiyuan 2025-12-02 22:10:52 +08:00
  • 4588cdac02 [Bugfix] fix custom op GmmSwigluQuantWeightNzTensorList (#4593) Chenxi Qian 2025-12-02 22:02:04 +08:00
  • b84c9afbf5 【doc fix】doc fix: deepseekv3.1 (#4645) 1092626063 2025-12-02 21:49:13 +08:00
  • 1b5513aa91 [performance] Enhance performance after enabling min_p (#4529) FuNanyang 2025-12-02 20:35:51 +08:00
  • eabedf43aa [Doc] Refactor the DeepSeek-V3.1 tutorial. (#4399) 1092626063 2025-12-02 18:46:30 +08:00
  • 874097a1de clean up model module (#4611) wangxiyuan 2025-12-02 17:35:47 +08:00
  • 96b2cdf6d8 [Ops][Triton] Add a triton kernel supporting partial rope. (#4413) whx 2025-12-02 17:10:19 +08:00
  • 8907010815 [Doc] Add tutorial for Qwen3-Coder-30B-A3B (#4391) yeyifan 2025-12-02 16:03:37 +08:00
  • cb33b09179 [Doc]clean up ascend scheduler config from doc (#4612) wangxiyuan 2025-12-02 14:22:56 +08:00
  • 3b4cb23616 [Bugfix] fix qwen2.5-vl-72b shape ERROR during the _prepare_inputs phase under high concurrency. (#4553) Levi 2025-12-02 14:20:45 +08:00
  • bb1610dc25 add hyperlink (#4588) herizhen 2025-12-02 14:09:03 +08:00
  • 400af665e6 [CI] Drop ascend scheduler from test (#4613) wangxiyuan 2025-12-02 13:18:17 +08:00
  • 6360eb1dea Revert "[Bugfix] Fix Qwen2.5-Omni-7B accuarcy test (#4556)" (#4619) wangxiyuan 2025-12-02 13:15:47 +08:00
  • e18e3067a7 Bump actions/checkout from 4.3.1 to 6.0.0 (#4592) dependabot[bot] 2025-12-02 11:59:25 +08:00
  • 2fa3945112 [Bugfix]Fix eplb enable when using mtp float weights. (#4571) offline893 2025-12-02 09:20:49 +08:00
  • 71e9b379c8 [Bugfix] Fix Qwen2.5-Omni-7B accuarcy test (#4556) zhangxinyuehfad 2025-12-02 09:20:05 +08:00
  • b4bf01ead1 [Refactor] Remove redundant attention operator branches. (#4531) weijinqian0 2025-12-02 09:13:26 +08:00
  • 981a14f8d5 [CI]enable chunked prefill by default (#4569) wangxiyuan 2025-12-02 08:54:34 +08:00
  • 6b9a997076 [MM][Model] Remove Qwen3-VL modeling files (#4577) Shanshan Shen 2025-12-02 07:33:17 +08:00
  • a9c4b8604a [main][bugfix] bugfix for qwen3 moe quantization (#4599) Wang Kunpeng 2025-12-01 23:48:57 +08:00
  • 12ca99c94e [Bugfix] Remove ModelSlim-"M4 Quantization". (#4589) Slightwind 2025-12-01 23:45:02 +08:00
  • 8813832387 [Test] Add GLM-4.5 nightly test (#4225) zhangxinyuehfad 2025-12-01 22:31:56 +08:00
  • c097790370 [Doc] Fix DeepSeek-V3.2-Exp doc, add docker command. (#4479) zhangyiming 2025-12-01 22:29:21 +08:00
  • b6afec73e1 [Test] Add accuracy nightly test for new models (#4262) zhangxinyuehfad 2025-12-01 22:28:46 +08:00
  • 52abd47f8c [Bugfix][SHM] Use writer lock by default and remove redundant env (#4117) Zetong Li 2025-12-01 22:27:01 +08:00
  • 8e7f5cff6d fix qwenvl pd smoke test error (#4597) shaopeng-666 2025-12-01 22:24:59 +08:00
  • 143e1f46d0 [Feat] shared expert dp for deepseek_mtp (#3811) MengLong Chen 2025-12-01 20:44:11 +08:00
  • 27b09ca9b9 [CI] drop ascend scheduler test (#4582) wangxiyuan 2025-12-01 20:33:50 +08:00
  • 203b4e6777 [Bug_fix] fix torchair o_proj forward parameter (#4166) zzhxxx 2025-12-01 19:57:01 +08:00
  • aa56a0f4b7 [Bugfix] PCP adaptation for VLLM v0.11.2 modifications (#4604) Slightwind 2025-12-01 19:20:32 +08:00
  • 0d14f635b4 upgrade torch npu version (#4433) wangxiyuan 2025-12-01 19:01:55 +08:00
  • f1f6370ed9 [Feature] Integrate Suffix Spec Decoding (#4045) fluctlux 2025-12-01 18:41:42 +08:00
  • 3023e15e23 add _cann_ops_custom gitignore (#4605) zzzzwwjj 2025-12-01 18:37:32 +08:00
  • f4871c6ab9 [Kernel] add triton kernels for sampling (#4550) MidnightSun 2025-12-01 17:41:58 +08:00
  • 2b82320b66 [Bugfix] Fix bug with establishing the flashcomm2 and pp communication domains. (#4458) zzhxxx 2025-12-01 15:56:22 +08:00
  • 8c65009d62 Bump actions/setup-python from 6.0.0 to 6.1.0 (#4591) dependabot[bot] 2025-12-01 14:32:08 +08:00
  • 76d0ba4342 [Image][Build] Cherry pick #4062 from main (#4506) Li Wang 2025-12-01 11:39:40 +08:00
  • 2b4f7a5016 [cherry-pick pr-4254] bugfix for mtp>1 when lm_head_tp>1 (#4360) zouyida2052 2025-12-01 11:11:15 +08:00
  • 51c8f60eb0 [Bugfix] Resolve MTP > 1 issue when lm head tp > 1 (#4254) Jade Zheng 2025-12-01 10:22:36 +08:00
  • e8e20c0bbf [BugFix] Fix Qwen2.5_Omni vision customized op attr err (#4568) Ting FU 2025-12-01 09:18:55 +08:00
  • c68ddc11ce [OPS] add bmm_transpose ops (#3990) Wang Yixuan 2025-12-01 09:09:51 +08:00
  • bc67696a02 [EPLB][Ops] Integerate grouped_matmul_swiglu_quant_weight_nz_tensor_list operator into dynamic EPLB (#4216) 欧派果奶我还要 2025-11-30 22:52:05 +08:00
  • 18eefc23c3 [feature] Support W8A8 PD-Mix Quantization (#4235) Slightwind 2025-11-30 11:57:26 +08:00
  • ff7061317f [Bugfix] Fix kvpool precision synchronization (#4574) Chao Lei 2025-11-30 09:39:07 +08:00
  • 2b3bfe432e [bugfix] Repair the problem of moe model accuracy caused by version upgrade. (#4562) weijinqian0 2025-11-30 06:12:39 +08:00
  • c84efeae25 [CI] Skip test_ngram_correctness as the oom issue block CI (#4578) Mengqing Cao 2025-11-30 01:34:50 +08:00
  • 517fd9272d Revert "drop ascend scheduler" (#4580) Mengqing Cao 2025-11-29 22:20:48 +08:00
  • 4dbe4fd123 [feature]Pooling Features and PCP Adaptation (#4143) DreamerLeader 2025-11-29 22:07:45 +08:00
  • 1eb5295a1b remove qwen3-next model file (#4573) wangxiyuan 2025-11-29 18:37:26 +08:00
  • a3041cd78c [Bugfix] fix dp parallel + tp > 1 offline inference port conflict (#4539) Nengjun Ma 2025-11-29 18:37:11 +08:00
  • 1874265074 Move mla to ops module (#4575) wangxiyuan 2025-11-29 18:36:55 +08:00
  • 2a19215e5f [MM][Model] Remove Qwen2-VL modeling files (#4534) Shanshan Shen 2025-11-29 18:07:01 +08:00
  • 6664a4e5ce improve soc version (#4522) wangxiyuan 2025-11-29 17:42:16 +08:00
  • f10acddb78 drop ascend scheduler (#4498) wangxiyuan 2025-11-29 16:18:34 +08:00
  • 53a52d6614 [P/D] [bugfix] add get_kv_connector_handshake_metadata func for 0.11.2 (#4567) liziyu 2025-11-29 16:09:45 +08:00
  • cd9f5c0611 [bugfix] dep ineffective (#4416) LI SHENGYONG 2025-11-29 15:19:11 +08:00
  • 0151022ab8 [bugfix] dep ineffective (#4417) LI SHENGYONG 2025-11-29 15:18:29 +08:00
  • 8ebbf13c1a Update triton package name (#4563) wangxiyuan 2025-11-29 15:00:40 +08:00
  • b747c95cfa [Doc] Add single NPU tutorial for Qwen2.5-Omni-7B (#4446) Ting FU 2025-11-29 11:57:29 +08:00
  • 9af34755ff [Bugfix] Fix model run _npu_flash_attention hang issue (#4410) Ting FU 2025-11-29 09:20:22 +08:00
  • 048d350f9e update triton package url (#4552) wangxiyuan 2025-11-28 21:00:49 +08:00
  • 1c4a0468ee 【OPS】qwen3-next support triton chunk_gated_delta_rule ops (#4070) shiyuan680 2025-11-28 20:55:43 +08:00
  • 5447a039b9 [Feature][main]reconstruction kvpool connector to ascend connector (#4438) fems14 2025-11-28 18:08:37 +08:00
  • 554f16ae1f [Kernel] add custom op GmmSwigluQuantWeightNzTensorList (#3804) Chenxi Qian 2025-11-28 18:06:39 +08:00
  • 71acc8ddeb For nz unset in bf16&fp16 (#4495) henryxuxu0716 2025-11-28 17:32:25 +08:00
  • 3199fe8350 [Doc]Delete equals sign (#4537) herizhen 2025-11-28 17:09:26 +08:00