Commit Graph

  • eff4b5791c Recover offline_inference_npu.py to make doctest passed (#1756) Yikun Jiang 2025-07-12 12:36:35 +08:00
  • 8b3a483269 Add recommend version and refresh readme / contribution.md (#1757) Yikun Jiang 2025-07-12 12:35:40 +08:00
  • 3c404de1b1 [Release]Update release note (#1753) wangxiyuan 2025-07-11 17:58:26 +08:00
  • b5b7e0ecc7 [Doc] Add qwen3 embedding 8b guide (#1734) wangxiyuan 2025-07-11 17:40:17 +08:00
  • 9c560b009a [Release] Add 0.9.2rc1 release note (#1725) wangxiyuan 2025-07-11 17:36:05 +08:00
  • 1b4a2f3817 [CI] Add accuracy ci for DP and EP and TP and ETP (#1140) zhangxinyuehfad 2025-07-11 17:25:17 +08:00
  • d13fb0766e [Perf] add patch to optimize apply_topk_topp (#1732) Pr0Wh1teGivee 2025-07-11 15:32:02 +08:00
  • aa4240c67f Support pipeline parallel in V1 Engine (#1700) weiguihua2 2025-07-11 15:30:51 +08:00
  • 1cd27da5fb [Test] Remove VLLM_USE_V1 in accuracy test (#1739) zhangxinyuehfad 2025-07-11 15:29:11 +08:00
  • ee40d3d850 use npu_moe_gating_top_k_softmax (#1355) ttanzhiqiang 2025-07-11 08:55:06 +08:00
  • 9d16c9982e rm router logits Improve TTOP 3ms (#1407) ttanzhiqiang 2025-07-11 08:53:17 +08:00
  • 0fc9b56d40 [Perf] Improve MLA multistream performance (#1353) ApsarasX 2025-07-11 08:51:17 +08:00
  • cc210f46e6 [AscendScheduler][Bugfix] Remove num_draft_tokens while allocating slots (#1718) Mengqing Cao 2025-07-10 18:47:45 +08:00
  • 011fd73a48 [CI] Make CI tracker more clear (#1720) wangxiyuan 2025-07-10 16:03:23 +08:00
  • 3d1e6a5929 [Doc] Update user doc index (#1581) wangxiyuan 2025-07-10 14:26:59 +08:00
  • c7446438a9 [1/N][CI] Move linting system to pre-commits hooks (#1256) Li Wang 2025-07-10 14:17:15 +08:00
  • 643e6f5486 [Bugfix] Fix accuracy problem caused by mask pollution (#1678) ApsarasX 2025-07-10 14:06:49 +08:00
  • 60519c71bd shared_experts+router_experts merge all_reduce(Improve TTOP 5ms) (#1395) ttanzhiqiang 2025-07-10 12:07:05 +08:00
  • 997f156a51 Use ci_vllm_version when recording vLLM commit (#1689) Yikun Jiang 2025-07-10 11:07:27 +08:00
  • 89c1a0f006 [Bugfix] Fix memory-leak caused by dist._functional_collectives.reduce_scatter_tensor (#1380) ApsarasX 2025-07-10 10:57:24 +08:00
  • b1c66b211f [CI] Fix lint in CI (#1712) Mengqing Cao 2025-07-10 10:47:18 +08:00
  • 0c4aa2b4f1 [Doc] Add multi node data parallel doc (#1685) Li Wang 2025-07-10 09:36:37 +08:00
  • b4b19ea588 [Doc] Add multi-npu qwen3-MoE-32B Tutorials (#1419) leo-pony 2025-07-10 09:06:51 +08:00
  • 3ef45d0cc2 feat: Improve the offline_inference npu v0/v1 scripts (#1669) xleoken 2025-07-09 17:03:53 +08:00
  • 6af35f60cc [Bugfix][CI] Remove V0 Spec Decode CI (#1656) Shanshan Shen 2025-07-09 15:53:58 +08:00
  • b979ee353d [Misc] Code clean up (#1679) wangxiyuan 2025-07-09 14:33:40 +08:00
  • 392fd7239b [Misc] Add attention mask (#1673) wangxiyuan 2025-07-09 09:12:03 +08:00
  • cc1588be50 [Misc] Code clean up (#1674) wangxiyuan 2025-07-09 08:54:12 +08:00
  • 830332ebfc Clean up v0.9.1 code (#1672) wangxiyuan 2025-07-09 08:52:24 +08:00
  • 0d4bc03946 Fix wheel glibc version incompatibility (#1582) Icey 2025-07-08 18:46:02 +08:00
  • e4e9ea02ab Upgrade vLLM version to v0.9.2 (#1652) Yikun Jiang 2025-07-08 14:18:17 +08:00
  • 71de52d3a9 feat: add kv cache memory cache and skip dynamo guard (#1549) NeverRaR 2025-07-07 22:37:14 +08:00
  • df84cceca8 perf: use multicast to avoid padding decode request to prefill size (#1555) NeverRaR 2025-07-07 22:36:03 +08:00
  • f08c4f15a2 fix spell error (#1654) wm901115nwpu 2025-07-07 20:24:42 +08:00
  • f2a20393a2 [CI] Fix mypy check in CI (#1655) Mengqing Cao 2025-07-07 20:19:16 +08:00
  • 18495f44b2 [BugFix] Fix max_num_tokens_across_dp calculation bugs in attention_v1_torchair (#1636) Angazenn 2025-07-07 20:03:02 +08:00
  • 9c886d0a1f [EPLB] support deepseek eplb strategy (#1196) Zheng Wengang 2025-07-07 17:22:08 +08:00
  • 4e29c5a808 Add ut for test_pooling_model_runner.py (#1640) wangyanhui-cmss 2025-07-07 17:12:11 +08:00
  • 493768eb30 Record vLLM commit in PR description (#1623) Yikun Jiang 2025-07-07 10:20:38 +08:00
  • 7efa4e92fe [CI] Fix oom in chunk prefill (#1622) Mengqing Cao 2025-07-07 10:14:40 +08:00
  • c58accc15e [Bugfix] Support Qwen3-MOE on aclgraph mode (#1381) ApsarasX 2025-07-06 15:29:36 +08:00
  • 14373f65d7 [Test] Remove V0 accuracy test and enable MoE and VL test on V1 (#1574) zhangxinyuehfad 2025-07-06 11:10:19 +08:00
  • 0c1d239df4 Add unit test local cpu guide and enable base testcase (#1566) Yikun Jiang 2025-07-06 10:42:27 +08:00
  • eb390545ec [Performance] Disable JIT and nd2nz to improve performance for Altlas 300I series (#1591) Vincent Yuan 2025-07-05 16:29:21 +08:00
  • dd22ac38b2 [CI/UT][Refactor] move e2e spec decode and deepseek acc test to per pr (#1136) Mengqing Cao 2025-07-04 18:05:45 +08:00
  • 343955c7ac [CI] Follow vLLM FusedMoEParallelConfig interface change and clean up unused config (#1625) wangxiyuan 2025-07-04 17:54:33 +08:00
  • 4e910186de [CI/UT] Unify model usage via ModelScope in CI (#1207) zhangxinyuehfad 2025-07-04 10:52:17 +08:00
  • a5f33590d3 [CORE]initial support for torchair with non-mla backend (#1506) Angazenn 2025-07-03 22:21:42 +08:00
  • 9fbd8017c0 [Quantization]300I Duo support w8a8 quantization (#1560) Angazenn 2025-07-03 22:12:46 +08:00
  • 6d7cb14a24 Fix lint in examples/offline_embed.py (#1618) Yikun Jiang 2025-07-03 21:40:29 +08:00
  • e511ddd67d [Bug] Fix wrong modescope env set order (#1611) xleoken 2025-07-03 18:50:53 +08:00
  • a45dfde283 [CI] Fix FusedMoEConfig and input batch failure to recover CI (#1602) wangxiyuan 2025-07-03 18:36:17 +08:00
  • d96da1f00c [DOC] Fix word spelling (#1595) yupeng 2025-07-02 21:42:39 +08:00
  • 9fb3d558e5 [Test]Add unit test for platform.py (#1476) zhanghw0354 2025-07-02 17:46:06 +08:00
  • 30bf7014d0 [Bugfix] Add func swap_states to fix MLA attention (#1580) Li Wang 2025-07-02 17:42:53 +08:00
  • 59237ea788 [CI/UT] Add test for chunk prefill and prefix cache on v1/AscendScheduler (#1505) Mengqing Cao 2025-07-02 16:57:03 +08:00
  • 6b80c5acba Fix W8A8 fused moe bug (#1529) Zhu Yi Lin 2025-07-02 16:40:51 +08:00
  • 7fc1a98489 add ut for kv tansfer module (#1531) Agonixiaoxiao 2025-07-02 16:14:52 +08:00
  • aa5fa07478 Only enable single version for wheel pr build (#1571) Yikun Jiang 2025-07-02 14:50:34 +08:00
  • c3c8c9317c [DOC] add LoRA user guide (#1265) yupeng 2025-07-02 14:41:31 +08:00
  • f39365d2ea [Benchmark] Fix error msg upload in performance benchmark (#1559) Li Wang 2025-07-02 14:06:08 +08:00
  • 641a4e6092 [CI] Cache sampled token ids in model runner to fix CI error (#1573) wangxiyuan 2025-07-02 12:11:14 +08:00
  • 0e43813120 [ModelRunner] Use shared CachedRequestData cross request to fix ci (#1546) Pleaplusone 2025-07-02 06:05:21 +08:00
  • 6db7dc2c85 [Benchmark] Refactor perf script to use benchmark cli (#1524) Li Wang 2025-06-30 23:42:04 +08:00
  • 53ec583bbb [Docs] Update Altlas 300I series doc and fix CI lint (#1537) leo-pony 2025-06-30 23:34:00 +08:00
  • a054f0f4ca [CI] change to new ds model (#1513) wangxiyuan 2025-06-30 19:02:29 +08:00
  • 8013634e9c [Structured Output] Remove redundant check for grammar_bitmask (#1459) Shanshan Shen 2025-06-30 17:39:19 +08:00
  • ba577dfc52 [Doc] Add Structured Output guide (#1499) Shanshan Shen 2025-06-30 17:21:44 +08:00
  • f286265791 [BugFix] Address PrefillCacheHit state to fix prefix cache accuracy bug (#1498) whx 2025-06-30 16:51:20 +08:00
  • 5f8241c25c [V1][ModelRunner] Support pooling model for v1 engine (#1359) Li Wang 2025-06-30 16:31:12 +08:00
  • 790c810bf7 Bump actions/github-script from 6 to 7 (#1519) dependabot[bot] 2025-06-30 16:04:41 +08:00
  • e4df0a4395 Add Pangu MoE Pro for 300I series docs (#1516) Yikun Jiang 2025-06-30 13:37:22 +08:00
  • cad4c693c6 Add Pangu MoE Pro docs (#1512) Yikun Jiang 2025-06-30 12:15:33 +08:00
  • 75d05ee200 [Core] Fix block table shape to make Prefix cache work with Ascend scheduler (#1446) yiz-liu 2025-06-30 11:25:19 +08:00
  • b308a7a258 support pangumoe w8a8c8 and docs (#1477) Zhu Yi Lin 2025-06-28 18:51:07 +08:00
  • c59d69d9e6 [PERF]support MERRouter (#1421) Angazenn 2025-06-28 16:14:49 +08:00
  • 8fa188111d [PERF]support H2P communication optimization for PanguProMoe (#1463) Angazenn 2025-06-28 16:10:27 +08:00
  • 5c53cbaf2a [BugFix]Fix bugs when initializing communication groups with dp on 300I Duo (#1478) Angazenn 2025-06-28 16:07:52 +08:00
  • 2cf9c4c3a2 [CI/Build] Fix version conflict on transformers (#1490) Mengqing Cao 2025-06-28 15:11:04 +08:00
  • 5f4391652f [PromptLogprobs][V1] Support prompt logprobs to fix ceval accuracy in V1 (#1483) Mengqing Cao 2025-06-28 09:38:52 +08:00
  • 99e685532d [Doc] Add Qwen2.5-VL eager mode doc (#1394) Shanshan Shen 2025-06-28 09:08:51 +08:00
  • d59e7fa095 [CI] Pin transformers<4.53.0 and fix EPLB load_weights to make CI passed (#1482) Mengqing Cao 2025-06-28 00:12:43 +08:00
  • 3687676fa7 [Doc] Add guidance on how to implement and register new models (#1426) Shanshan Shen 2025-06-27 16:46:49 +08:00
  • 5571fb7118 [Misc] Add release checklist issue template (#1447) wangxiyuan 2025-06-27 09:15:36 +08:00
  • 5968dff4e0 [Build] Add build info (#1386) wangxiyuan 2025-06-27 09:14:43 +08:00
  • c563a08f0a [CI] Fix nightly benchmark (#1453) Li Wang 2025-06-26 19:39:18 +08:00
  • 192dbbcc6e Optimize Patch developer guide (#1452) Zesheng Zong 2025-06-26 19:10:16 +08:00
  • e5eea64b66 [CI/UT] Add ut for parallel_state.py (#1460) wangyanhui-cmss 2025-06-26 19:03:27 +08:00
  • 4e2daf5ab7 [Doc] Add qwen2-audio eager mode tutorial (#1371) Shanshan Shen 2025-06-26 16:56:05 +08:00
  • 1025344912 Doc Enhancement: Single NPU(Qwen3-8B) aclgraph mode + eager mode (#1374) leo-pony 2025-06-26 16:52:54 +08:00
  • 53c2d58ae1 Handle with_prefill_across_dp for multistream mla (#1322) sdmyzlp 2025-06-26 09:32:07 +08:00
  • 2690697caa [Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 (#1416) yiz-liu 2025-06-26 09:27:43 +08:00
  • 06ccce1ddf [FOLLOWUP] fix name and format in accuracy test (#1288) (#1435) zhangxinyuehfad 2025-06-26 00:26:54 +08:00
  • 2fda60464c [Perf] Use fused ops npu_top_k_top_p (#1308) Pr0Wh1teGivee 2025-06-25 20:59:06 +08:00
  • e7efc7e7e7 [BugFix] Remove not using patch_eagle.py for CI. (#1385) yuancaoyaoHW 2025-06-25 20:36:05 +08:00
  • 941269a6c5 adjusting the communication method in graph mode (#1194) sharonyunyun 2025-06-25 19:56:49 +08:00
  • 205cb85a1e [Doc] Fix doc typo (#1424) wangxiyuan 2025-06-25 19:28:26 +08:00
  • ca884ef86d [Misc] Clean up uesless code for LLM initialize (#1373) wangxiyuan 2025-06-25 16:20:14 +08:00
  • 0060886a37 [CI]Update accuracy report test (#1288) zhangxinyuehfad 2025-06-25 14:10:34 +08:00
  • 15df8be937 [Doc] Add sleep mode doc (#1295) Li Wang 2025-06-25 14:07:14 +08:00