Commit Graph

  • dd22ac38b2 [CI/UT][Refactor] move e2e spec decode and deepseek acc test to per pr (#1136) Mengqing Cao 2025-07-04 18:05:45 +08:00
  • 343955c7ac [CI] Follow vLLM FusedMoEParallelConfig interface change and clean up unused config (#1625) wangxiyuan 2025-07-04 17:54:33 +08:00
  • 4e910186de [CI/UT] Unify model usage via ModelScope in CI (#1207) zhangxinyuehfad 2025-07-04 10:52:17 +08:00
  • a5f33590d3 [CORE]initial support for torchair with non-mla backend (#1506) Angazenn 2025-07-03 22:21:42 +08:00
  • 9fbd8017c0 [Quantization]300I Duo support w8a8 quantization (#1560) Angazenn 2025-07-03 22:12:46 +08:00
  • 6d7cb14a24 Fix lint in examples/offline_embed.py (#1618) Yikun Jiang 2025-07-03 21:40:29 +08:00
  • e511ddd67d [Bug] Fix wrong modescope env set order (#1611) xleoken 2025-07-03 18:50:53 +08:00
  • a45dfde283 [CI] Fix FusedMoEConfig and input batch failure to recover CI (#1602) wangxiyuan 2025-07-03 18:36:17 +08:00
  • d96da1f00c [DOC] Fix word spelling (#1595) yupeng 2025-07-02 21:42:39 +08:00
  • 9fb3d558e5 [Test]Add unit test for platform.py (#1476) zhanghw0354 2025-07-02 17:46:06 +08:00
  • 30bf7014d0 [Bugfix] Add func swap_states to fix MLA attention (#1580) Li Wang 2025-07-02 17:42:53 +08:00
  • 59237ea788 [CI/UT] Add test for chunk prefill and prefix cache on v1/AscendScheduler (#1505) Mengqing Cao 2025-07-02 16:57:03 +08:00
  • 6b80c5acba Fix W8A8 fused moe bug (#1529) Zhu Yi Lin 2025-07-02 16:40:51 +08:00
  • 7fc1a98489 add ut for kv tansfer module (#1531) Agonixiaoxiao 2025-07-02 16:14:52 +08:00
  • aa5fa07478 Only enable single version for wheel pr build (#1571) Yikun Jiang 2025-07-02 14:50:34 +08:00
  • c3c8c9317c [DOC] add LoRA user guide (#1265) yupeng 2025-07-02 14:41:31 +08:00
  • f39365d2ea [Benchmark] Fix error msg upload in performance benchmark (#1559) Li Wang 2025-07-02 14:06:08 +08:00
  • 641a4e6092 [CI] Cache sampled token ids in model runner to fix CI error (#1573) wangxiyuan 2025-07-02 12:11:14 +08:00
  • 0e43813120 [ModelRunner] Use shared CachedRequestData cross request to fix ci (#1546) Pleaplusone 2025-07-02 06:05:21 +08:00
  • 6db7dc2c85 [Benchmark] Refactor perf script to use benchmark cli (#1524) Li Wang 2025-06-30 23:42:04 +08:00
  • 53ec583bbb [Docs] Update Altlas 300I series doc and fix CI lint (#1537) leo-pony 2025-06-30 23:34:00 +08:00
  • a054f0f4ca [CI] change to new ds model (#1513) wangxiyuan 2025-06-30 19:02:29 +08:00
  • 8013634e9c [Structured Output] Remove redundant check for grammar_bitmask (#1459) Shanshan Shen 2025-06-30 17:39:19 +08:00
  • ba577dfc52 [Doc] Add Structured Output guide (#1499) Shanshan Shen 2025-06-30 17:21:44 +08:00
  • f286265791 [BugFix] Address PrefillCacheHit state to fix prefix cache accuracy bug (#1498) whx 2025-06-30 16:51:20 +08:00
  • 5f8241c25c [V1][ModelRunner] Support pooling model for v1 engine (#1359) Li Wang 2025-06-30 16:31:12 +08:00
  • 790c810bf7 Bump actions/github-script from 6 to 7 (#1519) dependabot[bot] 2025-06-30 16:04:41 +08:00
  • e4df0a4395 Add Pangu MoE Pro for 300I series docs (#1516) Yikun Jiang 2025-06-30 13:37:22 +08:00
  • cad4c693c6 Add Pangu MoE Pro docs (#1512) Yikun Jiang 2025-06-30 12:15:33 +08:00
  • 75d05ee200 [Core] Fix block table shape to make Prefix cache work with Ascend scheduler (#1446) yiz-liu 2025-06-30 11:25:19 +08:00
  • b308a7a258 support pangumoe w8a8c8 and docs (#1477) Zhu Yi Lin 2025-06-28 18:51:07 +08:00
  • c59d69d9e6 [PERF]support MERRouter (#1421) Angazenn 2025-06-28 16:14:49 +08:00
  • 8fa188111d [PERF]support H2P communication optimization for PanguProMoe (#1463) Angazenn 2025-06-28 16:10:27 +08:00
  • 5c53cbaf2a [BugFix]Fix bugs when initializing communication groups with dp on 300I Duo (#1478) Angazenn 2025-06-28 16:07:52 +08:00
  • 2cf9c4c3a2 [CI/Build] Fix version conflict on transformers (#1490) Mengqing Cao 2025-06-28 15:11:04 +08:00
  • 5f4391652f [PromptLogprobs][V1] Support prompt logprobs to fix ceval accuracy in V1 (#1483) Mengqing Cao 2025-06-28 09:38:52 +08:00
  • 99e685532d [Doc] Add Qwen2.5-VL eager mode doc (#1394) Shanshan Shen 2025-06-28 09:08:51 +08:00
  • d59e7fa095 [CI] Pin transformers<4.53.0 and fix EPLB load_weights to make CI passed (#1482) Mengqing Cao 2025-06-28 00:12:43 +08:00
  • 3687676fa7 [Doc] Add guidance on how to implement and register new models (#1426) Shanshan Shen 2025-06-27 16:46:49 +08:00
  • 5571fb7118 [Misc] Add release checklist issue template (#1447) wangxiyuan 2025-06-27 09:15:36 +08:00
  • 5968dff4e0 [Build] Add build info (#1386) wangxiyuan 2025-06-27 09:14:43 +08:00
  • c563a08f0a [CI] Fix nightly benchmark (#1453) Li Wang 2025-06-26 19:39:18 +08:00
  • 192dbbcc6e Optimize Patch developer guide (#1452) Zesheng Zong 2025-06-26 19:10:16 +08:00
  • e5eea64b66 [CI/UT] Add ut for parallel_state.py (#1460) wangyanhui-cmss 2025-06-26 19:03:27 +08:00
  • 4e2daf5ab7 [Doc] Add qwen2-audio eager mode tutorial (#1371) Shanshan Shen 2025-06-26 16:56:05 +08:00
  • 1025344912 Doc Enhancement: Single NPU(Qwen3-8B) aclgraph mode + eager mode (#1374) leo-pony 2025-06-26 16:52:54 +08:00
  • 53c2d58ae1 Handle with_prefill_across_dp for multistream mla (#1322) sdmyzlp 2025-06-26 09:32:07 +08:00
  • 2690697caa [Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 (#1416) yiz-liu 2025-06-26 09:27:43 +08:00
  • 06ccce1ddf [FOLLOWUP] fix name and format in accuracy test (#1288) (#1435) zhangxinyuehfad 2025-06-26 00:26:54 +08:00
  • 2fda60464c [Perf] Use fused ops npu_top_k_top_p (#1308) Pr0Wh1teGivee 2025-06-25 20:59:06 +08:00
  • e7efc7e7e7 [BugFix] Remove not using patch_eagle.py for CI. (#1385) yuancaoyaoHW 2025-06-25 20:36:05 +08:00
  • 941269a6c5 adjusting the communication method in graph mode (#1194) sharonyunyun 2025-06-25 19:56:49 +08:00
  • 205cb85a1e [Doc] Fix doc typo (#1424) wangxiyuan 2025-06-25 19:28:26 +08:00
  • ca884ef86d [Misc] Clean up uesless code for LLM initialize (#1373) wangxiyuan 2025-06-25 16:20:14 +08:00
  • 0060886a37 [CI]Update accuracy report test (#1288) zhangxinyuehfad 2025-06-25 14:10:34 +08:00
  • 15df8be937 [Doc] Add sleep mode doc (#1295) Li Wang 2025-06-25 14:07:14 +08:00
  • e4e0b7af05 [Doc] Add patch doc (#1414) wangxiyuan 2025-06-25 12:00:45 +08:00
  • 52317f92cb [DP] Tiny fix of dp and update example (#1273) Mengqing Cao 2025-06-25 11:03:04 +08:00
  • c1c5d56255 [Doc] Update FAQ and add test guidance (#1360) Mengqing Cao 2025-06-25 09:59:23 +08:00
  • 5f5800ba42 [Bugfix] Sync MRotaryEmbedding interface change to recover CI (#1399) Li Wang 2025-06-24 22:56:39 +08:00
  • 6ed3f00427 [Doc] remove environment variable VLLM_ENABLE_MC2 (#1406) liziyu 2025-06-24 21:18:10 +08:00
  • 20767a043c [CI/UT] Fix disaggregated prefill ci (#1313) Mengqing Cao 2025-06-24 17:11:00 +08:00
  • 9cbce423ce [MISC] Remove useless patch (#1366) wangxiyuan 2025-06-24 10:05:59 +08:00
  • 5177bef87a support fused_moe_allgather_ep (#1335) lyj-jjj 2025-06-23 22:03:38 +08:00
  • 917c6b71af [TEST][DOC] Fix doctest and add system package installation (#1375) Yikun Jiang 2025-06-23 20:50:33 +08:00
  • 08cfc7cb4b Modify installation.md for adding pip extra index of torch-npu (#1272) Icey 2025-06-23 15:37:50 +08:00
  • e1123172d1 [Doc] Add reinstall instructions doc (#1303) weiguihua2 2025-06-23 14:06:27 +08:00
  • 15592c0d48 [bugfix] fix accuracy prolem for deepseek V3/R1 models with torchair graph in long sequence predictions (#1331) linfeng-yuan 2025-06-23 09:52:27 +08:00
  • f04c6763d8 [Bugfix] fix env variable in dbo (#1284) zxdukki 2025-06-23 09:07:57 +08:00
  • 21fb68a03a [CI] Update guided decoding ut (#1312) Shanshan Shen 2025-06-23 09:06:20 +08:00
  • 339d6894f6 [CI/UT][bugfix] fix v0 spec decode (#1321) wemaster 2025-06-23 09:05:13 +08:00
  • 7e6efbf2a9 update torch-npu to 2.5.1.post1.dev20250619 (#1347) Pleaplusone 2025-06-23 09:02:09 +08:00
  • 4447e53d7a [Doc] Change not to no in faqs.md (#1357) xleoken 2025-06-23 09:01:00 +08:00
  • a95afc011e [CI] Enable merge trigger unit test and accuracy test schedule job (#1345) Yikun Jiang 2025-06-22 17:21:57 +08:00
  • 2e5f312530 Cleanup ununsed doc (#1352) Yikun Jiang 2025-06-22 15:05:30 +08:00
  • c30ddb8331 Bump v0.9.1rc1 release (#1349) Yikun Jiang 2025-06-22 13:15:36 +08:00
  • 097e7149f7 [Platform] Add initial experimental support for Altlas 300I series (#1333) Yikun Jiang 2025-06-21 09:00:16 +08:00
  • 2009fdb8da [Test] Enable code cov for V1 and enable push trigger (#1164) Yikun Jiang 2025-06-21 00:01:05 +08:00
  • 2f1266d451 Support Pangu Pro MoE model (#1204) Angazenn 2025-06-20 23:59:59 +08:00
  • 00ae250f3c [V1][eagle3] Support eagle3 proposer for v1 (#1032) yuancaoyaoHW 2025-06-20 17:19:54 +08:00
  • 45be1aac0c [CI] Add codespell check for doc (#1314) wangxiyuan 2025-06-20 16:48:14 +08:00
  • 761bd3d9d7 Add user guide for quantization (#1206) 22dimensions 2025-06-20 15:53:25 +08:00
  • 2c7dd85fd8 [Fix] Fix the token-wise padding mechanism (#1300) yiz-liu 2025-06-20 14:46:17 +08:00
  • b350edae9a [UT] refactor test_expert_load_balancer and fix broken CI (#1293) wangxiyuan 2025-06-20 01:02:52 +08:00
  • ebb2a70dbb static EPLB fix bug, add unit test (#1186) songshanhu07 2025-06-18 19:46:56 +08:00
  • 2cd8ecdc4f [Bugfix][Spec Decode] Enable ACL_OP_INIT_MODE=1 directly only when using V0 spec decode (#1258) Shanshan Shen 2025-06-18 17:50:20 +08:00
  • db2f630aeb [bugfix] fix deepseek with mc2 (#1268) zzzzwwjj 2025-06-18 00:58:38 +08:00
  • d7e19ed57a [BugFix] fix length of sin/cos cache in rope (#1266) whx 2025-06-17 23:14:25 +08:00
  • afc8edb046 [Bugfix]: Pass scaling args to mc2 (#1202) Jade Zheng 2025-06-17 22:16:44 +08:00
  • f8029945c3 [Bugfix] Remove cuda related lines and add additional pip mirror (#1252) Li Wang 2025-06-17 21:25:40 +08:00
  • 23ca68d0c8 [refactor] Refactoring AscendFusedMoE (#1229) zzzzwwjj 2025-06-17 17:49:03 +08:00
  • 05dec7eda9 [Doc] Refactor and init user story page (#1224) Yikun Jiang 2025-06-17 09:36:35 +08:00
  • 9d3cbc0953 [Doctest] add installation doctest (#1179) Yikun Jiang 2025-06-17 08:52:26 +08:00
  • 96fa7ff63b [DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (#1235) Mengqing Cao 2025-06-16 23:09:53 +08:00
  • f5404dc650 Fix the device error when using ray as vllm-acend backend (#884) zhuo97 2025-06-16 21:03:16 +08:00
  • 69b817ed65 [CI] Add unit test framework (#1201) wangxiyuan 2025-06-16 18:32:28 +08:00
  • 966557a2a3 [Build] Speedup image build (#1216) Yikun Jiang 2025-06-16 09:02:53 +08:00
  • 4ce860a2be [CI] Make e2e test to be preemptible and simple (#1217) Yikun Jiang 2025-06-15 22:07:43 +08:00
  • 4270682383 Waiting for BMM NZ support(Improve TPOP 2ms performance) (#1131) ttanzhiqiang 2025-06-15 19:57:02 +08:00
  • 0d2074a1ec [Doc] fix VLLM_USE_V1 value in graph mode docs (#1226) 22dimensions 2025-06-15 15:41:11 +08:00