Commit Graph

  • ff97740b8d Use mirror images (#1912) li chaoran 2025-07-24 10:47:05 +08:00
  • ab7d5aca5d [Test] Add ut for files in /multistream (#1947) SunnyLee151064 2025-07-24 10:42:49 +08:00
  • 34571ea5ae [Test] Add ut for files in /distributed (#1951) SunnyLee151064 2025-07-24 10:36:11 +08:00
  • fa76a9b7bb [Bug] Add prefix parameter to parent class initialization (#1934) JohnJan 2025-07-24 10:28:40 +08:00
  • 2ffe051859 [Test]add ut for deepseek_v2. (#1964) Zac 2025-07-24 10:27:50 +08:00
  • 846555cdb5 [Misc] Clean up uesless code in attention (#1933) wangxiyuan 2025-07-24 10:23:34 +08:00
  • b5ad70e1a6 [Optimize]Change AI Vector core number getting function to glibc ABI free funcition (#1974) leo-pony 2025-07-24 10:00:19 +08:00
  • ac0bf133f4 add ut of fused_moe.py (#1930) shiyuan680 2025-07-23 16:24:09 +08:00
  • ac773aca43 Add UT for Patches (#1766) weichen 2025-07-23 16:07:20 +08:00
  • 326dcf2576 [Doc] Update support feature (#1828) wangxiyuan 2025-07-23 15:19:15 +08:00
  • 3aa3b46bfe [V1][PP] Support pp with ray backend in V1 (#1800) Mengqing Cao 2025-07-23 14:52:52 +08:00
  • 9a3bdf2162 [main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance (#1806) rjg-lyh 2025-07-22 19:03:13 +08:00
  • ce4970eee0 [Test] Add unit test for schedule_config.py (#1590) JohnJan 2025-07-22 11:43:25 +08:00
  • 5f0b42e414 [FOLLOWUP] Use base test to avoid patch everwhere (#1634) Yikun Jiang 2025-07-22 09:03:40 +08:00
  • 33e1ea4d1a [CI] Fix broken CI (#1915) Li Wang 2025-07-22 08:38:30 +08:00
  • 7265dc090d [2/4][Refactor] Refactor torchair utils (#1892) wangxiyuan 2025-07-21 19:43:30 +08:00
  • 957b0b611f [Misc][V0 Deprecation] Remove V0 Model Runner (#1823) Shanshan Shen 2025-07-21 16:35:50 +08:00
  • a66ef39bb6 [Misc][V0 Deprecation] Remove Redundant Offline Distributed Inference Example (#1899) Shanshan Shen 2025-07-21 12:01:45 +08:00
  • af56ae3ed1 [1/4][Refactor] Refactor torchair worker (#1885) wangxiyuan 2025-07-21 11:50:46 +08:00
  • c32eea96b7 [Doc]Add Chinese translation for documentation (#1870) aidoczh 2025-07-21 11:26:27 +08:00
  • 8cfd257992 [Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681) Mengqing Cao 2025-07-21 09:08:04 +08:00
  • a8b316ac5b [CI] Make AttentionBackend interface compatible to fix broken CI (#1893) wangxiyuan 2025-07-21 08:21:06 +08:00
  • 54f2b31184 [Doc] Add a doc for qwen omni (#1867) JohnJan 2025-07-20 09:05:41 +08:00
  • 2b726d8f90 [CI] Fix broken CI (#1889) wangxiyuan 2025-07-20 02:11:57 +08:00
  • 2ee90461d0 Fix e2e data parallel test: add resource release code (#1881) leo-pony 2025-07-19 11:39:48 +08:00
  • b824525be3 Move deepseek_v3 from deepseek_v2.py (#1793) xleoken 2025-07-19 11:37:03 +08:00
  • ab68d31a24 [Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878) Shanshan Shen 2025-07-19 09:42:32 +08:00
  • 53d2ea3789 [Bugfix]Fix the performance gap between 0.9.2rc1 and 0.9.1 (#1811) lianyibo 2025-07-18 23:09:54 +08:00
  • 574fe407eb [1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841) Mengqing Cao 2025-07-18 23:07:14 +08:00
  • 8a91e6e59c [Misc][V0 Deprecation] Remove V0 Related Custom Ops (#1871) Shanshan Shen 2025-07-18 23:06:03 +08:00
  • 3e39d7234c [CI] Switching to infra cache server to reduce network pressure (#1792) li chaoran 2025-07-18 18:39:25 +08:00
  • d08ff304cd [Misc][V0 Deprecation] Remove V0 Attention (#1835) Shanshan Shen 2025-07-18 14:10:13 +08:00
  • 33ef5dc813 add unit test for func wrapper (#1863) xudongLi-cmss 2025-07-18 11:05:17 +08:00
  • f9dfde02fd [Bugfix] Fix broken CI (#1848) Li Wang 2025-07-17 20:10:12 +08:00
  • 538dd357e6 Add graph mode and improve on multi_npu_moge.md (#1849) Zhu Yi Lin 2025-07-17 17:53:37 +08:00
  • aeb5aa8b88 [Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837) Shanshan Shen 2025-07-17 14:13:30 +08:00
  • 19e37cd379 [Misc] Add fusion_result.json to .gitignore (#1836) Shanshan Shen 2025-07-17 11:54:49 +08:00
  • 875a920d4a [Platform] Add support for Altlas A3 series (#1794) Icey 2025-07-17 11:13:02 +08:00
  • ef99fe1c54 [Test] Clean up duplicate test for ascend scheduler (#1819) wangxiyuan 2025-07-16 17:57:48 +08:00
  • c66b0827a7 [Misc][V0 Deprecation] Remove Pooling Model Runner (#1824) Shanshan Shen 2025-07-16 17:48:21 +08:00
  • ba7e934b21 Remove redundant empty lines in commit msg (#1814) Yikun Jiang 2025-07-16 16:50:44 +08:00
  • 06655002c5 [Misc][V0 Deprecation] Remove V0 Worker (#1821) Shanshan Shen 2025-07-16 14:07:17 +08:00
  • b005def0a5 [Misc][V0 Deprecation] Remove Multi-Step Model Runner (#1820) Shanshan Shen 2025-07-16 14:06:49 +08:00
  • f9e2e9bb31 [Misc][V0 Deprecation] Remove Draft Model Runner Used for V0 Spec Decode (#1810) Shanshan Shen 2025-07-16 10:51:23 +08:00
  • f96100fad5 [Misc][V0 Deprecation] Remove V0 related codes of test, example, platform (#1805) Shanshan Shen 2025-07-15 19:58:55 +08:00
  • a929699e98 [Misc][V0 Deprecation] Remove multi-step worker (#1809) Shanshan Shen 2025-07-15 19:48:47 +08:00
  • bf2549856f [CI] Fix changes CI to recover codecov (#1799) wangxiyuan 2025-07-15 15:01:13 +08:00
  • 787010a637 [Test] Remove VLLM_USE_V1 in example and tests (#1733) wangxiyuan 2025-07-15 12:49:57 +08:00
  • eb921d2b6f [Doc] Fix 404 error (#1797) wangxiyuan 2025-07-15 11:52:38 +08:00
  • 7bdada58eb [Misc] Remove VLLM_USE_V1 usage in code (#1764) wangxiyuan 2025-07-15 11:52:16 +08:00
  • 494b0f474f [CI]Fix broken CI (#1773) wangxiyuan 2025-07-15 00:54:20 +08:00
  • afcfe91dfa [Doc] Fix multi node doc (#1783) Li Wang 2025-07-14 17:56:57 +08:00
  • cabfb2bc31 [Test] Resolve vllm-ascend version accuracy test (#1769) zhangxinyuehfad 2025-07-14 15:43:37 +08:00
  • d3c6dd985a [Misc] Add include dir to .gitignore (#1771) Shanshan Shen 2025-07-14 12:05:29 +08:00
  • 9cd4ac76a1 [CI] Remove benchmark patch and increase the scheduler frequency (#1762) Li Wang 2025-07-13 20:00:35 +08:00
  • d118bf8a26 Update README.zh.md to fix typo (#1758) Yikun Jiang 2025-07-12 14:01:34 +08:00
  • eff4b5791c Recover offline_inference_npu.py to make doctest passed (#1756) Yikun Jiang 2025-07-12 12:36:35 +08:00
  • 8b3a483269 Add recommend version and refresh readme / contribution.md (#1757) Yikun Jiang 2025-07-12 12:35:40 +08:00
  • 3c404de1b1 [Release]Update release note (#1753) wangxiyuan 2025-07-11 17:58:26 +08:00
  • b5b7e0ecc7 [Doc] Add qwen3 embedding 8b guide (#1734) wangxiyuan 2025-07-11 17:40:17 +08:00
  • 9c560b009a [Release] Add 0.9.2rc1 release note (#1725) wangxiyuan 2025-07-11 17:36:05 +08:00
  • 1b4a2f3817 [CI] Add accuracy ci for DP and EP and TP and ETP (#1140) zhangxinyuehfad 2025-07-11 17:25:17 +08:00
  • d13fb0766e [Perf] add patch to optimize apply_topk_topp (#1732) Pr0Wh1teGivee 2025-07-11 15:32:02 +08:00
  • aa4240c67f Support pipeline parallel in V1 Engine (#1700) weiguihua2 2025-07-11 15:30:51 +08:00
  • 1cd27da5fb [Test] Remove VLLM_USE_V1 in accuracy test (#1739) zhangxinyuehfad 2025-07-11 15:29:11 +08:00
  • ee40d3d850 use npu_moe_gating_top_k_softmax (#1355) ttanzhiqiang 2025-07-11 08:55:06 +08:00
  • 9d16c9982e rm router logits Improve TTOP 3ms (#1407) ttanzhiqiang 2025-07-11 08:53:17 +08:00
  • 0fc9b56d40 [Perf] Improve MLA multistream performance (#1353) ApsarasX 2025-07-11 08:51:17 +08:00
  • cc210f46e6 [AscendScheduler][Bugfix] Remove num_draft_tokens while allocating slots (#1718) Mengqing Cao 2025-07-10 18:47:45 +08:00
  • 011fd73a48 [CI] Make CI tracker more clear (#1720) wangxiyuan 2025-07-10 16:03:23 +08:00
  • 3d1e6a5929 [Doc] Update user doc index (#1581) wangxiyuan 2025-07-10 14:26:59 +08:00
  • c7446438a9 [1/N][CI] Move linting system to pre-commits hooks (#1256) Li Wang 2025-07-10 14:17:15 +08:00
  • 643e6f5486 [Bugfix] Fix accuracy problem caused by mask pollution (#1678) ApsarasX 2025-07-10 14:06:49 +08:00
  • 60519c71bd shared_experts+router_experts merge all_reduce(Improve TTOP 5ms) (#1395) ttanzhiqiang 2025-07-10 12:07:05 +08:00
  • 997f156a51 Use ci_vllm_version when recording vLLM commit (#1689) Yikun Jiang 2025-07-10 11:07:27 +08:00
  • 89c1a0f006 [Bugfix] Fix memory-leak caused by dist._functional_collectives.reduce_scatter_tensor (#1380) ApsarasX 2025-07-10 10:57:24 +08:00
  • b1c66b211f [CI] Fix lint in CI (#1712) Mengqing Cao 2025-07-10 10:47:18 +08:00
  • 0c4aa2b4f1 [Doc] Add multi node data parallel doc (#1685) Li Wang 2025-07-10 09:36:37 +08:00
  • b4b19ea588 [Doc] Add multi-npu qwen3-MoE-32B Tutorials (#1419) leo-pony 2025-07-10 09:06:51 +08:00
  • 3ef45d0cc2 feat: Improve the offline_inference npu v0/v1 scripts (#1669) xleoken 2025-07-09 17:03:53 +08:00
  • 6af35f60cc [Bugfix][CI] Remove V0 Spec Decode CI (#1656) Shanshan Shen 2025-07-09 15:53:58 +08:00
  • b979ee353d [Misc] Code clean up (#1679) wangxiyuan 2025-07-09 14:33:40 +08:00
  • 392fd7239b [Misc] Add attention mask (#1673) wangxiyuan 2025-07-09 09:12:03 +08:00
  • cc1588be50 [Misc] Code clean up (#1674) wangxiyuan 2025-07-09 08:54:12 +08:00
  • 830332ebfc Clean up v0.9.1 code (#1672) wangxiyuan 2025-07-09 08:52:24 +08:00
  • 0d4bc03946 Fix wheel glibc version incompatibility (#1582) Icey 2025-07-08 18:46:02 +08:00
  • e4e9ea02ab Upgrade vLLM version to v0.9.2 (#1652) Yikun Jiang 2025-07-08 14:18:17 +08:00
  • 71de52d3a9 feat: add kv cache memory cache and skip dynamo guard (#1549) NeverRaR 2025-07-07 22:37:14 +08:00
  • df84cceca8 perf: use multicast to avoid padding decode request to prefill size (#1555) NeverRaR 2025-07-07 22:36:03 +08:00
  • f08c4f15a2 fix spell error (#1654) wm901115nwpu 2025-07-07 20:24:42 +08:00
  • f2a20393a2 [CI] Fix mypy check in CI (#1655) Mengqing Cao 2025-07-07 20:19:16 +08:00
  • 18495f44b2 [BugFix] Fix max_num_tokens_across_dp calculation bugs in attention_v1_torchair (#1636) Angazenn 2025-07-07 20:03:02 +08:00
  • 9c886d0a1f [EPLB] support deepseek eplb strategy (#1196) Zheng Wengang 2025-07-07 17:22:08 +08:00
  • 4e29c5a808 Add ut for test_pooling_model_runner.py (#1640) wangyanhui-cmss 2025-07-07 17:12:11 +08:00
  • 493768eb30 Record vLLM commit in PR description (#1623) Yikun Jiang 2025-07-07 10:20:38 +08:00
  • 7efa4e92fe [CI] Fix oom in chunk prefill (#1622) Mengqing Cao 2025-07-07 10:14:40 +08:00
  • c58accc15e [Bugfix] Support Qwen3-MOE on aclgraph mode (#1381) ApsarasX 2025-07-06 15:29:36 +08:00
  • 14373f65d7 [Test] Remove V0 accuracy test and enable MoE and VL test on V1 (#1574) zhangxinyuehfad 2025-07-06 11:10:19 +08:00
  • 0c1d239df4 Add unit test local cpu guide and enable base testcase (#1566) Yikun Jiang 2025-07-06 10:42:27 +08:00
  • eb390545ec [Performance] Disable JIT and nd2nz to improve performance for Altlas 300I series (#1591) Vincent Yuan 2025-07-05 16:29:21 +08:00