Commit Graph

  • 7c90ba5fe8 [Test] add ut for decorator.py/deepseek_mtp.py (#2127) CaranLic 2025-07-31 15:21:15 +08:00
  • 6192bc95c0 [Bugfix] fix tensor not same device in qwen2_5_vl_without_padding (#2051) Joey Gao 2025-07-31 15:18:54 +08:00
  • 72eceff94d [Bugfix] grammar_bitmask IndexError caused by outdated apply_grammar_bitmask method (#2022) ApsarasX 2025-07-31 09:03:27 +08:00
  • 75e28d0356 [Build][Ray] Fix protobuf version in Dockerfile (#2028) Mengqing Cao 2025-07-30 22:49:20 +08:00
  • 3386e09a40 ut:add ut for qwen2_vl.py (#2096) Ronald1995 2025-07-30 22:31:47 +08:00
  • 936df1cb9b [Doc] Fix cann related urls (#2106) Mengqing Cao 2025-07-30 22:31:30 +08:00
  • 4fcca137a7 [main][Feature] Support Qwen3 W4A8 quantization (#2060) Ruri 2025-07-30 14:57:14 +08:00
  • 6874d666fa [CI]Add e2e test for 310p (#1879) zhangxinyuehfad 2025-07-30 14:52:16 +08:00
  • 34dd24adf2 add ut for vocab_parallel_embedding (#2067) YuanCheng-coder 2025-07-30 14:35:45 +08:00
  • d9f82ebfce [misc] Add reminder comment when PR submitted (#2092) Yikun Jiang 2025-07-30 10:14:33 +08:00
  • 1dbb888275 [Bugfix] LoRA logits einsum dimension mismatch in add_lora_logits (#1583) hongfugui 2025-07-30 09:50:36 +08:00
  • d80b0cca5d [CI] Fix test on pyhccl to 2 cards (#2094) Mengqing Cao 2025-07-30 09:08:00 +08:00
  • 9b67c87b14 [Refactor]Refactor sampler (#2050) wangxiyuan 2025-07-30 08:47:22 +08:00
  • b6a7f07c70 [Perf][MoE] Improve MoE multistream parallel performace. (#1891) whx 2025-07-29 23:53:19 +08:00
  • 4df8e0027c [e2e]Fixed the issue that pyhccl e2e cannot run continuously with other tests (#1246) leo-pony 2025-07-29 19:38:30 +08:00
  • 61fc35184b [Doc] Add performance tuning doc to main (#1392) Shanshan Shen 2025-07-29 19:36:34 +08:00
  • 540336edc9 Add Custom Kernels For LoRA Performance (#1884) taoxudonghaha 2025-07-29 19:27:50 +08:00
  • 2da281ec5a bump default python version to 3.11 (#2072) TaoYu Chen 2025-07-29 19:07:17 +08:00
  • f60bb474f9 [CI] Enable linux-aarch64-a2 (64GB) and tp2 * 2 max-parallel to speed up CI (#2065) Li Wang 2025-07-29 18:59:05 +08:00
  • ca8007f584 [Feature] Enable inference support for Deepseekr1-w8a8-MTP (#1994) curryliu 2025-07-29 18:51:57 +08:00
  • 98cadc2146 [Perf] Avoid performing index selection of sin/cos cache every layer (#1890) whx 2025-07-29 18:06:45 +08:00
  • 0190b68f51 [Misc]Remove PD v0 code (#2047) wangxiyuan 2025-07-28 19:09:22 +08:00
  • 935e9d4c9d Pin transformers to fix v0.9.1 doctest (#2048) Yikun Jiang 2025-07-28 17:51:56 +08:00
  • 1a25b0a2dd [Test] add ut for qwen3_moe.py (#2055) huangxialu 2025-07-28 17:37:13 +08:00
  • e7d32ed3f1 [BugFix] Fix the problem that torchair doesn't support tp > 4. (#1508) whx 2025-07-28 16:48:05 +08:00
  • 4a008c4dac [Misc]Clean up useless import from vllm (#2049) wangxiyuan 2025-07-28 16:01:59 +08:00
  • 34cfdf5520 [Misc] Fix logger bug (#2024) wangxiyuan 2025-07-28 15:59:09 +08:00
  • 3ad582c9a9 [Test] Add ut for files in /attention (#1944) LeeWenquan 2025-07-28 15:54:40 +08:00
  • 32a9c5f694 [Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926) Ronald1995 2025-07-28 15:13:37 +08:00
  • ba3dfbd59e [main][refactor] Refactoring forward_context and model_runner_v1 (#1979) zzzzwwjj 2025-07-28 14:06:20 +08:00
  • e3a2443c3a [main][Doc] add mla pertoken quantization FAQ (#2018) Wang Kunpeng 2025-07-27 08:47:51 +08:00
  • 5b579ddafe Upgrade CANN to 8.2.RC1 (A3) (#2043) Yikun Jiang 2025-07-26 23:10:27 +08:00
  • ed2ab8a197 [CI/Build] Upgrade CANN to 8.2.RC1 (#1653) Mengqing Cao 2025-07-26 22:37:46 +08:00
  • d1c640841b [Bugfix] Fix num_hidden_layers when Qwen2-Audio 7B (#1803) zhangxinyuehfad 2025-07-26 20:13:00 +08:00
  • df0ec55162 Disaggregate prefill for kv cache register style (#950) Pleaplusone 2025-07-26 17:15:47 +08:00
  • 17a430f7b8 Upgrade vLLM to v0.10.0 (#1927) Yikun Jiang 2025-07-26 15:43:29 +08:00
  • 2f50304c19 [Bugfix] Add get_supported_tasks interface to fix broken CI (#2023) Li Wang 2025-07-26 08:20:21 +08:00
  • bdfb065b5d [1/2/N] Enable pymarkdown and python __init__ for lint system (#2011) Li Wang 2025-07-25 22:16:10 +08:00
  • d629f0b2b5 [CI] Remove transformers installation (#2014) Li Wang 2025-07-25 15:20:37 +08:00
  • e561a2c6ec ut:add ut for qwen2_5_vl_without_padding.py (#1988) Ronald1995 2025-07-25 14:12:44 +08:00
  • ae560f7131 [Test] Add uts for files in /core (#1957) SunnyLee151064 2025-07-25 09:48:19 +08:00
  • 6bc82cf6a7 Enable image push CI for build file and csrc has changes (#1977) Icey 2025-07-24 21:19:41 +08:00
  • cfdd45ed00 [Bug] Fix duplicate 'torch.' prefix in qwen-vl (#1986) JohnJan 2025-07-24 20:16:00 +08:00
  • 84fc7402c3 [Misc] Refactor AscendMetaData Comments to Make It Clearer (#1967) Shanshan Shen 2025-07-24 19:31:36 +08:00
  • ff97740b8d Use mirror images (#1912) li chaoran 2025-07-24 10:47:05 +08:00
  • ab7d5aca5d [Test] Add ut for files in /multistream (#1947) SunnyLee151064 2025-07-24 10:42:49 +08:00
  • 34571ea5ae [Test] Add ut for files in /distributed (#1951) SunnyLee151064 2025-07-24 10:36:11 +08:00
  • fa76a9b7bb [Bug] Add prefix parameter to parent class initialization (#1934) JohnJan 2025-07-24 10:28:40 +08:00
  • 2ffe051859 [Test]add ut for deepseek_v2. (#1964) Zac 2025-07-24 10:27:50 +08:00
  • 846555cdb5 [Misc] Clean up uesless code in attention (#1933) wangxiyuan 2025-07-24 10:23:34 +08:00
  • b5ad70e1a6 [Optimize]Change AI Vector core number getting function to glibc ABI free funcition (#1974) leo-pony 2025-07-24 10:00:19 +08:00
  • ac0bf133f4 add ut of fused_moe.py (#1930) shiyuan680 2025-07-23 16:24:09 +08:00
  • ac773aca43 Add UT for Patches (#1766) weichen 2025-07-23 16:07:20 +08:00
  • 326dcf2576 [Doc] Update support feature (#1828) wangxiyuan 2025-07-23 15:19:15 +08:00
  • 3aa3b46bfe [V1][PP] Support pp with ray backend in V1 (#1800) Mengqing Cao 2025-07-23 14:52:52 +08:00
  • 9a3bdf2162 [main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance (#1806) rjg-lyh 2025-07-22 19:03:13 +08:00
  • ce4970eee0 [Test] Add unit test for schedule_config.py (#1590) JohnJan 2025-07-22 11:43:25 +08:00
  • 5f0b42e414 [FOLLOWUP] Use base test to avoid patch everwhere (#1634) Yikun Jiang 2025-07-22 09:03:40 +08:00
  • 33e1ea4d1a [CI] Fix broken CI (#1915) Li Wang 2025-07-22 08:38:30 +08:00
  • 7265dc090d [2/4][Refactor] Refactor torchair utils (#1892) wangxiyuan 2025-07-21 19:43:30 +08:00
  • 957b0b611f [Misc][V0 Deprecation] Remove V0 Model Runner (#1823) Shanshan Shen 2025-07-21 16:35:50 +08:00
  • a66ef39bb6 [Misc][V0 Deprecation] Remove Redundant Offline Distributed Inference Example (#1899) Shanshan Shen 2025-07-21 12:01:45 +08:00
  • af56ae3ed1 [1/4][Refactor] Refactor torchair worker (#1885) wangxiyuan 2025-07-21 11:50:46 +08:00
  • c32eea96b7 [Doc]Add Chinese translation for documentation (#1870) aidoczh 2025-07-21 11:26:27 +08:00
  • 8cfd257992 [Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681) Mengqing Cao 2025-07-21 09:08:04 +08:00
  • a8b316ac5b [CI] Make AttentionBackend interface compatible to fix broken CI (#1893) wangxiyuan 2025-07-21 08:21:06 +08:00
  • 54f2b31184 [Doc] Add a doc for qwen omni (#1867) JohnJan 2025-07-20 09:05:41 +08:00
  • 2b726d8f90 [CI] Fix broken CI (#1889) wangxiyuan 2025-07-20 02:11:57 +08:00
  • 2ee90461d0 Fix e2e data parallel test: add resource release code (#1881) leo-pony 2025-07-19 11:39:48 +08:00
  • b824525be3 Move deepseek_v3 from deepseek_v2.py (#1793) xleoken 2025-07-19 11:37:03 +08:00
  • ab68d31a24 [Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878) Shanshan Shen 2025-07-19 09:42:32 +08:00
  • 53d2ea3789 [Bugfix]Fix the performance gap between 0.9.2rc1 and 0.9.1 (#1811) lianyibo 2025-07-18 23:09:54 +08:00
  • 574fe407eb [1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841) Mengqing Cao 2025-07-18 23:07:14 +08:00
  • 8a91e6e59c [Misc][V0 Deprecation] Remove V0 Related Custom Ops (#1871) Shanshan Shen 2025-07-18 23:06:03 +08:00
  • 3e39d7234c [CI] Switching to infra cache server to reduce network pressure (#1792) li chaoran 2025-07-18 18:39:25 +08:00
  • d08ff304cd [Misc][V0 Deprecation] Remove V0 Attention (#1835) Shanshan Shen 2025-07-18 14:10:13 +08:00
  • 33ef5dc813 add unit test for func wrapper (#1863) xudongLi-cmss 2025-07-18 11:05:17 +08:00
  • f9dfde02fd [Bugfix] Fix broken CI (#1848) Li Wang 2025-07-17 20:10:12 +08:00
  • 538dd357e6 Add graph mode and improve on multi_npu_moge.md (#1849) Zhu Yi Lin 2025-07-17 17:53:37 +08:00
  • aeb5aa8b88 [Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837) Shanshan Shen 2025-07-17 14:13:30 +08:00
  • 19e37cd379 [Misc] Add fusion_result.json to .gitignore (#1836) Shanshan Shen 2025-07-17 11:54:49 +08:00
  • 875a920d4a [Platform] Add support for Altlas A3 series (#1794) Icey 2025-07-17 11:13:02 +08:00
  • ef99fe1c54 [Test] Clean up duplicate test for ascend scheduler (#1819) wangxiyuan 2025-07-16 17:57:48 +08:00
  • c66b0827a7 [Misc][V0 Deprecation] Remove Pooling Model Runner (#1824) Shanshan Shen 2025-07-16 17:48:21 +08:00
  • ba7e934b21 Remove redundant empty lines in commit msg (#1814) Yikun Jiang 2025-07-16 16:50:44 +08:00
  • 06655002c5 [Misc][V0 Deprecation] Remove V0 Worker (#1821) Shanshan Shen 2025-07-16 14:07:17 +08:00
  • b005def0a5 [Misc][V0 Deprecation] Remove Multi-Step Model Runner (#1820) Shanshan Shen 2025-07-16 14:06:49 +08:00
  • f9e2e9bb31 [Misc][V0 Deprecation] Remove Draft Model Runner Used for V0 Spec Decode (#1810) Shanshan Shen 2025-07-16 10:51:23 +08:00
  • f96100fad5 [Misc][V0 Deprecation] Remove V0 related codes of test, example, platform (#1805) Shanshan Shen 2025-07-15 19:58:55 +08:00
  • a929699e98 [Misc][V0 Deprecation] Remove multi-step worker (#1809) Shanshan Shen 2025-07-15 19:48:47 +08:00
  • bf2549856f [CI] Fix changes CI to recover codecov (#1799) wangxiyuan 2025-07-15 15:01:13 +08:00
  • 787010a637 [Test] Remove VLLM_USE_V1 in example and tests (#1733) wangxiyuan 2025-07-15 12:49:57 +08:00
  • eb921d2b6f [Doc] Fix 404 error (#1797) wangxiyuan 2025-07-15 11:52:38 +08:00
  • 7bdada58eb [Misc] Remove VLLM_USE_V1 usage in code (#1764) wangxiyuan 2025-07-15 11:52:16 +08:00
  • 494b0f474f [CI]Fix broken CI (#1773) wangxiyuan 2025-07-15 00:54:20 +08:00
  • afcfe91dfa [Doc] Fix multi node doc (#1783) Li Wang 2025-07-14 17:56:57 +08:00
  • cabfb2bc31 [Test] Resolve vllm-ascend version accuracy test (#1769) zhangxinyuehfad 2025-07-14 15:43:37 +08:00
  • d3c6dd985a [Misc] Add include dir to .gitignore (#1771) Shanshan Shen 2025-07-14 12:05:29 +08:00
  • 9cd4ac76a1 [CI] Remove benchmark patch and increase the scheduler frequency (#1762) Li Wang 2025-07-13 20:00:35 +08:00
  • d118bf8a26 Update README.zh.md to fix typo (#1758) Yikun Jiang 2025-07-12 14:01:34 +08:00