Commit Graph

  • 0bd5ff5299 Fix accuracy test config and add DeepSeek-V2-Lite test (#2261) Icey 2025-08-08 11:09:16 +08:00
  • ad1083761f [CI][Quickfix] Fix AscendFusedMoE init error (#2268) Mengqing Cao 2025-08-08 10:20:23 +08:00
  • dceef080b1 [main] remove torch.cat and replace it by List[0] (#2153) huangxialu 2025-08-07 17:20:19 +08:00
  • b2598c3271 enable mm allreduce test (#2192) Ronald1995 2025-08-07 17:19:23 +08:00
  • 4604882a3e [ReleaseNote] Release note of v0.10.0rc1 (#2225) Mengqing Cao 2025-08-07 14:46:49 +08:00
  • 58c8d4fdcd Remove transformer pins for v0.9.1-dev (#2234) Yikun Jiang 2025-08-07 14:41:10 +08:00
  • 92eebc0c9b [Doc] Update user guide for suported models (#2263) zhangxinyuehfad 2025-08-07 14:39:51 +08:00
  • 440d28a138 [Tutorial] Add qwen3 8b w4a8 tutorial (#2249) 22dimensions 2025-08-07 14:39:38 +08:00
  • bcd0b532f5 [Doc] Update user guide for using lm-eval (#1325) zhangxinyuehfad 2025-08-07 14:15:49 +08:00
  • dbba3cabb0 [Doc] Update tutorials for single_npu_audio and single_npu_multimodal (#2252) zhangxinyuehfad 2025-08-07 14:08:14 +08:00
  • 205eff2b12 [Bugfix] Disable check vllm init temporary (#2250) Li Wang 2025-08-07 10:37:22 +08:00
  • c611291661 【main】SP For Qwen3 MoE (#2209) lbk-sys 2025-08-07 09:15:49 +08:00
  • 57b9f02185 [Bugfix] Fix disaggregated pd error (#2242) Li Wang 2025-08-06 19:48:10 +08:00
  • 26fc36b0e0 [V1] MTP supports torchair (#2145) xuyexiong 2025-08-06 19:37:43 +08:00
  • bf84f2dbfa [Doc] Support kimi-k2-w8a8 (#2162) Li Wang 2025-08-06 19:28:47 +08:00
  • 875a86cbe9 ut: add example and e2e test for sleepmode in external_launcher (#2152) huangxialu 2025-08-06 11:11:53 +08:00
  • 8a59367d0c [main][Feature] Support deepseek w4a8 quantization (#2172) Wang Kunpeng 2025-08-06 10:17:44 +08:00
  • e31b31f9c3 [main][Bugfix] Fix unable to load qwen3_moe quantized weights (#2219) Ruri 2025-08-06 09:08:36 +08:00
  • 54ace9e12b Add release note for v0.9.1rc2 (#2188) Yikun Jiang 2025-08-06 09:04:46 +08:00
  • 126cdfc92b [Test] add rejection sampler ut (#2084) sherie 2025-08-05 19:03:36 +08:00
  • f3b50c54e8 [main][Prefill Perf] Optimize Quantized MoE Performance by Reducing All2All Communication (#2195) Slightwind 2025-08-05 18:47:13 +08:00
  • 292fb8f696 [1/N][Refactor] torchair model runner refactor (#2205) wangxiyuan 2025-08-05 18:43:04 +08:00
  • 458ab2db12 [BugFix] Fix the bug that qwen3 moe doesn't work with aclgraph (#2183) wangxiyuan 2025-08-05 17:42:52 +08:00
  • 583ad8f347 [main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. (#2062) jinyuxin 2025-08-05 17:03:36 +08:00
  • 27c2b5c145 [Doc] Update pytorch version in README_zh doc (#2202) xleoken 2025-08-05 11:13:49 +08:00
  • 807f0895b2 Bump torch version to 2.7.1 (#1562) leo-pony 2025-08-05 08:43:24 +08:00
  • 36e450eb0f [Misc] Nit fix for disaggregated_prefill and ascend_forward_context (#2097) wangxiyuan 2025-08-05 08:39:02 +08:00
  • ad366bf908 [Bugfix] Follow vLLM Qwen-Moe/VL and KV Connector change to fix broken CI (#2181) Li Wang 2025-08-04 21:37:50 +08:00
  • e38fab011d [Doc][PD] Restore the default configuration items in examples/disaggregate_prefill_v1/README.md (#2165) hucong 2025-08-04 20:30:53 +08:00
  • 957c7f108d [Bugfix][PD] Make multiple Ps and Ds work on a single machine (#2080) CaveNightingale 2025-08-04 17:22:18 +08:00
  • a9480d5f0a [Fix] Adjust use_aclgraph logic (#2156) yiz-liu 2025-08-04 15:23:20 +08:00
  • 688350a3bb [bugfixed] fix the bug when run the inference of quantized ds-w8a8-mtp (#2134) liu 2025-08-04 15:16:42 +08:00
  • 4b3a210c33 Implementation of simple load balance routing proxy server (#1953) (#2124) Pleaplusone 2025-08-04 10:35:53 +08:00
  • af04ee9e7a [MoE][Dist] Fix Qwen MoE accuracy bug in DP scenario (#1856) Mengqing Cao 2025-08-04 10:24:18 +08:00
  • f939381c6f [Bugfix] Adopt the new changes on disaggregated pd from vllm main branch (#2122) Pleaplusone 2025-08-04 10:08:58 +08:00
  • ddaded1537 Add ut for envs.py (#2131) YuanCheng-coder 2025-08-02 16:53:44 +08:00
  • bea3d5bbb4 [Bug] Fix run bug in run_dp_server.sh (#2139) xleoken 2025-08-02 16:52:12 +08:00
  • 47f688a2f0 Change retrieving remote files to local retrieval. (#2141) yangqinghao-cmss 2025-08-02 16:51:22 +08:00
  • e48f32ec59 [CI] Update image for 310p ci (#2155) zhangxinyuehfad 2025-08-02 16:46:02 +08:00
  • e467fe1b77 Add qwen-vl model and sampling feature UT for 310I series (#2168) leo-pony 2025-08-02 11:26:12 +08:00
  • 6e00aed4d5 [main][Feature]Moe alltoallv communication optimization for unquantized RL training sence (#2088) weijinqian0 2025-08-02 09:49:10 +08:00
  • f0c1f0c828 [Doc] Add qwen vl example in tutorials for 310I series (#2160) leo-pony 2025-08-02 08:58:56 +08:00
  • 8cf97d8310 [Misc] Add extra checking to torchair_graph_config. (#1939) 22dimensions 2025-08-01 09:24:11 +08:00
  • 2284289880 [MISC] Cherry pick #1291 from v0.9.1-dev (#1825) Li Wang 2025-08-01 09:08:45 +08:00
  • 9e65da990e [Misc] Add warning for incompatible Ray backend with ACL Graph mode (#2132) 22dimensions 2025-08-01 09:06:09 +08:00
  • 99fa0ac882 [BugFix] update the kv transfer config (#2121) yangqinghao-cmss 2025-08-01 08:56:55 +08:00
  • 968e6791d3 [Misc] Add data preprocess functions to qwen2.5_vl_without_padding (#2148) Li Wang 2025-08-01 08:54:02 +08:00
  • e3b3ffb875 [Misc] Disable quantization in mindie_turbo (#2147) Li Wang 2025-08-01 08:53:00 +08:00
  • c62f346f5d Fixed 310p failure when using the sampler feature (#2151) leo-pony 2025-08-01 08:43:08 +08:00
  • 86bdde1ca8 Enable pytest and yaml style accuracy test (#2073) Icey 2025-07-31 21:39:13 +08:00
  • 9c9a7cd90b [main] adapt usage of npu_moe_gating_top_k_softmax and remove envs.SELECT_GATING_TOPK_SOTFMAX_EXPERTS (#2112) huangxialu 2025-07-31 21:05:56 +08:00
  • e8660d7978 ut:add ut for qwen2_5_vl (#2143) Ronald1995 2025-07-31 20:46:17 +08:00
  • cb0a303080 ut:add e2e test for external launcher (#2091) Ronald1995 2025-07-31 20:37:42 +08:00
  • 4c8842da65 [BugFix] Fix a bug of running chunked-prefill with torchair. (#1378) (#1844) Mengqing Cao 2025-07-31 20:08:45 +08:00
  • db310c6ec9 add ut for device allocator/camem and mutistream/layers (#2037) daniel 2025-07-31 19:17:27 +08:00
  • 2008152c48 [main][bugfix]Fix vLLM startup failure when inferring DeepSeek R1 model in DP scenario (#2020) zhanghw0354 2025-07-31 15:30:28 +08:00
  • 7c90ba5fe8 [Test] add ut for decorator.py/deepseek_mtp.py (#2127) CaranLic 2025-07-31 15:21:15 +08:00
  • 6192bc95c0 [Bugfix] fix tensor not same device in qwen2_5_vl_without_padding (#2051) Joey Gao 2025-07-31 15:18:54 +08:00
  • 72eceff94d [Bugfix] grammar_bitmask IndexError caused by outdated apply_grammar_bitmask method (#2022) ApsarasX 2025-07-31 09:03:27 +08:00
  • 75e28d0356 [Build][Ray] Fix protobuf version in Dockerfile (#2028) Mengqing Cao 2025-07-30 22:49:20 +08:00
  • 3386e09a40 ut:add ut for qwen2_vl.py (#2096) Ronald1995 2025-07-30 22:31:47 +08:00
  • 936df1cb9b [Doc] Fix cann related urls (#2106) Mengqing Cao 2025-07-30 22:31:30 +08:00
  • 4fcca137a7 [main][Feature] Support Qwen3 W4A8 quantization (#2060) Ruri 2025-07-30 14:57:14 +08:00
  • 6874d666fa [CI]Add e2e test for 310p (#1879) zhangxinyuehfad 2025-07-30 14:52:16 +08:00
  • 34dd24adf2 add ut for vocab_parallel_embedding (#2067) YuanCheng-coder 2025-07-30 14:35:45 +08:00
  • d9f82ebfce [misc] Add reminder comment when PR submitted (#2092) Yikun Jiang 2025-07-30 10:14:33 +08:00
  • 1dbb888275 [Bugfix] LoRA logits einsum dimension mismatch in add_lora_logits (#1583) hongfugui 2025-07-30 09:50:36 +08:00
  • d80b0cca5d [CI] Fix test on pyhccl to 2 cards (#2094) Mengqing Cao 2025-07-30 09:08:00 +08:00
  • 9b67c87b14 [Refactor]Refactor sampler (#2050) wangxiyuan 2025-07-30 08:47:22 +08:00
  • b6a7f07c70 [Perf][MoE] Improve MoE multistream parallel performace. (#1891) whx 2025-07-29 23:53:19 +08:00
  • 4df8e0027c [e2e]Fixed the issue that pyhccl e2e cannot run continuously with other tests (#1246) leo-pony 2025-07-29 19:38:30 +08:00
  • 61fc35184b [Doc] Add performance tuning doc to main (#1392) Shanshan Shen 2025-07-29 19:36:34 +08:00
  • 540336edc9 Add Custom Kernels For LoRA Performance (#1884) taoxudonghaha 2025-07-29 19:27:50 +08:00
  • 2da281ec5a bump default python version to 3.11 (#2072) TaoYu Chen 2025-07-29 19:07:17 +08:00
  • f60bb474f9 [CI] Enable linux-aarch64-a2 (64GB) and tp2 * 2 max-parallel to speed up CI (#2065) Li Wang 2025-07-29 18:59:05 +08:00
  • ca8007f584 [Feature] Enable inference support for Deepseekr1-w8a8-MTP (#1994) curryliu 2025-07-29 18:51:57 +08:00
  • 98cadc2146 [Perf] Avoid performing index selection of sin/cos cache every layer (#1890) whx 2025-07-29 18:06:45 +08:00
  • 0190b68f51 [Misc]Remove PD v0 code (#2047) wangxiyuan 2025-07-28 19:09:22 +08:00
  • 935e9d4c9d Pin transformers to fix v0.9.1 doctest (#2048) Yikun Jiang 2025-07-28 17:51:56 +08:00
  • 1a25b0a2dd [Test] add ut for qwen3_moe.py (#2055) huangxialu 2025-07-28 17:37:13 +08:00
  • e7d32ed3f1 [BugFix] Fix the problem that torchair doesn't support tp > 4. (#1508) whx 2025-07-28 16:48:05 +08:00
  • 4a008c4dac [Misc]Clean up useless import from vllm (#2049) wangxiyuan 2025-07-28 16:01:59 +08:00
  • 34cfdf5520 [Misc] Fix logger bug (#2024) wangxiyuan 2025-07-28 15:59:09 +08:00
  • 3ad582c9a9 [Test] Add ut for files in /attention (#1944) LeeWenquan 2025-07-28 15:54:40 +08:00
  • 32a9c5f694 [Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926) Ronald1995 2025-07-28 15:13:37 +08:00
  • ba3dfbd59e [main][refactor] Refactoring forward_context and model_runner_v1 (#1979) zzzzwwjj 2025-07-28 14:06:20 +08:00
  • e3a2443c3a [main][Doc] add mla pertoken quantization FAQ (#2018) Wang Kunpeng 2025-07-27 08:47:51 +08:00
  • 5b579ddafe Upgrade CANN to 8.2.RC1 (A3) (#2043) Yikun Jiang 2025-07-26 23:10:27 +08:00
  • ed2ab8a197 [CI/Build] Upgrade CANN to 8.2.RC1 (#1653) Mengqing Cao 2025-07-26 22:37:46 +08:00
  • d1c640841b [Bugfix] Fix num_hidden_layers when Qwen2-Audio 7B (#1803) zhangxinyuehfad 2025-07-26 20:13:00 +08:00
  • df0ec55162 Disaggregate prefill for kv cache register style (#950) Pleaplusone 2025-07-26 17:15:47 +08:00
  • 17a430f7b8 Upgrade vLLM to v0.10.0 (#1927) Yikun Jiang 2025-07-26 15:43:29 +08:00
  • 2f50304c19 [Bugfix] Add get_supported_tasks interface to fix broken CI (#2023) Li Wang 2025-07-26 08:20:21 +08:00
  • bdfb065b5d [1/2/N] Enable pymarkdown and python __init__ for lint system (#2011) Li Wang 2025-07-25 22:16:10 +08:00
  • d629f0b2b5 [CI] Remove transformers installation (#2014) Li Wang 2025-07-25 15:20:37 +08:00
  • e561a2c6ec ut:add ut for qwen2_5_vl_without_padding.py (#1988) Ronald1995 2025-07-25 14:12:44 +08:00
  • ae560f7131 [Test] Add uts for files in /core (#1957) SunnyLee151064 2025-07-25 09:48:19 +08:00
  • 6bc82cf6a7 Enable image push CI for build file and csrc has changes (#1977) Icey 2025-07-24 21:19:41 +08:00
  • cfdd45ed00 [Bug] Fix duplicate 'torch.' prefix in qwen-vl (#1986) JohnJan 2025-07-24 20:16:00 +08:00
  • 84fc7402c3 [Misc] Refactor AscendMetaData Comments to Make It Clearer (#1967) Shanshan Shen 2025-07-24 19:31:36 +08:00