xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

99fa0ac882 [BugFix] update the kv transfer config (#2121) yangqinghao-cmss 2025-08-01 08:56:55 +08:00
968e6791d3 [Misc] Add data preprocess functions to qwen2.5_vl_without_padding (#2148) Li Wang 2025-08-01 08:54:02 +08:00
e3b3ffb875 [Misc] Disable quantization in mindie_turbo (#2147) Li Wang 2025-08-01 08:53:00 +08:00
c62f346f5d Fixed 310p failure when using the sampler feature (#2151) leo-pony 2025-08-01 08:43:08 +08:00
86bdde1ca8 Enable pytest and yaml style accuracy test (#2073) Icey 2025-07-31 21:39:13 +08:00
9c9a7cd90b [main] adapt usage of npu_moe_gating_top_k_softmax and remove envs.SELECT_GATING_TOPK_SOTFMAX_EXPERTS (#2112) huangxialu 2025-07-31 21:05:56 +08:00
e8660d7978 ut:add ut for qwen2_5_vl (#2143) Ronald1995 2025-07-31 20:46:17 +08:00
cb0a303080 ut:add e2e test for external launcher (#2091) Ronald1995 2025-07-31 20:37:42 +08:00
4c8842da65 [BugFix] Fix a bug of running chunked-prefill with torchair. (#1378) (#1844) Mengqing Cao 2025-07-31 20:08:45 +08:00
db310c6ec9 add ut for device allocator/camem and mutistream/layers (#2037) daniel 2025-07-31 19:17:27 +08:00
2008152c48 [main][bugfix]Fix vLLM startup failure when inferring DeepSeek R1 model in DP scenario (#2020) zhanghw0354 2025-07-31 15:30:28 +08:00
7c90ba5fe8 [Test] add ut for decorator.py/deepseek_mtp.py (#2127) CaranLic 2025-07-31 15:21:15 +08:00
6192bc95c0 [Bugfix] fix tensor not same device in qwen2_5_vl_without_padding (#2051) Joey Gao 2025-07-31 15:18:54 +08:00
72eceff94d [Bugfix] grammar_bitmask IndexError caused by outdated apply_grammar_bitmask method (#2022) ApsarasX 2025-07-31 09:03:27 +08:00
75e28d0356 [Build][Ray] Fix protobuf version in Dockerfile (#2028) Mengqing Cao 2025-07-30 22:49:20 +08:00
3386e09a40 ut:add ut for qwen2_vl.py (#2096) Ronald1995 2025-07-30 22:31:47 +08:00
936df1cb9b [Doc] Fix cann related urls (#2106) Mengqing Cao 2025-07-30 22:31:30 +08:00
4fcca137a7 [main][Feature] Support Qwen3 W4A8 quantization (#2060) Ruri 2025-07-30 14:57:14 +08:00
6874d666fa [CI]Add e2e test for 310p (#1879) zhangxinyuehfad 2025-07-30 14:52:16 +08:00
34dd24adf2 add ut for vocab_parallel_embedding (#2067) YuanCheng-coder 2025-07-30 14:35:45 +08:00
d9f82ebfce [misc] Add reminder comment when PR submitted (#2092) Yikun Jiang 2025-07-30 10:14:33 +08:00
1dbb888275 [Bugfix] LoRA logits einsum dimension mismatch in add_lora_logits (#1583) hongfugui 2025-07-30 09:50:36 +08:00
d80b0cca5d [CI] Fix test on pyhccl to 2 cards (#2094) Mengqing Cao 2025-07-30 09:08:00 +08:00
9b67c87b14 [Refactor]Refactor sampler (#2050) wangxiyuan 2025-07-30 08:47:22 +08:00
b6a7f07c70 [Perf][MoE] Improve MoE multistream parallel performace. (#1891) whx 2025-07-29 23:53:19 +08:00
4df8e0027c [e2e]Fixed the issue that pyhccl e2e cannot run continuously with other tests (#1246) leo-pony 2025-07-29 19:38:30 +08:00
61fc35184b [Doc] Add performance tuning doc to main (#1392) Shanshan Shen 2025-07-29 19:36:34 +08:00
540336edc9 Add Custom Kernels For LoRA Performance (#1884) taoxudonghaha 2025-07-29 19:27:50 +08:00
2da281ec5a bump default python version to 3.11 (#2072) TaoYu Chen 2025-07-29 19:07:17 +08:00
f60bb474f9 [CI] Enable linux-aarch64-a2 (64GB) and tp2 * 2 max-parallel to speed up CI (#2065) Li Wang 2025-07-29 18:59:05 +08:00
ca8007f584 [Feature] Enable inference support for Deepseekr1-w8a8-MTP (#1994) curryliu 2025-07-29 18:51:57 +08:00
98cadc2146 [Perf] Avoid performing index selection of sin/cos cache every layer (#1890) whx 2025-07-29 18:06:45 +08:00
0190b68f51 [Misc]Remove PD v0 code (#2047) wangxiyuan 2025-07-28 19:09:22 +08:00
935e9d4c9d Pin transformers to fix v0.9.1 doctest (#2048) Yikun Jiang 2025-07-28 17:51:56 +08:00
1a25b0a2dd [Test] add ut for qwen3_moe.py (#2055) huangxialu 2025-07-28 17:37:13 +08:00
e7d32ed3f1 [BugFix] Fix the problem that torchair doesn't support tp > 4. (#1508) whx 2025-07-28 16:48:05 +08:00
4a008c4dac [Misc]Clean up useless import from vllm (#2049) wangxiyuan 2025-07-28 16:01:59 +08:00
34cfdf5520 [Misc] Fix logger bug (#2024) wangxiyuan 2025-07-28 15:59:09 +08:00
3ad582c9a9 [Test] Add ut for files in /attention (#1944) LeeWenquan 2025-07-28 15:54:40 +08:00
32a9c5f694 [Feature]: implement the fusion of allreduce and matmul in prefill phase when tp is enabled (#1926) Ronald1995 2025-07-28 15:13:37 +08:00
ba3dfbd59e [main][refactor] Refactoring forward_context and model_runner_v1 (#1979) zzzzwwjj 2025-07-28 14:06:20 +08:00
e3a2443c3a [main][Doc] add mla pertoken quantization FAQ (#2018) Wang Kunpeng 2025-07-27 08:47:51 +08:00
5b579ddafe Upgrade CANN to 8.2.RC1 (A3) (#2043) Yikun Jiang 2025-07-26 23:10:27 +08:00
ed2ab8a197 [CI/Build] Upgrade CANN to 8.2.RC1 (#1653) Mengqing Cao 2025-07-26 22:37:46 +08:00
d1c640841b [Bugfix] Fix num_hidden_layers when Qwen2-Audio 7B (#1803) zhangxinyuehfad 2025-07-26 20:13:00 +08:00
df0ec55162 Disaggregate prefill for kv cache register style (#950) Pleaplusone 2025-07-26 17:15:47 +08:00
17a430f7b8 Upgrade vLLM to v0.10.0 (#1927) Yikun Jiang 2025-07-26 15:43:29 +08:00
2f50304c19 [Bugfix] Add get_supported_tasks interface to fix broken CI (#2023) Li Wang 2025-07-26 08:20:21 +08:00
bdfb065b5d [1/2/N] Enable pymarkdown and python __init__ for lint system (#2011) Li Wang 2025-07-25 22:16:10 +08:00
d629f0b2b5 [CI] Remove transformers installation (#2014) Li Wang 2025-07-25 15:20:37 +08:00
e561a2c6ec ut:add ut for qwen2_5_vl_without_padding.py (#1988) Ronald1995 2025-07-25 14:12:44 +08:00
ae560f7131 [Test] Add uts for files in /core (#1957) SunnyLee151064 2025-07-25 09:48:19 +08:00
6bc82cf6a7 Enable image push CI for build file and csrc has changes (#1977) Icey 2025-07-24 21:19:41 +08:00
cfdd45ed00 [Bug] Fix duplicate 'torch.' prefix in qwen-vl (#1986) JohnJan 2025-07-24 20:16:00 +08:00
84fc7402c3 [Misc] Refactor AscendMetaData Comments to Make It Clearer (#1967) Shanshan Shen 2025-07-24 19:31:36 +08:00
ff97740b8d Use mirror images (#1912) li chaoran 2025-07-24 10:47:05 +08:00
ab7d5aca5d [Test] Add ut for files in /multistream (#1947) SunnyLee151064 2025-07-24 10:42:49 +08:00
34571ea5ae [Test] Add ut for files in /distributed (#1951) SunnyLee151064 2025-07-24 10:36:11 +08:00
fa76a9b7bb [Bug] Add prefix parameter to parent class initialization (#1934) JohnJan 2025-07-24 10:28:40 +08:00
2ffe051859 [Test]add ut for deepseek_v2. (#1964) Zac 2025-07-24 10:27:50 +08:00
846555cdb5 [Misc] Clean up uesless code in attention (#1933) wangxiyuan 2025-07-24 10:23:34 +08:00
b5ad70e1a6 [Optimize]Change AI Vector core number getting function to glibc ABI free funcition (#1974) leo-pony 2025-07-24 10:00:19 +08:00
ac0bf133f4 add ut of fused_moe.py (#1930) shiyuan680 2025-07-23 16:24:09 +08:00
ac773aca43 Add UT for Patches (#1766) weichen 2025-07-23 16:07:20 +08:00
326dcf2576 [Doc] Update support feature (#1828) wangxiyuan 2025-07-23 15:19:15 +08:00
3aa3b46bfe [V1][PP] Support pp with ray backend in V1 (#1800) Mengqing Cao 2025-07-23 14:52:52 +08:00
9a3bdf2162 [main] Use AddRmsNormQuant ops in the custom model to optimize Qwen3's performance (#1806) rjg-lyh 2025-07-22 19:03:13 +08:00
ce4970eee0 [Test] Add unit test for schedule_config.py (#1590) JohnJan 2025-07-22 11:43:25 +08:00
5f0b42e414 [FOLLOWUP] Use base test to avoid patch everwhere (#1634) Yikun Jiang 2025-07-22 09:03:40 +08:00
33e1ea4d1a [CI] Fix broken CI (#1915) Li Wang 2025-07-22 08:38:30 +08:00
7265dc090d [2/4][Refactor] Refactor torchair utils (#1892) wangxiyuan 2025-07-21 19:43:30 +08:00
957b0b611f [Misc][V0 Deprecation] Remove V0 Model Runner (#1823) Shanshan Shen 2025-07-21 16:35:50 +08:00
a66ef39bb6 [Misc][V0 Deprecation] Remove Redundant Offline Distributed Inference Example (#1899) Shanshan Shen 2025-07-21 12:01:45 +08:00
af56ae3ed1 [1/4][Refactor] Refactor torchair worker (#1885) wangxiyuan 2025-07-21 11:50:46 +08:00
c32eea96b7 [Doc]Add Chinese translation for documentation (#1870) aidoczh 2025-07-21 11:26:27 +08:00
8cfd257992 [Dist][EP] Remove ETP/EP maintained in vllm-ascend (#1681) Mengqing Cao 2025-07-21 09:08:04 +08:00
a8b316ac5b [CI] Make AttentionBackend interface compatible to fix broken CI (#1893) wangxiyuan 2025-07-21 08:21:06 +08:00
54f2b31184 [Doc] Add a doc for qwen omni (#1867) JohnJan 2025-07-20 09:05:41 +08:00
2b726d8f90 [CI] Fix broken CI (#1889) wangxiyuan 2025-07-20 02:11:57 +08:00
2ee90461d0 Fix e2e data parallel test: add resource release code (#1881) leo-pony 2025-07-19 11:39:48 +08:00
b824525be3 Move deepseek_v3 from deepseek_v2.py (#1793) xleoken 2025-07-19 11:37:03 +08:00
ab68d31a24 [Misc][V0 Deprecation] Remove Cache Engine Used for V0 Worker (#1878) Shanshan Shen 2025-07-19 09:42:32 +08:00
53d2ea3789 [Bugfix]Fix the performance gap between 0.9.2rc1 and 0.9.1 (#1811) lianyibo 2025-07-18 23:09:54 +08:00
574fe407eb [1/N][CustomOp] Register activation customop instead of overwrite forward_oot (#1841) Mengqing Cao 2025-07-18 23:07:14 +08:00
8a91e6e59c [Misc][V0 Deprecation] Remove V0 Related Custom Ops (#1871) Shanshan Shen 2025-07-18 23:06:03 +08:00
3e39d7234c [CI] Switching to infra cache server to reduce network pressure (#1792) li chaoran 2025-07-18 18:39:25 +08:00
d08ff304cd [Misc][V0 Deprecation] Remove V0 Attention (#1835) Shanshan Shen 2025-07-18 14:10:13 +08:00
33ef5dc813 add unit test for func wrapper (#1863) xudongLi-cmss 2025-07-18 11:05:17 +08:00
f9dfde02fd [Bugfix] Fix broken CI (#1848) Li Wang 2025-07-17 20:10:12 +08:00
538dd357e6 Add graph mode and improve on multi_npu_moge.md (#1849) Zhu Yi Lin 2025-07-17 17:53:37 +08:00
aeb5aa8b88 [Misc][V0 Deprecation] Add __main__ guard to all offline examples (#1837) Shanshan Shen 2025-07-17 14:13:30 +08:00
19e37cd379 [Misc] Add fusion_result.json to .gitignore (#1836) Shanshan Shen 2025-07-17 11:54:49 +08:00
875a920d4a [Platform] Add support for Altlas A3 series (#1794) Icey 2025-07-17 11:13:02 +08:00
ef99fe1c54 [Test] Clean up duplicate test for ascend scheduler (#1819) wangxiyuan 2025-07-16 17:57:48 +08:00
c66b0827a7 [Misc][V0 Deprecation] Remove Pooling Model Runner (#1824) Shanshan Shen 2025-07-16 17:48:21 +08:00
ba7e934b21 Remove redundant empty lines in commit msg (#1814) Yikun Jiang 2025-07-16 16:50:44 +08:00
06655002c5 [Misc][V0 Deprecation] Remove V0 Worker (#1821) Shanshan Shen 2025-07-16 14:07:17 +08:00
b005def0a5 [Misc][V0 Deprecation] Remove Multi-Step Model Runner (#1820) Shanshan Shen 2025-07-16 14:06:49 +08:00
f9e2e9bb31 [Misc][V0 Deprecation] Remove Draft Model Runner Used for V0 Spec Decode (#1810) Shanshan Shen 2025-07-16 10:51:23 +08:00
f96100fad5 [Misc][V0 Deprecation] Remove V0 related codes of test, example, platform (#1805) Shanshan Shen 2025-07-15 19:58:55 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0