xc-llm-ascend

EngineX/xc-llm-ascend

Fork 0

b0403f8d8a [CI] fix ci (#2464) Mengqing Cao 2025-08-22 07:30:48 +08:00
0ca3f48c90 [2/N][refactor] torchair deepseek mla backend refactor (#2459) linfeng-yuan 2025-08-21 14:02:30 +08:00
67a222c383 [Doc] Add feature branch policy (#2432) Yikun Jiang 2025-08-21 10:37:21 +08:00
3fb80ee356 add mlp tp optimze (#2120) sherie 2025-08-21 09:22:07 +08:00
973a7cfdf0 [DOC] update doc: LoRA with ACLGraph (#2430) yupeng 2025-08-21 08:55:55 +08:00
0dca4c6dbd refact runner model v1 (#2461) weiguihua2 2025-08-21 08:54:57 +08:00
1de16ead8e [main][bugfix] Modify the default value of the enable_shared_pert_dp to false (#2457) Wang Kunpeng 2025-08-20 20:25:53 +08:00
c40d4171bc [main][quantization] Adapt to the new format of ds w4a8 weight (#2392) Wang Kunpeng 2025-08-20 20:25:18 +08:00
eccfb715f6 [CI] Fix UT (#2452) wangxiyuan 2025-08-20 16:26:07 +08:00
3f867ee708 refactor allgather/mc2-related fused_experts (#2369) sherie 2025-08-20 14:20:46 +08:00
73acdcfc3b [PD] Correct the ip and port env (#2450) wangxiyuan 2025-08-20 11:39:05 +08:00
7bec1a9b9c qwen3_moe/qwen25 support torchair graph (#2403) Nicholas Tao 2025-08-20 11:23:50 +08:00
31ae249742 [misc] remove uesless envs (#2448) wangxiyuan 2025-08-20 10:50:21 +08:00
3a384492e1 [CI] add lint block before running e2e (#2447) Mengqing Cao 2025-08-20 09:53:23 +08:00
1327f9be1c Fix some ci issue and refactor modelrunner (#2445) Mengqing Cao 2025-08-20 09:01:04 +08:00
955411611c Nominate Mengqing Cao as vllm-ascend maintainer (#2433) Jade Zheng 2025-08-19 14:13:54 +08:00
d91c6daf89 [improve] Remove redundant parentheses in pangu_moe.py (#2081) xleoken 2025-08-19 11:00:18 +08:00
6335fe39ea Nominate ApsarasX as vllm-ascend maintainer (#2419) wangxiyuan 2025-08-19 10:44:35 +08:00
83e0f41408 [3/N][Refactor] Move torchair_attention to torchair dir (#2017) Shanshan Shen 2025-08-19 10:25:22 +08:00
2a763b8326 [Bug] Fix bug in test_chunked.py (#1992) xleoken 2025-08-19 10:23:47 +08:00
27d038dc66 fix doc typo (#2407) G.O.D 2025-08-19 09:10:01 +08:00
3f4a358b14 [Bugfix] Fix custom op register issue (#2409) Pleaplusone 2025-08-19 09:09:43 +08:00
3648d18e67 Add Custom Kernels For LoRA Performance (#2325) liuchenbing 2025-08-19 09:09:11 +08:00
8fb50a4248 Bump actions/checkout from 4 to 5 (#2420) dependabot[bot] 2025-08-19 08:54:56 +08:00
9e7c168d99 Add ModelRunner_prepare_inputs doc (#1493) TaoYu Chen 2025-08-18 15:41:24 +08:00
3fc31ee1cb [1/N][refactor] torchair deepseek modeling refactor (#2384) linfeng-yuan 2025-08-18 15:00:37 +08:00
19fdc9a3f0 [Bugfix] Fix header include issue in rope (#2397) Pleaplusone 2025-08-18 14:33:38 +08:00
03ca2b26ca [P/D] Mooncake Connector for v1 distributed (#1568) Chao Lei 2025-08-18 14:30:07 +08:00
2bb7e55022 [Bugfix][PD]fix non-working disaggregated prefill (#2374) CaveNightingale 2025-08-15 16:59:52 +08:00
1b40665548 [Misc] remove unused file (cache.py) (#2377) 22dimensions 2025-08-15 10:27:43 +08:00
61866b8ac6 [Quickfix] update CachedRequestState as NewRequestData changed (#2367) Mengqing Cao 2025-08-15 07:35:27 +08:00
2ad7e1251e [Doc] Fix quant documentation to make it reproducible (#2277) Li Wang 2025-08-14 17:19:47 +08:00
c721ae6042 [CustomOp] Register RMSNorm instead of overwrite forward_oot (#2284) Icey 2025-08-14 17:18:30 +08:00
e14f2ef669 refactor select_experts of moe module (#2150) shiyuan680 2025-08-14 11:50:53 +08:00
103654ccd6 [Misc] Remove redundant imported envs, using envs_ascend instead (#2193) Shanshan Shen 2025-08-14 09:33:39 +08:00
55d0790597 [2/N][Refactor] Refactor V1 attention for better extensibility (#1995) Shanshan Shen 2025-08-14 09:32:41 +08:00
8914d5a4b2 [Quickfix] Add the missing apply_router_weight_on_input in FusedMoE init (#2348) Mengqing Cao 2025-08-14 09:17:50 +08:00
0f7492d18e [Bugfix] fix the oom when chunkprefill with long context like 64k (#2319) zhenghaojiang 2025-08-13 17:15:59 +08:00
8bfd16a145 [Doc] Add container image save/load FAQ for offline environments (#2347) jack 2025-08-13 16:00:43 +08:00
992271b027 [1/N][Feat] Support MoE models with ACL Graph and refactor MoE communication logic (#2125) yiz-liu 2025-08-12 21:10:20 +08:00
1a70564e7c [5/N][Refactor] torchair model runner refactor (#2216) wangxiyuan 2025-08-12 14:24:50 +08:00
49ec6c98b7 [Doc] Update faq (#2334) Mengqing Cao 2025-08-12 14:12:53 +08:00
dc585f148a [main][prefill optimization] Optimize parallel strategies to reduce communication overhead (#2198) Wang Kunpeng 2025-08-12 14:12:12 +08:00
81817908ca ut: add ci guard for ut coverage (#2317) Ronald1995 2025-08-12 08:05:01 +08:00
9c6d108330 Configure Gemini (#2298) jack 2025-08-11 22:21:29 +08:00
c8b0f5f799 [4/N][Refactor] torchair model runner refactor (#2208) wangxiyuan 2025-08-11 21:39:24 +08:00
eb43a475f4 [Feat] chunkprefill mla support torchair graph (#1772) zhenghaojiang 2025-08-11 19:58:59 +08:00
881e36d6a9 [3/N][Refactor] torchair model runner refactor (#2207) wangxiyuan 2025-08-11 18:03:19 +08:00
29aaba5f84 [Perf][MTP] Optimize reject sampler in greedy situation. (#2137) whx 2025-08-11 17:37:49 +08:00
ca274001b0 Bump actions/download-artifact from 4 to 5 (#2311) dependabot[bot] 2025-08-11 16:02:12 +08:00
c0f0b70813 [core] Support capture custom ops into aclgraph (#2113) Pleaplusone 2025-08-11 15:59:42 +08:00
1ab15414bb [2/N][Refactor] torchair model runner refactor (#2204) wangxiyuan 2025-08-11 14:06:49 +08:00
9260910c8d [CI] Fix broken CI (#2302) wangxiyuan 2025-08-11 11:22:32 +08:00
ee6f79c44a Add ut for test_communicator.py (#2293) yangqinghao-cmss 2025-08-09 08:26:04 +08:00
3e65c406b8 Fix accuracy test create PR (#2274) Icey 2025-08-08 14:12:11 +08:00
0bd5ff5299 Fix accuracy test config and add DeepSeek-V2-Lite test (#2261) Icey 2025-08-08 11:09:16 +08:00
ad1083761f [CI][Quickfix] Fix AscendFusedMoE init error (#2268) Mengqing Cao 2025-08-08 10:20:23 +08:00
dceef080b1 [main] remove torch.cat and replace it by List[0] (#2153) huangxialu 2025-08-07 17:20:19 +08:00
b2598c3271 enable mm allreduce test (#2192) Ronald1995 2025-08-07 17:19:23 +08:00
4604882a3e [ReleaseNote] Release note of v0.10.0rc1 (#2225) Mengqing Cao 2025-08-07 14:46:49 +08:00
58c8d4fdcd Remove transformer pins for v0.9.1-dev (#2234) Yikun Jiang 2025-08-07 14:41:10 +08:00
92eebc0c9b [Doc] Update user guide for suported models (#2263) zhangxinyuehfad 2025-08-07 14:39:51 +08:00
440d28a138 [Tutorial] Add qwen3 8b w4a8 tutorial (#2249) 22dimensions 2025-08-07 14:39:38 +08:00
bcd0b532f5 [Doc] Update user guide for using lm-eval (#1325) zhangxinyuehfad 2025-08-07 14:15:49 +08:00
dbba3cabb0 [Doc] Update tutorials for single_npu_audio and single_npu_multimodal (#2252) zhangxinyuehfad 2025-08-07 14:08:14 +08:00
205eff2b12 [Bugfix] Disable check vllm init temporary (#2250) Li Wang 2025-08-07 10:37:22 +08:00
c611291661 【main】SP For Qwen3 MoE (#2209) lbk-sys 2025-08-07 09:15:49 +08:00
57b9f02185 [Bugfix] Fix disaggregated pd error (#2242) Li Wang 2025-08-06 19:48:10 +08:00
26fc36b0e0 [V1] MTP supports torchair (#2145) xuyexiong 2025-08-06 19:37:43 +08:00
bf84f2dbfa [Doc] Support kimi-k2-w8a8 (#2162) Li Wang 2025-08-06 19:28:47 +08:00
875a86cbe9 ut: add example and e2e test for sleepmode in external_launcher (#2152) huangxialu 2025-08-06 11:11:53 +08:00
8a59367d0c [main][Feature] Support deepseek w4a8 quantization (#2172) Wang Kunpeng 2025-08-06 10:17:44 +08:00
e31b31f9c3 [main][Bugfix] Fix unable to load qwen3_moe quantized weights (#2219) Ruri 2025-08-06 09:08:36 +08:00
54ace9e12b Add release note for v0.9.1rc2 (#2188) Yikun Jiang 2025-08-06 09:04:46 +08:00
126cdfc92b [Test] add rejection sampler ut (#2084) sherie 2025-08-05 19:03:36 +08:00
f3b50c54e8 [main][Prefill Perf] Optimize Quantized MoE Performance by Reducing All2All Communication (#2195) Slightwind 2025-08-05 18:47:13 +08:00
292fb8f696 [1/N][Refactor] torchair model runner refactor (#2205) wangxiyuan 2025-08-05 18:43:04 +08:00
458ab2db12 [BugFix] Fix the bug that qwen3 moe doesn't work with aclgraph (#2183) wangxiyuan 2025-08-05 17:42:52 +08:00
583ad8f347 [main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. (#2062) jinyuxin 2025-08-05 17:03:36 +08:00
27c2b5c145 [Doc] Update pytorch version in README_zh doc (#2202) xleoken 2025-08-05 11:13:49 +08:00
807f0895b2 Bump torch version to 2.7.1 (#1562) leo-pony 2025-08-05 08:43:24 +08:00
36e450eb0f [Misc] Nit fix for disaggregated_prefill and ascend_forward_context (#2097) wangxiyuan 2025-08-05 08:39:02 +08:00
ad366bf908 [Bugfix] Follow vLLM Qwen-Moe/VL and KV Connector change to fix broken CI (#2181) Li Wang 2025-08-04 21:37:50 +08:00
e38fab011d [Doc][PD] Restore the default configuration items in examples/disaggregate_prefill_v1/README.md (#2165) hucong 2025-08-04 20:30:53 +08:00
957c7f108d [Bugfix][PD] Make multiple Ps and Ds work on a single machine (#2080) CaveNightingale 2025-08-04 17:22:18 +08:00
a9480d5f0a [Fix] Adjust use_aclgraph logic (#2156) yiz-liu 2025-08-04 15:23:20 +08:00
688350a3bb [bugfixed] fix the bug when run the inference of quantized ds-w8a8-mtp (#2134) liu 2025-08-04 15:16:42 +08:00
4b3a210c33 Implementation of simple load balance routing proxy server (#1953) (#2124) Pleaplusone 2025-08-04 10:35:53 +08:00
af04ee9e7a [MoE][Dist] Fix Qwen MoE accuracy bug in DP scenario (#1856) Mengqing Cao 2025-08-04 10:24:18 +08:00
f939381c6f [Bugfix] Adopt the new changes on disaggregated pd from vllm main branch (#2122) Pleaplusone 2025-08-04 10:08:58 +08:00
ddaded1537 Add ut for envs.py (#2131) YuanCheng-coder 2025-08-02 16:53:44 +08:00
bea3d5bbb4 [Bug] Fix run bug in run_dp_server.sh (#2139) xleoken 2025-08-02 16:52:12 +08:00
47f688a2f0 Change retrieving remote files to local retrieval. (#2141) yangqinghao-cmss 2025-08-02 16:51:22 +08:00
e48f32ec59 [CI] Update image for 310p ci (#2155) zhangxinyuehfad 2025-08-02 16:46:02 +08:00
e467fe1b77 Add qwen-vl model and sampling feature UT for 310I series (#2168) leo-pony 2025-08-02 11:26:12 +08:00
6e00aed4d5 [main][Feature]Moe alltoallv communication optimization for unquantized RL training sence (#2088) weijinqian0 2025-08-02 09:49:10 +08:00
f0c1f0c828 [Doc] Add qwen vl example in tutorials for 310I series (#2160) leo-pony 2025-08-02 08:58:56 +08:00
8cf97d8310 [Misc] Add extra checking to torchair_graph_config. (#1939) 22dimensions 2025-08-01 09:24:11 +08:00
2284289880 [MISC] Cherry pick #1291 from v0.9.1-dev (#1825) Li Wang 2025-08-01 09:08:45 +08:00
9e65da990e [Misc] Add warning for incompatible Ray backend with ACL Graph mode (#2132) 22dimensions 2025-08-01 09:06:09 +08:00

Commit Graph Select branches Hide Pull Requests br/v0.18.0 br/v0.18.0rc1 v0.11.0 Mono Color

Commit Graph

Select branches

Hide Pull Requests

br/v0.18.0

br/v0.18.0rc1

v0.11.0