metal : allow ops to run concurrently (#15929)

* metal : run graphs ops concurrently

ggml-ci

* cont : add flags for debugging and disabling concurrency

ggml-ci

* cont : refactor and handle fusing

ggml-ci

* cont : simplify - no need to use GPU address

ggml-ci

* cont : prepare mem ranges for reuse + add ggml-metal-common.cpp

ggml-ci

* cont : avoid redundant keywords in cpp [no ci]

* metal : reorder graph for better concurrency

ggml-ci

* metal : fix race on mem pool buffers

ggml-ci

* cont : add env GGML_METAL_GRAPH_OPTIMIZE_DISABLE

ggml-ci

* cont : refactor, optimize, add comments

ggml-ci

* cont : refactor ggml-metal.m

ggml-ci

* minor : update logs [no ci]
This commit is contained in:
Georgi Gerganov
2025-09-13 13:54:28 +03:00
committed by GitHub
parent 84d7b2fca1
commit f161463a54
4 changed files with 719 additions and 38 deletions

View File

@@ -6,6 +6,7 @@ message(STATUS "Metal framework found")
ggml_add_backend_library(ggml-metal
ggml-metal.m
ggml-metal-common.cpp
)
target_link_libraries(ggml-metal PRIVATE