leejet
0a1b3982cd
ggml: add ops for WAN video model (cuda && cpu) ( #15669 )
...
* add conv3d support
* add ggml_pad_ext for cpu & cuda backend
* cuda/cpu: add im2col_3d support
* cuda: make im2col a little faster
* fix cuda pad/scale/im2col3d
* make im2col_3d faster
* gguf: support loading tensors which n_dims > GGML_MAX_DIMS
* fix cuda get_rows
* avoid ggml_conv_3d conflict
* correct GGML_OP_COUNT assertion
* avoid build failure
* avoid build failure on MacOS
* cuda: remove unnecessary MIN define
* fix cpu im2col_3d
* adjust the code style
* cuda: use simpler loop in get_rows
* add test_im2col_3d to test-backend-ops
* test-backend-ops.cpp: remove trailing whitespace
* cpu: im2col_3d support non continuous src
Co-authored-by: Jeff Bolz <jbolz@nvidia.com >
* fix test_im2col_3d
* remove unused variables
* cuda: get_rows: dfloat2 -> float2
* add test_pad_ext to test-backend-ops.cpp
* add gguf_init_from_file_ext impl
* Revert "gguf: support loading tensors which n_dims > GGML_MAX_DIMS"
This reverts commit d8377a0a37f314bd3713fe043b4333ad661610c1.
* Revert "add gguf_init_from_file_ext impl"
This reverts commit d9f1d13208c68ef83b3538201ac7f31614fb1994.
* update ggml_backend_vk_device_supports_op
* fix ggml_backend_vk_device_supports_op
* update other backend supports op for ggml_pad_ext
* metal/opencl/sycl/vulkan: fix GGML_OP_PAD check in supports_op
---------
Co-authored-by: Jeff Bolz <jbolz@nvidia.com >
2025-09-04 10:38:49 +02:00
Johannes Gäßler
7a6e91ad26
CUDA: replace GGML_CUDA_F16 with CUDA arch checks ( #15433 )
2025-08-20 16:58:49 +02:00
uvos
5ba36f6103
HIP: Cleanup hipification header ( #15285 )
...
add expicit conversion operator to support older versions of rocm
Switch over to hip_bf16 from legacy hip_bfloat16
Simplify RDNA3 define
Reduce swap over of new hipblas api to rocm 6.5 as this version is used for rocm 7.0 previews
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
2025-08-14 16:23:56 +02:00
Aman Gupta
b9c3eefde1
CUDA: add bf16 and i32 to getrows ( #14529 )
2025-07-07 21:45:43 +08:00
Johannes Gäßler
5c86c9ed3e
CUDA: fix crash on large batch size for MoE models ( #13384 )
2025-05-09 12:14:04 +02:00
Johannes Gäßler
e1e8e0991f
CUDA: batched+noncont MMQ, refactor bs>1 MoE code ( #13199 )
2025-04-30 23:12:59 +02:00
Johannes Gäßler
9c8dcefe17
CUDA: backwards pass for misc. ops, add tests ( #11257 )
...
* CUDA: backwards pass for misc. ops, add tests
* remove restrict from pointers
2025-01-16 16:43:38 +01:00
slaren
2b1f616b20
ggml : reduce hash table reset cost ( #8698 )
...
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
2024-07-27 04:41:55 +02:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com >
2024-06-26 18:33:02 +03:00