enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

bbfc849274 SYCL: add ops doc (#14901) Akarshan Biswas 2025-07-27 17:52:58 +05:30
ca0ef2dddb llama : clarify comment about pp and tg graphs [no ci] (#14895) Daniel Bevenius 2025-07-27 12:10:51 +02:00
89d1029559 vulkan : add fp16 support for the conv_2d kernel (#14872) Erik Scholz 2025-07-27 12:04:33 +02:00
f1a4e72de5 vulkan: skip empty set_rows to avoid invalid API usage (#14860) Jeff Bolz 2025-07-27 04:05:34 -05:00
4762ad7316 model : make rope_yarn_log_mul optional for deepseek2 (#14896) Gabriel Larson 2025-07-27 03:18:37 -05:00
1dc9614e06 llama : fix kq_scale for the attention layers of PLaMo2 (#14892) Shunta Saito 2025-07-27 16:38:44 +09:00
446595b9b3 Docs: add instructions for adding backends (#14889) Aman Gupta 2025-07-27 09:36:43 +08:00
66906cd82a HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3 (#14624) deepsek 2025-07-26 18:28:14 -04:00
11dd5a44eb CANN: Implement GLU ops (#14884) hipudding 2025-07-26 17:56:18 +08:00
9b8f3c6c77 musa: fix build warnings (unused variable) (#14869) R0CKSTAR 2025-07-26 10:36:02 +08:00
c7f3169cd5 ggml-cpu : disable GGML_NNPA by default due to instability (#14880) Aaron Teo 2025-07-26 01:09:03 +08:00
793c0d7f46 metal: SSM_SCAN performance (#14743) Gabe Goodhart 2025-07-25 10:47:39 -06:00
ce111d39d6 opencl: add fused rms_norm_mul (#14841) lhez 2025-07-25 08:12:13 -07:00
e7fecba934 docs : update HOWTO‑add‑model.md for ModelBase and new model classes (#14874) wooksong 2025-07-25 23:25:05 +09:00
e2b7621e7c ggml : remove invalid portPos specifiers from dot files (#14838) Oliver Simons 2025-07-25 13:29:57 +02:00
c1dbea752a context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (#14870) Georgi Gerganov 2025-07-25 14:28:06 +03:00
749e0d27f0 mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503) kiwi 2025-07-25 19:08:04 +08:00
64bf1c3744 rpc : check for null buffers in get/set/copy tensor endpoints (#14868) Chris Rohlf 2025-07-25 06:17:02 -04:00
c12bbde372 sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855) Diego Devesa 2025-07-25 01:07:26 -07:00
3f4fc97f1d musa: upgrade musa sdk to rc4.2.0 (#14498) R0CKSTAR 2025-07-25 03:05:37 +08:00
2df255da3c sync : ggml Georgi Gerganov 2025-07-24 18:30:33 +03:00
60f816a79d cmake : fix usage issues (ggml/1257) Kai Pastor 2025-07-22 20:13:21 +02:00
5592f278b6 ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) Daniel Bevenius 2025-07-21 15:53:12 +02:00
e4868d16d2 context : perform output reorder lazily upon access after sync (#14853) Georgi Gerganov 2025-07-24 16:31:48 +03:00
820de57d4f chat : fix kimi-k2 chat template (#14852) Xuan-Son Nguyen 2025-07-24 13:59:56 +02:00
cb4a63aad6 sycl: fixed semantics of block offset calculation (#14814) Alberto Cabrera Pérez 2025-07-24 11:09:57 +01:00
86f5623d90 llama : fix MiniCPM inference after Granite Four changes (#14850) yummy 2025-07-24 17:50:51 +08:00
39cffdf188 docs: add libcurl-dev install hint for Linux distros (#14801) Pouya 2025-07-24 12:26:44 +03:00
065908cb09 metal : fix fusion across different encoders (#14849) Georgi Gerganov 2025-07-24 10:24:05 +03:00
4ec6291a24 sycl: fix undefined variable in work group size check (#14843) Donghyeon Jeong 2025-07-24 13:50:41 +09:00
a12363bbf0 convert : text-only support for GLM-4.1V-9B-Thinking (#14823) jacekpoplawski 2025-07-23 23:23:57 +02:00
a86f52b285 CUDA: fix overflow in FA, tune performance (#14840) Johannes Gäßler 2025-07-23 21:43:25 +02:00
b284197df4 CUDA: fix compilation with GGML_CUDA_F16 (#14837) Johannes Gäßler 2025-07-23 18:22:30 +02:00
221c0e0c58 ci : correct label refactor->refactoring (#14832) Sigbjørn Skjæret 2025-07-23 14:27:54 +02:00
07a19e27a2 CUDA: fix quantized KV cache + multiple sequences (#14822) Johannes Gäßler 2025-07-23 12:35:53 +02:00
18f3b5ff9e tests : add non-cont K,V FA tests Georgi Gerganov 2025-07-18 13:36:27 +03:00
7233358d29 memory : handle saving/loading null layers in recurrent memory (#14675) l3utterfly 2025-07-23 16:16:41 +08:00
6c88b3bb25 ggml: fix loongarch quantize_row_q8_1 error (#14827) lixing-star 2025-07-23 14:39:51 +08:00
14c28dfc50 CANN: weight format to NZ for Ascend310P3 (#14407) chen fan 2025-07-23 11:58:00 +08:00
8c988fa41d CUDA: add fused rms norm (#14800) Aman Gupta 2025-07-23 09:25:42 +08:00
acd6cb1c41 ggml : model card yaml tab->2xspace (#14819) Csaba Kecskemeti 2025-07-22 09:29:43 -07:00
84712b6043 vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817) Jeff Bolz 2025-07-22 10:35:21 -05:00
d4d1522b20 llama : add model type detection for rwkv7 7B&14B (#14816) Molly Sophia 2025-07-22 23:01:29 +08:00
d1aa0cc5d1 imatrix: add option to display importance score statistics for a given imatrix file (#12718) Ed Addario 2025-07-22 13:33:37 +01:00
c8ade30036 Mtmd: add a way to select device for vision encoder (#14236) stduhpf 2025-07-22 12:51:03 +02:00
e28c0b80c2 cuda : implement bf16 cpy ops and enable bf16 cont (#14763) Sigbjørn Skjæret 2025-07-22 12:33:10 +02:00
8e6f8bc875 opencl: remove unreachable return (#14806) lhez 2025-07-21 23:53:30 -07:00
adef81781a server : allow setting --reverse-prompt arg (#14799) Molly Sophia 2025-07-22 09:24:22 +08:00
48b86c4fdb cuda: remove linking to cublasLt (#14790) R0CKSTAR 2025-07-22 07:45:26 +08:00
38d3af1b73 opencl: fix im2col when KW!=KH (#14803) Sigbjørn Skjæret 2025-07-21 22:55:10 +02:00
6c9ee3b17e opencl: add conv2d kernel (#14403) rmatif 2025-07-21 19:03:19 +02:00
cd465d823c sycl: Fix im2col (#14797) Romain Biessy 2025-07-21 18:39:29 +02:00
922042601b kleidiai: add support for get_rows (#14676) Charles Xu 2025-07-21 15:49:52 +02:00
2ba1333b35 docs : fix backends table in README.md (#14796) Radoslav Gerganov 2025-07-21 15:03:49 +03:00
c2e058f1b4 vulkan/cuda: Fix im2col when KW!=KH (#14789) Jeff Bolz 2025-07-21 06:35:40 -05:00
c82d48ec23 llama : fix --reverse-prompt crashing issue (#14794) Molly Sophia 2025-07-21 17:38:36 +08:00
b4efd77f8a server : add parse_special option to /tokenize endpoint (#14783) IsaacDynamo 2025-07-21 09:24:51 +02:00
2be60cbc27 docs : fix link for tools/perplexity in README.md (#14780) Aman Gupta 2025-07-21 02:13:47 +08:00
b526ad2668 Documentation: Further revisions to the Vulkan section in build.md (#14785) rspOverflow 2025-07-20 23:55:32 +07:00
938b785764 Clang-format: local files first + fix BinPacking (#14779) Aman Gupta 2025-07-20 19:42:34 +08:00
36c153248f Contrib: add 0cc4m as codeowner for Vulkan backend (#14775) 0cc4m 2025-07-19 22:47:21 +02:00
a979ca22db ggml: adds CONV_2D op and direct GEMM Vulkan implementation (#14316) Ervin Áron Tasnádi 2025-07-19 21:59:08 +02:00
90083283ec imatrix : use GGUF to store importance matrices (#9400) compilade 2025-07-19 12:51:22 -04:00
d4b91ea7b2 vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (#13274) (#14707) Peter0x44 2025-07-19 16:58:03 +01:00
83f5872404 Vulkan: Fix fprintf format-security warning (#14770) 0cc4m 2025-07-19 17:47:53 +02:00
f0d4d176df Documentation: Update build.md's Vulkan section (#14736) rspOverflow 2025-07-19 17:18:36 +07:00
b17230917c sync : ggml Georgi Gerganov 2025-07-19 11:46:12 +03:00
bf9087f59a metal : fuse add, mul + add tests (#14596) Georgi Gerganov 2025-07-18 20:37:26 +03:00
9fb1042ce6 graph : fix graph reuse reset of params (#14760) Georgi Gerganov 2025-07-18 20:08:33 +03:00
2adf8d83ac parallel : add option for different RNG seeds (#14757) Georgi Gerganov 2025-07-18 17:33:41 +03:00
021cc28bef cuda : Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs (#14741) Oliver Simons 2025-07-18 13:35:32 +02:00
d498af3d5a graph : avoid huge warm-up graphs for MoE models (#14753) Georgi Gerganov 2025-07-18 14:31:15 +03:00
eacdeb5bfc model : fix build after merge conflict (#14754) Georgi Gerganov 2025-07-18 11:53:55 +03:00
e0cb5c5cb8 model : add EXAONE 4.0 support (#14630) lgai-exaone 2025-07-18 17:45:49 +09:00
f9a31eea06 CUDA: set_rows + cpy.cu refactor (#14712) Aman Gupta 2025-07-18 14:54:18 +08:00
8f974bc1e9 graph : refactor context to not pass gf explicitly (#14629) Georgi Gerganov 2025-07-18 08:29:28 +03:00
09651d09ff graph : Pass the graph placeholder message in debug mode (#14748) Nexes the Elder 2025-07-18 06:25:54 +02:00
349ea79fce use max work group size for device to replace the magic number (#14732) Neo Zhang Jianyu 2025-07-18 10:23:14 +08:00
670e1360cd convert : fix Ernie4.5 MoE without shared experts (#14746) Piotr Wilkin (ilintar) 2025-07-18 01:17:16 +02:00
760b4484e3 nix : use optionalAttrs for env mkDerivation attrset argument (#14726) Wroclaw 2025-07-18 00:18:16 +02:00
cb887f1bc1 model: add Ernie 4.5 MoE support (#14658) Piotr Wilkin (ilintar) 2025-07-17 23:15:32 +02:00
d6fb3f6b49 kv-cache : fix k-shift for multiple streams (#14742) Georgi Gerganov 2025-07-17 20:52:33 +03:00
01612b7409 llama : reuse compute graphs (#14482) Georgi Gerganov 2025-07-17 19:08:33 +03:00
086cf81e88 llama : fix parallel processing for lfm2 (#14705) Tarek Dakhran 2025-07-17 09:22:11 +02:00
d9b691081c kv-cache : opt mask set input (#14600) Georgi Gerganov 2025-07-17 09:49:15 +03:00
ad57d3edd2 batch : fix uninitialized has_cpl flag (#14733) Georgi Gerganov 2025-07-17 09:45:54 +03:00
1ba45d4982 ci : disable failing vulkan crossbuilds (#14723) Sigbjørn Skjæret 2025-07-17 01:52:08 +02:00
19e5943d9e convert : make hf token optional (#14717) Sigbjørn Skjæret 2025-07-16 23:17:43 +02:00
496957e1cb llama : fix parameter order for hybrid memory initialization (#14725) Diner Burger 2025-07-16 15:17:25 -04:00
21c021745d ggml: Add initial WebGPU backend (#14521) Reese Levine 2025-07-16 08:18:51 -07:00
b0f0ecc3dc model : support output bias for qwen2 (#14711) tempstudio 2025-07-16 10:02:06 -05:00
225e7a1438 llama : add high-throughput mode (#14363) Georgi Gerganov 2025-07-16 16:35:42 +03:00
ab14019821 Support diffusion models: Add Dream 7B (#14644) Aman Gupta 2025-07-16 20:03:51 +08:00
64978340b0 ggml : add asserts (#14720) Georgi Gerganov 2025-07-16 14:43:32 +03:00
6ffd4e9c44 server : pre-calculate EOG logit biases (#14721) Georgi Gerganov 2025-07-16 14:04:12 +03:00
e4841d24d3 llama : fix parallel processing for plamo2 (#14716) Shunta Saito 2025-07-16 19:12:22 +09:00
538cc77f7f server : fix handling of the ignore_eos flag (#14710) Georgi Gerganov 2025-07-16 12:13:57 +03:00
5cae766541 scripts: synthetic prompt mode for server-bench.py (#14695) Johannes Gäßler 2025-07-16 09:33:28 +02:00
4b91d6f71f convert : only check for tokenizer folder if we need it (#14704) Sigbjørn Skjæret 2025-07-16 08:52:04 +02:00
cf91f217f1 convert : add pre-computed hashes first to prevent order mishaps (#14701) Sigbjørn Skjæret 2025-07-16 08:51:12 +02:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full