enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

5364ae4ba5 llama : print hint when loading a model when no backends are loaded (#13589) Diego Devesa 2025-05-16 07:38:07 -07:00
7c07ac244d ci : add ppc64el to build-linux-cross (#13575) Sigbjørn Skjæret 2025-05-16 14:54:23 +02:00
0a338ed013 sycl : fixed compilation warnings (#13582) Łukasz Ślusarczyk 2025-05-16 12:15:29 +02:00
bc098c3cf0 minja: sync (qwen3) (#13573) Olivier Chafik 2025-05-15 23:29:10 +01:00
c6a2c9e741 gguf : use ggml log system (#13571) Diego Devesa 2025-05-15 10:13:11 -07:00
07ad2b6db3 gguf-py : fix disconnect-before-connect in editor-gui (#13569) Daniel Tang 2025-05-15 12:47:10 -04:00
c531edfa34 convert : fix conversion for llama 4 (#13567) Xuan-Son Nguyen 2025-05-15 17:40:07 +02:00
02cdd2d8b0 sycl: simplify bin_bcast_kernel (#13383) Atharva Dubey 2025-05-15 16:39:52 +01:00
64bb51cf90 sycl: reordered Q4_K MMVQ (#13109) Svetlozar Georgiev 2025-05-15 16:35:44 +01:00
9c404ed54c sycl: use oneDNN for matrices multiplication (#12972) Łukasz Ślusarczyk 2025-05-15 16:53:41 +02:00
6c8b91500e llama-bench : fix -ot with dl backends (#13563) Diego Devesa 2025-05-15 06:46:55 -07:00
3cc1f1f1d2 webui : handle PDF input (as text or image) + convert pasted long content to file (#13562) Xuan-Son Nguyen 2025-05-15 14:24:50 +02:00
c753d7bed0 server : proper error handling for missing elements in messages array (OpenAI compatible backend) (#13540) Piotr Wilkin (ilintar) 2025-05-15 08:40:58 +02:00
b2838049cc bench : handle decode errors (#13548) Georgi Gerganov 2025-05-15 05:57:02 +03:00
aa48e373f2 server: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802) Olivier Chafik 2025-05-15 02:39:51 +01:00
e3a9421b78 kv-cache : fix out-of-bounds view during reserve graph (#13547) Georgi Gerganov 2025-05-14 23:15:15 +03:00
5ab5d5fb25 arm64: optimize q6_k_q8_k kernel with i8mm (#13519) Yibo Cai 2025-05-15 03:53:52 +08:00
3198405e98 common: add partial regex support (#12808) Olivier Chafik 2025-05-14 19:50:57 +01:00
f5170c1d7a editorconfig : fix trailing whitespace from #13542 (#13546) Sigbjørn Skjæret 2025-05-14 20:22:49 +02:00
017f10b5fa fix: crash when calling llama_state_get_size on a context without a KV cache (#13542) Gilad S. 2025-05-14 19:18:18 +03:00
4696d56749 CUDA: fix crash on large batch size for quant. MoE (#13537) Johannes Gäßler 2025-05-14 16:41:02 +02:00
b7d2672082 llama : fix quantize with dl backends (#13539) Diego Devesa 2025-05-14 07:12:36 -07:00
6da34fa276 CUDA: faster Deepseek FA, add Turing support (#13435) Johannes Gäßler 2025-05-14 16:08:20 +02:00
5e7d95e22e fix: Move build_inp_pos to the top of the graph section for build_granite (#13538) Gabe Goodhart 2025-05-14 06:53:59 -06:00
053174436f server : passthrough the /models endpoint during loading (#13535) Georgi Gerganov 2025-05-14 15:42:10 +03:00
360a9c98e1 server : fix cache_tokens bug with no cache_prompt (#13533) Xuan-Son Nguyen 2025-05-14 13:35:07 +02:00
09d13d94fb cmake: simplify vulkan shader test logic (#13263) bandoti 2025-05-14 07:53:57 -03:00
24e86cae72 vulkan: KHR_coopmat flash attention (#13506) Jeff Bolz 2025-05-14 18:55:26 +09:00
bb1681fbd5 webui : use fflate for more deterministic gzip compress (#13525) Xuan-Son Nguyen 2025-05-14 10:26:12 +02:00
d486dd3e8e webui: Allow pasting file from clipboard (#13526) Luca Stefani 2025-05-14 10:07:31 +02:00
21ca987fba docs: Update link to ggml-org in multimodal.md (#13513) ddpasa 2025-05-14 09:59:12 +02:00
be1d4a13db scripts : fix compare-llama-bench.py show parameter (#13514) Sigbjørn Skjæret 2025-05-14 08:41:01 +02:00
ab3971f2a0 vulkan: workaround FA compile failures on macos (#13517) Jeff Bolz 2025-05-14 13:15:50 +09:00
e5c834f718 quantize : improve tensor-type pattern matching (#13033) Ed Addario 2025-05-13 18:12:31 +01:00
71bdbdb587 clip : clip.h become private API (⚠️ breaking change) (#13510) Xuan-Son Nguyen 2025-05-13 17:07:21 +02:00
f0995d28ce metal : use FA-vec kernel up to batch size 20 (#13496) Georgi Gerganov 2025-05-13 18:04:39 +03:00
c252e0c409 metal : optimize multi-sequence FA vec kernel (#13493) Georgi Gerganov 2025-05-13 18:04:00 +03:00
4f711afed5 ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509) Dan Johansson 2025-05-13 17:02:28 +02:00
b89d605a91 batched-bench : fix pp batch contents (#13492) Georgi Gerganov 2025-05-13 18:01:53 +03:00
b4726345ac mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change) (#13460) Xuan-Son Nguyen 2025-05-13 15:33:58 +02:00
bf79371120 scripts : support arbitrary input file formats in compare-llama-bench.py (#13455) Sigbjørn Skjæret 2025-05-13 15:31:12 +02:00
d590cd4c24 model : Granite MoE shared (#13269) Gabe Goodhart 2025-05-13 07:12:01 -06:00
1e2809bc4b sync : ggml Georgi Gerganov 2025-05-13 14:01:45 +03:00
cf0a43bb64 llama-bench : add defrag-thold, check for invalid ranges (#13487) Diego Devesa 2025-05-12 15:31:37 -07:00
f0d46ef157 opencl: remove unnecessary assert for add (#13257) lhez 2025-05-12 13:13:49 -07:00
de4c07f937 clip : cap max image size 1024 for qwen vl model (#13478) Xuan-Son Nguyen 2025-05-12 15:06:51 +02:00
10d2af0eaa llama/ggml: add LLM training support (#10544) Johannes Gäßler 2025-05-12 14:44:49 +02:00
064cc596ac context : fix state io for memory-less contexts (#13470) Georgi Gerganov 2025-05-12 15:12:27 +03:00
91159ee9df server : allow content to be null in oaicompat_completion_params_parse (#13477) Anudit Nagar 2025-05-12 18:56:42 +07:00
22cdab343b llama-bench : accept ranges for integer parameters (#13410) Diego Devesa 2025-05-12 13:08:22 +02:00
a71a4075cd ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053) Dan Johansson 2025-05-12 13:06:19 +02:00
95e18884fc CUDA: fix misaligned synchronization in FA (#13469) Johannes Gäßler 2025-05-12 10:51:21 +02:00
df8491922f ggml : add mrope kernel for metal (#13457) Xuan-Son Nguyen 2025-05-12 10:29:13 +02:00
14492144c2 enable dpcpp nightly builds with libraries (#13406) Atharva Dubey 2025-05-12 06:15:32 +01:00
c104023994 mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459) City 2025-05-12 00:39:06 +02:00
9a390c4829 tools : fix uninitialized llama_batch in server (#13436) Anthony Umfer 2025-05-11 11:08:26 -04:00
09232370fc scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451) Sigbjørn Skjæret 2025-05-11 16:20:39 +02:00
7474e00b34 CUDA: fix crash with partial offloading of MoE (#13439) Johannes Gäßler 2025-05-11 16:09:33 +02:00
7f323a589f Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B (#13386) David Huang 2025-05-11 20:18:39 +08:00
3eac209319 mtmd : support InternVL 3 38B and 78B mmproj (#13443) City 2025-05-11 11:35:52 +02:00
a634d75d1b mtmd : move helpers to dedicated file (#13442) Xuan-Son Nguyen 2025-05-11 11:34:23 +02:00
62d4250e52 docs : Fix typo in InternVL3 model name (#13440) Thomas Germer 2025-05-10 22:26:46 +02:00
0208355f42 CUDA: fix race conditions FlashAttention kernels (#13438) Johannes Gäßler 2025-05-10 22:22:48 +02:00
d2a4ef05c6 vocab : add ByteDance-Seed/Seed-Coder (#13423) Sigbjørn Skjæret 2025-05-10 22:08:07 +02:00
15e6125a39 mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434) Xuan-Son Nguyen 2025-05-10 19:57:54 +02:00
3b24d26c22 server : update docs (#13432) Xuan-Son Nguyen 2025-05-10 18:44:49 +02:00
43dfd741a5 llguidance : set tokenizer slices to default (#13424) Sigbjørn Skjæret 2025-05-10 17:19:52 +02:00
b064a51a4e ci: free_disk_space flag enabled for intel variant (#13426) Thammachart Chinvarapon 2025-05-10 21:34:48 +07:00
053367d149 mtmd : support InternVL 2.5 and 3 (#13422) Xuan-Son Nguyen 2025-05-10 16:26:42 +02:00
d8919424f1 CUDA: fix FlashAttention on Turing (#13415) Johannes Gäßler 2025-05-10 09:16:52 +02:00
7fef11766c arg : add env var to control mmproj (#13416) Xuan-Son Nguyen 2025-05-10 08:16:29 +02:00
dc1d2adfc0 vulkan: scalar flash attention implementation (#13324) Jeff Bolz 2025-05-09 23:07:07 -07:00
7c28a74e07 chore(llguidance): use tagged version that does not break the build (#13413) Helton Reis 2025-05-09 17:15:39 -03:00
33eff40240 server : vision support via libmtmd (#12898) Xuan-Son Nguyen 2025-05-09 19:29:37 +02:00
17512a94d6 sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858) Alberto Cabrera Pérez 2025-05-09 16:34:08 +01:00
611aa914ef metal : optimize MoE for large batches (#13388) Georgi Gerganov 2025-05-09 15:14:56 +03:00
0cf6725e9f CUDA: FA support for Deepseek (Ampere or newer) (#13306) Johannes Gäßler 2025-05-09 13:34:58 +02:00
27ebfcacba llama : do not crash if there is no CPU backend (#13395) Diego Devesa 2025-05-09 13:02:07 +02:00
5c86c9ed3e CUDA: fix crash on large batch size for MoE models (#13384) Johannes Gäßler 2025-05-09 12:14:04 +02:00
efb8b47eda imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389) Bartowski 2025-05-09 05:53:58 -04:00
0527771dd8 llama-run: add support for downloading models from ModelScope (#13370) R0CKSTAR 2025-05-09 17:25:50 +08:00
2189fd3b63 mtmd : fix batch_view for m-rope (#13397) Xuan-Son Nguyen 2025-05-09 11:18:02 +02:00
3f96aeff39 llama : one-off chat template fix for Mistral-Small-2503 (#13398) Xuan-Son Nguyen 2025-05-09 11:17:51 +02:00
b486ba05bf rpc : add rpc_msg_set_tensor_hash_req (#13353) Radoslav Gerganov 2025-05-09 10:31:07 +03:00
02115dcd9a vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326) Jeff Bolz 2025-05-09 02:23:41 -05:00
d9c4accaff server : (webui) rename has_multimodal --> modalities (#13393) Xuan-Son Nguyen 2025-05-09 09:06:37 +02:00
15e03282bb ci : limit write permission to only the release step + fixes (#13392) Diego Devesa 2025-05-08 23:45:22 +02:00
f05a6d71a0 mtmd : Expose helper_decode_image_chunk (#13366) Matt Clayton 2025-05-08 14:25:39 -04:00
ee01d71e58 server : (webui) fix a very small misalignment (#13387) Xuan-Son Nguyen 2025-05-08 18:51:45 +02:00
8c83449cb7 server : (webui) revamp the input area, plus many small UI improvements (#13365) Xuan-Son Nguyen 2025-05-08 15:37:29 +02:00
1a844be132 convert : support rope_scaling type and rope_type (#13349) Sigbjørn Skjæret 2025-05-08 15:34:29 +02:00
0ccc121354 mtmd : fix the calculation of n_tokens for smolvlm (#13381) welix 2025-05-08 22:03:53 +09:00
6562e5a4d6 context : allow cache-less context for embeddings (#13108) Georgi Gerganov 2025-05-08 14:28:33 +03:00
51fb96b1ff context : remove logits_all flag (#13284) Georgi Gerganov 2025-05-08 14:26:50 +03:00
70a6991edf ci : move release workflow to a separate file (#13362) Diego Devesa 2025-05-08 13:15:28 +02:00
f061021206 llama : print size and type of overridden tensors (#13364) Diego Devesa 2025-05-08 13:15:15 +02:00
8733e0cf6e sycl: addressing non-contiguous src1 mul_mats (nc and batched) (#13343) Alberto Cabrera Pérez 2025-05-08 10:08:01 +01:00
814f795e06 docker : disable arm64 and intel images (#13356) Diego Devesa 2025-05-07 16:36:33 +02:00
d879433824 sync : ggml Georgi Gerganov 2025-05-07 16:39:36 +03:00
13b0a04597 whisper: remove MSVC warnings pragmas (whisper/3090) Daniel Bevenius 2025-05-05 13:09:35 +02:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full