enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

f52d59d771 llava : fix clip loading GGUFs with missing description (#12660) Sigbjørn Skjæret 2025-03-31 11:07:07 +02:00
52de2e5949 tts : remove printfs (#12640) marcoStocchi 2025-03-31 10:20:30 +02:00
2c3f8b850a llama : support BailingMoE (Ling) (#12634) Sigbjørn Skjæret 2025-03-30 22:21:03 +02:00
4663bd353c metal : use constexpr in FA kernels + fix typedef (#12659) Georgi Gerganov 2025-03-30 22:04:04 +03:00
b3de7cac73 llama : add Trillion 7B model support (#12556) Juyoung Suk 2025-03-31 03:38:33 +09:00
7242dd9675 llama-chat : Add Yandex instruct model template support (#12621) Sergei Vorobyov 2025-03-30 21:12:03 +03:00
492d7f1ff7 musa: fix all warnings, re-enable -DLLAMA_FATAL_WARNINGS=ON in ci and update doc (#12611) R0CKSTAR 2025-03-30 16:59:38 +08:00
d3f1f0acfb sync : ggml Georgi Gerganov 2025-03-29 15:37:54 +02:00
360dc22c00 cpu : rm unused variable (ggml/1166) Xuan-Son Nguyen 2025-03-29 11:59:56 +01:00
a62d7fa7a9 cpu: de-duplicate some of the operators and refactor (ggml/1144) cmdr2 2025-03-29 11:37:13 +05:30
e408d4351a ggml : add logging for native build options/vars (whisper/2935) Daniel Bevenius 2025-03-24 09:53:38 +01:00
3891e183c6 examples : command.wasm updates (whisper/2904) Daniel Bevenius 2025-03-20 07:02:18 +01:00
af6ae1efb2 llama : fix non-causal mask for gemma 3 (#12615) Xuan-Son Nguyen 2025-03-30 00:07:37 +01:00
0bb2919335 llama : change cpu_buft_list order: ACCEL -> GPU host -> CPU extra -> CPU (#12632) Djip007 2025-03-29 14:07:37 +01:00
a69f846351 cmake : fix ccache conflict (#12522) Jay 2025-03-29 18:04:58 +08:00
d07a0d7a79 CANN : remove clang-format in ggml-cann (#12607) hipudding 2025-03-29 18:03:28 +08:00
3714c3ee1a llama : fix incorrect Qwen2Moe ffn_moe_out graph callback (#12631) Sigbjørn Skjæret 2025-03-28 22:13:02 +01:00
b4ae50810e metal : improve FA + improve MoE (#12612) Georgi Gerganov 2025-03-28 20:21:59 +02:00
b86f600723 vulkan: fix coopmat shader generation when cross-compiling (#12272) Icenowy Zheng 2025-03-29 01:51:06 +08:00
dd373dd3bf llama: fix error on bad grammar (#12628) Johannes Gäßler 2025-03-28 18:08:52 +01:00
5d01670266 server : include speculative decoding stats when timings_per_token is enabled (#12603) Benson Wong 2025-03-28 01:05:44 -07:00
ef03229ff4 rpc : update README for cache usage (#12620) Radoslav Gerganov 2025-03-28 09:44:13 +02:00
13731766db llamafile : ppc64le GEMV forwarding for FP32. (#12594) amritahs-ibm 2025-03-28 13:13:22 +05:30
ab6ab8f809 rpc : send hash when tensor data is above some fixed threshold (#12496) Radoslav Gerganov 2025-03-28 08:18:04 +02:00
2099a9d5db server : Support listening on a unix socket (#12613) Piotr 2025-03-27 23:41:04 +01:00
2969019837 media : add SVG logo [no ci] (#12616) Georgi Gerganov 2025-03-27 23:09:05 +02:00
5dec47dcd4 opencl: add multi and vision rope, gelu_quick and im2col (#12600) lhez 2025-03-27 08:08:08 -07:00
f125b8dccf llama : add PLM GGUF Conversion & Inference Support (#12457) Si1w 2025-03-27 10:49:15 +00:00
953c2a62cf model : restore support for T5Encoder (#12590) HighDoping 2025-03-27 18:43:33 +08:00
d5c6309d91 convert : Support Qwen2_5_VLForConditionalGeneration (#12595) Csaba Kecskemeti 2025-03-27 03:11:23 -07:00
029c693fdc sync : ggml Georgi Gerganov 2025-03-27 09:36:13 +02:00
771d84371c scripts : update sync + fix cmake merge Georgi Gerganov 2025-03-27 09:22:30 +02:00
df0665a483 sync : ggml Georgi Gerganov 2025-03-27 09:01:21 +02:00
0306aad1ca cmake : sync/merge PowerPC build commands (#0) Georgi Gerganov 2025-03-27 09:00:57 +02:00
c7b43ab608 llamafile : ppc64le MMA implementation for Q4_0. (#12489) amritahs-ibm 2025-03-27 12:21:47 +05:30
24feaec057 ggml : riscv: add 128-bit RVV support (#12530) xctan 2025-03-27 14:38:34 +08:00
f28bc4c286 llama : make loras compatible with repacking (#12593) Georgi Gerganov 2025-03-27 08:24:10 +02:00
f17a3bb4e8 SYCL: implement memset ggml backend buffer interface (#12580) Akarshan Biswas 2025-03-27 07:16:00 +05:30
bd40678df7 HIP: Add support for RDNA4 targets (#12372) Slobodan Josic 2025-03-26 23:46:30 +01:00
b3298fa47a metal : refactor mat-vec code (#12569) Georgi Gerganov 2025-03-26 21:38:38 +02:00
2447ad8a98 upgrade to llguidance 0.7.10 (#12576) Michał Moskal 2025-03-26 11:06:09 -07:00
02082f1519 clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566) Ivy233 2025-03-26 22:06:04 +08:00
df4d20cd53 convert : fix squeeze for ssm_conv tensors (#12573) Georgi Gerganov 2025-03-26 14:21:05 +02:00
5ed38b6852 ggml : fix MUL_MAT_ID repack with Q8_K (#12544) Georgi Gerganov 2025-03-26 13:02:00 +02:00
fd7855f8f5 doc: [MUSA] minor changes (#12583) R0CKSTAR 2025-03-26 15:09:48 +08:00
53af4dba42 convert: fix Mistral3/Gemma3 model hparams init (#12571) Sigbjørn Skjæret 2025-03-25 23:03:10 +01:00
ef19c71769 run: de-duplicate fmt and format functions and optimize (#11596) Eric Curtin 2025-03-25 17:46:11 +00:00
053b3f9aae ggml-cpu : update KleidiAI to v1.5.0 (#12568) Dan Johansson 2025-03-25 12:10:18 +01:00
e2f560175a SYCL: disable Q4_0 reorder optimization (#12560) Akarshan Biswas 2025-03-25 16:10:18 +05:30
36ee06dd2d docs : add build instructions for KleidiAI (#12563) Dan Johansson 2025-03-25 10:35:20 +01:00
3cd3a39532 ci: [MUSA] add CI and update doc (#12562) R0CKSTAR 2025-03-25 15:45:08 +08:00
2d77d88e70 context : fix worst-case reserve outputs (#12545) Georgi Gerganov 2025-03-25 09:19:23 +02:00
c95fa362b3 ci: [SYCL] ggml-ci Use main GPU and enable sysman (#12547) Akarshan Biswas 2025-03-24 23:05:38 +05:30
2b65ae3029 opencl: simplify kernel embedding logic in cmakefile (#12503) lhez 2025-03-24 09:20:47 -07:00
48d7021c61 CI: fix SYCL build (#12546) Akarshan Biswas 2025-03-24 18:28:32 +05:30
3361e2deba docs: update: improve the Fedoa CUDA guide (#12536) Tei Home 2025-03-24 19:02:26 +08:00
00d53800e0 llama-vocab : add SuperBPE pre-tokenizer (#12532) compilade 2025-03-24 06:47:24 -04:00
7ea75035b6 CUDA: Fix clang warnings (#12540) R0CKSTAR 2025-03-24 18:28:34 +08:00
c54f6b7988 mmap : skip resource limit checks on AIX (#12541) Prajwal B Mehendarkar 2025-03-24 15:47:10 +05:30
9b169a4d4e vulkan: fix mul_mat_vec failure in backend tests (#12529) Jeff Bolz 2025-03-24 01:56:17 -05:00
77f9c6bbe5 server : Add verbose output to OAI compatible chat endpoint. (#12246) Marius Gerdes 2025-03-23 19:30:26 +01:00
18b663d8e4 install : add macports (#12518) Lars Sonchocky-Helldorf 2025-03-23 09:21:48 +01:00
fbdfefe74e llama : gemma3 : use output tensor if it exists in model weight (#12506) Xuan-Son Nguyen 2025-03-22 23:28:19 +01:00
ba932dfb50 ggml : fix quantized cpy op (#12310) Georgi Gerganov 2025-03-22 16:23:26 +02:00
fac63a3d78 musa: refine compute capability (#12493) R0CKSTAR 2025-03-22 17:11:37 +08:00
eddfb43850 vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505) Jeff Bolz 2025-03-22 03:40:11 -05:00
4375415b4a Vulkan: RTE rounding for cpy to quant (#12480) stduhpf 2025-03-21 20:34:50 +01:00
30c42ef5cb vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (#12472) Eve 2025-03-21 19:27:47 +00:00
af04481e6b model : do not repack if a GPU device is present (#12498) Georgi Gerganov 2025-03-21 16:14:29 +02:00
960e726077 chore : cleanup llama_model_loader::TENSOR_ usage (#12492) Sigbjørn Skjæret 2025-03-21 10:21:36 +01:00
ea1518e839 llama-tts : avoid crashes related to bad model file paths (#12482) marcoStocchi 2025-03-21 10:12:45 +01:00
1aa87ee53d [SYCL] Fix build on Windows when ccache enabled (#9954) (#9976) 蕭澧邦 2025-03-21 14:58:47 +08:00
9ffcc9e374 sycl: cleanup oneDNN related code (#12097) Svetlozar Georgiev 2025-03-21 02:15:56 +00:00
e04643063b webui : Prevent rerendering on textarea input (#12299) Woof Dog 2025-03-20 14:57:43 +00:00
dbb3a4739e llama : make Qwen2MoE QKV bias optional (#12477) Sigbjørn Skjæret 2025-03-20 12:49:59 +01:00
3d82dbcbce ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (#12332) Srihari-mcw 2025-03-20 17:05:34 +05:30
732b5fbf5e convert : avoid calls to tokenizer.added_tokens_decoder (#12473) Bartowski 2025-03-20 02:36:37 -04:00
568013d0cd context : clear sets containing encoder output sequence ids before storing new values (#12470) fairydreaming 2025-03-19 21:01:57 +01:00
517b5ddbf0 CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (#12183) Gaurav Garg 2025-03-20 01:22:06 +05:30
a9b59288e2 vulkan: optimize iq1 coopmat2 dequant functions (#12427) Jeff Bolz 2025-03-19 13:56:23 -05:00
0fd8487b14 Fix visionOS build and add CI (#12415) Guus Waals 2025-03-19 10:15:23 +00:00
108e53c2f1 llama : add support for GPT2, Bloom and CodeShell tied word embeddings (#12456) Sigbjørn Skjæret 2025-03-19 09:08:49 +01:00
a686171ea7 convert : Support chat_template.json (#12460) Sigbjørn Skjæret 2025-03-19 08:58:13 +01:00
c446b2edd2 vulkan: Submit once enough matmul work has been recorded (#12406) Jeff Bolz 2025-03-19 02:26:26 -05:00
d84635b1b0 opencl: improve profiling (#12442) lhez 2025-03-18 12:54:55 -07:00
75422e8bc4 graph : normalize Q, K, V shapes + sync cross attention (#12449) Georgi Gerganov 2025-03-18 21:35:19 +02:00
bb115d2bf7 musa: override warp_size of musa device to 32 (#12445) R0CKSTAR 2025-03-19 02:28:26 +08:00
29fff308c7 llama : support converting Mistral Small text-only (#12450) Xuan-Son Nguyen 2025-03-18 19:16:19 +01:00
c6af2161b2 speculative : fix seg fault in certain cases (#12454) Georgi Gerganov 2025-03-18 19:35:11 +02:00
99aa304fb9 llama : add support for EXAONE tied word embeddings (#12451) Xuan-Son Nguyen 2025-03-18 17:24:33 +01:00
8551c44d84 context : always use non-causal attention for encoder graphs (#12447) Georgi Gerganov 2025-03-18 13:05:49 +02:00
35cae5ba05 SYCL: using graphs is configurable by environment variable and compile option (#12371) Łukasz Ślusarczyk 2025-03-18 11:16:31 +01:00
810e0af3f5 server : fix warmup draft cache type (#12446) Georgi Gerganov 2025-03-18 12:05:42 +02:00
eba92d64c3 cmake : fix PowerPC build (#12241) Prajwal B Mehendarkar 2025-03-18 15:07:33 +05:30
d9a14523bb ggml : add SVE support for q6_K_q8_K (#12361) fj-y-saito 2025-03-18 17:14:39 +09:00
fd123cfead Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (#12434) 0cc4m 2025-03-18 07:21:40 +01:00
a53f7f7b88 fixed compilation warnings in ggml-sycl (#12424) Łukasz Ślusarczyk 2025-03-18 01:51:25 +01:00
7dfad387e3 llama: Add support for RWKV v7 architecture (#12412) Molly Sophia 2025-03-18 07:27:50 +08:00
60c902926c docs : bring llama-cli conversation/template docs up-to-date (#12426) Sigbjørn Skjæret 2025-03-17 21:14:32 +01:00
b1b132efcb cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394) Gaurav Garg 2025-03-17 23:55:13 +05:30

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full