enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

46d9caa27a model-conversion : add mmproj conversion target (#15628) Daniel Bevenius 2025-08-28 09:26:48 +02:00
5a0e3ef6f0 cuda: Add cublasLt_static linking when GGML_STATIC is enabled (#15622) matiaslin 2025-08-27 17:32:36 -07:00
fbef0fad7a server: higher timeout for tests (#15621) Johannes Gäßler 2025-08-27 20:58:09 +02:00
da54f9f1a2 presets : add qwen3-30B-a3b FIM (#15616) Georgi Gerganov 2025-08-27 15:48:07 +03:00
47373271f9 HIP: Enable support for ggml_backend_cuda_register_host_buffer (#15615) uvos 2025-08-27 13:58:54 +02:00
1bded5a3b3 kv-cache : better estimate of n_kv for multi-sequence batches (#15610) Georgi Gerganov 2025-08-27 13:55:12 +03:00
1e7489745a CANN: refactor mask handling and improve performance in FA (#15561) Chenguang Li 2025-08-27 17:21:41 +08:00
1cf123a343 ggml-cpu : add basic RVV support for vector f32 ops (#15057) xctan 2025-08-27 16:44:22 +08:00
fcca2182a1 common : add -m to bash completion for --model [no ci] (#15591) Daniel Bevenius 2025-08-27 10:28:53 +02:00
86076f92de OpenCL: add fused group_norm/norm, mul, add (#15314) rmatif 2025-08-27 08:36:05 +02:00
bcbddcd54f tests : fix test-opt with GGML_BACKEND_DL (#15599) Diego Devesa 2025-08-26 13:14:38 -07:00
8b69686136 SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (#15592) Akarshan Biswas 2025-08-27 00:27:49 +05:30
8ce3ff1d91 mtmd : fix mtmd ios build (#15579) fidoriel 2025-08-26 20:05:50 +02:00
44b1efa41a tests: add performance test for mul mat id (#15543) Eve 2025-08-26 15:42:49 +00:00
a6a58d6478 llamafile: PowerPC Sgemm Optimization (#15558) shalinib-ibm 2025-08-26 21:05:25 +05:30
0373486dbc graph : fix assert in memory-less build_attn (#15590) Georgi Gerganov 2025-08-26 17:45:17 +03:00
62cef26ac5 model-conversion : add qat-q4 quantization targets (#15588) Daniel Bevenius 2025-08-26 16:12:29 +02:00
8f5afa94c4 CUDA: return -1 for nonexistent compiled arch (#15587) Johannes Gäßler 2025-08-26 16:01:20 +02:00
b3964c1e89 metal : optimize FA vec for large sequences and BS <= 8 (#15566) Georgi Gerganov 2025-08-26 14:22:14 +03:00
79a546220c mtmd : support Kimi VL model (#15458) Xuan-Son Nguyen 2025-08-26 12:54:19 +02:00
85cc1ae998 context : print graph stats for memory-less contexts (#15586) Georgi Gerganov 2025-08-26 12:47:00 +03:00
1d8d83deaa metal : improve MUL_MAT_ID (#15541) Georgi Gerganov 2025-08-26 12:46:15 +03:00
c4e9239064 model : support MiniCPM-V 4.5 (#15575) tc-mb 2025-08-26 16:05:55 +08:00
39842a7f73 gguf-py : remove erroneous FFN_GATE entry (#15583) Sigbjørn Skjæret 2025-08-26 09:08:08 +02:00
0fd90db585 metal : remove contiguous assertion for src0 in IM2COL (#15577) Sigbjørn Skjæret 2025-08-26 08:51:43 +02:00
4c37636b3e Add a warning for special devices (#15563) Yoshi_likes_e4 2025-08-26 13:15:33 +07:00
34bdbbd7c2 vulkan: Remove splitting for mul_mat_id (#15568) Jeff Bolz 2025-08-25 23:42:44 -05:00
74f52f77f2 CUDA: Accelerate MXFP4 table lookup using __byte_perm (#15451) Qeeweew 2025-08-26 05:21:22 +08:00
f7207b0415 opencl: fix support ops condition for rms_norm (#15560) lhez 2025-08-25 14:18:09 -07:00
4d917cd4f6 vulkan: fix min subgroup 16 condition for mmid subgroup optimization (#15565) Ruben Ortlam 2025-08-25 17:56:59 +02:00
886b97a5d6 tests: Generate unique input values for count_equal (#15487) Jeff Bolz 2025-08-25 10:47:16 -05:00
111f8d06f0 metal: fix regression when no metal devices are present (#15531) Ihar Hrachyshka 2025-08-25 11:27:34 -04:00
5eff6ec9b1 CUDA: MoE helper in device code, better tile sizes (#15525) Johannes Gäßler 2025-08-25 17:23:40 +02:00
dfd9b5f6c7 model-conversion : set pooling type to none in logits.cpp (#15564) Daniel Bevenius 2025-08-25 15:00:43 +02:00
5a6bc6b1a6 model-conversion : add model card template for embeddings [no ci] (#15557) Daniel Bevenius 2025-08-25 14:25:25 +02:00
6b64f74b55 batched-bench : fix unified KV cache handling + pp timing (#15562) Georgi Gerganov 2025-08-25 13:56:43 +03:00
0d5a470223 convert : update Ernie 4.5 dense architecture name (#15555) Weizhao Ouyang 2025-08-25 17:15:06 +08:00
b0ba31f525 metal : add FA kernels for HS=40 (#15559) Georgi Gerganov 2025-08-25 10:14:48 +03:00
7da9fed0d6 convert : support interns1-mini (#15412) RunningLeon 2025-08-25 14:32:16 +08:00
c247d06f38 CANN: ROPE cache sin/cos repeat (#15501) Chenguang Li 2025-08-25 10:32:21 +08:00
043fb27d38 vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices (#15524) Ruben Ortlam 2025-08-24 19:36:36 +02:00
b730706a49 kv-cache : support layer reuse (#15504) Georgi Gerganov 2025-08-24 13:07:07 +03:00
c9a24fb932 vulkan: Support FA with any multiple of 8 head sizes (#15537) Jeff Bolz 2025-08-24 04:24:25 -05:00
a9c6ffcbfa vulkan: enable Conv2D for Apple after MoltenVK fixed the bug (#15526) Ruben Ortlam 2025-08-24 10:48:53 +02:00
e78cf0d4b1 vulkan: workaround MoltenVK compile failure in multi_add (#15506) Jeff Bolz 2025-08-24 03:48:21 -05:00
710dfc465a CUDA: fix half2 -> half conversion for HIP (#15529) Johannes Gäßler 2025-08-23 21:37:06 +02:00
611f419cff vulkan: optimize rms_norm, and allow the work to spread across multiple SMs (#15281) Jeff Bolz 2025-08-23 13:16:17 -05:00
b1afcab804 model : add support for Seed-OSS (#15490) Piotr Wilkin (ilintar) 2025-08-23 15:21:52 +02:00
9ef536907d scripts: fix compare-llama-bench.py (#15521) Johannes Gäßler 2025-08-23 12:58:58 +02:00
21dc4ddaf2 chat : fix debug build assertion in trim function (#15520) LaffeyNyaa 2025-08-23 16:38:30 +08:00
289bf4113e vulkan: Rewrite synchronization to allow some overlap between nodes (#15489) Jeff Bolz 2025-08-23 02:33:36 -05:00
b55f06e1aa vulkan.Dockerfile: install vulkan SDK using tarball (#15282) R0CKSTAR 2025-08-23 14:58:57 +08:00
0a9b43e507 vulkan : support ggml_mean (#15393) Acly 2025-08-23 08:35:21 +02:00
330c3d2d21 vulkan: optimize mul_mat_id loading row ids into shared memory (#15427) Jeff Bolz 2025-08-23 01:31:54 -05:00
e92734d51b test-opt: allow slight inprecision (#15503) Johannes Gäßler 2025-08-22 23:47:01 +02:00
45363632cb ggml WebGPU: add support for quantization types (#15440) Reese Levine 2025-08-22 11:28:03 -07:00
32732f2459 model : gpt-oss add response_format support (#15494) Aldehir Rojas 2025-08-22 11:04:08 -05:00
92f7f0a53c ggml: add conv3d op (#15182) rmatif 2025-08-22 15:33:15 +02:00
b1ab91821f cuda : add Pad Reflect 1D support (#14659) Yavor Ivanov 2025-08-22 14:06:29 +03:00
9ebebef62f llama : remove KV cache defragmentation logic (#15473) Georgi Gerganov 2025-08-22 12:22:13 +03:00
ad5c975c2d ggml-cpu: Support Q5_0 and Q5_1 on s390x (#15486) Aaron Teo 2025-08-22 16:11:04 +08:00
4afb0a746f server : Support multimodal completion and embeddings prompts in JSON format (#15108) 65a 2025-08-22 08:10:14 +00:00
e288693669 readme : model : mtdm : lfm2 improvements (#15476) Tarek Dakhran 2025-08-22 09:29:08 +02:00
a0f98dd604 CANN: Optimize RMS_NORM using cache (#15419) Chenguang Li 2025-08-22 14:12:07 +08:00
54a241f505 sched : fix possible use of wrong ids tensor when offloading moe prompt processing (#15488) Diego Devesa 2025-08-21 14:09:32 -07:00
cd36b5e5c7 llama : remove deprecated llama_kv_self API (#15472) Georgi Gerganov 2025-08-21 19:13:45 +03:00
3f196be84b graph : remove build_attn_with_sinks overload (#15469) Georgi Gerganov 2025-08-21 18:44:45 +03:00
97ae5961a4 vulkan : support conv_2d_dw with f16 weights (#15392) Acly 2025-08-21 17:01:51 +02:00
20c2dac8c6 vulkan: add exp operation (#15456) Dong Won Kim 2025-08-22 00:00:16 +09:00
96452a3fa4 vulkan: Reuse conversion results in prealloc_y (#15410) Jeff Bolz 2025-08-21 09:55:00 -05:00
9ad5e60dba examples : fix some typos in examples/model-conversion/README.md (#15477) Jie Fu (傅杰) 2025-08-21 22:53:13 +08:00
715a6db02c kv-cache : drop the "unified" prefix (#15467) Georgi Gerganov 2025-08-21 17:00:33 +03:00
ad294df03f examples : install torch-cpu for model conversion tool/example (#15475) Jie Fu (傅杰) 2025-08-21 21:42:34 +08:00
029bb39eb1 ci : enable RVV1.0 native build (#15386) Ali Tariq 2025-08-21 17:52:16 +05:00
30649cab65 ci : continue file download with wget (#15471) Georgi Gerganov 2025-08-21 13:42:55 +03:00
2758fa10da examples : add model conversion tool/example (#15455) Daniel Bevenius 2025-08-21 12:16:54 +02:00
b108e42904 ci : fix -Werror=return-type in clip.cpp so ci/run.sh can run without issue (#15221) Michael Giba 2025-08-21 05:06:46 -05:00
245be739df ci : add copilot-instructions.md (#15286) Copilot 2025-08-21 11:47:52 +02:00
b2caf67db1 convert : make Mistral community chat templates optional via parameter (#15420) Julien Denize 2025-08-21 11:19:50 +02:00
2f3dbffb17 common : fix incorrect print of non-ascii characters in the logging (#15466) Jie Fu (傅杰) 2025-08-21 16:54:34 +08:00
945e1f12a6 ggml : fix condition of im2col on Metal backend (#15460) Xuan-Son Nguyen 2025-08-21 07:32:26 +02:00
1b0db8f6e0 server : fix webui (#15462) stduhpf 2025-08-21 07:19:22 +02:00
29f538ac63 examples : remove references to make in examples [no ci] (#15457) Daniel Bevenius 2025-08-21 06:12:28 +02:00
8ad038c0fd musa: add GGML_UNUSED_VARS (#15446) R0CKSTAR 2025-08-21 11:06:05 +08:00
5682a3745f sched : copy only the used experts when offloading prompt processing (#15346) Diego Devesa 2025-08-20 16:35:28 -07:00
1bc664a26a server: fix OpenAI API compatibility for usage statistics in chat streams (#15444) teo 2025-08-21 07:10:08 +09:00
13aeb7aef2 CUDA: refactor FA support/selection code (#15454) Johannes Gäßler 2025-08-20 23:14:14 +02:00
7a6e91ad26 CUDA: replace GGML_CUDA_F16 with CUDA arch checks (#15433) Johannes Gäßler 2025-08-20 16:58:49 +02:00
fec9519802 vulkan: shorten pipeline name strings (#15431) Jeff Bolz 2025-08-20 09:33:14 -05:00
657b8a77bd chat: handle gpt-oss return/end token inconsistency (#15421) Daniel Bevenius 2025-08-20 14:26:01 +02:00
ec5ab1a36c common : fix context shift help message (#15448) Jie Fu (傅杰) 2025-08-20 18:33:30 +08:00
1a99c2d948 cmake : fix target include directories (#15450) xiaobing318 2025-08-20 18:32:05 +08:00
37f10f955f make : remove make in favor of CMake (#15449) Daniel Bevenius 2025-08-20 12:31:16 +02:00
2f37014073 lookahead : add sample command to readme (#15447) Georgi Gerganov 2025-08-20 13:30:46 +03:00
a094f38143 musa: fix build warnings (#15258) R0CKSTAR 2025-08-20 10:17:37 +08:00
fb22dd07a6 opencl: mark argsort unsupported if cols exceed workgroup limit (#15375) lhez 2025-08-20 02:25:51 +08:00
9ef6b0b835 model : add gpt-oss type strings (#15424) Georgi Gerganov 2025-08-19 19:58:28 +03:00
1e19f5d462 common : Add top-nsigma sampler to help globally (#15428) Gian-Carlo Pascutto 2025-08-19 18:58:14 +02:00
d2fcd91cf9 server : disable context shift by default (#15416) Georgi Gerganov 2025-08-19 16:46:37 +03:00
a6d3cfe7fa CANN: optimize rope operator (#15335) SHUAI YANG 2025-08-19 21:28:22 +08:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full