enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

0827b2c1da ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) Srihari-mcw 2024-12-31 19:53:33 +05:30
45095a61bf server : clean up built-in template detection (#11026) Xuan Son Nguyen 2024-12-31 15:22:01 +01:00
5896c65232 server : add OAI compat for /v1/completions (#10974) Xuan Son Nguyen 2024-12-31 12:34:13 +01:00
bc7b1f8632 convert : fix Llama-3_1-Nemotron-51B rope settings (#11008) ymcki 2024-12-31 19:04:48 +08:00
6e1531aca5 common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013) Peter 2024-12-31 11:46:06 +11:00
716bd6dec3 vulkan: optimize mul_mat for small values of N (#10991) Jeff Bolz 2024-12-30 11:27:11 -06:00
c250ecb315 android : fix llama_batch free (#11014) ag2s20150909 2024-12-30 20:35:13 +08:00
a813badbbd vulkan: im2col and matmul optimizations for stable diffusion (#10942) Jeff Bolz 2024-12-29 03:16:34 -06:00
fdd2188912 vulkan: Use push constant offset to handle misaligned descriptors (#10987) Jeff Bolz 2024-12-29 02:35:11 -06:00
f865ea149d server: added more docs for response_fields field (#10995) Isaac McFadyen 2024-12-28 10:09:19 -05:00
16cdce7b68 server : fix token duplication when streaming with stop strings (#10997) Alexey Parfenov 2024-12-28 15:08:54 +00:00
d79d8f39b4 vulkan: multi-row k quants (#10846) Eve 2024-12-26 10:54:44 -05:00
d283d02bf2 examples, ggml : fix GCC compiler warnings (#10983) Peter 2024-12-27 00:59:11 +11:00
9ba399dfa7 server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) Reza Kakhki 2024-12-24 21:33:04 +01:00
2cd43f4900 ggml : more perfo with llamafile tinyblas on x86_64 (#10714) Djip007 2024-12-24 18:54:49 +01:00
09fe2e7613 server: allow filtering llama server response fields (#10940) NeverLucky 2024-12-24 19:39:49 +03:00
30caac3a68 llama : the WPM vocabs use the CLS token as BOS (#10930) Georgi Gerganov 2024-12-24 09:44:20 +02:00
60cfa728e2 ggml : use wstring for backend search paths (#10960) Diego Devesa 2024-12-24 04:05:27 +01:00
3327bb0f8d ggml : fix arm enabled features check (#10961) Diego Devesa 2024-12-24 04:05:17 +01:00
32d6ee6385 ggml : fix const usage in SSE path (#10962) Diego Devesa 2024-12-23 20:25:52 +01:00
14b699ecde server : fix missing model id in /model endpoint (#10957) Xuan Son Nguyen 2024-12-23 12:52:25 +01:00
485dc01214 server : add system_fingerprint to chat/completion (#10917) Xuan Son Nguyen 2024-12-23 12:02:44 +01:00
86bf31cfe6 rpc-server : add support for the SYCL backend (#10934) Radoslav Gerganov 2024-12-23 10:39:30 +02:00
b92a14a841 llama : support InfiniAI Megrez 3b (#10893) Yun Dou 2024-12-23 08:35:44 +08:00
6f0c9e034b llama : support for Llama-3_1-Nemotron-51B (#10669) ymcki 2024-12-23 08:22:33 +08:00
dab76c92cc llama-run : include temperature option (#10899) Eric Curtin 2024-12-23 00:21:40 +00:00
7024d59e6a ggml : fix run-time on FreeBSD in get_executable_path() (#10948) yuri@FreeBSD 2024-12-22 16:20:11 -08:00
7c0e285858 devops : add docker-multi-stage builds (#10832) Rudi Servo 2024-12-22 21:22:58 -01:00
7ae33a616f llama : add Falcon3 support (#10883) Billel Mokeddem 2024-12-23 01:09:58 +03:00
ebdee9478c vulkan: build fixes for 32b (#10927) Jeff Bolz 2024-12-22 03:44:01 -06:00
5cd85b5e00 convert : add BertForMaskedLM (#10919) Georgi Gerganov 2024-12-21 10:10:18 +02:00
a91a41364b vulkan: optimize coopmat2 dequant functions (#10855) Jeff Bolz 2024-12-21 01:04:45 -06:00
e34c5af43f ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() (#10874) Adrien Gallouët 2024-12-21 00:33:37 +01:00
eb5c3dc64b SYCL: Migrate away from deprecated ggml_tensor->backend (#10840) Akarshan Biswas 2024-12-20 21:01:28 +05:30
0ca416c91a server : (UI) fix copy to clipboard function (#10916) Xuan Son Nguyen 2024-12-20 14:12:06 +01:00
21ae3b9be8 ggml : add test for SVE and disable when it fails (#10906) Diego Devesa 2024-12-20 13:31:28 +01:00
0a11f8b7b5 convert : fix RWKV v6 model conversion (#10913) Molly Sophia 2024-12-20 17:44:58 +08:00
d408bb9268 clip : disable GPU support (#10896) Georgi Gerganov 2024-12-19 18:47:15 +02:00
5cab3e4aaa llama : minor grammar refactor (#10897) Georgi Gerganov 2024-12-19 17:42:13 +02:00
36319dec5d tts : small QoL for easy model fetch (#10903) Georgi Gerganov 2024-12-19 17:35:15 +02:00
57bb2c40cd server : fix logprobs, make it OAI-compatible (#10783) Xuan Son Nguyen 2024-12-19 15:40:08 +01:00
a3c33b1dce ggml: fix arm build with gcc (#10895) Adrien Gallouët 2024-12-19 14:20:41 +01:00
2fffc52b50 llama : fix Roberta embeddings (#10856) Sukriti Sharma 2024-12-19 06:04:51 -07:00
7585edbdeb convert : Add support for Microsoft Phi-4 model (#10817) fairydreaming 2024-12-19 10:37:12 +01:00
cd920d0ac3 tests: disable GGUF test for bad value size (#10886) Johannes Gäßler 2024-12-19 08:53:58 +01:00
7909e8588d llama-run : improve progress bar (#10821) Eric Curtin 2024-12-19 02:58:00 +00:00
9177484f58 ggml : fix arm build (#10890) Diego Devesa 2024-12-18 23:21:42 +01:00
0bf2d10c55 tts : add OuteTTS support (#10784) Georgi Gerganov 2024-12-18 19:27:21 +02:00
7bbb5acf12 server: avoid overwriting Authorization header (#10878) Gaetan Bisson 2024-12-18 04:00:07 -10:00
152610eda9 server : output embeddings for all tokens when pooling = none (#10861) Georgi Gerganov 2024-12-18 13:01:41 +02:00
0e70ba686e server : add "tokens" output (#10853) Georgi Gerganov 2024-12-18 11:05:29 +02:00
46828872c3 server : (embeddings) using same format for "input" and "content" (#10872) Xuan Son Nguyen 2024-12-18 09:55:09 +01:00
6b064c92b4 docs: Fix HIP (née hipBLAS) in README (#10880) redbeard 2024-12-18 00:35:00 -08:00
4da69d1abd Revert "llama : add Falcon3 support (#10864)" (#10876) Diego Devesa 2024-12-18 01:36:46 +01:00
d62b532c52 Use model->gguf_kv for loading the template instead of using the C API. (#10868) DAN™ 2024-12-17 17:24:22 -05:00
081b29bd2a tests: add tests for GGUF (#10830) Johannes Gäßler 2024-12-17 19:09:35 +01:00
5437d4aaf5 sync : ggml Georgi Gerganov 2024-12-17 18:36:02 +02:00
78f766768d cmake : fix "amd64" processor string (whisper/2638) Georgi Gerganov 2024-12-17 18:34:32 +02:00
8dd19a4812 vulkan : fix soft_max.comp division by zero (whisper/2633) gn64 2024-12-16 19:34:38 +09:00
130d0c90bd ggml : remove return from ggml_gallocr_allocate_node (ggml/1048) Daniel Bevenius 2024-12-14 03:23:08 +01:00
3919da8e33 ggml : add check for grad_accs (ggml/1046) Daniel Bevenius 2024-12-13 08:19:38 +01:00
0006f5a74a ggml : update ggml_backend_cpu_device_supports_op (#10867) Georgi Gerganov 2024-12-17 18:35:42 +02:00
05c3a444b8 server : fill usage info in embeddings and rerank responses (#10852) krystiancha 2024-12-17 16:00:24 +00:00
382bc7f2e8 llama : add Falcon3 support (#10864) Billel Mokeddem 2024-12-17 19:24:56 +04:00
4f51968aca readme : update typos (#10863) Ruan 2024-12-17 17:47:20 +08:00
227d7c5a7f server : (UI) fix missing async generator on safari (#10857) Xuan Son Nguyen 2024-12-17 09:52:09 +01:00
7b1ec53f56 vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809) Eve 2024-12-17 05:52:55 +00:00
160bc039c8 rwkv6: add wkv6 support for Vulkan backend (#10829) Zhiyuan Li 2024-12-17 05:00:46 +08:00
08ea539df2 unicode : improve naming style (#10838) Georgi Gerganov 2024-12-16 12:31:45 +02:00
644fd71b44 sampling : refactor + optimize penalties sampler (#10803) Georgi Gerganov 2024-12-16 12:31:14 +02:00
4ddd199f6f llava : Allow locally downloaded models for QwenVL (#10833) Bartowski 2024-12-15 15:43:25 -05:00
a0974156f3 llama : add Deepseek MoE v1 & GigaChat models (#10827) Valentin Mamedov 2024-12-16 00:02:46 +07:00
87cf323cef scripts : change build path to "build-bench" for compare-commits.sh (#10836) Georgi Gerganov 2024-12-15 18:44:47 +02:00
5478bbcd17 server: (UI) add syntax highlighting and latex math rendering (#10808) Vinesh Janarthanan 2024-12-15 05:55:54 -06:00
b5ae1ddff9 gguf-py : bump to v0.13.0 Georgi Gerganov 2024-12-15 13:16:42 +02:00
89d604f2c8 server: Fix has_next_line in JSON response (#10818) Michelle Tan 2024-12-14 22:29:45 +00:00
e52aba537a nix: allow to override rocm gpu targets (#10794) Evgeny Kurnevsky 2024-12-14 18:17:36 +00:00
ba1cb19cdd llama : add Qwen2VL support + multimodal RoPE (#10361) HimariO 2024-12-14 20:43:46 +08:00
56eea0781c Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default (#10771) cduk 2024-12-13 23:21:49 +01:00
a76c56fa1a Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693) lhez 2024-12-13 12:23:52 -08:00
c27ac678dd Opt class for positional argument handling (#10508) Eric Curtin 2024-12-13 18:34:25 +00:00
11e07fd63b fix: graceful shutdown for Docker images (#10815) Corentin REGAL 2024-12-13 18:23:50 +01:00
4601a8bb67 gguf-py : numpy 2 newbyteorder fix (#9772) Jett Janiak 2024-12-13 15:48:44 +01:00
9f35e44592 Fix crash caused by ggml_backend_load_all when launching on Android Activity (#10812) 谢乃闻 2024-12-13 12:56:07 +00:00
64ae065511 vulkan: small mul_mat_vec optimizations (#10665) Eve 2024-12-13 08:42:04 +00:00
83ed24a97b SYCL: Reduce most of the compiler warnings (#10748) Akarshan Biswas 2024-12-13 12:12:15 +05:30
d583cd03f6 ggml : Fix compilation issues on ARM platform when building without fp16 (#10811) Karol Kontny 2024-12-13 01:04:19 +01:00
adffa6ffd5 common : improve -ctv -ctk CLI arguments (#10806) Xuan Son Nguyen 2024-12-12 22:53:05 +01:00
274ec65af6 contrib : add ngxson as codeowner (#10804) Xuan Son Nguyen 2024-12-12 20:52:28 +01:00
8faa1d4dd4 CUDA: faster non-contiguous concat (#10760) a3sh 2024-12-13 02:09:50 +08:00
cb13ef85a4 remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) Diego Devesa 2024-12-12 19:02:49 +01:00
4064c0e3b6 Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#10798) 0cc4m 2024-12-12 18:36:00 +01:00
dc5301d565 Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats (#10721) 0cc4m 2024-12-12 18:35:37 +01:00
9fdb124304 common : add missing env var for speculative (#10801) Xuan Son Nguyen 2024-12-12 16:57:32 +01:00
5555c0c1f6 docs: update server streaming mode documentation (#9519) CentricStorm 2024-12-11 22:40:40 +00:00
973f328b1e Merge pull request #10788 from ggerganov/gg/gguf-py-0.11.0 Georgi Gerganov 2024-12-11 23:14:46 +02:00
fb18934a97 gguf-py : bump version to 0.11.0 Georgi Gerganov 2024-12-11 23:13:31 +02:00
235f6e14bf server : (UI) add tok/s, get rid of completion.js (#10786) Xuan Son Nguyen 2024-12-11 20:52:14 +01:00
1a31d0dc00 Update README.md (#10772) qingy1337 2024-12-11 07:16:32 -08:00
92f77a640f ci : pin nodejs to 22.11.0 (#10779) Xuan Son Nguyen 2024-12-11 14:59:41 +01:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full