enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

0fff7fd798 docs : vulkan build instructions to use git bash mingw64 (#10303) FirstTimeEZ 2024-11-17 12:29:18 +13:00
4e54be0ec6 llama/ex: remove --logdir argument (#10339) Johannes Gäßler 2024-11-16 23:00:41 +01:00
db4cfd5dbc llamafile : fix include path (#0) Georgi Gerganov 2024-11-16 17:58:56 +02:00
8ee0d09ae6 make : auto-determine dependencies (#0) Georgi Gerganov 2024-11-16 17:58:32 +02:00
bcdb7a2386 server: (web UI) Add samplers sequence customization (#10255) MaggotHATE 2024-11-16 18:26:54 +05:00
f245cc28d4 scripts : fix missing key in compare-llama-bench.py (#10332) Georgi Gerganov 2024-11-16 10:32:50 +02:00
772703c8ff vulkan: Optimize some mat-vec mul quant shaders (#10296) Jeff Bolz 2024-11-16 00:26:57 -06:00
dd3a6ce9f8 vulkan : add cmake preset debug/release (#10306) FirstTimeEZ 2024-11-16 14:59:33 +13:00
1e58ee1318 ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324) Dan Johansson 2024-11-16 01:53:37 +01:00
89e4caaaf0 llama : save number of parameters and the size in llama_model (#10286) FirstTimeEZ 2024-11-16 13:42:13 +13:00
74d73dc85c Make updates to fix issues with clang-cl builds while using AVX512 flags (#10314) Srihari-mcw 2024-11-16 02:57:00 +05:30
4047be74da scripts: update compare-llama-bench.py (#10319) Johannes Gäßler 2024-11-15 21:19:03 +01:00
883d206fbd ggml : fix some build issues slaren 2024-11-15 20:20:54 +01:00
09ecbcb596 cmake : fix ppc64 check (whisper/0) Georgi Gerganov 2024-11-15 15:35:22 +02:00
3225008973 ggml : vulkan logs (whisper/2547) thewh1teagle 2024-11-15 15:33:53 +02:00
cbf5541a82 sync : ggml Georgi Gerganov 2024-11-15 15:31:16 +02:00
18429220bd AVX BF16 and single scale quant optimizations (#10212) Eve 2024-11-15 11:47:58 +00:00
f0204a0ec7 ci: build test musa with cmake (#10298) R0CKSTAR 2024-11-15 19:47:25 +08:00
57f8355b29 sycl: Update Intel docker images to use DPC++ 2025.0 (#10305) Romain Biessy 2024-11-15 12:10:45 +01:00
9901068ac7 server : (web UI) add copy button for code block, fix api key (#10242) Xuan Son Nguyen 2024-11-15 05:48:49 -04:00
231f9360d9 cann: dockerfile and doc adjustment (#10302) Chenguang Li 2024-11-15 15:09:35 +08:00
4802ad350b scripts : fix regex in sync [no ci] Georgi Gerganov 2024-11-15 08:38:43 +02:00
5a54af4d4f sycl: Use syclcompat::dp4a (#10267) Romain Biessy 2024-11-15 04:09:12 +01:00
1607a5e5b0 backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921) Charles Xu 2024-11-15 01:28:50 +01:00
ae8de6d50a ggml : build backends as libraries (#10256) Diego Devesa 2024-11-14 18:04:35 +01:00
4a8ccb37ad CUDA: no -sm row for very small matrices (#10185) Johannes Gäßler 2024-11-14 13:00:15 +01:00
2a82891a85 speculative : fix out-of-bounds access (#10289) Georgi Gerganov 2024-11-14 11:44:15 +02:00
af148c9386 vulkan: Optimize binary ops (#10270) Jeff Bolz 2024-11-13 23:22:55 -06:00
66798e42fb vulkan: Use macros to make the mat mul pipeline creation more concise (#10259) Jeff Bolz 2024-11-13 14:59:47 -06:00
fb4a0ec083 llama : propagate the results of graph_compute (#9525) Michael Podvitskiy 2024-11-13 20:00:35 +02:00
5ea926dad7 sync : ggml Georgi Gerganov 2024-11-13 18:11:54 +02:00
1ee9eea094 docs : update bindings list (#10261) Small Grass Forest 2024-11-13 19:17:10 +08:00
ff7fb670d0 server : add missing docs (#10269) Alexey Parfenov 2024-11-13 11:16:30 +00:00
0e712a5acb server : fix incorrect res in validate_model_chat_template (#10272) Jhen-Jie Hong 2024-11-13 19:15:23 +08:00
a0ec17b32e metadata: Detailed Dataset Authorship Metadata (#8875) Brian 2024-11-13 21:10:38 +11:00
2e82ffa4af sycl : Fixes to broken builds and test-backend-ops (#10257) Alberto Cabrera Pérez 2024-11-13 09:40:57 +00:00
80dd7ff22f vulkan: Optimize contiguous copies (#10254) Jeff Bolz 2024-11-13 00:58:57 -06:00
54ef9cfc72 vulkan: Throttle the number of shader compiles during the build step. (#10222) Jeff Bolz 2024-11-11 11:13:51 -06:00
b0cefea58a metal : more precise Q*K in FA vec kernel (#10247) Georgi Gerganov 2024-11-11 08:39:13 +02:00
b141e5f6ef server : enable KV cache defrag by default (#10233) Georgi Gerganov 2024-11-11 08:38:43 +02:00
4b3a9212b6 flake.lock: Update (#10243) Georgi Gerganov 2024-11-10 21:45:25 +02:00
505f33274d server : (web UI) Add back sampler settings (#10239) MaggotHATE 2024-11-11 00:42:25 +05:00
160687b3ed vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10226) Jeff Bolz 2024-11-10 05:37:56 -06:00
6423c65aa8 metal : reorder write loop in mul mat kernel + style (#10231) Georgi Gerganov 2024-11-09 11:53:13 +02:00
39a334a9aa metal : fix build and some more comments (#10229) Georgi Gerganov 2024-11-09 11:53:02 +02:00
bb38cdd8ba metal : fix F32 accumulation in FA vec kernel (#10232) Georgi Gerganov 2024-11-09 11:52:45 +02:00
f018acba22 llama : fix Qwen model type strings Georgi Gerganov 2024-11-09 11:26:34 +02:00
46323fa9ef metal : hide debug messages from normal log Georgi Gerganov 2024-11-09 11:21:49 +02:00
5b359bb1e3 ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) SXX 2024-11-09 15:35:46 +08:00
e89213492d ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) amritahs-ibm 2024-11-09 12:47:50 +05:30
8fc393f246 scripts : fix pattern and get n_tokens in one go (#10221) haopeng 2024-11-09 15:06:54 +08:00
ec450d3bbf metal : opt-in compile flag for BF16 (#10218) Georgi Gerganov 2024-11-08 21:59:46 +02:00
695ad752b2 metal : improve clarity (minor) (#10171) Georgi Gerganov 2024-11-08 18:37:41 +02:00
841f27abdb metal : optimize FA kernels (#10171) Georgi Gerganov 2024-11-08 13:47:22 +02:00
d05b3127bd swift : exclude ggml-metal-embed.metal (#10211) Jhen-Jie Hong 2024-11-08 17:34:06 +08:00
76c6e7f105 server : minor UI fix (#10207) Xuan Son Nguyen 2024-11-07 18:44:38 -04:00
a71d81cf8c server : revamp chat UI with vuejs and daisyui (#10175) Xuan Son Nguyen 2024-11-07 17:31:10 -04:00
eec4d71737 scripts : add amx to sync-ggml.sh [no ci] Georgi Gerganov 2024-11-07 23:11:36 +02:00
3b08828674 sync : ggml Georgi Gerganov 2024-11-07 23:08:24 +02:00
a2c6fd747c scripts : sync update Georgi Gerganov 2024-11-07 23:07:55 +02:00
97404c4a03 ggml : add ggml-cpu.h to the public headers (#10204) Diego Devesa 2024-11-07 18:16:08 +01:00
60e17ce23c Remove identical wte/etw logic for jais (#10203) Faisal Zaghloul 2024-11-07 11:46:12 -05:00
5107e8cea3 DRY: Fixes clone functionality (#10192) wwoodsTM 2024-11-07 08:20:25 -07:00
2319126a70 fix q4_0_8_8 format for corrupted tokens issue (#10198) snadampal 2024-11-07 02:02:08 -06:00
3bcd40b3c5 Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) Zhiyuan Li 2024-11-07 18:19:10 +11:00
5c333e0140 metal : add BF16 support (#8439) Georgi Gerganov 2024-11-06 19:53:51 +02:00
b11f9ba9b8 server : remove hack for extra parallel slot (#10187) Georgi Gerganov 2024-11-06 13:29:01 +02:00
94d8cb8be1 metal : fix from ptr buffer name (#10189) Diego Devesa 2024-11-06 12:10:07 +01:00
1dc04b2dee ggml : adjust is_first_call init value (#10193) Georgi Gerganov 2024-11-06 11:20:10 +02:00
a1eaf6a960 metal : add quantized FA support (#10149) Georgi Gerganov 2024-11-06 10:24:23 +02:00
b8deef0ec0 llama : add <|tool_call|> formatting to Granite template (#10177) Gabe Goodhart 2024-11-05 05:23:04 -07:00
a9e8a9a030 ggml : fix arch check in bf16_to_fp32 (#10164) Diego Devesa 2024-11-04 23:17:01 +01:00
3407364776 Q6_K AVX improvements (#10118) Eve 2024-11-04 22:06:31 +00:00
d5a409e57f ggml : fix gelu tables initialization (#10172) Diego Devesa 2024-11-04 20:06:58 +01:00
401558b7ba ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167) Diego Devesa 2024-11-04 17:34:08 +01:00
9e0ecfb697 server : clarify /slots endpoint, add is_processing (#10162) Xuan Son Nguyen 2024-11-04 16:33:29 +01:00
6a066b9978 fix build break on arm64 linux (#10166) snadampal 2024-11-04 09:08:33 -06:00
ea02c753eb cuda : clear error after changing peer access (#10153) Diego Devesa 2024-11-04 13:10:23 +01:00
05697f670b metal : simplify f16 and f32 dequant kernels (#0) Georgi Gerganov 2024-11-04 13:49:34 +02:00
f8e58135cf metal : move dequantize templates to beginning of MSL source (#0) Georgi Gerganov 2024-11-04 13:43:32 +02:00
329ed914c9 CANN: adjust backend registry refactor. (#10158) leo-pony 2024-11-04 19:08:22 +08:00
ce027adfb3 sync : ggml Georgi Gerganov 2024-11-04 10:33:37 +02:00
284e5b0275 cmake : make it possible linking ggml as external lib (ggml/1003) Yuri Khrustalev 2024-11-02 05:09:12 -04:00
e2292aaa17 metal : fix minor string leaks (ggml/1004) Plamen Minev 2024-11-01 16:55:10 +02:00
9f40989351 ggml : move CPU backend to a separate file (#10144) Diego Devesa 2024-11-03 19:34:08 +01:00
08828a6d7d metal : minor fixup in FA kernel (#10143) Georgi Gerganov 2024-11-03 15:18:40 +02:00
1839f69130 flake.lock: Update (#10146) Georgi Gerganov 2024-11-03 15:14:15 +02:00
9830b6923b Add apple arm to presets (#10134) Christian Köhnenkamp 2024-11-02 23:35:31 +01:00
42cadc74bd server : fix slot selection by lru (#10126) sasha0552 2024-11-02 16:34:56 +00:00
45950415ed server : fix endpoint checks (#10135) Georgi Gerganov 2024-11-02 18:34:00 +02:00
1926d6e39d llama : adjust default context size + print warnings (#10136) Georgi Gerganov 2024-11-02 15:18:56 +02:00
b634f8a26f simple-chat : only add bos on first prompt (#10129) Diego Devesa 2024-11-02 13:08:53 +01:00
7554aa4655 convert-lora : make --base optional (#10110) Xuan Son Nguyen 2024-11-02 12:53:17 +01:00
a6744e43e8 llama : add simple-chat example (#10124) Diego Devesa 2024-11-01 23:50:59 +01:00
e991e3127f llama : use smart pointers for ggml resources (#10117) Diego Devesa 2024-11-01 23:48:26 +01:00
418f5eef26 vulkan : improve ggml_vk_create_buffer error handling (#9898) Shupei Fan 2024-11-02 02:33:14 +08:00
ba6f62eb79 readme : update hot topics Georgi Gerganov 2024-11-01 17:31:51 +02:00
d865d1478c server : fix smart selection of available slot (#10120) sasha0552 2024-11-01 13:33:14 +00:00
1804adb0cf ggml : remove ggml_scratch (#10121) Georgi Gerganov 2024-11-01 12:58:45 +02:00
815fe72adc sync : ggml Georgi Gerganov 2024-11-01 10:28:24 +02:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full