enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

81713121ee kv-cells : track min/max used cells and per-sequence positions (#13808) Georgi Gerganov 2025-05-27 13:49:41 +03:00
f9cd68398b sampling : make sure samplers return at least 1 token (#13822) Georgi Gerganov 2025-05-27 12:07:52 +03:00
4f81b33e32 llama : validate seq id batch input (#13809) Georgi Gerganov 2025-05-27 09:40:59 +03:00
cdf94a1802 server: --offline mode (#13804) Olivier Chafik 2025-05-26 14:34:27 -07:00
a26c4cc11e scripts : add option to compare commits in Debug (#13806) Georgi Gerganov 2025-05-26 22:24:01 +03:00
4265a87b59 cuda : avoid cuGetErrorString (#13791) Georgi Gerganov 2025-05-26 22:14:52 +03:00
6f180b915c SYCL: Add non contiguous support in RMS_NORM and NORM kernels (#13611) Akarshan Biswas 2025-05-26 21:10:36 +05:30
03f582ae8f server: fix streaming crashes (#13786) Olivier Chafik 2025-05-26 08:03:57 -07:00
88c125f2ac examples/training: Fix file name in README (#13803) standby24x7 2025-05-26 23:55:24 +09:00
d74e94c1b3 server: fix format of streamed tool call deltas (diff name, fix id location) (#13800) Olivier Chafik 2025-05-26 06:56:49 -07:00
f13847cfb5 server: fix regression on streamed non-chat completion w/ stops (#13785) Olivier Chafik 2025-05-26 06:16:37 -07:00
79c137f776 examples : allow extracting embeddings from decoder contexts (#13797) Georgi Gerganov 2025-05-26 14:03:54 +03:00
22229314fc llama : clarify deprecation message (#13794) Georgi Gerganov 2025-05-26 12:57:50 +03:00
9012eb9b45 sycl: Add more debug prints (#13640) Romain Biessy 2025-05-26 10:28:53 +02:00
fef693dc6b vulkan: mark IM2COL as supporting non-contig (#13783) Jeff Bolz 2025-05-25 23:02:07 -05:00
2d38b6e400 CANN: Add the basic supports of Flash Attention kernel (#13627) Bizhao Shi 2025-05-26 10:20:18 +08:00
e121edc432 server: add --reasoning-budget 0 to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771) Olivier Chafik 2025-05-26 00:30:51 +01:00
2f099b510f webui : bump max upload file size to 500MB (#13779) Xuan-Son Nguyen 2025-05-25 19:02:18 +02:00
aa50ba462f tests : improve UGM tokenizer test coverage (#13773) Sigbjørn Skjæret 2025-05-25 16:22:29 +02:00
de2ef53a4b kv-cache : rework kv_cell (#13706) Georgi Gerganov 2025-05-25 16:34:36 +03:00
c508256db2 rpc : Fix build on OpenBSD (#13541) Percy Piper 2025-05-25 13:35:53 +01:00
40aaa8a403 mtmd : add support for Qwen2-Audio and SeaLLM-Audio (#13760) Xuan-Son Nguyen 2025-05-25 14:06:32 +02:00
a08c1d2845 docs : add Moondream2 pre-quantized link (#13745) ddpasa 2025-05-25 14:04:49 +02:00
d785f9c1fd server: fix/test add_generation_prompt (#13770) Olivier Chafik 2025-05-25 10:45:49 +01:00
4032ca4066 llama : add support for Qwen3 MoE tied word embeddings (#13768) Piotr Jasiukajtis 2025-05-25 10:29:43 +02:00
515fdbf7ed SYCL: revert "sycl: simplify bin_bcast_kernel (#13383)" (#13752) Akarshan Biswas 2025-05-25 12:38:37 +05:30
f5cd27b71d server: streaming of tool calls and thoughts when --jinja is on (#12379) Olivier Chafik 2025-05-25 01:48:08 +01:00
a2d02d5793 releases : bundle llvm omp library in windows release (#13763) Diego Devesa 2025-05-24 15:55:16 -07:00
17fc817b58 releases : enable openmp in windows cpu backend build (#13756) Diego Devesa 2025-05-24 13:27:03 -07:00
2bd1b30f69 ggml-cpu : set openmp wait time if not set (#13758) Diego Devesa 2025-05-24 13:26:47 -07:00
259469c4b5 Move GLM4 f32 attention fix to the correct function (#13750) 0cc4m 2025-05-24 16:49:12 +02:00
4c32832c59 ggml : add ggml_gelu_erf() CUDA kernel (#13719) Xuan-Son Nguyen 2025-05-24 13:06:47 +02:00
c3a2624339 vocab : fix ugm tokenizer precision (#13743) Sigbjørn Skjæret 2025-05-24 12:29:09 +02:00
ffd0eae60b CUDA: fix race condition in FA vector kernels (#13742) Johannes Gäßler 2025-05-24 11:46:19 +02:00
b775345d78 ci : enable winget package updates (#13734) Diego Devesa 2025-05-23 13:14:00 -07:00
a70a8a69c2 ci : add winget package updater (#13732) Diego Devesa 2025-05-23 13:09:38 -07:00
d13d0f6135 hparams : initialize arrays (#13728) Georgi Gerganov 2025-05-23 20:16:13 +03:00
8a2afb7520 llama : allow custom list of swa_layers (#13726) Xuan-Son Nguyen 2025-05-23 17:07:04 +02:00
9ecf3e66a3 server : support audio input (#13714) Xuan-Son Nguyen 2025-05-23 11:03:47 +02:00
faaaff5f94 CANN: Support MUL_MAT_ID for q8_0 and q4_0 (#13705) Chenguang Li 2025-05-23 16:47:53 +08:00
e16c4731c7 ggml : fix the order of ggml_unary_op (#13718) Xuan-Son Nguyen 2025-05-23 08:12:48 +02:00
1dcd01960c vulkan: support CPY from any type to itself (#13695) Jeff Bolz 2025-05-23 00:45:02 -04:00
c10ed6cbcc vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it (#13696) Jeff Bolz 2025-05-23 00:33:45 -04:00
a127ff1780 use LOG_WARN to replace std::cerr (#13657) Judd 2025-05-23 12:33:08 +08:00
3079e9ac8e release : fix windows hip release (#13707) Diego Devesa 2025-05-22 15:21:37 -07:00
8a1d206f1d tts : fix n_ubatch + make WavTokenizer cache-less (#13713) Georgi Gerganov 2025-05-22 22:21:07 +03:00
797990c4bc mtmd : add ultravox audio input (#13623) Xuan-Son Nguyen 2025-05-22 20:42:48 +02:00
ab86335760 common: Include torch package for s390x (#13699) Aaron Teo 2025-05-23 02:31:29 +08:00
cc74d5be99 server : pad small embedding batches (#13692) Georgi Gerganov 2025-05-22 16:33:39 +03:00
5be24af73d gguf-py : correct charsmap parameter typing (#13701) Sigbjørn Skjæret 2025-05-22 14:25:05 +02:00
d394a9aedc sycl : Remove waits from function calls (#13702) Nicolò Scipione 2025-05-22 13:54:43 +02:00
6b56a64690 SYCL: Avoid using with SYCL-Graph for unsupported nodes (#13587) Ewan Crawford 2025-05-22 09:24:09 +01:00
a4e8912dfd opencl: Add support for multiple devices (#12622) Henry Linjamäki 2025-05-22 02:21:45 +03:00
edbf42edfd opencl: fix couple crashes (#12795) Henry Linjamäki 2025-05-21 23:21:17 +03:00
d643bb2c79 releases : build CPU backend separately (windows) (#13642) Diego Devesa 2025-05-21 13:09:57 -07:00
8e186ef0e7 hparams : support models for which all layers use SWA (#13682) Georgi Gerganov 2025-05-21 20:00:49 +03:00
5fbfe384d4 server : improve error reporting (#13680) Georgi Gerganov 2025-05-21 19:46:56 +03:00
c76532e7ba convert : add qwen2vl support for unsloth merges (#13686) antichristHater 2025-05-21 19:40:35 +03:00
2aa777d86d examples : switch retrieval to llama_encode (#13685) Sigbjørn Skjæret 2025-05-21 16:57:38 +02:00
eb0f5c28d3 gguf-py : display the invalid gguf type (#13687) Emmanuel Ferdman 2025-05-21 17:33:54 +03:00
cf4cb59e64 ggml : add ggml_gelu_erf() (#13667) Xuan-Son Nguyen 2025-05-21 16:26:33 +02:00
0d5c742161 server : Add the endpoints /api/tags and /api/chat (#13659) Robin Davidsson 2025-05-21 15:15:27 +02:00
42158ae2e8 server : fix first message identification (#13634) Dorin-Andrei Geman 2025-05-21 16:07:57 +03:00
797f2ac062 kv-cache : simplify the interface (#13660) Georgi Gerganov 2025-05-21 15:11:13 +03:00
b44890df2e model : disable SWA for Phi models (#13676) Georgi Gerganov 2025-05-21 13:09:21 +03:00
33983057d0 musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647) R0CKSTAR 2025-05-21 09:58:49 +08:00
fb1cab201c vulkan: fix warnings (#13626) Eve 2025-05-20 21:35:16 +00:00
b7a17463ec mtmd-helper : bug fix to token batching in mtmd (#13650) l3utterfly 2025-05-21 00:55:30 +08:00
be0239693c model : fix llama4 graph (#13663) Georgi Gerganov 2025-05-20 19:21:04 +03:00
a4090d1174 llama : remove llama_kv_cache_view API + remove deprecated (#13653) Georgi Gerganov 2025-05-20 16:13:16 +03:00
b69f1647f9 CUDA: skip fully masked-out KV in FA vec kernel (#13584) Johannes Gäßler 2025-05-20 14:45:07 +02:00
759e37b0d8 tests : avoid github urls due to throttling (#13654) Sigbjørn Skjæret 2025-05-20 12:03:17 +02:00
4245e622e0 sycl: disable reorder for sycl mulmat (#13536) Svetlozar Georgiev 2025-05-20 10:34:15 +01:00
c9c64dee57 Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 to fix infinity values in output (#13639) 0cc4m 2025-05-20 10:11:56 +02:00
c00a2634be metal : fix typo in FA kernel comments (#13651) Georgi Gerganov 2025-05-20 10:41:40 +03:00
e298d2fbd0 kv-cache : add SWA support (#13194) Georgi Gerganov 2025-05-20 08:05:46 +03:00
f0adb80bf7 CANN: Update CANN model support (#13162) Xinpeng Dou 2025-05-20 11:43:43 +08:00
f7c9429c85 sycl : Overcoming workaround for mmap() allocation on Windows (#13482) Nicolò Scipione 2025-05-20 02:54:43 +02:00
1dfbf2cf3a common : add load_progress_callback (#13617) psocolovsky 2025-05-19 21:17:36 +02:00
8960efd0a6 Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence (#13607) 0cc4m 2025-05-19 17:54:08 +02:00
725f23f1f3 sycl : backend documentation review (#13544) Alberto Cabrera Pérez 2025-05-19 14:38:20 +01:00
92ecdcc06a mtmd : add vision support for llama 4 (#13282) Xuan-Son Nguyen 2025-05-19 13:04:14 +02:00
f71f40a284 ci : upgraded oneAPI version in SYCL workflows and dockerfile (#13532) Alberto Cabrera Pérez 2025-05-19 11:46:09 +01:00
d30cb5a7fa sync : ggml Georgi Gerganov 2025-05-19 12:50:29 +03:00
6c35981a64 mnist: fix segmentation fault (ggml/1227) Johannes Gäßler 2025-05-19 09:33:35 +02:00
8b5e19aea6 ggml : fix apple OS check in ggml_print_backtrace (ggml/1229) Diego Devesa 2025-05-18 18:30:13 -07:00
60aea028b5 ggml : Fix missing backtrace on Linux (ggml/1228) Daniel Tang 2025-05-17 19:06:26 -04:00
9c55e5c5c2 fix: check model pointer validity before use (#13631) Nick 2025-05-19 18:25:41 +08:00
33d7aed4a8 CANN: Support MOE Model MUL_MAT_ID (#13042) Chenguang Li 2025-05-19 14:21:17 +08:00
6a2bc8bfb7 server : added --no-prefill-assistant flag (#13608) Isaac McFadyen 2025-05-17 17:59:48 -04:00
e3a7cf6c5b cmake: use the current build config for vulkan-shaders-gen (#13595) Gilad S. 2025-05-17 21:26:43 +03:00
518329b2d4 parallel : add option for non-shared and larger prompts (#13598) Georgi Gerganov 2025-05-17 12:58:55 +03:00
2f5a4e1e09 vulkan: move common FA code to flash_attn_base.comp (#13556) Jeff Bolz 2025-05-17 16:14:55 +09:00
4f41ee11d6 vulkan: use scalar FA rather than coopmat2 when N==1 (#13554) Jeff Bolz 2025-05-17 15:35:47 +09:00
3e0be1cace llguidance : official v0.7.20 release (no actual changes) [noci] (#13594) Z 2025-05-16 14:56:28 -06:00
6aa892ec2a server : do not return error out of context (with ctx shift disabled) (#13577) Xuan-Son Nguyen 2025-05-16 21:50:00 +02:00
aea9f8b4e7 webui : improve accessibility for visually impaired people (#13551) Xuan-Son Nguyen 2025-05-16 21:49:01 +02:00
06c1e4abc1 readme : add list of dependencies and their license (#13591) Xuan-Son Nguyen 2025-05-16 20:04:18 +02:00
415e40a357 releases : use arm version of curl for arm releases (#13592) Diego Devesa 2025-05-16 10:36:51 -07:00
654a67794f metal : add FA-vec kernel for head size 64 (#13583) Georgi Gerganov 2025-05-16 20:32:58 +03:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full