enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

5783575c9d Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533) Olivier Chafik 2025-01-31 08:24:29 +00:00
4a2b196d03 server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531) Olivier Chafik 2025-01-31 08:12:40 +00:00
1bd3047a93 common: Add missing va_end (#11529) Steve Grubb 2025-01-31 00:58:55 -05:00
a2df2787b3 server : update help metrics processing/deferred (#11512) Daniel Bevenius 2025-01-31 06:04:53 +01:00
553f1e46e9 ci: ccache for all github worfklows (#11516) Olivier Chafik 2025-01-30 22:01:06 +00:00
8b576b6c55 Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639) Olivier Chafik 2025-01-30 19:13:58 +00:00
27d135c970 HIP: require at least HIP 5.5 uvos 2025-01-29 19:36:00 +01:00
6af1ca48cb HIP: Prepare reduction operators for wave 64 uvos 2025-01-29 19:12:42 +01:00
c300e68ef4 CUDA/HIP: add warp_size to cuda_device_info uvos 2025-01-29 17:46:23 +01:00
3d804dec76 sync: minja (#11499) Olivier Chafik 2025-01-30 10:30:27 +00:00
ffd0821c57 vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496) mgroeber9110 2025-01-30 11:10:59 +01:00
4314e56c4f server : use lambda instead of std::bind (#11507) Daniel Bevenius 2025-01-30 11:05:00 +01:00
496e5bf46b server : (docs) added response format for /apply-template [no ci] (#11503) Isaac McFadyen 2025-01-30 04:11:53 -05:00
7919256c57 readme : reference examples relative links (#11505) Guspan Tanadi 2025-01-30 12:58:02 +07:00
e0449763a4 server : update json snippets in README.md [no ci] (#11492) Daniel Bevenius 2025-01-30 05:48:14 +01:00
eb7cf15a80 server : add /apply-template endpoint for additional use cases of Minja functionality (#11489) Nigel Bosch 2025-01-29 12:45:44 -06:00
66ee4f297c vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) Rémy Oudompheng 2025-01-29 18:29:39 +01:00
e51c47b401 server : update auto gen files comments [no ci] (#11484) Daniel Bevenius 2025-01-29 16:34:18 +01:00
2711d0215f vulkan: Catch pipeline creation failure and print an error message (#11436) Jeff Bolz 2025-01-29 09:26:50 -06:00
f0d4b29edf Parse https://ollama.com/library/ syntax (#11480) Eric Curtin 2025-01-29 12:23:10 +01:00
815857791d sync : ggml Georgi Gerganov 2025-01-29 11:25:29 +02:00
1a0e87d291 ggml : add option to not print stack on abort (ggml/1081) William Tambellini 2025-01-23 11:59:08 -08:00
d2e518e9b4 ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) issixx 2025-01-17 21:29:08 +09:00
b636228c0a embedding : enable --no-warmup option (#11475) Daniel Bevenius 2025-01-29 09:38:54 +01:00
325afb370a llama: fix missing k_cache store for rwkv6qwen2 (#11445) Molly Sophia 2025-01-29 12:07:21 +08:00
794fe23f29 cmake: add hints for locating ggml on Windows using Llama find-package (#11466) Emreerdog 2025-01-29 02:22:06 +03:00
cf8cc856d7 server : Fixed wrong function name in llamacpp server unit test (#11473) peidaqi 2025-01-28 16:03:42 -07:00
d0c08040b6 ci : fix build CPU arm64 (#11472) Xuan-Son Nguyen 2025-01-29 00:02:56 +01:00
be5ef7963f HIP: Supress transformation warning in softmax.cu uvos 2025-01-28 23:06:32 +01:00
cae9fb4361 HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080) Nikita Sarychev 2025-01-28 07:42:20 -08:00
7fee2889e6 Add github protocol pulling and http:// (#11465) Eric Curtin 2025-01-28 15:45:41 +01:00
d7d1eccacc docker: allow installing pip packages system-wide (#11437) Nuno 2025-01-28 15:17:25 +01:00
4bf3119d61 cmake : don't fail on GGML_CPU=OFF (#11457) someone13574 2025-01-28 09:15:34 -05:00
f643120bad docker: add perplexity and bench commands to full image (#11438) Nuno 2025-01-28 11:42:32 +01:00
6e84b0ab8e SYCL : SOFTMAX F16 mask support and other fixes (#11261) Akarshan Biswas 2025-01-28 15:26:58 +05:30
2b8525d5c8 Handle missing model in CLI parameters for llama-run (#11399) Michael Engel 2025-01-28 09:32:40 +01:00
a4417ddda9 Add new hf protocol for ollama (#11449) Eric Curtin 2025-01-27 19:36:10 +01:00
d6d24cd9ed AMD: parse the architecture as supplied by gcnArchName (#11244) Haus1 2025-01-27 08:58:17 -05:00
a5203b4465 llama : minor fixes for up llama load model speed (#11448) lexasub 2025-01-27 17:42:09 +04:00
df984e0147 llama: refactor llama_decode_impl (#11381) Johannes Gäßler 2025-01-27 12:07:12 +01:00
acd38efee3 metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441) Ihar Hrachyshka 2025-01-27 02:41:59 -05:00
caf773f249 docker : fix ARM build and Vulkan build (#11434) Xuan Son Nguyen 2025-01-26 22:45:32 +01:00
178a7eb952 metal : use residency sets (#11427) Georgi Gerganov 2025-01-26 20:06:16 +02:00
6f53d8a6b4 docker: add missing vulkan library to base layer and update to 24.04 (#11422) Nuno 2025-01-26 18:22:43 +01:00
19f65187cb cmake: add ggml find package (#11369) bandoti 2025-01-26 12:07:48 -04:00
1d8ee06000 rpc: fix register position (#11424) Frank Mai 2025-01-26 23:20:34 +08:00
2cc9b8c32c readme : update hot topics Georgi Gerganov 2025-01-26 14:30:15 +02:00
f35726c2fb build: apply MSVC /bigobj option to c/cpp files only (#11423) Jeff Bolz 2025-01-25 20:10:03 -06:00
4a75d19376 vulkan: compile shaders on-demand (#11406) Jeff Bolz 2025-01-25 15:29:57 -06:00
26771a1491 Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420) uvos 2025-01-25 21:01:12 +01:00
ca6baf76c1 build: add /bigobj to MSVC build (#11407) Jeff Bolz 2025-01-25 11:26:37 -06:00
6e264a905b docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419) Diego Devesa 2025-01-25 17:22:41 +01:00
49b0e3cec4 server : fix cleaning up stream task (#11418) Xuan Son Nguyen 2025-01-25 16:36:44 +01:00
20a758155b docker : fix CPU ARM build (#11403) Diego Devesa 2025-01-25 15:22:29 +01:00
00c24acb2a ci : fix line breaks on windows builds (#11409) Georgi Gerganov 2025-01-25 13:36:48 +02:00
466ea66f33 CANN: Add Ascend CANN build ci (#10217) jiahao su 2025-01-25 07:26:01 +08:00
5f0db9522f hip : Add hipGraph and VMM support to ROCM (#11362) uvos 2025-01-25 00:02:23 +01:00
c5d9effb49 CUDA: fix FP16 cuBLAS GEMM (#11396) Johannes Gäßler 2025-01-24 21:02:43 +01:00
9fbadaef4f rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356) uvos 2025-01-24 17:50:49 +01:00
9755129c27 release : pack /lib in the packages (#11392) Georgi Gerganov 2025-01-24 18:41:30 +02:00
a07c2c8a52 docs : Update readme to build targets for local docker build (#11368) Jafar Uruç 2025-01-24 13:30:13 +00:00
8137b4bb2b CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380) Johannes Gäßler 2025-01-24 12:38:31 +01:00
1af6945eb0 cmake : avoid -march=native when reproducible build is wanted (#11366) Bernhard M. Wiedemann 2025-01-24 12:21:35 +01:00
01f37edf1a Update llama-run README.md (#11386) Eric Curtin 2025-01-24 09:39:24 +00:00
c07e87f38b server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364) stduhpf 2025-01-24 09:02:38 +01:00
564804b79b tests: fix some mul_mat test gaps (#11375) Jeff Bolz 2025-01-23 14:51:24 -06:00
05f63cc9ee Update documentation (#11373) Eric Curtin 2025-01-23 20:04:31 +00:00
f7fb43cd0b Add -ngl (#11372) Eric Curtin 2025-01-23 16:16:18 +00:00
5845661640 server : add more clean up when cancel_tasks is called (#11340) Xuan Son Nguyen 2025-01-23 13:56:05 +01:00
f211d1dc10 Treat hf.co/ prefix the same as hf:// (#11350) Eric Curtin 2025-01-23 10:38:20 +00:00
955a6c2d91 Vulkan-run-test: fix mmq_wg_denoms (#11343) amd-dwang 2025-01-23 15:14:28 +08:00
1971adf55e vulkan: sort shaders for more deterministic binary (#11315) Jeff Bolz 2025-01-23 01:07:50 -06:00
5245729e33 vulkan: fix diag_mask_inf (#11323) Jeff Bolz 2025-01-23 01:01:17 -06:00
6152129d05 main : update README documentation for batch size (#11353) Diego Devesa 2025-01-22 19:22:20 +01:00
16d3df7ab0 readme : add plugin links (#11355) Georgi Gerganov 2025-01-22 19:44:26 +02:00
12c2bdf2de server : fix draft context not being released (#11354) Diego Devesa 2025-01-22 17:44:40 +01:00
c64d2becb1 minja: sync at 0f5f7f2b37 (#11352) Olivier Chafik 2025-01-22 16:16:27 +00:00
96f4053934 Adding logprobs to /v1/completions (#11344) Jiří Podivín 2025-01-22 12:51:32 +01:00
a94f3b2727 common: utils to split / join / repeat strings (from json converter) (#11342) Olivier Chafik 2025-01-22 09:51:44 +00:00
3e3357fd77 llava : support Minicpm-omni (#11289) tc-mb 2025-01-22 15:35:48 +08:00
6171c9d258 Add Jinja template support (#11016) Olivier Chafik 2025-01-21 13:18:51 +00:00
e28245f35f export-lora : fix tok_embd tensor (#11330) Xuan Son Nguyen 2025-01-21 14:07:12 +01:00
6da5bec81c rpc : better caching of the base buffer pointer (#11331) Radoslav Gerganov 2025-01-21 15:06:41 +02:00
2e2f8f093c linenoise.cpp refactoring (#11301) Eric Curtin 2025-01-21 09:32:35 +00:00
2139667ec4 metal : fix out-of-bounds write (#11314) Georgi Gerganov 2025-01-21 08:48:13 +02:00
80d0d6b4b7 common : add -hfd option for the draft model (#11318) Georgi Gerganov 2025-01-20 22:29:43 +02:00
aea8ddd516 vulkan: fix coopmat2 validation failures (#11284) Jeff Bolz 2025-01-20 10:38:32 -06:00
9f7add1cde examples : fix add_special conditions (#11311) Georgi Gerganov 2025-01-20 16:36:08 +02:00
90d987b105 mmap: add include for cerrno (#11296) Christopher Nielsen 2025-01-20 09:02:43 -05:00
a4251edd6f cmake: fix shell command quoting in build-info script (#11309) Michael Podvitskiy 2025-01-20 15:02:15 +01:00
ec7f3ac9ab llama : add support for Deepseek-R1-Qwen distill model (#11310) Xuan Son Nguyen 2025-01-20 14:35:07 +01:00
ef6dada60c cont : fix whitespaces (#11305) Georgi Gerganov 2025-01-20 09:29:32 +02:00
ae3c1db2f9 llama : re-add LLM_ARCH_PHIMOE (#11305) Kyle Bruene 2025-01-20 01:21:01 -06:00
92bc493917 tests : increase timeout when sanitizers are enabled (#11300) Georgi Gerganov 2025-01-19 20:22:30 +02:00
b9daaffe02 simple-chat : fix BOS being added to each message (#11278) Georgi Gerganov 2025-01-19 18:12:09 +02:00
99487b57d4 SYCL: Introducing memory host pool (#11251) Nicolò Scipione 2025-01-19 14:33:34 +01:00
a1649cc13f Adding linenoise.cpp to llama-run (#11252) Eric Curtin 2025-01-18 14:42:31 +00:00
4dd34ff831 cmake : add sanitizer flags for llama.cpp (#11279) Georgi Gerganov 2025-01-18 16:18:15 +02:00
f30f099228 server : implement cancellable request (#11285) Xuan Son Nguyen 2025-01-18 14:12:05 +01:00
f26c874179 scripts : restore hf.sh (#11288) Georgi Gerganov 2025-01-18 13:18:32 +02:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full