enginex-ascend-910-llama.cpp

EngineX-Ascend/enginex-ascend-910-llama.cpp

Fork 0

bba9d945c1 cmake : removed stdc++fs (whisper/3097) Jared Tweed 2025-05-02 02:41:35 -07:00
bc4e1128f7 llama : deci : support ffn-free with attention (#13296) Sigbjørn Skjæret 2025-05-07 12:49:27 +02:00
39e73ae0d6 common : Add a warning when we can't match samplers from a string or char. (#13330) Ycros 2025-05-07 18:23:28 +10:00
1f73301b63 cuda : remove nrows_x in mul_mat_q_process_tile (#13325) R0CKSTAR 2025-05-07 15:48:23 +08:00
4773d7a02f examples : remove infill (#13283) Georgi Gerganov 2025-05-07 10:28:02 +03:00
6c7fd67b64 llama : support tie embedding for chatglm models (#13328) piDack 2025-05-07 15:23:11 +08:00
141a908a59 CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135) Johannes Gäßler 2025-05-06 23:35:51 +02:00
32916a4907 clip : refactor graph builder (#13321) Xuan-Son Nguyen 2025-05-06 22:40:24 +02:00
ffc727203a sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345) DocShotgun 2025-05-06 13:36:24 -07:00
91a86a6f35 sampling : don't consider -infinity values in top_n_sigma (#13344) oobabooga 2025-05-06 15:24:15 -03:00
f4ed10b69c cmake : remove arm64 msvc presets (#13342) Diego Devesa 2025-05-06 20:15:31 +02:00
1e333d5bba SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (#13254) Akarshan Biswas 2025-05-06 20:27:06 +05:30
2f54e348ad llama : fix build_ffn without gate (#13336) Xuan-Son Nguyen 2025-05-06 14:25:40 +02:00
2356fb1d53 CUDA: fix bad asserts for partial offload (#13337) Johannes Gäßler 2025-05-06 13:58:51 +02:00
764b85627b convert : qwen2/3moe : set yarn metadata if present (#13331) Sigbjørn Skjæret 2025-05-06 11:12:06 +02:00
15a28ec8c7 CUDA: fix --split-mode row for MMQ (#13323) Johannes Gäßler 2025-05-06 08:36:46 +02:00
a7366faa5b gguf-py : avoid requiring pyside6 for other scripts (#13036) compilade 2025-05-05 22:27:31 -04:00
9070365020 CUDA: fix logic for clearing padding with -ngl 0 (#13320) Johannes Gäßler 2025-05-05 22:32:13 +02:00
233461f812 sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264) oobabooga 2025-05-05 17:12:19 -03:00
b34c859146 server : Webui - change setText command from parent window to also send the message. (#13309) igardev 2025-05-05 17:03:31 +03:00
9b61acf060 mtmd : rename llava directory to mtmd (#13311) Xuan-Son Nguyen 2025-05-05 16:02:55 +02:00
5215b91e93 clip : fix confused naming ffn_up and ffn_down (#13290) Xuan-Son Nguyen 2025-05-05 12:54:44 +02:00
ae803bfc3d convert : bailingmoe : set yarn metadata if present (#13312) Sigbjørn Skjæret 2025-05-05 12:34:26 +02:00
66645a5285 SYCL: Disable mul_mat kernels for noncontiguous tensor b (#13308) Akarshan Biswas 2025-05-05 13:39:10 +05:30
27aa259532 mtmd : add C public API (#13184) Xuan-Son Nguyen 2025-05-04 23:43:42 +02:00
9fdfcdaedd rpc : use backend registry, support dl backends (#13304) Diego Devesa 2025-05-04 21:25:43 +02:00
6eb7d25c70 ggml : activate s390x simd for Q3_K (#13301) Aaron Teo 2025-05-05 01:49:12 +08:00
86bd60d3fe llava/mtmd : fixes to fully support dl backends (#13303) Diego Devesa 2025-05-04 17:05:20 +02:00
9f2da5871f llama : build windows releases with dl backends (#13220) Diego Devesa 2025-05-04 14:20:49 +02:00
93c4e23905 CUDA: fix race condition in MMQ stream-k fixup (#13299) Johannes Gäßler 2025-05-04 14:16:39 +02:00
8afbd96818 CUDA: fix race condition in MMQ ids_dst (#13294) Johannes Gäßler 2025-05-04 13:58:38 +02:00
8ae5ebcf85 vulkan: Additional type support for unary, binary, and copy (#13266) Jeff Bolz 2025-05-04 00:17:16 -05:00
3e959f0976 imatrix: fix oob writes if src1 is not contiguous (#13286) Johannes Gäßler 2025-05-04 00:50:37 +02:00
36667c8edc clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking change) (#13259) Xuan-Son Nguyen 2025-05-03 20:07:54 +02:00
3bf785f3ef llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843) ymcki 2025-05-03 23:39:51 +08:00
1d36b3670b llama : move end-user examples to tools directory (#13249) Diego Devesa 2025-05-02 20:27:13 +02:00
b34443923c sync : ggml (#13268) Georgi Gerganov 2025-05-02 20:54:30 +03:00
a75cb30dc9 context : fix reorder logic (#13267) Georgi Gerganov 2025-05-02 20:54:13 +03:00
3f3769ba76 ggml : Enable MMA for BF16 in llamafile_sgemm (#13148) shalinib-ibm 2025-05-02 22:23:12 +05:30
2f567611c0 llama-model : support Qwen2 embedding models and pooling_mode_lasttoken (#13245) Jared Van Bortel 2025-05-02 11:42:30 -04:00
7d2123484e convert : use correct context length for nomic-embed-text-v2 (#13216) Jared Van Bortel 2025-05-02 11:41:54 -04:00
074e42ab31 convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209) Xuan-Son Nguyen 2025-05-02 17:17:15 +02:00
c642bc014c kv-cache : separate recurrent vs non-recurrent impl (#12799) Georgi Gerganov 2025-05-02 17:48:36 +03:00
cb06a3c363 llama : orion rope type is neox (#13261) Sigbjørn Skjæret 2025-05-02 12:44:24 +02:00
626083faf7 llama : plamo rope type is neox (#13260) Sigbjørn Skjæret 2025-05-02 12:40:56 +02:00
2af6880178 llama-chat : reset glmedge chat template (#13253) piDack 2025-05-02 17:06:09 +08:00
e84773ab60 mtmd-cli : fix out_of_range when input image path is empty (#13244) Shakil Ahmed 2025-05-02 14:20:27 +06:00
fab647e884 server : add cache reuse card link to help (#13230) Georgi Gerganov 2025-05-02 09:48:31 +03:00
dcf886007d convert : explicitly disable trust_remote_code for AutoConfig (#13246) Xuan-Son Nguyen 2025-05-02 08:45:10 +02:00
d24d592808 ci: fix cross-compile sync issues (#12804) bandoti 2025-05-01 19:06:39 -03:00
8efbdadc61 rpc : avoid uninitialized memory in serialize_tensor (#13210) Justin Santa Barbara 2025-05-01 17:32:11 -04:00
f057808ffa ggml: Don't assert fail when tensor data changes (#13222) Jesse Gross 2025-05-01 13:46:10 -07:00
d7a14c42a1 build : fix build info on windows (#13239) Diego Devesa 2025-05-01 21:48:08 +02:00
b6e4ff69b8 clip : (minicpmv) Re-enable upscaling of images smaller than the CLIP image size (#13237) Loïc Carrère 2025-05-01 21:32:21 +02:00
e0f572c846 llama-chat : update GLM4 chat template (#13238) matteo 2025-05-01 21:16:38 +02:00
79f26e9e12 vulkan: Add bfloat16 support (#12554) Jeff Bolz 2025-05-01 13:49:39 -05:00
fc727bcdd5 vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader (#13191) Jeff Bolz 2025-05-01 13:19:31 -05:00
b0ecbd434b test: non-cont. b in test-backend-ops -o MUL_MAT (#13187) Johannes Gäßler 2025-05-01 20:18:56 +02:00
b1dd4d08e8 sync : ggml Georgi Gerganov 2025-05-01 17:07:13 +03:00
99881f77d8 whisper : add check that target name exists (whisper/3103) Daniel Bevenius 2025-05-01 10:05:24 +02:00
b5769d92b4 ggml : suppress Windows compiler warnings (whisper/3075) Daniel Bevenius 2025-04-29 15:47:55 +02:00
8936784f7a mtmd : add **vision** support for Mistral Small 3.1 (#13231) Xuan-Son Nguyen 2025-05-01 17:05:42 +02:00
13c9a3319b arg : remove CURLINFO_EFFECTIVE_METHOD (#13228) Xuan-Son Nguyen 2025-05-01 10:23:25 +02:00
a70183eb00 llama-model : fix the reported size class for nomic-embed-text-v2-moe (#13223) Jared Van Bortel 2025-05-01 03:09:41 -04:00
8d33d740c3 sync : ggml Georgi Gerganov 2025-05-01 09:59:02 +03:00
4254bb4951 ggml : fix ggml_gallocr_ptr type (ggml/1205) Diego Devesa 2025-04-30 15:20:40 +02:00
9998540149 cuda : fix unused variable compile warning (whisper/0) Georgi Gerganov 2025-04-24 18:59:06 +03:00
e1e8e0991f CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199) Johannes Gäßler 2025-04-30 23:12:59 +02:00
6f67cf1f48 arg : -hf do not fail if url mismatch (#13219) Xuan-Son Nguyen 2025-04-30 22:29:15 +02:00
16a457facd fix typo: n_ctx_pre_seq -> n_ctx_per_seq (#13221) ddh0 2025-04-30 15:28:43 -05:00
3e168bede4 convert : improve model arch handling (#13122) Xuan-Son Nguyen 2025-04-30 16:56:24 +02:00
ceda28ef8e llava : remove duplicate include (#13207) Tatsuya Tanaka 2025-04-30 22:25:20 +09:00
3b127c7385 common : add -jf / --json-schema-file flag (#12011) Olivier Chafik 2025-04-30 13:52:35 +01:00
e5007a5edf vulkan: use uint array index to avoid glslang bug (#13193) Jeff Bolz 2025-04-30 07:38:37 -05:00
416313773b ggml : fix ppc64le build (#13176) shalinib-ibm 2025-04-30 16:47:08 +05:30
07c2e2f76c convert : correct typo image_mean --> image_std (#13208) Xuan-Son Nguyen 2025-04-30 13:06:15 +02:00
44cd8d91ff feat(ggml-cpu): enable z17 compile (#13182) Aaron Teo 2025-04-30 17:47:35 +08:00
5933e6fdc9 arg : allow using -hf offline (#13202) Xuan-Son Nguyen 2025-04-30 10:46:32 +02:00
da84c04d8f docker : do not build tests (#13204) Xuan-Son Nguyen 2025-04-30 10:44:07 +02:00
a0f7016d17 rpc : fix cache directory initialization (#13188) xiaofei 2025-04-30 14:29:22 +08:00
19e899ce21 scripts: n_depth for compare-llama-bench [no ci] (#13201) Johannes Gäßler 2025-04-29 23:32:04 +02:00
e2e1ddb93a server : Prefilling assistant message in openai compatible API (#13174) matteo 2025-04-29 20:33:10 +02:00
d9d398f84f sampling : when top-k <= 0 -> noop (#13173) Georgi Gerganov 2025-04-29 20:22:57 +03:00
5a63980117 llama-bench: fixed size of fields to correctly map to values (#13183) Alberto Cabrera Pérez 2025-04-29 16:24:36 +01:00
cdf76586b2 CUDA: fix non-cont. inputs for batched mat mul (#13155) Johannes Gäßler 2025-04-29 16:00:27 +02:00
7d3af70b08 llama : llm_type order by size (#13177) Sigbjørn Skjæret 2025-04-29 13:25:53 +02:00
00e3e5a194 mtmd : add qwen2vl and qwen2.5vl (#13141) Xuan-Son Nguyen 2025-04-29 11:47:04 +02:00
e98b3692be llama : set qwen3 model type sizes (#13175) Sigbjørn Skjæret 2025-04-29 11:00:31 +02:00
b6ce7430b7 llama-graph : fix text position for mrope (#13159) Xuan-Son Nguyen 2025-04-29 08:45:49 +02:00
5f5e39e1ba model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466) AT 2025-04-28 15:52:15 -04:00
eaea325324 clip : fix model size display (#13153) Xuan-Son Nguyen 2025-04-28 21:23:19 +02:00
43ddab6eee fix(rpc): Improve input validation and error handling (#13069) Ville Vesilehto 2025-04-28 21:00:20 +03:00
1831f538f7 llama-bench: add -d depth arg (#13096) Vishal Agarwal 2025-04-28 20:20:39 +05:30
4e87962e34 mtmd : fix glm-edge redundant token count (#13139) Xuan-Son Nguyen 2025-04-28 16:12:56 +02:00
fb0471d175 context : do not clear output buffer on reserve (#13152) pockers21 2025-04-28 06:45:40 -07:00
d2b2031e5f llama : (mrope) allow using normal 1D position for text token (#13138) Xuan-Son Nguyen 2025-04-28 14:20:56 +02:00
5fa9e63be8 clip : refactor set input for cgraph + fix qwen2.5vl input (#13136) Xuan-Son Nguyen 2025-04-28 12:18:59 +02:00
a4c340f974 SYCL: Add all missing unary kernels (#13074) Akarshan Biswas 2025-04-28 15:03:25 +05:30
d0a417f3c7 readme : update hot topics (#13150) Georgi Gerganov 2025-04-28 12:10:18 +03:00
43f2b07193 common : fix noreturn compile warning (#13151) Georgi Gerganov 2025-04-28 11:57:19 +03:00

Commit Graph Select branches Hide Pull Requests main b7003-full Mono Color

Commit Graph

Select branches

Hide Pull Requests

main

b7003-full