Commit Graph

  • af0a5b6163 server: fix incorrectly reported token probabilities (#7125) Johannes Gäßler 2024-05-07 23:07:58 +02:00
  • b6aa670203 Fix OLMo HF to GGUF conversion (#6910) nopperl 2024-05-07 19:39:43 +00:00
  • 260b7c6529 server : update readme with undocumented options (#7013) Kyle Mistele 2024-05-07 13:44:29 -05:00
  • 53d6c52e22 readme : update hot topics Georgi Gerganov 2024-05-07 21:43:13 +03:00
  • 3af34c1d1b main : update log text (EOS to EOG) (#7104) RhinoDevel 2024-05-07 19:51:31 +02:00
  • 04976db7a8 docs: fix typos (#7124) omahs 2024-05-07 17:20:33 +02:00
  • 947d3ad27d ci : add GG_BUILD_EXTRA_TESTS_0 env (#7098) Georgi Gerganov 2024-05-07 11:08:49 +03:00
  • 858f6b73f6 Add an option to build without CUDA VMM (#7067) William Tambellini 2024-05-06 11:12:14 -07:00
  • b3a995b416 flake.lock: Update (#7079) Georgi Gerganov 2024-05-06 18:36:06 +03:00
  • bcdee0daa7 minor : fix trailing whitespace Georgi Gerganov 2024-05-06 09:31:30 +03:00
  • 628b299106 Adding support for the --numa argument for llama-bench. (#7080) kunnis 2024-05-05 07:17:47 -05:00
  • 8f8acc8683 Disable benchmark on forked repo (#7034) Sigbjørn Skjæret 2024-05-05 13:38:55 +02:00
  • ca36326020 readme : add note that LLaMA 3 is not supported with convert.py (#7065) Lyle Dean 2024-05-05 06:21:46 +01:00
  • 889bdd7686 command-r : add BPE pre-tokenization (#7063) DAN™ 2024-05-05 01:19:30 -04:00
  • 6fbd432211 py : logging and flake8 suppression refactoring (#7081) Brian 2024-05-05 15:07:48 +10:00
  • 842500144e gguf-split: add --no-tensor-first-split (#7072) Xuan Son Nguyen 2024-05-04 18:56:22 +02:00
  • cf768b7e71 Tidy Android Instructions README.md (#7016) Jeximo 2024-05-04 13:10:15 -03:00
  • fcd84a0f5a Fix Linux /sys cpu path to guess number of cores (#7064) viric 2024-05-04 15:26:53 +02:00
  • 03fb8a002d If first token generated from the server is the stop word the server will crash (#7038) maor-ps 2024-05-04 12:06:40 +03:00
  • 92139b90af tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) Georgi Gerganov 2024-05-04 08:32:32 +03:00
  • a2ac89d6ef convert.py : add python logging instead of print() (#6511) Brian 2024-05-04 05:36:41 +10:00
  • 433def286e llama : rename ctx to user_data in progress_callback (#7045) Daniel Bevenius 2024-05-03 15:24:30 +02:00
  • 60325fa56f Remove .attention from skipped tensors to match more accurately (#7051) Bartowski 2024-05-02 19:49:09 -04:00
  • 6ecf3189e0 chore: fix typo in llama.cpp (#7032) alwqx 2024-05-02 23:56:41 +08:00
  • b0d943de17 Update LOG_IMPL and LOG_TEE_IMPL (#7029) Andrew Downing 2024-05-01 17:31:30 -04:00
  • 8d608a81b7 main : fix off by one error for context shift (#6921) l3utterfly 2024-05-02 04:27:41 +09:00
  • 3ea0d36000 Server: add tests for batch size, different seeds (#6950) Johannes Gäßler 2024-05-01 17:52:55 +02:00
  • 1613ef8d8e CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019) Johannes Gäßler 2024-05-01 14:46:37 +02:00
  • c4ec9c0d3d ci : exempt confirmed bugs from being tagged as stale (#7014) slaren 2024-05-01 07:13:59 +02:00
  • a8f9b07631 perplexity: more statistics, added documentation (#6936) Johannes Gäßler 2024-04-30 23:36:27 +02:00
  • f364eb6fb5 switch to using localizedDescription (#7010) Kevin Gibbons 2024-04-30 08:14:02 -07:00
  • 77e15bec62 metal : remove deprecated error code (#7008) Georgi Gerganov 2024-04-30 15:52:21 +03:00
  • a68a1e7ed0 metal : log more info on error (#6987) Kevin Gibbons 2024-04-30 02:34:50 -07:00
  • 9c67c2773d ggml : add Flash Attention (#5021) Georgi Gerganov 2024-04-30 12:16:08 +03:00
  • 952d03dbea convert : use utf8 encoding (#7000) Georgi Gerganov 2024-04-30 11:05:25 +03:00
  • 8843a98c2b Improve usability of --model-url & related flags (#6930) Olivier Chafik 2024-04-30 00:52:50 +01:00
  • b8c1476e44 Extending grammar integration tests (#6644) Clint Herron 2024-04-29 14:40:14 -04:00
  • 5539e6fdd1 main : fix typo in comment in main.cpp (#6985) Daniel Bevenius 2024-04-29 19:56:59 +02:00
  • b8a7a5a90f build(cmake): simplify instructions (cmake -B build && cmake --build build ...) (#6964) Olivier Chafik 2024-04-29 17:02:45 +01:00
  • d2c898f746 ci : tmp disable gguf-split (#6983) Georgi Gerganov 2024-04-29 18:36:39 +03:00
  • 544f1f10ad ggml : fix __MSC_VER -> _MSC_VER (#6977) Georgi Gerganov 2024-04-29 17:55:02 +03:00
  • ffe666572f llava-cli : multiple images (#6969) cpumaxx 2024-04-29 07:34:24 -07:00
  • 24affa7db3 readme : update hot topics Georgi Gerganov 2024-04-29 17:06:19 +03:00
  • f4ab2a4147 llama : fix BPE pre-tokenization (#6920) Georgi Gerganov 2024-04-29 16:58:41 +03:00
  • 3f167476b1 sampling : use std::random_device{}() for default random seed (#6962) David Renshaw 2024-04-29 09:35:45 -04:00
  • 3055a41805 convert : fix conversion of some BERT embedding models (#6937) Christian Zhou-Zheng 2024-04-29 09:34:41 -04:00
  • 577277ffd2 make : change GNU make default CXX from g++ to c++ (#6966) Przemysław Pawełczyk 2024-04-29 15:08:20 +02:00
  • ca7f29f568 ci : add building in MSYS2 environments (Windows) (#6967) Przemysław Pawełczyk 2024-04-29 14:59:47 +02:00
  • c4f708a93f llama : fix typo LAMMAFILE -> LLAMAFILE (#6974) Johannes Gäßler 2024-04-29 14:36:22 +02:00
  • e00b4a8f81 Fix more int overflow during quant (PPL/CUDA). (#6563) DAN™ 2024-04-28 18:38:44 -04:00
  • 7bb36ccf91 gguf : enforce that tensor names are unique (#6905) Xuan Son Nguyen 2024-04-28 17:36:18 +02:00
  • ce023f6f2f add device version in device list (#6959) Neo Zhang 2024-04-28 22:40:31 +08:00
  • 6e472f58e4 flake.lock: Update github-actions[bot] 2024-04-28 00:18:27 +00:00
  • 4dba7e8114 Replace "alternative" boolean operator in conditional compilation directive (#6949) mgroeber9110 2024-04-27 21:02:06 +02:00
  • b7368332e2 ci: server: tests python env on github container ubuntu latest / fix n_predict (#6935) Pierrick Hymbert 2024-04-27 17:50:48 +02:00
  • 928e0b7013 Reset schedule earlier to allow overlap with ggml graph computation on device (#6933) agray3 2024-04-26 19:08:30 +01:00
  • 0c4d489e29 quantize: add imatrix and dataset metadata in GGUF (#6658) Pierrick Hymbert 2024-04-26 20:06:33 +02:00
  • 017e6999b5 add basic tensor data validation function (#6884) slaren 2024-04-26 18:39:58 +02:00
  • e2764cd7ca gguf : fix mismatch between alloc and free functions (#6929) slaren 2024-04-26 17:07:42 +02:00
  • 4b1c3c98b4 llamafile : use 64-bit integers in sgemm (#6928) Justine Tunney 2024-04-26 10:05:33 -04:00
  • bbe3c6e761 ci: server: fix python installation (#6925) Pierrick Hymbert 2024-04-26 12:27:25 +02:00
  • 7f5ff558ee server: stop generation at n_ctx_train if n_predict is not set (#6638) Pierrick Hymbert 2024-04-26 12:15:30 +02:00
  • 9e4e077ec5 ci: server: fix python installation (#6922) Pierrick Hymbert 2024-04-26 11:11:51 +02:00
  • 83b72cb086 Merge pull request from GHSA-p5mv-gjc5-mwqv Georgi Gerganov 2024-04-26 10:41:53 +03:00
  • d4a9afc100 ci: server: fix python installation (#6918) Pierrick Hymbert 2024-04-26 09:27:49 +02:00
  • 7d641c26ac ci: fix concurrency for pull_request_target (#6917) Pierrick Hymbert 2024-04-26 09:26:59 +02:00
  • 5790c8dac1 bench: server add stop word for PHI-2 (#6916) Pierrick Hymbert 2024-04-26 09:26:16 +02:00
  • 46e12c4692 llava : add support for moondream vision language model (#6899) vik 2024-04-25 12:38:31 -07:00
  • dba497e0c1 cmake : restore LLAMA_LLAMAFILE_DEFAULT Georgi Gerganov 2024-04-25 21:31:17 +03:00
  • fa0b4ad252 cmake : remove obsolete ANDROID check Georgi Gerganov 2024-04-25 18:59:51 +03:00
  • d6e1d44f16 llama : synchronize before get/set session data (#6911) slaren 2024-04-25 17:59:03 +02:00
  • 853d06ffe2 ci : tmp disable slow tests Georgi Gerganov 2024-04-25 17:06:27 +03:00
  • 3fe0596c18 readme : update model list (#6908) BarfingLemurs 2024-04-25 09:52:28 -04:00
  • 0ead1f1072 llama : check that all the tensor data is in the model file (#6885) slaren 2024-04-25 15:23:47 +02:00
  • 51543729ff ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (#6906) Georgi Gerganov 2024-04-25 15:48:25 +03:00
  • 4ab99d8d47 clip : rename lerp function to avoid conflict (#6894) Daniel Bevenius 2024-04-25 14:38:14 +02:00
  • 54770413c4 ggml : fix MIN / MAX macros (#6904) Georgi Gerganov 2024-04-25 15:12:28 +03:00
  • aa750c1ede tests : minor bash stuff (#6902) Georgi Gerganov 2024-04-25 14:27:20 +03:00
  • 1966eb2615 quantize : add '--keep-split' to quantize model into shards (#6688) jiez 2024-04-25 18:29:35 +08:00
  • 784e11dea1 README: add graphic for matrix multiplication (#6881) Johannes Gäßler 2024-04-24 21:29:13 +02:00
  • b4e4b8a935 llama : add llama_get_pooling_type function (#6862) Douglas Hanley 2024-04-24 08:10:07 -05:00
  • 3fe847b574 server : do not apply Markdown formatting in code sections (#6850) mgroeber9110 2024-04-24 12:54:24 +02:00
  • 37246b1031 common : revert showing control tokens by default for server (#6860) Kyle Mistele 2024-04-24 05:15:29 -05:00
  • 28103f4832 Server: fix seed for multiple slots (#6835) Johannes Gäßler 2024-04-24 11:08:36 +02:00
  • c0d1b3e03e ggml : move 32-bit arm compat in ggml-impl.h (#6865) Georgi Gerganov 2024-04-24 12:00:07 +03:00
  • abd3314064 llama : add phi 3 chat template (#6857) Tristan Druyen 2024-04-24 10:52:37 +02:00
  • 3fec68be4e convert : add support of codeqwen due to tokenizer (#6707) Junyang Lin 2024-04-24 15:16:21 +08:00
  • c8297c6af5 llama : add phi3 support (#6852) liuwei-git 2024-04-24 15:00:37 +08:00
  • 4e96a812b3 [SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 flag activated (#6767) Anas Ahouzi 2024-04-23 02:53:18 +02:00
  • 192090bae4 llamafile : improve sgemm.cpp (#6796) Justine Tunney 2024-04-22 15:00:36 -04:00
  • e931888d50 ggml : fix calloc argument ordering. (#6820) Dave Airlie 2024-04-23 00:05:06 +10:00
  • 8960fe86ae llama : fix typo in <|im_end|> token text (#6745) Georgi Gerganov 2024-04-22 15:41:11 +03:00
  • c0956b09ba ci: fix job are cancelling each other (#6781) Pierrick Hymbert 2024-04-22 13:22:54 +02:00
  • e9b4a1bf68 flake.lock: Update github-actions[bot] 2024-04-21 00:17:47 +00:00
  • 5cf5e7d490 build: generate hex dump of server assets during build (#6661) Olivier Chafik 2024-04-21 18:48:53 +01:00
  • 40f74e4d73 llama : add option to render special/control tokens (#6807) Georgi Gerganov 2024-04-21 18:36:45 +03:00
  • b9cc76d87e ggml : fix ggml_backend_cpu_supports_op() for CPY (#0) Georgi Gerganov 2024-04-21 16:47:57 +03:00
  • 7dbdba5690 llama : add llama-3 chat template (#6751) Wouter 2024-04-21 15:03:39 +02:00
  • c1386c936e gguf-py : add IQ1_M to GGML_QUANT_SIZES (#6761) pmysl 2024-04-21 14:49:30 +02:00
  • e8d35f47cb doc : add link to falcon (#6789) Jan Boon 2024-04-21 20:35:40 +08:00