Commit Graph

  • 44ca159faf 1.5 bit: we can do even better (#5999) Kawrakow 2024-03-11 16:53:15 +01:00
  • 05b06210c9 llama : more consistent names of count variables (#5994) Georgi Gerganov 2024-03-11 17:49:47 +02:00
  • 83796e62bc llama : refactor unicode stuff (#5992) Georgi Gerganov 2024-03-11 17:47:47 +02:00
  • 828defefb6 Update server docker image URLs (#5997) Jakub N 2024-03-11 14:40:42 +01:00
  • caa106d4e0 Server: format error to json (#5961) Xuan Son Nguyen 2024-03-11 10:56:41 +01:00
  • 3202361c5b ggml, ci : Windows ARM runner and build fixes (#5979) Michael Podvitskiy 2024-03-11 10:28:51 +01:00
  • 332bdfd798 server : maintain chat completion id for streaming responses (#5988) Minsoo Cheong 2024-03-11 17:09:32 +09:00
  • ecab1c75de cmake : fix subdir for LLAMA_METAL_EMBED_LIBRARY (#5985) Gilad S 2024-03-11 10:00:08 +02:00
  • ee35600b90 llama : fix F16/F32 downcast + improve names (#5980) Georgi Gerganov 2024-03-11 09:56:47 +02:00
  • be858f6205 Better 1.5 bit quantization (#5971) Kawrakow 2024-03-11 07:51:49 +01:00
  • ef3ced26a3 [SYCL] Add q3_s and q1_s (#5886) Abhilash Majumder 2024-03-11 10:27:56 +05:30
  • 3814a07392 [SYCL] Add support for SYCL Nvidia target (#5738) AidanBeltonS 2024-03-11 01:13:57 +00:00
  • bb6d00bbf9 metal : move mm_id indices to shared mem (#5982) Georgi Gerganov 2024-03-10 23:12:48 +02:00
  • 7ab7b733bb android : fix utf8 decoding error (#5935) Dean 2024-03-11 04:03:17 +08:00
  • d9f65c97c3 readme : update hot topics Georgi Gerganov 2024-03-10 20:58:26 +02:00
  • b838b53ad6 sync : ggml Georgi Gerganov 2024-03-10 20:10:46 +02:00
  • df4dc3e7cb ggml : try fix 32-bit arm compat (whisper/1938) Georgi Gerganov 2024-03-08 23:45:07 +02:00
  • bf47a5eefc ggml : remove __constant__ specifier for CUDA tables (#5940) Georgi Gerganov 2024-03-10 20:09:24 +02:00
  • fa8a809a91 server: ci: windows build and tests (#5968) Pierrick Hymbert 2024-03-10 18:17:47 +01:00
  • bcebd7dbf6 llama : add support for GritLM (#5959) DAN™ 2024-03-10 11:56:30 -04:00
  • 2960eae847 grammar : verify parsed state (#5950) Clint Herron 2024-03-10 11:17:43 -04:00
  • c78541479c nix: update flake.lock (#5969) Georgi Gerganov 2024-03-10 16:43:08 +02:00
  • 621e86b331 server: benchmark: chat/completions scenario and other llm servers comparison (#5941) Pierrick Hymbert 2024-03-09 23:41:49 +01:00
  • 77d1ac7e00 server : print chat template info Georgi Gerganov 2024-03-09 22:04:00 +02:00
  • d894f352bf perplexity : support using multiple sequences to allow larger batch sizes (#5946) slaren 2024-03-09 19:55:54 +01:00
  • 098dbaab44 readme : update hot topics Georgi Gerganov 2024-03-09 18:14:13 +02:00
  • 8380ecfb21 ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951) Georgi Gerganov 2024-03-09 17:36:20 +02:00
  • 58308a0ecc server : fix metrics init (#5964) Georgi Gerganov 2024-03-09 17:34:15 +02:00
  • 5b09797321 ggml : remove old quantization functions (#5942) Georgi Gerganov 2024-03-09 15:53:59 +02:00
  • 97c09585d6 server : clarify some items in the readme (#5957) Georgi Gerganov 2024-03-09 15:47:47 +02:00
  • fb215c3832 server : normalize embeddings (#5956) SeungWon Jeong 2024-03-09 21:27:58 +09:00
  • 2c4f566c88 tests : gitignore ggml-common.h Georgi Gerganov 2024-03-09 14:17:11 +02:00
  • 0db32beaf0 server : fix passing prompt as tokens (#5955) Alexey Parfenov 2024-03-09 11:16:53 +00:00
  • 8a3012a4ad ggml : add ggml-common.h to deduplicate shared code (#5940) Georgi Gerganov 2024-03-09 12:47:57 +02:00
  • 9674aaf35c server : simplify logic for empty prompts (#5953) Georgi Gerganov 2024-03-09 12:34:18 +02:00
  • 950ba1ab84 Server: reorganize some http logic (#5939) Xuan Son Nguyen 2024-03-09 11:27:53 +01:00
  • e1fa9569ba server : add SSL support (#5926) Gabe Goodhart 2024-03-09 02:57:09 -07:00
  • fd72d2d2a5 server: tests: add truncated prompt tests, better kv cache size (#5933) Pierrick Hymbert 2024-03-09 10:30:04 +01:00
  • c2101a2e90 llama : support Mamba Selective State Space Models (#5328) compilade 2024-03-08 17:31:00 -05:00
  • 515f7d0d4f llama : fix quantization of shared token_embd (#5944) compilade 2024-03-08 10:53:37 -05:00
  • 76e868821a server: metrics: add llamacpp:prompt_seconds_total and llamacpp:tokens_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (#5937) Pierrick Hymbert 2024-03-08 12:25:04 +01:00
  • e457fb3540 llama : assume tied weights if lm_head/output weights is missing (#5824) Don Mahurin 2024-03-08 02:41:50 -08:00
  • af37fd8b30 server : fix EOS token detection with disabled cache (#5938) Georgi Gerganov 2024-03-08 12:40:02 +02:00
  • 581ed5c4fe log : fix MSVC compile errors (#5643) UEXTM.com 2024-03-08 04:35:04 -05:00
  • 6cdabe6526 llama-bench : add embeddings option (#5924) Georgi Gerganov 2024-03-07 16:32:38 +02:00
  • 89fb735fcf Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918) Neo Zhang Jianyu 2024-03-07 19:14:49 +08:00
  • 55a2a900ff server : add /v1/completions endpoint (#5914) Minsoo Cheong 2024-03-07 19:42:39 +09:00
  • 2002bc96bf server : refactor (#5882) Georgi Gerganov 2024-03-07 11:41:53 +02:00
  • ceca1aef07 [SYCL] fix error when set main gpu to non-zero (#5901) Neo Zhang Jianyu 2024-03-07 16:34:31 +08:00
  • e04e04f8fa ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906) Jared Van Bortel 2024-03-06 15:42:23 -05:00
  • e25fb4b18f ggml : use uint8x16_t return type for ggml_vqtbl1q_u8 (#5894) bobqianic 2024-03-06 07:35:07 +00:00
  • 1e35d619a6 convert : remove AWQ remnants (#5768) Georgi Gerganov 2024-03-06 09:12:25 +02:00
  • 8ced9f7e32 add wait() to make code stable (#5895) Neo Zhang Jianyu 2024-03-06 12:08:32 +08:00
  • 652ca2bded compare-llama-bench.py : remove mul_mat_q (#5892) slaren 2024-03-05 22:27:29 +01:00
  • bd836944f8 quants : use MM256_SET_M128I consistently to fix gcc 7 build (#5889) Jared Van Bortel 2024-03-05 11:56:37 -05:00
  • 3de31677d3 grammars : blacklists character control set (#5888) ExtReMLapin 2024-03-05 17:33:08 +01:00
  • 82cb31eb93 Revert "grammars : don't allow to output unescaped new line in string (#5885)" Georgi Gerganov 2024-03-05 15:56:24 +02:00
  • b1a4e994fd grammars : don't allow to output unescaped new line in string (#5885) ExtReMLapin 2024-03-05 14:44:29 +01:00
  • 61d1c88e15 Vulkan Improvements (#5835) 0cc4m 2024-03-05 13:33:42 +01:00
  • 21b0867433 [SYCL] fix mul_mat fault in CI/unit-test (#5862) Neo Zhang Jianyu 2024-03-05 16:08:35 +08:00
  • 6a87ac3a52 fix editorconfig check break (#5879) Minsoo Cheong 2024-03-05 15:12:23 +09:00
  • 29eee40474 fix speculative decoding build on windows (#5874) Jeffrey Quesnelle 2024-03-04 19:23:06 -08:00
  • 1d41d6f7c2 nix: static build (#5814) hutli 2024-03-05 02:33:08 +01:00
  • 29ae62d2ae llama : fix embeddings (#5796) Georgi Gerganov 2024-03-04 22:31:20 +02:00
  • e0843afe1b flake : fix Georgi Gerganov 2024-03-04 21:50:50 +02:00
  • a1c6d96ed8 ggml : fix unknown status (#0) Georgi Gerganov 2024-03-04 20:53:27 +02:00
  • efd8533ef8 sync : ggml Georgi Gerganov 2024-03-04 11:06:39 +02:00
  • 9fa2627347 ggml : introduce ggml_status (ggml/750) Michael Podvitskiy 2024-03-04 10:05:42 +01:00
  • fe52be11e3 cmake : handle cases where git index is not found in .git (#5844) Dane Madsen 2024-03-05 05:26:55 +11:00
  • 6d341ab6c5 speculative : implement stochastic speculative sampling (#5625) Minsoo Cheong 2024-03-05 03:24:00 +09:00
  • 4ffcdce2ff add alias for chat template (#5858) Xuan Son Nguyen 2024-03-04 12:22:08 +01:00
  • a0fc62661f sync : ggml Georgi Gerganov 2024-03-04 10:40:04 +02:00
  • 7d43c585dc add some new ops, fix some operators and add batch operations to certain operators. (ggml/747) leejet 2024-03-03 20:23:52 +08:00
  • 82f3e668ad common : use LLAMA_DEFAULT_SEED (#5855) DAN™ 2024-03-04 03:08:19 -05:00
  • 5a51cc1bb4 main : support special tokens as reverse/anti prompt (#5847) DAN™ 2024-03-04 02:57:20 -05:00
  • 67be2ce101 cuda : fix data race in soft max (#5853) slaren 2024-03-03 14:26:18 +01:00
  • 231ae28f07 readme : add API changes section Georgi Gerganov 2024-03-03 12:44:03 +02:00
  • 475df1d6cf llama : allow for user specified embedding pooling type (#5849) Douglas Hanley 2024-03-03 04:40:27 -06:00
  • 87c2e8b279 gguf-dump : support i-quants (#5841) Nindaleth 2024-03-03 09:43:42 +01:00
  • de9692a7d2 llama : fix llama_copy_state_data with fragmented KV cache (#5840) compilade 2024-03-03 03:41:55 -05:00
  • e6029348e8 ci : schedule slow server tests only on Release or on demand (#5839) Pierrick Hymbert 2024-03-03 09:35:23 +01:00
  • 8ef969afce server : init http requests thread pool with --parallel if set (#5836) Pierrick Hymbert 2024-03-03 08:48:36 +01:00
  • fa974646e1 flake.lock: Update (#5842) Georgi Gerganov 2024-03-03 06:11:31 +02:00
  • 9731134296 server: tests: passkey challenge / self-extend with context shift demo (#5832) Pierrick Hymbert 2024-03-02 22:00:14 +01:00
  • 4a6e2d6142 llama : add abort_callback to interrupt computation (#5409) Michael Podvitskiy 2024-03-02 20:52:25 +01:00
  • 494c870326 ggml : fix IQ3_S AVX implementation (#5834) Georgi Gerganov 2024-03-02 20:00:49 +02:00
  • 4d4d2366fc convert : automatically fall back to HfVocab if tokenizer.model doesn't exist (#5821) Jared Van Bortel 2024-03-02 12:27:26 -05:00
  • c7a0ad8ec9 convert-hf : make model class definitions self-contained (#5825) Jared Van Bortel 2024-03-02 12:21:47 -05:00
  • bbde6eb256 ggml : IQ3_S improvements (#5829) Kawrakow 2024-03-02 17:00:51 +02:00
  • ef2cd694c4 scripts : add pod-llama.sh Georgi Gerganov 2024-03-02 16:54:08 +02:00
  • 6c32d8c7ad llama : refactor internal quantization functions (#5830) Xuan Son Nguyen 2024-03-02 15:19:09 +01:00
  • 802da0091b llama : fix segfault from unknown model arch name (#5820) compilade 2024-03-02 08:42:56 -05:00
  • 715641391d Support multiple GPUs (split mode) on SYCL backend (#5806) Neo Zhang Jianyu 2024-03-02 19:49:30 +08:00
  • 9bf297a02b workflows : remove nocleanup arg for check-requirements.sh (#5826) crasm 2024-03-02 00:11:06 -05:00
  • cb5e8f7fc4 build(nix): Introduce flake.formatter for nix fmt (#5687) Tushar 2024-03-02 04:48:26 +05:30
  • da3b9ba2b7 convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792) nold 2024-03-01 22:51:12 +01:00
  • c29af7e225 llama : add StarCoder2 support (#5795) Sourab Mangrulkar 2024-03-02 01:00:46 +05:30
  • 38d16b1426 server : remove api_like_OAI.py proxy script (#5808) Georgi Gerganov 2024-03-01 20:00:58 +02:00
  • c2224f003b ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813) ddpasa 2024-03-01 18:00:00 +01:00
  • e743386728 gemma : fix bfloat16 -> float16 conversion issue (#5810) kunal-vaishnavi 2024-03-01 06:08:08 -08:00