Xuan Son Nguyen
958367bf53
server : refactor slot input data, move tokenizer to HTTP thread (#10023)
* server : refactor slot input data, move tokenizer to HTTP thread
* move prompt_tokens.empty() check
* fix incorrect if branch
* fix infinite generation loop
* bring back infill validation
* add infill test
* try fixing format_infill
* fix test
* remove redundant code
* rename completion to inference
* update docs
* use llama_tokens everywhere
2024-10-24 21:51:22 +02:00
..
2024-10-24 21:51:22 +02:00
2024-10-12 16:06:31 +03:00
2024-09-28 17:42:03 +03:00
2024-03-20 06:33:49 +01:00
2024-10-24 21:51:22 +02:00
2024-03-02 22:00:14 +01:00
2024-08-06 17:33:39 +02:00
2024-09-06 23:21:29 +02:00
2024-09-06 23:21:29 +02:00
2024-09-28 17:42:03 +03:00
2024-05-20 22:10:03 +10:00
2024-10-08 13:27:04 +02:00
2024-09-12 22:30:11 +02:00
2024-05-21 14:39:48 +02:00
2024-09-02 17:11:51 +02:00