sglang

Author	SHA1	Message	Date
Ying Sheng	5be9eb8a8c	Add PUT for generate api (#448 )	2024-05-17 02:35:15 -07:00
Lianmin Zheng	c05956e534	Simplify port allocation (#447 )	2024-05-16 18:07:30 -07:00
Matthias Gerstgrasser	d75dc20fae	Add finish_reason to OpenAI API (#446 )	2024-05-16 14:55:05 -07:00
Liangsheng Yin	690d162d97	Format code (#441 )	2024-05-14 22:40:46 +08:00
Kaichen Zhang - NTU	664287b2a7	[Feat] Add llava qwen, llava mistral (#419 ) Co-authored-by: Bo Li <drluodian@gmail.com>	2024-05-13 22:17:50 -07:00
Lianmin Zheng	e0ae5d42ec	Update version to 0.1.16 (#438 )	2024-05-13 17:29:17 -07:00
Lianmin Zheng	32de16ce2f	Fix streaming (#437 )	2024-05-13 17:26:18 -07:00
Yuanhan Zhang	0992d85f92	support llava video (#426 )	2024-05-13 16:57:00 -07:00
Lianmin Zheng	5dc55a5f02	Handle truncation errors (#436 )	2024-05-13 15:56:00 -07:00
Lianmin Zheng	4231a42fa8	Fix import of global_config	2024-05-13 12:11:55 -07:00
Liangsheng Yin	39191c8515	Cache optimizations (#418 )	2024-05-13 12:47:13 +08:00
Lianmin Zheng	562b8857d8	Improve error handling (#433 )	2024-05-12 20:49:04 -07:00
Shannon Shen	04c0b21488	Allow `input_ids` in the input of the `/generate` endpoint (#363 )	2024-05-12 15:29:00 -07:00
Lianmin Zheng	6e09cf6a15	Misc fixes (#432 )	2024-05-12 15:05:40 -07:00
Lianmin Zheng	72bb344388	Update version to 0.1.15 (#431 )	2024-05-12 14:22:33 -07:00
Lianmin Zheng	2d580e7a89	Fix flashinfer (#430 )	2024-05-12 08:18:53 -07:00
Lianmin Zheng	3fc97f6709	Move openai api server into a separate file (#429 )	2024-05-12 06:41:32 -07:00
Lianmin Zheng	abc548c707	Minor fix for the import path (#428 )	2024-05-12 05:10:35 -07:00
Lianmin Zheng	aee4f523cf	Fix logit processor bugs (#427 )	2024-05-12 04:54:07 -07:00
Lianmin Zheng	7023f413c6	Clean up (#422 )	2024-05-11 20:55:00 -07:00
Lianmin Zheng	09deb20dee	Optimize the memory usage of logits processor (#420 )	2024-05-11 16:56:42 -07:00
Qubitium	33b242df30	Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380 ) Co-authored-by: ZX <zx@lbx.dev> Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>	2024-05-11 16:37:49 -07:00
Lianmin Zheng	a511a2d089	restrict vllm version	2024-05-09 15:49:29 -07:00
Liangsheng Yin	6ec65f4555	Make public APIs more standard. (#416 )	2024-05-09 15:39:22 +08:00
Enrique Shockwave	e2c31fca5c	Include finish reason in meta info response (#415 )	2024-05-09 15:14:01 +08:00
Liangsheng Yin	d5de20a3ee	Fix `sync()` when `fork(1)` (#412 )	2024-05-08 15:15:18 +08:00
YoungJoong Noah Kim	4a1c6ae2ce	Add Cohere Command R chat template (#411 )	2024-05-07 15:18:15 +08:00
Liangsheng Yin	14522e6a26	Organize Benchmark (#381 )	2024-05-05 16:14:17 +08:00
ZhouXingg	183df47282	SamplingParams add "spaces_between_special_tokens" argument (#392 )	2024-04-30 16:17:12 -07:00
Joschka Braun	5c5aba5900	Adding RAG tracing & eval cookbook using Parea (#390 )	2024-04-30 16:13:28 -07:00
Lianmin Zheng	ba67101f99	Fix chatml template (#406 )	2024-04-30 15:53:39 -07:00
Liangsheng Yin	19818b9c2f	Minor: style improvement of radix_cache and memory_pool (#395 )	2024-04-26 01:01:36 +08:00
Liangsheng Yin	9216b10678	Improve performance when running with full parallel (#394 )	2024-04-25 17:29:07 +08:00
Liangsheng Yin	150d7020ed	Revert removing the unused imports (#385 )	2024-04-23 22:36:33 +08:00
Liangsheng Yin	9acc6e3504	add `.isort.cfg` (#378 )	2024-04-22 22:38:09 +08:00
Enrique Shockwave	cf9d8efdd3	llama3 instruct template (#372 )	2024-04-21 09:40:12 -07:00
Liangsheng Yin	1bf1cf1953	Reduce overhead when `fork(1)` (#375 )	2024-04-21 17:25:14 +08:00
Ke Bao	e822e5900b	Optimize radix tree matching (#364 )	2024-04-17 09:47:37 -07:00
Fronx	2b6d999191	Fix issue #367 – System message not supported for Anthropic (anthropic.BadRequestError) (#368 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-04-16 11:18:24 -07:00
Lianmin Zheng	65501a9cf1	Fix commandr import; format code	2024-04-16 18:10:12 +00:00
ZhouXingg	db611066ad	support `command-r` (#369 )	2024-04-16 10:36:51 -07:00
Liangsheng Yin	62b3812b69	Time cost utils (#355 )	2024-04-09 23:27:31 +08:00
Tom Dörr	550a4f78f3	Fix typos in infer_batch.py (#354 )	2024-04-09 15:10:05 +08:00
SimoneRaponi	ff99c38a07	Add timeout to get_meta_info (#346 ) Co-authored-by: simone <simone.raponi@equixely.com>	2024-04-03 22:22:06 +08:00
Qubitium	c9de3e169c	Eliminate 2 gpu ops during sampling when logit_bias is zero (#338 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-04-03 13:56:06 +08:00
Liangsheng Yin	ed27a6b992	Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" (#345 )	2024-04-03 12:45:01 +08:00
Liangsheng Yin	463c6632a8	Eliminate 2 gpu ops during sampling when logit_bias is zero (#343 ) Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>	2024-04-02 19:14:55 +08:00
Ying Sheng	b0890631a0	fix gemma import error	2024-04-01 07:36:52 +00:00
Junlong Li	cb389c91bc	Fix llava parallelism/fork bug (#315 )	2024-03-28 19:24:54 -07:00
Qubitium	eddaa2b599	Add support for new autogptq quant_config.checkpoint_format (#332 )	2024-03-28 19:24:16 -07:00

1 2 3 4

186 Commits