sglang

Author	SHA1	Message	Date
Liangsheng Yin	19818b9c2f	Minor: style improvement of radix_cache and memory_pool (#395 )	2024-04-26 01:01:36 +08:00
Liangsheng Yin	9216b10678	Improve performance when running with full parallel (#394 )	2024-04-25 17:29:07 +08:00
Liangsheng Yin	150d7020ed	Revert removing the unused imports (#385 )	2024-04-23 22:36:33 +08:00
Liangsheng Yin	9acc6e3504	add `.isort.cfg` (#378 )	2024-04-22 22:38:09 +08:00
Enrique Shockwave	cf9d8efdd3	llama3 instruct template (#372 )	2024-04-21 09:40:12 -07:00
Liangsheng Yin	1bf1cf1953	Reduce overhead when `fork(1)` (#375 )	2024-04-21 17:25:14 +08:00
Ke Bao	e822e5900b	Optimize radix tree matching (#364 )	2024-04-17 09:47:37 -07:00
Fronx	2b6d999191	Fix issue #367 – System message not supported for Anthropic (anthropic.BadRequestError) (#368 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-04-16 11:18:24 -07:00
Lianmin Zheng	65501a9cf1	Fix commandr import; format code	2024-04-16 18:10:12 +00:00
ZhouXingg	db611066ad	support `command-r` (#369 )	2024-04-16 10:36:51 -07:00
Liangsheng Yin	62b3812b69	Time cost utils (#355 )	2024-04-09 23:27:31 +08:00
Tom Dörr	550a4f78f3	Fix typos in infer_batch.py (#354 )	2024-04-09 15:10:05 +08:00
SimoneRaponi	ff99c38a07	Add timeout to get_meta_info (#346 ) Co-authored-by: simone <simone.raponi@equixely.com>	2024-04-03 22:22:06 +08:00
Qubitium	c9de3e169c	Eliminate 2 gpu ops during sampling when logit_bias is zero (#338 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-04-03 13:56:06 +08:00
Liangsheng Yin	ed27a6b992	Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" (#345 )	2024-04-03 12:45:01 +08:00
Liangsheng Yin	463c6632a8	Eliminate 2 gpu ops during sampling when logit_bias is zero (#343 ) Co-authored-by: Qubitium <417764+Qubitium@users.noreply.github.com>	2024-04-02 19:14:55 +08:00
Ying Sheng	b0890631a0	fix gemma import error	2024-04-01 07:36:52 +00:00
Junlong Li	cb389c91bc	Fix llava parallelism/fork bug (#315 )	2024-03-28 19:24:54 -07:00
Qubitium	eddaa2b599	Add support for new autogptq quant_config.checkpoint_format (#332 )	2024-03-28 19:24:16 -07:00
Liangsheng Yin	2af565b3bb	[model] DBRX-instruct support (#337 )	2024-03-28 10:05:19 -07:00
Liangsheng Yin	3842eba5fa	Logprobs Refractor (#331 )	2024-03-28 14:34:49 +08:00
Liangsheng Yin	24e59f5350	`model_runner` simplify (#329 )	2024-03-24 19:48:37 +08:00
Liangsheng Yin	7523541962	`model_rpc` style improvement (#293 )	2024-03-24 15:41:24 +08:00
Jani Monoses	30d17840fc	Update dependencies (#326 )	2024-03-23 10:15:58 -07:00
Qubitium	ce216c80dc	Cleanup codebase: removed unnecessary code/logic (#298 )	2024-03-23 10:15:16 -07:00
Lianmin Zheng	51104cd405	Update version to v0.1.14 (#324 )	2024-03-22 13:42:22 -07:00
Lianmin Zheng	e2b2f0a213	Support oai in benchmark/mmlu (#323 )	2024-03-22 13:37:57 -07:00
Jani Monoses	b57abe1663	Add StableLM model. (#301 )	2024-03-22 13:24:08 -07:00
Jani Monoses	e57f079275	Use Anthropic messages API (#304 )	2024-03-22 13:23:31 -07:00
Li Bo	08df63a6f8	[Fix/Potential Bugs] Can not correctly import models in python/sglang/srt/models (#311 )	2024-03-22 12:19:58 -07:00
ZhouGongZaiShi	77835756a7	Fix outlines-0.0.35 incompatibility (#291 ) Co-authored-by: ZX <zx@lbx.dev>	2024-03-22 12:19:11 -07:00
Liurl	ed31579971	Fix marlin model loading compat with autogptq (#290 ) Co-authored-by: LRL <lrl@lbx.dev>	2024-03-13 13:15:43 +08:00
Qubitium	92e2d74fd0	Fix env (docker) compat due to __file__ usage (#288 )	2024-03-13 13:02:48 +08:00
Enrique Shockwave	d9b3b01883	enable marlin kernels (#286 )	2024-03-12 22:10:12 -04:00
Qubitium	ad1dd74673	Fix flashinfer >= 0.0.3 compat (#282 )	2024-03-12 21:45:58 +08:00
Qubitium	b2eb080501	Fix Runtime missing some ServerArgs options (#281 )	2024-03-11 22:32:15 +08:00
Lianmin Zheng	4aa5dd2c5f	Update version to v0.1.13 (#280 )	2024-03-11 05:49:27 -07:00
Lianmin Zheng	13662fd533	Fix RuntimeEndpoint (#279 )	2024-03-11 05:24:24 -07:00
Alessio Dalla Piazza	d5ae2ebaa2	Add Support for API Key Authentication (#230 )	2024-03-11 05:16:10 -07:00
Liangsheng Yin	1b35547927	Organize `server_args` (#277 )	2024-03-11 20:06:52 +08:00
Lianmin Zheng	faba293a0d	Improve gemma and documentations (#278 )	2024-03-11 04:43:39 -07:00
Liangsheng Yin	89885b31ef	Gemma Support (#256 )	2024-03-11 12:14:27 +08:00
Geary.Z	64fe311593	replace skip_embed with input_embeds (#222 )	2024-03-10 19:04:52 -07:00
Liangsheng Yin	a7ace9c88d	Fix qwen config (#261 )	2024-03-10 18:54:18 -07:00
Lin Tianchuan	30d67b2bca	Add `set_var` to interpreter.py (#263 )	2024-03-07 23:20:11 +08:00
Xinwei Xiong	b0b722ee8e	Refactor ChatTemplate for Enhanced Clarity and Efficiency (#201 )	2024-03-03 17:52:36 +08:00
Srinivas Billa	01b07ea3ac	Add SSL Cert Functionality (#224 )	2024-03-03 17:41:41 +08:00
Liangsheng Yin	dfb13ac455	Fix addr reuse in check_port (#253 )	2024-03-03 17:09:16 +08:00
Enrique Shockwave	9759d927cf	fix chatml template (#195 )	2024-02-24 16:34:22 +08:00
Zhang Wenbin	8d0a7fae3b	Fix interpreter.py `get_var(var_name)` in text iter when `stream` is not enabled (#198 )	2024-02-24 16:27:34 +08:00

1 2 3 4

155 Commits