sglang

Author	SHA1	Message	Date
Liangsheng Yin	73cf6834f2	Support `stop_token_ids` in sglang API (#1092 )	2024-08-15 00:31:39 +00:00
Ying Sheng	96a2093ef0	[Fix] Compatibility of window attention and cuda graph (#1090 )	2024-08-14 10:37:01 -07:00
Liangsheng Yin	a34dd86a7d	Use `dtype` to control generate (#1082 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-08-14 15:58:07 +00:00
Ying Sheng	0909bb0d2f	[Feat] Add window attention for gemma-2 (#1056 )	2024-08-13 17:01:26 -07:00
Yineng Zhang	f7fb68d292	ci: add moe test (#1053 )	2024-08-13 18:43:23 +10:00
Yineng Zhang	65e89baea9	fix: not use the default port (#1068 )	2024-08-13 15:12:56 +10:00
Lianmin Zheng	0c1c72a0b4	Fix accuracy test (#1051 )	2024-08-12 19:48:40 +10:00
Lianmin Zheng	41598e0d8e	Add longer accuracy test on CI (#1049 )	2024-08-12 09:21:38 +00:00
Ying Sheng	32f6144323	fix: Fix returned prefill logits and add output str test (#1046 )	2024-08-12 06:13:45 +00:00
Lianmin Zheng	8207637029	Improve end-to-end throughput test and its coverage (#1039 )	2024-08-11 18:27:33 -07:00
Lianmin Zheng	d84c5e70f7	Test the case when max_new_tokens is very large (#1038 )	2024-08-11 16:41:03 -07:00
Yineng Zhang	94752ac811	feat: use FlashInfer rmsnorm and silu (#907 )	2024-08-11 14:57:13 +10:00
Lianmin Zheng	54fb1c80c0	Clean up unit tests (#1020 )	2024-08-10 15:09:03 -07:00
Ying Sheng	7599badeaf	Support embedding input as a list (#1014 )	2024-08-10 08:39:05 -07:00
Ying Sheng	e040a2450b	Add e5-mistral embedding model - step 3/3 (#988 )	2024-08-08 16:31:19 -07:00
Juwan Yoo	ab7875941b	feat: frequency, min_new_tokens, presence, and repetition penalties (#973 )	2024-08-08 04:21:08 -07:00
Ying Sheng	3bc99e6fe4	Test openai vision api (#925 )	2024-08-05 13:51:55 +10:00
Liangsheng Yin	bb66cc4c52	Fix CI && python3.8 compatible (#920 )	2024-08-04 16:02:05 -07:00
Ying Sheng	0d4f3a9fcd	Make API Key OpenAI-compatible (#917 )	2024-08-04 13:35:44 -07:00
Ying Sheng	995af5a54b	Improve the structure of CI (#911 )	2024-08-03 23:09:21 -07:00
Ying Sheng	70cc0749ce	Add model accuracy test - step 1 (#866 )	2024-08-03 18:20:50 -07:00
Ying Sheng	3cadecf0c4	Increase openai client limit (#886 )	2024-08-02 00:47:23 -07:00
Ying Sheng	e90e3a50d4	Add benchmark: HumanEval (#889 )	2024-08-02 00:46:41 -07:00
Ying Sheng	ae7ee01a8e	Add accuracy test to CI: MMLU (#882 )	2024-08-01 21:20:17 -07:00
Ying Sheng	72b6ea88b4	Make scripts under `/test/srt` as unit tests (#875 )	2024-08-01 14:34:55 -07:00
Ying Sheng	6f221d4ca0	Fix unit tests for the frontend language part (#872 )	2024-08-01 12:39:12 -07:00
Lianmin Zheng	0736b27020	[Minor] Improve the code style in TokenizerManager (#767 )	2024-07-27 05:05:15 -07:00
Mingyi	e3046ea3a8	Update OpenAI API (#667 )	2024-07-19 23:20:54 -07:00
Ying Sheng	51fda1439f	Update Readme (#660 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-07-19 09:54:01 -07:00
Ying Sheng	fb9296f0ed	Higher priority for user input of max_prefill_tokens & format (#540 )	2024-06-12 21:48:40 -07:00
Lianmin Zheng	f6dbd24043	Improve doc strings (#518 )	2024-06-08 02:39:32 -07:00
Lianmin Zheng	3bc01ac137	[Minor] improve code style	2024-06-03 18:11:34 -07:00
Lianmin Zheng	09de730dee	Improve benchmark scripts & add more models (#484 )	2024-05-27 14:13:26 -07:00
Lianmin Zheng	55c1643627	Improve benchmark scripts & rename some scripts (#477 )	2024-05-26 12:51:45 -07:00
Lianmin Zheng	ced77c6626	Rename api_num_spec_tokens -> num_api_spec_tokens (#458 )	2024-05-20 18:44:23 -07:00
Ying Sheng	3e684be7a3	Fix openai speculative execution (#456 )	2024-05-20 17:01:13 -07:00
Lianmin Zheng	8210ec60f4	Improve error handling & abort disconnected requests (#449 )	2024-05-17 05:49:31 -07:00
Lianmin Zheng	aee4f523cf	Fix logit processor bugs (#427 )	2024-05-12 04:54:07 -07:00
Qubitium	33b242df30	Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380 ) Co-authored-by: ZX <zx@lbx.dev> Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>	2024-05-11 16:37:49 -07:00
Liangsheng Yin	14522e6a26	Organize Benchmark (#381 )	2024-05-05 16:14:17 +08:00
Liangsheng Yin	9acc6e3504	add `.isort.cfg` (#378 )	2024-04-22 22:38:09 +08:00
Fronx	2b6d999191	Fix issue #367 – System message not supported for Anthropic (anthropic.BadRequestError) (#368 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-04-16 11:18:24 -07:00
Lianmin Zheng	e2b2f0a213	Support oai in benchmark/mmlu (#323 )	2024-03-22 13:37:57 -07:00
Lianmin Zheng	c51020cf0c	Fix the chat template for llava-v1.6-34b & format code (#177 )	2024-02-11 05:50:13 -08:00
Lianmin Zheng	74b3bfaaf8	format code	2024-01-30 16:36:10 +00:00
Keith Stevens	1d0fbe8e43	[Feature] Adds basic support for image content in OpenAI chat routes (#113 )	2024-01-30 06:12:33 -08:00
Lianmin Zheng	97aa9b3284	Improve docs & Add JSON decode example (#121 )	2024-01-30 05:45:27 -08:00
shiyi.c_98	fd7c479239	Gemini Backend (#9 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-01-16 22:29:37 -08:00
Lianmin Zheng	70359bf31a	Update benchmark scripts (#8 )	2024-01-15 16:12:57 -08:00
Lianmin Zheng	4bd8233f2c	Fix test cases (#6 )	2024-01-15 01:15:53 -08:00

1 2

53 Commits