sglang

Author	SHA1	Message	Date
Lianmin Zheng	d84c5e70f7	Test the case when max_new_tokens is very large (#1038 )	2024-08-11 16:41:03 -07:00
Lianmin Zheng	d785412077	Fix the case when max_new_tokens is too large (#1025 )	2024-08-11 15:20:18 -07:00
Liangsheng Yin	7b6a5332ca	Fix triton args init (#1034 )	2024-08-11 12:11:26 -07:00
Lianmin Zheng	4080e82244	Fix the case where r.prefix_indices is None (#1031 )	2024-08-11 04:53:51 -07:00
Yineng Zhang	c245b78973	hotfix: add CustomOp abstraction (#1027 )	2024-08-11 02:45:59 -07:00
Lianmin Zheng	9dae407812	Improve type annotation (#1029 )	2024-08-11 02:44:59 -07:00
Liangsheng Yin	fcc0f5ed99	Fix wrong assert (#1028 )	2024-08-11 09:22:16 +00:00
Lianmin Zheng	a97df79124	Clean up readme and arguments of chunked prefill (#1022 )	2024-08-11 01:18:52 -07:00
Yineng Zhang	94752ac811	feat: use FlashInfer rmsnorm and silu (#907 )	2024-08-11 14:57:13 +10:00
Liangsheng Yin	43fbb6d919	Fix `input_ids` && rename to `fill_ids` (#1021 )	2024-08-10 16:24:12 -07:00
Lianmin Zheng	54fb1c80c0	Clean up unit tests (#1020 )	2024-08-10 15:09:03 -07:00
Ying Sheng	b68c4c073b	fix: force max new tokens to be 1 for embedding request (#1019 )	2024-08-10 13:46:42 -07:00
Yineng Zhang	e712837d38	misc: update test config (#990 )	2024-08-11 04:20:30 +10:00
Ying Sheng	7599badeaf	Support embedding input as a list (#1014 )	2024-08-10 08:39:05 -07:00
Liangsheng Yin	62757db6f0	Reduce the overhead when cache is disabled (#1010 )	2024-08-09 16:36:57 -07:00
Liangsheng Yin	73fa2d49d5	Some warnings to crash when CI (#1009 )	2024-08-09 15:16:23 -07:00
Mingyi	61728884d7	Fix benchmark latency (#1007 )	2024-08-09 13:18:58 -07:00
gryffindor-rr	9cf0a5bada	Add skip_tokenizer_init args. (#959 ) Co-authored-by: lzhang <zhanglei@modelbest.cn>	2024-08-09 12:14:13 -07:00
Ying Sheng	b16e856f11	Add openai embedding API (#997 )	2024-08-09 11:19:18 -07:00
Roger Wang	05c50a82b8	Minor bugfix on benchmark serving (#1005 )	2024-08-10 02:53:50 +10:00
Yineng Zhang	b568df5d03	fix: resolve correctness_test issue (#1002 )	2024-08-09 23:21:42 +10:00
Juwan Yoo	10bca45bc6	bugfix: penalizers to be merged before reqs (#1001 )	2024-08-09 21:46:24 +10:00
liuyhwangyh	b91a4cb1b1	support models from www.modelscope.cn (#994 ) Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>	2024-08-09 02:52:14 -07:00
Ying Sheng	e040a2450b	Add e5-mistral embedding model - step 3/3 (#988 )	2024-08-08 16:31:19 -07:00
Ying Sheng	9f662501a3	Move torch.compile configs into cuda_graph_runner.py (#993 )	2024-08-08 13:20:30 -07:00
Juwan Yoo	ab7875941b	feat: frequency, min_new_tokens, presence, and repetition penalties (#973 )	2024-08-08 04:21:08 -07:00
yichuan~	3a79613c28	support more optioin about usage in stream mode (#985 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-08 09:41:57 +00:00
Liangsheng Yin	1ac304eeb4	Adjust `InputeMetadata` and `ScheduleBatch` (#981 )	2024-08-08 01:11:22 -07:00
Ying Sheng	20a4f927dc	Add io struct for embedding models [unreachable code] - step 2/3 (#987 )	2024-08-08 07:52:31 +00:00
Ying Sheng	0de7c2d09e	Add e5-mistral modules [unreachable code] - step 1/3 (#983 )	2024-08-08 00:04:15 -07:00
Liangsheng Yin	6ed4e3b8fb	Fix chunked prefill (#984 )	2024-08-07 22:28:42 -07:00
Ying Sheng	00023d622a	[minor] Update type annotation in tokenizer_manager.py (#982 )	2024-08-08 01:48:45 +00:00
foszto	c62d560c03	#590 Increase default , track changes in examples and documentation (#971 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-08 00:54:46 +00:00
Liangsheng Yin	2b8257f325	Adjust max prefix len (#980 )	2024-08-08 00:41:26 +00:00
Liangsheng Yin	7623091d97	RadixCache method adjust (#977 )	2024-08-07 15:52:24 -07:00
Liangsheng Yin	f724f1f1e9	PrefillAdder abstraction (#968 )	2024-08-07 13:47:28 -07:00
Zhiqiang Xie	6db27f7b3b	misc: correct the int data type for token ids and indices (#969 )	2024-08-08 04:40:07 +08:00
Yineng Zhang	dc9d06d886	chore: bump v0.2.11 (#970 )	2024-08-07 20:47:53 +08:00
Yineng Zhang	c31f084c71	chore: update vllm to 0.5.4 (#966 )	2024-08-07 21:15:41 +10:00
Liangsheng Yin	a01ddd9605	misc: fix the req_to_token member change (#967 )	2024-08-07 01:52:10 -07:00
Liangsheng Yin	7fa54a1ab3	Make `req_pool_indices` on CPU (#960 )	2024-08-07 01:41:25 -07:00
Yineng Zhang	05abd1261c	misc: add compute capability in check_env (#965 )	2024-08-07 18:39:36 +10:00
Ying Sheng	ff68ae857a	Show more error messages for warmup errors (#932 )	2024-08-06 23:57:06 -07:00
yichuan~	795eab6dda	Add support for Batch API test (#936 )	2024-08-06 23:52:10 -07:00
Liangsheng Yin	87e8c090e9	Organize code (rename, movement) (#953 )	2024-08-06 20:50:32 -07:00
Liangsheng Yin	ad56e68495	Fix stuck in `get_new_prefill_batch` (#948 )	2024-08-06 01:05:58 -07:00
yichuan~	ffb15744b5	Support multiple args options (#941 )	2024-08-06 04:12:53 +10:00
Ke Bao	a9c833d580	Fix union operator (#940 )	2024-08-06 00:46:34 +08:00
Aidan Cooper	94e0115186	Feat: add alternative choices selection methods (#835 )	2024-08-05 03:27:49 -07:00
Aidan Cooper	b216a545b3	Remove leftover auth_token (#934 )	2024-08-05 03:25:48 -07:00

1 2 3 4 5 ...

494 Commits