sglang

Author	SHA1	Message	Date
Juwan Yoo	10bca45bc6	bugfix: penalizers to be merged before reqs (#1001 )	2024-08-09 21:46:24 +10:00
liuyhwangyh	b91a4cb1b1	support models from www.modelscope.cn (#994 ) Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>	2024-08-09 02:52:14 -07:00
Ying Sheng	e040a2450b	Add e5-mistral embedding model - step 3/3 (#988 )	2024-08-08 16:31:19 -07:00
Ying Sheng	9f662501a3	Move torch.compile configs into cuda_graph_runner.py (#993 )	2024-08-08 13:20:30 -07:00
Juwan Yoo	ab7875941b	feat: frequency, min_new_tokens, presence, and repetition penalties (#973 )	2024-08-08 04:21:08 -07:00
yichuan~	3a79613c28	support more optioin about usage in stream mode (#985 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-08 09:41:57 +00:00
Liangsheng Yin	1ac304eeb4	Adjust `InputeMetadata` and `ScheduleBatch` (#981 )	2024-08-08 01:11:22 -07:00
Ying Sheng	20a4f927dc	Add io struct for embedding models [unreachable code] - step 2/3 (#987 )	2024-08-08 07:52:31 +00:00
Ying Sheng	0de7c2d09e	Add e5-mistral modules [unreachable code] - step 1/3 (#983 )	2024-08-08 00:04:15 -07:00
Liangsheng Yin	6ed4e3b8fb	Fix chunked prefill (#984 )	2024-08-07 22:28:42 -07:00
Ying Sheng	00023d622a	[minor] Update type annotation in tokenizer_manager.py (#982 )	2024-08-08 01:48:45 +00:00
foszto	c62d560c03	#590 Increase default , track changes in examples and documentation (#971 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-08 00:54:46 +00:00
Liangsheng Yin	2b8257f325	Adjust max prefix len (#980 )	2024-08-08 00:41:26 +00:00
Liangsheng Yin	7623091d97	RadixCache method adjust (#977 )	2024-08-07 15:52:24 -07:00
Liangsheng Yin	f724f1f1e9	PrefillAdder abstraction (#968 )	2024-08-07 13:47:28 -07:00
Zhiqiang Xie	6db27f7b3b	misc: correct the int data type for token ids and indices (#969 )	2024-08-08 04:40:07 +08:00
Yineng Zhang	dc9d06d886	chore: bump v0.2.11 (#970 )	2024-08-07 20:47:53 +08:00
Yineng Zhang	c31f084c71	chore: update vllm to 0.5.4 (#966 )	2024-08-07 21:15:41 +10:00
Liangsheng Yin	a01ddd9605	misc: fix the req_to_token member change (#967 )	2024-08-07 01:52:10 -07:00
Liangsheng Yin	7fa54a1ab3	Make `req_pool_indices` on CPU (#960 )	2024-08-07 01:41:25 -07:00
Yineng Zhang	05abd1261c	misc: add compute capability in check_env (#965 )	2024-08-07 18:39:36 +10:00
Ying Sheng	ff68ae857a	Show more error messages for warmup errors (#932 )	2024-08-06 23:57:06 -07:00
yichuan~	795eab6dda	Add support for Batch API test (#936 )	2024-08-06 23:52:10 -07:00
Liangsheng Yin	87e8c090e9	Organize code (rename, movement) (#953 )	2024-08-06 20:50:32 -07:00
Liangsheng Yin	ad56e68495	Fix stuck in `get_new_prefill_batch` (#948 )	2024-08-06 01:05:58 -07:00
yichuan~	ffb15744b5	Support multiple args options (#941 )	2024-08-06 04:12:53 +10:00
Ke Bao	a9c833d580	Fix union operator (#940 )	2024-08-06 00:46:34 +08:00
Aidan Cooper	94e0115186	Feat: add alternative choices selection methods (#835 )	2024-08-05 03:27:49 -07:00
Aidan Cooper	b216a545b3	Remove leftover auth_token (#934 )	2024-08-05 03:25:48 -07:00
yichuan~	fd7926e46e	Fix prompt len in parallel sampling (#928 )	2024-08-05 00:56:08 -07:00
Ying Sheng	3bc99e6fe4	Test openai vision api (#925 )	2024-08-05 13:51:55 +10:00
min-xu-et	ebf69964cd	latency test enhancement - final part (#921 )	2024-08-04 18:15:23 -07:00
Ying Sheng	141e8c71a3	Bump version to 0.2.10 (#923 )	2024-08-04 16:52:51 -07:00
yichuan~	d53dcf9c98	Support more OpenAI API test (#916 )	2024-08-04 16:43:09 -07:00
Liangsheng Yin	bb66cc4c52	Fix CI && python3.8 compatible (#920 )	2024-08-04 16:02:05 -07:00
Ying Sheng	0d4f3a9fcd	Make API Key OpenAI-compatible (#917 )	2024-08-04 13:35:44 -07:00
min-xu-et	afd411d09f	enhance latency test - part 2 (#915 )	2024-08-04 12:27:25 -07:00
Ke Bao	e1eae1fd15	Support MLA for DeepSeek-V2 with Triton - step 1 (#905 )	2024-08-05 03:40:33 +10:00
Yineng Zhang	f4d9953d9d	misc: add triton in check_env PACKAGE_LIST (#914 )	2024-08-04 23:20:59 +10:00
Ying Sheng	995af5a54b	Improve the structure of CI (#911 )	2024-08-03 23:09:21 -07:00
min-xu-et	539856455d	latency test enhancement - part 1 (#909 )	2024-08-03 22:44:58 -07:00
Ying Sheng	70cc0749ce	Add model accuracy test - step 1 (#866 )	2024-08-03 18:20:50 -07:00
min-xu-et	7dd8a7e6d9	fixed an error handling in bench_latency.py (#904 )	2024-08-03 17:42:17 -07:00
Ying Sheng	b906c01592	Bump version to 0.2.9.post1 (#899 )	2024-08-02 12:08:00 -07:00
Yineng Zhang	046c2b339e	chore: add multipart dep for fastapi (#895 )	2024-08-03 00:50:19 +10:00
Yineng Zhang	6b8f66efe1	misc: update cuda graph capture exception log (#894 )	2024-08-03 00:40:52 +10:00
Ying Sheng	30a9b2ef20	Bump version to v0.2.9 (#890 )	2024-08-02 01:45:48 -07:00
Ying Sheng	3cadecf0c4	Increase openai client limit (#886 )	2024-08-02 00:47:23 -07:00
Ying Sheng	e90e3a50d4	Add benchmark: HumanEval (#889 )	2024-08-02 00:46:41 -07:00
Ying Sheng	fbd6b94d69	Fix the double BOS problem in the HF chat template (#888 )	2024-08-02 00:30:50 -07:00

1 2 3 4 5 ...

473 Commits