sglang

Author	SHA1	Message	Date
Ying Sheng	0de7c2d09e	Add e5-mistral modules [unreachable code] - step 1/3 (#983 )	2024-08-08 00:04:15 -07:00
Liangsheng Yin	6ed4e3b8fb	Fix chunked prefill (#984 )	2024-08-07 22:28:42 -07:00
Ying Sheng	00023d622a	[minor] Update type annotation in tokenizer_manager.py (#982 )	2024-08-08 01:48:45 +00:00
foszto	c62d560c03	#590 Increase default , track changes in examples and documentation (#971 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-08 00:54:46 +00:00
Liangsheng Yin	2b8257f325	Adjust max prefix len (#980 )	2024-08-08 00:41:26 +00:00
Liangsheng Yin	7623091d97	RadixCache method adjust (#977 )	2024-08-07 15:52:24 -07:00
Liangsheng Yin	f724f1f1e9	PrefillAdder abstraction (#968 )	2024-08-07 13:47:28 -07:00
Zhiqiang Xie	6db27f7b3b	misc: correct the int data type for token ids and indices (#969 )	2024-08-08 04:40:07 +08:00
Liangsheng Yin	a01ddd9605	misc: fix the req_to_token member change (#967 )	2024-08-07 01:52:10 -07:00
Liangsheng Yin	7fa54a1ab3	Make `req_pool_indices` on CPU (#960 )	2024-08-07 01:41:25 -07:00
Ying Sheng	ff68ae857a	Show more error messages for warmup errors (#932 )	2024-08-06 23:57:06 -07:00
yichuan~	795eab6dda	Add support for Batch API test (#936 )	2024-08-06 23:52:10 -07:00
Liangsheng Yin	87e8c090e9	Organize code (rename, movement) (#953 )	2024-08-06 20:50:32 -07:00
Liangsheng Yin	ad56e68495	Fix stuck in `get_new_prefill_batch` (#948 )	2024-08-06 01:05:58 -07:00
yichuan~	ffb15744b5	Support multiple args options (#941 )	2024-08-06 04:12:53 +10:00
yichuan~	fd7926e46e	Fix prompt len in parallel sampling (#928 )	2024-08-05 00:56:08 -07:00
Ying Sheng	3bc99e6fe4	Test openai vision api (#925 )	2024-08-05 13:51:55 +10:00
yichuan~	d53dcf9c98	Support more OpenAI API test (#916 )	2024-08-04 16:43:09 -07:00
Liangsheng Yin	bb66cc4c52	Fix CI && python3.8 compatible (#920 )	2024-08-04 16:02:05 -07:00
Ying Sheng	0d4f3a9fcd	Make API Key OpenAI-compatible (#917 )	2024-08-04 13:35:44 -07:00
Ke Bao	e1eae1fd15	Support MLA for DeepSeek-V2 with Triton - step 1 (#905 )	2024-08-05 03:40:33 +10:00
Ying Sheng	70cc0749ce	Add model accuracy test - step 1 (#866 )	2024-08-03 18:20:50 -07:00
min-xu-et	7dd8a7e6d9	fixed an error handling in bench_latency.py (#904 )	2024-08-03 17:42:17 -07:00
Yineng Zhang	6b8f66efe1	misc: update cuda graph capture exception log (#894 )	2024-08-03 00:40:52 +10:00
Ying Sheng	fbd6b94d69	Fix the double BOS problem in the HF chat template (#888 )	2024-08-02 00:30:50 -07:00
任嘉	4013a4e1b0	Implement served_model_name to customize model id when use local mode… (#749 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-01 17:13:51 -07:00
Ying Sheng	60340a3643	Improve the coverage of the openai api server test (#878 )	2024-08-01 16:01:30 -07:00
Ying Sheng	72b6ea88b4	Make scripts under `/test/srt` as unit tests (#875 )	2024-08-01 14:34:55 -07:00
Ying Sheng	6f221d4ca0	Fix unit tests for the frontend language part (#872 )	2024-08-01 12:39:12 -07:00
Liangsheng Yin	c020f9ceda	Support chunked prefill when radix cache is disabled (#811 )	2024-08-01 00:29:01 -07:00
yichuan~	ca600e8cd6	Add support for logprobs in OpenAI chat API (#852 )	2024-08-01 00:08:21 -07:00
Ying Sheng	5e7dd984fe	Fix llama for classification (#855 )	2024-07-31 15:48:31 -07:00
Yineng Zhang	bc3eaac2b8	chore: update flashinfer to v0.1.3 (#850 )	2024-08-01 04:37:05 +10:00
Liangsheng Yin	a6c7ebbbcb	Add req slots leaking check (#842 )	2024-07-30 18:29:01 -07:00
yichuan~	bb0501c0d9	Fix List input bug (#838 )	2024-07-30 13:40:51 -07:00
Liangsheng Yin	6b0f2e9088	Add `--max-total-tokens` (#840 )	2024-07-30 13:33:55 -07:00
Ying Sheng	b579ecf028	Add awq_marlin (#826 )	2024-07-30 02:04:51 -07:00
Ying Sheng	e7487b08bc	Adjust default mem fraction to avoid OOM (#823 )	2024-07-30 01:58:31 -07:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00
Liangsheng Yin	3520f75fb1	Remove inf value for chunked prefill size (#812 )	2024-07-29 18:34:25 -07:00
yichuan~	084fa54d37	Add support for OpenAI API : offline batch(file) processing (#699 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-07-29 13:07:18 -07:00
Ying Sheng	eba458bd19	Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" (#806 )	2024-07-29 12:20:42 -07:00
Yineng Zhang	3d1cb0af83	feat: add chat template for internlm2-chat (#802 )	2024-07-30 03:18:03 +08:00
Ying Sheng	7d352b4fdd	Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118" (#805 )	2024-07-29 11:39:12 -07:00
Yineng Zhang	87064015d9	fix: update flashinfer to 0.1.2 to fix sampling for cu118 (#803 )	2024-07-29 11:00:52 -07:00
Liangsheng Yin	7cd4f244a4	Chunked prefill (#800 )	2024-07-29 03:32:58 -07:00
Ying Sheng	98111fbe3e	Revert "Chunked prefill support" (#799 )	2024-07-29 02:38:31 -07:00
Liangsheng Yin	2ec39ab712	Chunked prefill support (#797 )	2024-07-29 02:21:50 -07:00
Ying Sheng	325a06c2de	Fix logging (#796 )	2024-07-28 23:01:45 -07:00
Ying Sheng	8d908a937c	Fix echo + lobprob for OpenAI API when the prompt is a list (#791 )	2024-07-28 17:09:16 -07:00

1 2 3 4 5 ...

340 Commits