sglang

Author	SHA1	Message	Date
yichuan~	fd7926e46e	Fix prompt len in parallel sampling (#928 )	2024-08-05 00:56:08 -07:00
Ying Sheng	3bc99e6fe4	Test openai vision api (#925 )	2024-08-05 13:51:55 +10:00
min-xu-et	ebf69964cd	latency test enhancement - final part (#921 )	2024-08-04 18:15:23 -07:00
Ying Sheng	141e8c71a3	Bump version to 0.2.10 (#923 )	2024-08-04 16:52:51 -07:00
yichuan~	d53dcf9c98	Support more OpenAI API test (#916 )	2024-08-04 16:43:09 -07:00
Liangsheng Yin	bb66cc4c52	Fix CI && python3.8 compatible (#920 )	2024-08-04 16:02:05 -07:00
Ying Sheng	0d4f3a9fcd	Make API Key OpenAI-compatible (#917 )	2024-08-04 13:35:44 -07:00
min-xu-et	afd411d09f	enhance latency test - part 2 (#915 )	2024-08-04 12:27:25 -07:00
Ke Bao	e1eae1fd15	Support MLA for DeepSeek-V2 with Triton - step 1 (#905 )	2024-08-05 03:40:33 +10:00
Yineng Zhang	f4d9953d9d	misc: add triton in check_env PACKAGE_LIST (#914 )	2024-08-04 23:20:59 +10:00
Ying Sheng	995af5a54b	Improve the structure of CI (#911 )	2024-08-03 23:09:21 -07:00
min-xu-et	539856455d	latency test enhancement - part 1 (#909 )	2024-08-03 22:44:58 -07:00
Ying Sheng	70cc0749ce	Add model accuracy test - step 1 (#866 )	2024-08-03 18:20:50 -07:00
min-xu-et	7dd8a7e6d9	fixed an error handling in bench_latency.py (#904 )	2024-08-03 17:42:17 -07:00
Ying Sheng	b906c01592	Bump version to 0.2.9.post1 (#899 )	2024-08-02 12:08:00 -07:00
Yineng Zhang	046c2b339e	chore: add multipart dep for fastapi (#895 )	2024-08-03 00:50:19 +10:00
Yineng Zhang	6b8f66efe1	misc: update cuda graph capture exception log (#894 )	2024-08-03 00:40:52 +10:00
Ying Sheng	30a9b2ef20	Bump version to v0.2.9 (#890 )	2024-08-02 01:45:48 -07:00
Ying Sheng	3cadecf0c4	Increase openai client limit (#886 )	2024-08-02 00:47:23 -07:00
Ying Sheng	e90e3a50d4	Add benchmark: HumanEval (#889 )	2024-08-02 00:46:41 -07:00
Ying Sheng	fbd6b94d69	Fix the double BOS problem in the HF chat template (#888 )	2024-08-02 00:30:50 -07:00
Ying Sheng	ae7ee01a8e	Add accuracy test to CI: MMLU (#882 )	2024-08-01 21:20:17 -07:00
任嘉	4013a4e1b0	Implement served_model_name to customize model id when use local mode… (#749 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-01 17:13:51 -07:00
Ying Sheng	60340a3643	Improve the coverage of the openai api server test (#878 )	2024-08-01 16:01:30 -07:00
Ying Sheng	72b6ea88b4	Make scripts under `/test/srt` as unit tests (#875 )	2024-08-01 14:34:55 -07:00
Ying Sheng	e4d3333c6c	bump to 0.2.8 (#877 )	2024-08-01 14:18:26 -07:00
Ying Sheng	6f221d4ca0	Fix unit tests for the frontend language part (#872 )	2024-08-01 12:39:12 -07:00
Yineng Zhang	7f6c690b67	misc: use pip cache purge and add unit test ci (#871 )	2024-08-02 05:12:20 +10:00
Liangsheng Yin	c020f9ceda	Support chunked prefill when radix cache is disabled (#811 )	2024-08-01 00:29:01 -07:00
yichuan~	ca600e8cd6	Add support for logprobs in OpenAI chat API (#852 )	2024-08-01 00:08:21 -07:00
Kai Fronsdal	0c0c81372e	Fix #857 (#858 )	2024-08-01 00:05:39 -07:00
Ying Sheng	5e7dd984fe	Fix llama for classification (#855 )	2024-07-31 15:48:31 -07:00
Yineng Zhang	bc3eaac2b8	chore: update flashinfer to v0.1.3 (#850 )	2024-08-01 04:37:05 +10:00
Liangsheng Yin	a6c7ebbbcb	Add req slots leaking check (#842 )	2024-07-30 18:29:01 -07:00
yichuan~	bb0501c0d9	Fix List input bug (#838 )	2024-07-30 13:40:51 -07:00
Liangsheng Yin	6b0f2e9088	Add `--max-total-tokens` (#840 )	2024-07-30 13:33:55 -07:00
Yineng Zhang	1edd4e07d6	chore: bump v0.2.7 (#830 )	2024-07-30 20:41:10 +10:00
Yineng Zhang	f52eda35ea	misc: update e2e test benchmark config (#825 )	2024-07-30 19:19:23 +10:00
Ying Sheng	b579ecf028	Add awq_marlin (#826 )	2024-07-30 02:04:51 -07:00
Ying Sheng	e7487b08bc	Adjust default mem fraction to avoid OOM (#823 )	2024-07-30 01:58:31 -07:00
Ying Sheng	ae5c0fc442	Support disable_ignore_eos in bench_serving.py (#824 )	2024-07-30 01:42:07 -07:00
ObjectNotFound	daf593a385	Fix streaming bug (#820 )	2024-07-30 00:32:07 -07:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00
Enrique Shockwave	21e22b9e96	Fix LiteLLM kwargs (#817 )	2024-07-29 22:38:02 -07:00
Ying Sheng	db6089e6f3	Revert "Organize public APIs" (#815 )	2024-07-29 19:40:28 -07:00
Liangsheng Yin	3520f75fb1	Remove inf value for chunked prefill size (#812 )	2024-07-29 18:34:25 -07:00
Liangsheng Yin	c8e9fed87a	Organize public APIs (#809 )	2024-07-29 15:34:16 -07:00
yichuan~	084fa54d37	Add support for OpenAI API : offline batch(file) processing (#699 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-07-29 13:07:18 -07:00
Ying Sheng	eba458bd19	Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" (#806 )	2024-07-29 12:20:42 -07:00
Yineng Zhang	3d1cb0af83	feat: add chat template for internlm2-chat (#802 )	2024-07-30 03:18:03 +08:00

1 2 3 4 5 ...

444 Commits