sglang

Author	SHA1	Message	Date
Liangsheng Yin	c020f9ceda	Support chunked prefill when radix cache is disabled (#811 )	2024-08-01 00:29:01 -07:00
yichuan~	ca600e8cd6	Add support for logprobs in OpenAI chat API (#852 )	2024-08-01 00:08:21 -07:00
Kai Fronsdal	0c0c81372e	Fix #857 (#858 )	2024-08-01 00:05:39 -07:00
Ying Sheng	5e7dd984fe	Fix llama for classification (#855 )	2024-07-31 15:48:31 -07:00
Yineng Zhang	bc3eaac2b8	chore: update flashinfer to v0.1.3 (#850 )	2024-08-01 04:37:05 +10:00
Liangsheng Yin	a6c7ebbbcb	Add req slots leaking check (#842 )	2024-07-30 18:29:01 -07:00
yichuan~	bb0501c0d9	Fix List input bug (#838 )	2024-07-30 13:40:51 -07:00
Liangsheng Yin	6b0f2e9088	Add `--max-total-tokens` (#840 )	2024-07-30 13:33:55 -07:00
Yineng Zhang	1edd4e07d6	chore: bump v0.2.7 (#830 )	2024-07-30 20:41:10 +10:00
Yineng Zhang	f52eda35ea	misc: update e2e test benchmark config (#825 )	2024-07-30 19:19:23 +10:00
Ying Sheng	b579ecf028	Add awq_marlin (#826 )	2024-07-30 02:04:51 -07:00
Ying Sheng	e7487b08bc	Adjust default mem fraction to avoid OOM (#823 )	2024-07-30 01:58:31 -07:00
Ying Sheng	ae5c0fc442	Support disable_ignore_eos in bench_serving.py (#824 )	2024-07-30 01:42:07 -07:00
ObjectNotFound	daf593a385	Fix streaming bug (#820 )	2024-07-30 00:32:07 -07:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00
Enrique Shockwave	21e22b9e96	Fix LiteLLM kwargs (#817 )	2024-07-29 22:38:02 -07:00
Ying Sheng	db6089e6f3	Revert "Organize public APIs" (#815 )	2024-07-29 19:40:28 -07:00
Liangsheng Yin	3520f75fb1	Remove inf value for chunked prefill size (#812 )	2024-07-29 18:34:25 -07:00
Liangsheng Yin	c8e9fed87a	Organize public APIs (#809 )	2024-07-29 15:34:16 -07:00
yichuan~	084fa54d37	Add support for OpenAI API : offline batch(file) processing (#699 ) Co-authored-by: hnyls2002 <hnyls2002@gmail.com>	2024-07-29 13:07:18 -07:00
Ying Sheng	eba458bd19	Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" (#806 )	2024-07-29 12:20:42 -07:00
Yineng Zhang	3d1cb0af83	feat: add chat template for internlm2-chat (#802 )	2024-07-30 03:18:03 +08:00
Ying Sheng	7d352b4fdd	Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118" (#805 )	2024-07-29 11:39:12 -07:00
Yineng Zhang	87064015d9	fix: update flashinfer to 0.1.2 to fix sampling for cu118 (#803 )	2024-07-29 11:00:52 -07:00
Liangsheng Yin	7cd4f244a4	Chunked prefill (#800 )	2024-07-29 03:32:58 -07:00
Ying Sheng	98111fbe3e	Revert "Chunked prefill support" (#799 )	2024-07-29 02:38:31 -07:00
Liangsheng Yin	2ec39ab712	Chunked prefill support (#797 )	2024-07-29 02:21:50 -07:00
ObjectNotFound	8f6274c82b	Add role documentation, add system begin & end tokens (#793 )	2024-07-28 23:02:49 -07:00
Ying Sheng	325a06c2de	Fix logging (#796 )	2024-07-28 23:01:45 -07:00
Ying Sheng	79f816292e	Fix lazy import location (#795 )	2024-07-28 22:09:50 -07:00
Eric Yoon	b688fd858d	Lazy-import third-party backends (#794 )	2024-07-28 21:57:41 -07:00
Ying Sheng	8d908a937c	Fix echo + lobprob for OpenAI API when the prompt is a list (#791 )	2024-07-28 17:09:16 -07:00
Yineng Zhang	dd7e8b9421	chore: add copyright for srt (#790 )	2024-07-28 23:07:12 +10:00
Ying Sheng	c71880f896	Vectorize logprobs computation (#787 )	2024-07-28 05:22:14 -07:00
Yineng Zhang	948625799e	docs: init readthedocs support (#783 )	2024-07-28 16:50:31 +10:00
Yineng Zhang	68e5262699	fix: replace pillow with PIL in PACKAGE_LIST (#781 )	2024-07-28 14:06:24 +10:00
Lianmin Zheng	bc1154c399	Bump version to 0.2.6 (#779 )	2024-07-27 20:29:33 -07:00
Lianmin Zheng	752e643007	Allow disabling flashinfer sampling kernel (#778 )	2024-07-27 20:18:56 -07:00
Lianmin Zheng	30db99b3d9	Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776 )	2024-07-27 19:50:34 -07:00
Lianmin Zheng	0a409bd438	Fix return_log_probs with cuda graph (#775 )	2024-07-27 19:15:09 -07:00
Mingyi	e4db4e5ba5	minor refactor: move check server args to server_args.py (#774 )	2024-07-27 19:03:40 -07:00
Lianmin Zheng	bbc07c4197	Move sampling logits to float32 (#773 )	2024-07-27 17:30:12 -07:00
Lianmin Zheng	a036d41980	Fix max new tokens (#772 )	2024-07-27 17:22:18 -07:00
Lianmin Zheng	f95e661757	Fix max_tokens for OpenAI chat completion API (#766 )	2024-07-27 15:44:27 -07:00
Lianmin Zheng	0736b27020	[Minor] Improve the code style in TokenizerManager (#767 )	2024-07-27 05:05:15 -07:00
Ke Bao	3fdab91912	Fix TransformerTokenizer init for chatglm2 & 3 (#761 )	2024-07-27 02:44:46 -07:00
Liangsheng Yin	d9fccfefe2	Fix context length (#757 )	2024-07-26 18:13:13 -07:00
Liangsheng Yin	679ebcbbdc	Deepseek v2 support (#693 )	2024-07-26 17:10:07 -07:00
Yineng Zhang	5bd06b4599	fix: use REPO_TOKEN (#755 )	2024-07-27 05:56:30 +10:00
Yineng Zhang	9a61182732	fix: add release tag workflow (#754 )	2024-07-27 05:48:38 +10:00

1 2 3 4 5 ...

416 Commits