sglang

Author	SHA1	Message	Date
Ying Sheng	98111fbe3e	Revert "Chunked prefill support" (#799 )	2024-07-29 02:38:31 -07:00
Liangsheng Yin	2ec39ab712	Chunked prefill support (#797 )	2024-07-29 02:21:50 -07:00
Ying Sheng	325a06c2de	Fix logging (#796 )	2024-07-28 23:01:45 -07:00
Yineng Zhang	dd7e8b9421	chore: add copyright for srt (#790 )	2024-07-28 23:07:12 +10:00
Lianmin Zheng	30db99b3d9	Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776 )	2024-07-27 19:50:34 -07:00
Lianmin Zheng	a036d41980	Fix max new tokens (#772 )	2024-07-27 17:22:18 -07:00
Lianmin Zheng	f95e661757	Fix max_tokens for OpenAI chat completion API (#766 )	2024-07-27 15:44:27 -07:00
Ying Sheng	30d8e130e7	Improve benchmark scripts (#717 )	2024-07-24 14:44:14 -07:00
Ying Sheng	83d2b30d75	format	2024-07-24 10:53:07 +00:00
Ying Sheng	4367f4bb8d	Fix prefill size (#711 )	2024-07-24 03:41:15 -07:00
Liangsheng Yin	4cd64b8ee6	Auto adjust new ratio (#708 )	2024-07-23 22:06:02 -07:00
Liangsheng Yin	268684439b	Use min new token ratio at start (#701 )	2024-07-23 11:52:50 -07:00
Ying Sheng	06487f126e	refactor model loader: initial refactor (#664 )	2024-07-20 02:18:22 -07:00
Liangsheng Yin	7620cd37dd	Fix jump forward when streaming (#665 )	2024-07-19 16:42:06 -07:00
Liangsheng Yin	a9ef49c12c	Detokenize incrementally when streaming (#653 )	2024-07-18 17:57:40 -07:00
Mingyi	d774acad5c	Remove the dependency of rpyc (#646 )	2024-07-18 02:13:54 -07:00
Liangsheng Yin	3de2f30a27	Flashinfer sample kernel (#617 )	2024-07-17 13:24:43 -07:00
zhyncs	2e341cd493	misc: add pre-commit config (#637 )	2024-07-17 11:55:39 -07:00
Ying Sheng	a470e60c97	clean up step function (#635 )	2024-07-16 20:15:24 -07:00
Ying Sheng	6a2941f4d0	Improve tensor parallel performance (#625 ) Co-authored-by: Mingyi <wisclmy0611@gmail.com>	2024-07-15 07:10:51 -07:00
Mingyi	5ac8b80677	Simplify mem state (#623 )	2024-07-15 02:01:09 -07:00
Liangsheng Yin	564a898ad9	Optimize mem indices mangement (#619 )	2024-07-13 23:39:37 -07:00
Ying Sheng	5949b1ca0e	Fix memory pool index error (#616 )	2024-07-13 16:45:11 -07:00
Lianmin Zheng	665815969a	Enable cuda graph by default (#612 )	2024-07-13 05:29:46 -07:00
Ying Sheng	dc1b8bcfaa	Format (#593 )	2024-07-05 10:06:17 -07:00
Ying Sheng	75b31a2a88	Update run_batch interface and max_prefill_tokens (#574 )	2024-06-30 18:26:04 -07:00
sglang	11616fc6bd	Minor fix in compiler & format (#545 )	2024-06-29 23:42:14 -07:00
Lianmin Zheng	badf3fa020	Expose dtype argument (#569 )	2024-06-27 23:30:39 -07:00
Lianmin Zheng	2e6e62e156	Increase the number of thread limitation for tp worker managers. (#567 )	2024-06-26 09:33:45 -07:00
Lianmin Zheng	a385ee27bd	Warmup cublas (#566 )	2024-06-25 12:46:00 -07:00
Lianmin Zheng	303ef8883e	Clean up logits processor (#558 )	2024-06-22 00:25:24 -07:00
Ying Sheng	09593e9bc9	Multi-node Tensor Parallelism (#550 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-06-17 20:41:24 -07:00
Qubitium-modelcloud	bbec01c9aa	Fix tp worker only checking req[0] for stream (#546 )	2024-06-14 22:56:10 -07:00
Ying Sheng	fb9296f0ed	Higher priority for user input of max_prefill_tokens & format (#540 )	2024-06-12 21:48:40 -07:00
Liangsheng Yin	9c902b1954	Decode Incrementally (#517 )	2024-06-11 23:39:12 -07:00
Lianmin Zheng	f6dbd24043	Improve doc strings (#518 )	2024-06-08 02:39:32 -07:00
Lianmin Zheng	91f93f141f	Crash the server when error or OOM happens (#514 )	2024-06-07 19:22:34 -07:00
Qubitium	f70f72586a	Fix rid state map leak + Refractor .finished (#505 ) Co-authored-by: ZX <zx@lbx.dev>	2024-06-07 13:20:40 -07:00
Lianmin Zheng	c0ae70c8ed	Improve logging & fix litellm dependency. (#512 )	2024-06-07 13:10:32 -07:00
Ying Sheng	83525a1df2	Revert "Make the server random by default" (#492 )	2024-05-31 12:00:21 -07:00
Lianmin Zheng	80a33ce8b0	Do not set the default value of global random seed (#488 )	2024-05-29 18:41:18 -04:00
Ying Sheng	0463f7fb52	Support data parallelism (static) (#480 ) Co-authored-by: Ying Sheng <ying.sheng@databricks.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2024-05-27 21:24:10 -07:00

42 Commits