sglang

Author	SHA1	Message	Date
yichuan~	49c5e0eca9	Add support for OpenAI API parallel sampling (#640 )	2024-07-19 23:10:01 -07:00
Ke Bao	ec2150b294	Fix kill process util (#666 )	2024-07-19 21:43:11 -07:00
Liangsheng Yin	7620cd37dd	Fix jump forward when streaming (#665 )	2024-07-19 16:42:06 -07:00
Ying Sheng	11c8efff73	Add benchmark instructions (#663 )	2024-07-19 11:12:23 -07:00
Ying Sheng	e87c7fd501	Improve docs (#662 )	2024-07-19 10:58:03 -07:00
zhyncs	630479c3a6	feat: update check env (#661 )	2024-07-19 09:54:15 -07:00
Ying Sheng	51fda1439f	Update Readme (#660 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-07-19 09:54:01 -07:00
zhyncs	dc4e4a6acc	misc: update SGLang package description (#659 )	2024-07-19 09:27:39 -07:00
Ying Sheng	2d96da813e	refactor model loader [unreachable code]: initial refactor (#655 )	2024-07-19 09:27:06 -07:00
zhyncs	c126a6ccba	feat: add benchmark serving (#657 )	2024-07-19 09:15:21 -07:00
zhyncs	ac971ff633	perf: reduce ttft and itl with stream_interval 1 (#658 )	2024-07-19 09:14:22 -07:00
Lianmin Zheng	e1792cca24	Remove cached triton launcher (#656 )	2024-07-18 23:28:40 -07:00
shrirajh	1b7adbb5a0	`TokenizerManager.context_len` should inherit from `server_args.conte… (#654 )	2024-07-18 21:55:29 -07:00
Liangsheng Yin	a9ef49c12c	Detokenize incrementally when streaming (#653 )	2024-07-18 17:57:40 -07:00
Ying Sheng	21ba3a88a1	Remove useless variables in infer_batch.py (#651 )	2024-07-18 05:31:44 -07:00
zhyncs	9c5cac2450	fix: resolve lint error (#650 )	2024-07-18 03:33:21 -07:00
zhyncs	b050d9283f	fix: set ulimit -n 65535 (#647 )	2024-07-18 02:35:45 -07:00
zhyncs	6a4dc99697	misc: rm rpyc from PACKAGE_LIST (#649 )	2024-07-18 02:35:38 -07:00
Mingyi	d774acad5c	Remove the dependency of rpyc (#646 )	2024-07-18 02:13:54 -07:00
zhyncs	d93388da3e	feat: add check_env (#645 )	2024-07-17 21:39:28 -07:00
Ying Sheng	476584cb6e	Increase the capacity of the memory pool (#643 )	2024-07-17 15:44:41 -07:00
Liangsheng Yin	abd5385ac5	Move `global_server_args_dict` (#642 )	2024-07-17 13:49:15 -07:00
Liangsheng Yin	3de2f30a27	Flashinfer sample kernel (#617 )	2024-07-17 13:24:43 -07:00
zhyncs	2e341cd493	misc: add pre-commit config (#637 )	2024-07-17 11:55:39 -07:00
zhyncs	a8552cb18b	feat: support internlm2 (#636 )	2024-07-16 22:40:03 -07:00
Ying Sheng	a470e60c97	clean up step function (#635 )	2024-07-16 20:15:24 -07:00
Liangsheng Yin	5ff60eda78	Fix vertexai (#633 )	2024-07-16 16:07:19 -07:00
Aidan Cooper	c193002297	Add support for VertexAI safety settings (#624 )	2024-07-16 11:54:42 -07:00
ylying	fe3be1595d	Add qwen2 tie word embedding (#630 )	2024-07-16 11:48:49 -07:00
Ying Sheng	0aa189f150	Disable NCCL_NVLS by default (#631 )	2024-07-16 09:05:10 -07:00
Liangsheng Yin	c9ee3d3559	Fix model forward grad (#628 )	2024-07-15 22:09:09 -07:00
Lianmin Zheng	41d1f67704	Fix flush cache (#627 )	2024-07-15 20:44:04 -07:00
Ying Sheng	56f5fc4ab5	Bump version to 0.1.21 (#626 )	2024-07-15 13:10:53 -07:00
Ying Sheng	6a2941f4d0	Improve tensor parallel performance (#625 ) Co-authored-by: Mingyi <wisclmy0611@gmail.com>	2024-07-15 07:10:51 -07:00
Mingyi	5ac8b80677	Simplify mem state (#623 )	2024-07-15 02:01:09 -07:00
Liangsheng Yin	a56858ba67	Unify index operations (#620 )	2024-07-14 12:55:55 -07:00
Liangsheng Yin	564a898ad9	Optimize mem indices mangement (#619 )	2024-07-13 23:39:37 -07:00
Lianmin Zheng	5d264a90ac	Bump version to 0.1.20 (#618 )	2024-07-13 17:27:55 -07:00
Ying Sheng	5949b1ca0e	Fix memory pool index error (#616 )	2024-07-13 16:45:11 -07:00
Lianmin Zheng	0feca02dd9	Improve benchmark scripts (#615 )	2024-07-13 15:59:04 -07:00
Liangsheng Yin	10143e1a5f	Memorypool chunked prefetch (#614 )	2024-07-13 15:24:03 -07:00
Lianmin Zheng	65c6577696	Improve benchmark scripts & fix llava (#613 )	2024-07-13 15:00:26 -07:00
Lianmin Zheng	665815969a	Enable cuda graph by default (#612 )	2024-07-13 05:29:46 -07:00
Lianmin Zheng	396a69240f	Cleanup attention backend: flashinfer and triton (#611 )	2024-07-12 18:21:11 -07:00
Lianmin Zheng	af4e7910e7	Clean up the usage of flashinfer (#610 )	2024-07-12 13:00:03 -07:00
Lianmin Zheng	519e20cfda	Code clean up: Remove deprecated prefill move InputMetadata to infer_batch.py (#609 )	2024-07-12 12:28:09 -07:00
Lianmin Zheng	d9a6902986	Fix bench latency (#607 )	2024-07-11 14:37:01 -07:00
Lianmin Zheng	ad872feb14	bump version to 0.1.19	2024-07-09 02:23:14 -07:00
Lianmin Zheng	da2e5d6546	Fix the default argument of OpenAI Chat completion (#605 )	2024-07-09 02:04:43 -07:00
胡译文	02b7258658	[Feat] Expose logprob options to `sgl.gen` API (#503 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-07-09 00:35:39 -07:00

1 2 3 4 5 ...

311 Commits