Commit Graph

311 Commits

Author SHA1 Message Date
yichuan~
49c5e0eca9 Add support for OpenAI API parallel sampling (#640) 2024-07-19 23:10:01 -07:00
Ke Bao
ec2150b294 Fix kill process util (#666) 2024-07-19 21:43:11 -07:00
Liangsheng Yin
7620cd37dd Fix jump forward when streaming (#665) 2024-07-19 16:42:06 -07:00
Ying Sheng
11c8efff73 Add benchmark instructions (#663) 2024-07-19 11:12:23 -07:00
Ying Sheng
e87c7fd501 Improve docs (#662) 2024-07-19 10:58:03 -07:00
zhyncs
630479c3a6 feat: update check env (#661) 2024-07-19 09:54:15 -07:00
Ying Sheng
51fda1439f Update Readme (#660)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-07-19 09:54:01 -07:00
zhyncs
dc4e4a6acc misc: update SGLang package description (#659) 2024-07-19 09:27:39 -07:00
Ying Sheng
2d96da813e refactor model loader [unreachable code]: initial refactor (#655) 2024-07-19 09:27:06 -07:00
zhyncs
c126a6ccba feat: add benchmark serving (#657) 2024-07-19 09:15:21 -07:00
zhyncs
ac971ff633 perf: reduce ttft and itl with stream_interval 1 (#658) 2024-07-19 09:14:22 -07:00
Lianmin Zheng
e1792cca24 Remove cached triton launcher (#656) 2024-07-18 23:28:40 -07:00
shrirajh
1b7adbb5a0 TokenizerManager.context_len should inherit from `server_args.conte… (#654) 2024-07-18 21:55:29 -07:00
Liangsheng Yin
a9ef49c12c Detokenize incrementally when streaming (#653) 2024-07-18 17:57:40 -07:00
Ying Sheng
21ba3a88a1 Remove useless variables in infer_batch.py (#651) 2024-07-18 05:31:44 -07:00
zhyncs
9c5cac2450 fix: resolve lint error (#650) 2024-07-18 03:33:21 -07:00
zhyncs
b050d9283f fix: set ulimit -n 65535 (#647) 2024-07-18 02:35:45 -07:00
zhyncs
6a4dc99697 misc: rm rpyc from PACKAGE_LIST (#649) 2024-07-18 02:35:38 -07:00
Mingyi
d774acad5c Remove the dependency of rpyc (#646) 2024-07-18 02:13:54 -07:00
zhyncs
d93388da3e feat: add check_env (#645) 2024-07-17 21:39:28 -07:00
Ying Sheng
476584cb6e Increase the capacity of the memory pool (#643) 2024-07-17 15:44:41 -07:00
Liangsheng Yin
abd5385ac5 Move global_server_args_dict (#642) 2024-07-17 13:49:15 -07:00
Liangsheng Yin
3de2f30a27 Flashinfer sample kernel (#617) 2024-07-17 13:24:43 -07:00
zhyncs
2e341cd493 misc: add pre-commit config (#637) 2024-07-17 11:55:39 -07:00
zhyncs
a8552cb18b feat: support internlm2 (#636) 2024-07-16 22:40:03 -07:00
Ying Sheng
a470e60c97 clean up step function (#635) 2024-07-16 20:15:24 -07:00
Liangsheng Yin
5ff60eda78 Fix vertexai (#633) 2024-07-16 16:07:19 -07:00
Aidan Cooper
c193002297 Add support for VertexAI safety settings (#624) 2024-07-16 11:54:42 -07:00
ylying
fe3be1595d Add qwen2 tie word embedding (#630) 2024-07-16 11:48:49 -07:00
Ying Sheng
0aa189f150 Disable NCCL_NVLS by default (#631) 2024-07-16 09:05:10 -07:00
Liangsheng Yin
c9ee3d3559 Fix model forward grad (#628) 2024-07-15 22:09:09 -07:00
Lianmin Zheng
41d1f67704 Fix flush cache (#627) 2024-07-15 20:44:04 -07:00
Ying Sheng
56f5fc4ab5 Bump version to 0.1.21 (#626) 2024-07-15 13:10:53 -07:00
Ying Sheng
6a2941f4d0 Improve tensor parallel performance (#625)
Co-authored-by: Mingyi <wisclmy0611@gmail.com>
2024-07-15 07:10:51 -07:00
Mingyi
5ac8b80677 Simplify mem state (#623) 2024-07-15 02:01:09 -07:00
Liangsheng Yin
a56858ba67 Unify index operations (#620) 2024-07-14 12:55:55 -07:00
Liangsheng Yin
564a898ad9 Optimize mem indices mangement (#619) 2024-07-13 23:39:37 -07:00
Lianmin Zheng
5d264a90ac Bump version to 0.1.20 (#618) 2024-07-13 17:27:55 -07:00
Ying Sheng
5949b1ca0e Fix memory pool index error (#616) 2024-07-13 16:45:11 -07:00
Lianmin Zheng
0feca02dd9 Improve benchmark scripts (#615) 2024-07-13 15:59:04 -07:00
Liangsheng Yin
10143e1a5f Memorypool chunked prefetch (#614) 2024-07-13 15:24:03 -07:00
Lianmin Zheng
65c6577696 Improve benchmark scripts & fix llava (#613) 2024-07-13 15:00:26 -07:00
Lianmin Zheng
665815969a Enable cuda graph by default (#612) 2024-07-13 05:29:46 -07:00
Lianmin Zheng
396a69240f Cleanup attention backend: flashinfer and triton (#611) 2024-07-12 18:21:11 -07:00
Lianmin Zheng
af4e7910e7 Clean up the usage of flashinfer (#610) 2024-07-12 13:00:03 -07:00
Lianmin Zheng
519e20cfda Code clean up: Remove deprecated prefill move InputMetadata to infer_batch.py (#609) 2024-07-12 12:28:09 -07:00
Lianmin Zheng
d9a6902986 Fix bench latency (#607) 2024-07-11 14:37:01 -07:00
Lianmin Zheng
ad872feb14 bump version to 0.1.19 2024-07-09 02:23:14 -07:00
Lianmin Zheng
da2e5d6546 Fix the default argument of OpenAI Chat completion (#605) 2024-07-09 02:04:43 -07:00
胡译文
02b7258658 [Feat] Expose logprob options to sgl.gen API (#503)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-07-09 00:35:39 -07:00