Liangsheng Yin
|
8f4b1559e7
|
Temporary fix invalid sample results (#668)
|
2024-07-20 00:51:05 -07:00 |
|
Mingyi
|
e3046ea3a8
|
Update OpenAI API (#667)
|
2024-07-19 23:20:54 -07:00 |
|
yichuan~
|
49c5e0eca9
|
Add support for OpenAI API parallel sampling (#640)
|
2024-07-19 23:10:01 -07:00 |
|
Ke Bao
|
ec2150b294
|
Fix kill process util (#666)
|
2024-07-19 21:43:11 -07:00 |
|
Liangsheng Yin
|
7620cd37dd
|
Fix jump forward when streaming (#665)
|
2024-07-19 16:42:06 -07:00 |
|
Ying Sheng
|
11c8efff73
|
Add benchmark instructions (#663)
|
2024-07-19 11:12:23 -07:00 |
|
Ying Sheng
|
e87c7fd501
|
Improve docs (#662)
|
2024-07-19 10:58:03 -07:00 |
|
zhyncs
|
630479c3a6
|
feat: update check env (#661)
|
2024-07-19 09:54:15 -07:00 |
|
Ying Sheng
|
51fda1439f
|
Update Readme (#660)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-07-19 09:54:01 -07:00 |
|
zhyncs
|
dc4e4a6acc
|
misc: update SGLang package description (#659)
|
2024-07-19 09:27:39 -07:00 |
|
Ying Sheng
|
2d96da813e
|
refactor model loader [unreachable code]: initial refactor (#655)
|
2024-07-19 09:27:06 -07:00 |
|
zhyncs
|
c126a6ccba
|
feat: add benchmark serving (#657)
|
2024-07-19 09:15:21 -07:00 |
|
zhyncs
|
ac971ff633
|
perf: reduce ttft and itl with stream_interval 1 (#658)
|
2024-07-19 09:14:22 -07:00 |
|
Lianmin Zheng
|
e1792cca24
|
Remove cached triton launcher (#656)
|
2024-07-18 23:28:40 -07:00 |
|
shrirajh
|
1b7adbb5a0
|
TokenizerManager.context_len should inherit from `server_args.conte… (#654)
|
2024-07-18 21:55:29 -07:00 |
|
Liangsheng Yin
|
a9ef49c12c
|
Detokenize incrementally when streaming (#653)
|
2024-07-18 17:57:40 -07:00 |
|
Ying Sheng
|
21ba3a88a1
|
Remove useless variables in infer_batch.py (#651)
|
2024-07-18 05:31:44 -07:00 |
|
zhyncs
|
9c5cac2450
|
fix: resolve lint error (#650)
|
2024-07-18 03:33:21 -07:00 |
|
zhyncs
|
b050d9283f
|
fix: set ulimit -n 65535 (#647)
|
2024-07-18 02:35:45 -07:00 |
|
zhyncs
|
6a4dc99697
|
misc: rm rpyc from PACKAGE_LIST (#649)
|
2024-07-18 02:35:38 -07:00 |
|
Mingyi
|
d774acad5c
|
Remove the dependency of rpyc (#646)
|
2024-07-18 02:13:54 -07:00 |
|
zhyncs
|
d93388da3e
|
feat: add check_env (#645)
|
2024-07-17 21:39:28 -07:00 |
|
Ying Sheng
|
476584cb6e
|
Increase the capacity of the memory pool (#643)
|
2024-07-17 15:44:41 -07:00 |
|
Liangsheng Yin
|
abd5385ac5
|
Move global_server_args_dict (#642)
|
2024-07-17 13:49:15 -07:00 |
|
Liangsheng Yin
|
3de2f30a27
|
Flashinfer sample kernel (#617)
|
2024-07-17 13:24:43 -07:00 |
|
zhyncs
|
2e341cd493
|
misc: add pre-commit config (#637)
|
2024-07-17 11:55:39 -07:00 |
|
zhyncs
|
a8552cb18b
|
feat: support internlm2 (#636)
|
2024-07-16 22:40:03 -07:00 |
|
Ying Sheng
|
a470e60c97
|
clean up step function (#635)
|
2024-07-16 20:15:24 -07:00 |
|
Liangsheng Yin
|
5ff60eda78
|
Fix vertexai (#633)
|
2024-07-16 16:07:19 -07:00 |
|
Aidan Cooper
|
c193002297
|
Add support for VertexAI safety settings (#624)
|
2024-07-16 11:54:42 -07:00 |
|
ylying
|
fe3be1595d
|
Add qwen2 tie word embedding (#630)
|
2024-07-16 11:48:49 -07:00 |
|
Ying Sheng
|
0aa189f150
|
Disable NCCL_NVLS by default (#631)
|
2024-07-16 09:05:10 -07:00 |
|
Liangsheng Yin
|
c9ee3d3559
|
Fix model forward grad (#628)
|
2024-07-15 22:09:09 -07:00 |
|
Lianmin Zheng
|
41d1f67704
|
Fix flush cache (#627)
|
2024-07-15 20:44:04 -07:00 |
|
Ying Sheng
|
56f5fc4ab5
|
Bump version to 0.1.21 (#626)
|
2024-07-15 13:10:53 -07:00 |
|
Ying Sheng
|
6a2941f4d0
|
Improve tensor parallel performance (#625)
Co-authored-by: Mingyi <wisclmy0611@gmail.com>
|
2024-07-15 07:10:51 -07:00 |
|
Mingyi
|
5ac8b80677
|
Simplify mem state (#623)
|
2024-07-15 02:01:09 -07:00 |
|
Liangsheng Yin
|
a56858ba67
|
Unify index operations (#620)
|
2024-07-14 12:55:55 -07:00 |
|
Liangsheng Yin
|
564a898ad9
|
Optimize mem indices mangement (#619)
|
2024-07-13 23:39:37 -07:00 |
|
Lianmin Zheng
|
5d264a90ac
|
Bump version to 0.1.20 (#618)
|
2024-07-13 17:27:55 -07:00 |
|
Ying Sheng
|
5949b1ca0e
|
Fix memory pool index error (#616)
|
2024-07-13 16:45:11 -07:00 |
|
Lianmin Zheng
|
0feca02dd9
|
Improve benchmark scripts (#615)
|
2024-07-13 15:59:04 -07:00 |
|
Liangsheng Yin
|
10143e1a5f
|
Memorypool chunked prefetch (#614)
|
2024-07-13 15:24:03 -07:00 |
|
Lianmin Zheng
|
65c6577696
|
Improve benchmark scripts & fix llava (#613)
|
2024-07-13 15:00:26 -07:00 |
|
Lianmin Zheng
|
665815969a
|
Enable cuda graph by default (#612)
|
2024-07-13 05:29:46 -07:00 |
|
Lianmin Zheng
|
396a69240f
|
Cleanup attention backend: flashinfer and triton (#611)
|
2024-07-12 18:21:11 -07:00 |
|
Lianmin Zheng
|
af4e7910e7
|
Clean up the usage of flashinfer (#610)
|
2024-07-12 13:00:03 -07:00 |
|
Lianmin Zheng
|
519e20cfda
|
Code clean up: Remove deprecated prefill move InputMetadata to infer_batch.py (#609)
|
2024-07-12 12:28:09 -07:00 |
|
Lianmin Zheng
|
d9a6902986
|
Fix bench latency (#607)
|
2024-07-11 14:37:01 -07:00 |
|
Lianmin Zheng
|
ad872feb14
|
bump version to 0.1.19
|
2024-07-09 02:23:14 -07:00 |
|