Ke Bao
|
5303c1ed22
|
Support Mistral-Nemo (#691)
|
2024-07-22 03:36:53 +10:00 |
|
zhyncs
|
65bd13386b
|
misc: recommend to use chat model for benchmark (#690)
|
2024-07-22 00:13:33 +10:00 |
|
Liangsheng Yin
|
eedc12e12e
|
Support Deepseek MoE Model (#689)
|
2024-07-21 03:09:29 -07:00 |
|
zhyncs
|
6a846bb1fd
|
misc: update output file logic (#686)
|
2024-07-21 18:07:30 +10:00 |
|
zhyncs
|
0fdb3127a1
|
feat: update bench serving (#685)
|
2024-07-21 16:46:58 +10:00 |
|
Max Shawabkeh
|
5ad033a070
|
Fix StreamExecutor.fork() losing the current role start index. (#684)
|
2024-07-20 23:32:11 -07:00 |
|
Lianmin Zheng
|
77e592e8e0
|
support non-streaming benchmark (#682)
|
2024-07-20 18:36:42 -07:00 |
|
Liangsheng Yin
|
caaad53b52
|
Support gpt-bigcode model class (#681)
|
2024-07-20 18:34:37 -07:00 |
|
Liangsheng Yin
|
69d19188fc
|
Decouple kv (#679)
|
2024-07-20 14:16:45 -07:00 |
|
zhyncs
|
4b4a67f814
|
feat: support TRT LLM benchmark and multiple benchmarks (#670)
|
2024-07-20 11:05:35 -07:00 |
|
Ke Bao
|
0ac94c36cb
|
Fallback when sampling failed (#678)
|
2024-07-20 10:44:54 -07:00 |
|
Ying Sheng
|
2b4c646277
|
Update version to 0.1.22 (#677)
|
2024-07-20 03:39:50 -07:00 |
|
Liangsheng Yin
|
f424e76d96
|
Fix illegal tokens during sampling (#676)
|
2024-07-20 03:11:15 -07:00 |
|
Lianmin Zheng
|
490a1f39dd
|
Fix cuda graph with flashinfer (#675)
|
2024-07-20 02:43:55 -07:00 |
|
Ying Sheng
|
06487f126e
|
refactor model loader: initial refactor (#664)
|
2024-07-20 02:18:22 -07:00 |
|
Liangsheng Yin
|
39c57317e1
|
Revert "Temporary fix invalid sample results" (#673)
|
2024-07-20 02:06:31 -07:00 |
|
Lianmin Zheng
|
9592a1f3bd
|
Fix random dataset (#671)
|
2024-07-20 01:57:43 -07:00 |
|
Lianmin Zheng
|
35759efa91
|
Support random dataset in bench_serving.py (#669)
|
2024-07-20 01:06:43 -07:00 |
|
Liangsheng Yin
|
8f4b1559e7
|
Temporary fix invalid sample results (#668)
|
2024-07-20 00:51:05 -07:00 |
|
Mingyi
|
e3046ea3a8
|
Update OpenAI API (#667)
|
2024-07-19 23:20:54 -07:00 |
|
yichuan~
|
49c5e0eca9
|
Add support for OpenAI API parallel sampling (#640)
|
2024-07-19 23:10:01 -07:00 |
|
Ke Bao
|
ec2150b294
|
Fix kill process util (#666)
|
2024-07-19 21:43:11 -07:00 |
|
Liangsheng Yin
|
7620cd37dd
|
Fix jump forward when streaming (#665)
|
2024-07-19 16:42:06 -07:00 |
|
Ying Sheng
|
11c8efff73
|
Add benchmark instructions (#663)
|
2024-07-19 11:12:23 -07:00 |
|
Ying Sheng
|
e87c7fd501
|
Improve docs (#662)
|
2024-07-19 10:58:03 -07:00 |
|
zhyncs
|
630479c3a6
|
feat: update check env (#661)
|
2024-07-19 09:54:15 -07:00 |
|
Ying Sheng
|
51fda1439f
|
Update Readme (#660)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-07-19 09:54:01 -07:00 |
|
zhyncs
|
dc4e4a6acc
|
misc: update SGLang package description (#659)
|
2024-07-19 09:27:39 -07:00 |
|
Ying Sheng
|
2d96da813e
|
refactor model loader [unreachable code]: initial refactor (#655)
|
2024-07-19 09:27:06 -07:00 |
|
zhyncs
|
c126a6ccba
|
feat: add benchmark serving (#657)
|
2024-07-19 09:15:21 -07:00 |
|
zhyncs
|
ac971ff633
|
perf: reduce ttft and itl with stream_interval 1 (#658)
|
2024-07-19 09:14:22 -07:00 |
|
Lianmin Zheng
|
e1792cca24
|
Remove cached triton launcher (#656)
|
2024-07-18 23:28:40 -07:00 |
|
shrirajh
|
1b7adbb5a0
|
TokenizerManager.context_len should inherit from `server_args.conte… (#654)
|
2024-07-18 21:55:29 -07:00 |
|
Liangsheng Yin
|
a9ef49c12c
|
Detokenize incrementally when streaming (#653)
|
2024-07-18 17:57:40 -07:00 |
|
Ying Sheng
|
21ba3a88a1
|
Remove useless variables in infer_batch.py (#651)
|
2024-07-18 05:31:44 -07:00 |
|
zhyncs
|
9c5cac2450
|
fix: resolve lint error (#650)
|
2024-07-18 03:33:21 -07:00 |
|
zhyncs
|
b050d9283f
|
fix: set ulimit -n 65535 (#647)
|
2024-07-18 02:35:45 -07:00 |
|
zhyncs
|
6a4dc99697
|
misc: rm rpyc from PACKAGE_LIST (#649)
|
2024-07-18 02:35:38 -07:00 |
|
Mingyi
|
d774acad5c
|
Remove the dependency of rpyc (#646)
|
2024-07-18 02:13:54 -07:00 |
|
zhyncs
|
d93388da3e
|
feat: add check_env (#645)
|
2024-07-17 21:39:28 -07:00 |
|
Ying Sheng
|
476584cb6e
|
Increase the capacity of the memory pool (#643)
|
2024-07-17 15:44:41 -07:00 |
|
Liangsheng Yin
|
abd5385ac5
|
Move global_server_args_dict (#642)
|
2024-07-17 13:49:15 -07:00 |
|
Liangsheng Yin
|
3de2f30a27
|
Flashinfer sample kernel (#617)
|
2024-07-17 13:24:43 -07:00 |
|
zhyncs
|
2e341cd493
|
misc: add pre-commit config (#637)
|
2024-07-17 11:55:39 -07:00 |
|
zhyncs
|
a8552cb18b
|
feat: support internlm2 (#636)
|
2024-07-16 22:40:03 -07:00 |
|
Ying Sheng
|
a470e60c97
|
clean up step function (#635)
|
2024-07-16 20:15:24 -07:00 |
|
Liangsheng Yin
|
5ff60eda78
|
Fix vertexai (#633)
|
2024-07-16 16:07:19 -07:00 |
|
Aidan Cooper
|
c193002297
|
Add support for VertexAI safety settings (#624)
|
2024-07-16 11:54:42 -07:00 |
|
ylying
|
fe3be1595d
|
Add qwen2 tie word embedding (#630)
|
2024-07-16 11:48:49 -07:00 |
|
Ying Sheng
|
0aa189f150
|
Disable NCCL_NVLS by default (#631)
|
2024-07-16 09:05:10 -07:00 |
|