Yineng Zhang
|
5bd06b4599
|
fix: use REPO_TOKEN (#755)
|
2024-07-27 05:56:30 +10:00 |
|
Yineng Zhang
|
9a61182732
|
fix: add release tag workflow (#754)
|
2024-07-27 05:48:38 +10:00 |
|
Yineng Zhang
|
eeb2482186
|
feat: add release tag workflow (#753)
|
2024-07-27 05:37:02 +10:00 |
|
Yineng Zhang
|
8628ab9c8b
|
feat: add docker workflow (#751)
|
2024-07-27 03:54:51 +10:00 |
|
Yineng Zhang
|
1b77670f39
|
chore: bump v0.2.1 (#740)
|
2024-07-26 21:27:41 +10:00 |
|
Yineng Zhang
|
768e05d08f
|
fix benchmark (#743)
Co-authored-by: hnyls2002 <hnyls2002@gmail.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-07-26 21:26:13 +10:00 |
|
Yineng Zhang
|
6b32bb1c0b
|
misc: format (#741)
|
2024-07-26 21:00:51 +10:00 |
|
Toshiki Kataoka
|
40facad5f1
|
feat: support token ids in /v1/completions (#736)
|
2024-07-26 02:53:17 -07:00 |
|
Toshiki Kataoka
|
da504445dc
|
fix /generate without sampling_params (#734)
|
2024-07-26 01:27:56 -07:00 |
|
Ying Sheng
|
252e0f7bbd
|
fix: small bug for llama-405b fp16 (#733)
|
2024-07-25 21:14:54 -07:00 |
|
Ying Sheng
|
1a491d00cb
|
Bump version to 0.2.0 (#730)
|
2024-07-25 08:03:36 -07:00 |
|
Ying Sheng
|
8fbba3de3d
|
Fix bugs (fp8 checkpoints, triton cache manager) (#729)
|
2024-07-25 07:42:00 -07:00 |
|
Ying Sheng
|
ae0f6130cb
|
Revert "fix: fp8 config" (#728)
|
2024-07-25 07:25:33 -07:00 |
|
Yineng Zhang
|
926ac01b64
|
fix: resolve the logo display issue on the PyPI page (#726)
|
2024-07-25 20:47:46 +10:00 |
|
Yineng Zhang
|
25c881a005
|
chore: bump v0.1.25 (#725)
|
2024-07-25 20:04:35 +10:00 |
|
Liangsheng Yin
|
04ec6ba2ac
|
Fix dockerfile and triton cache manager (#720)
|
2024-07-25 03:04:21 -07:00 |
|
Ying Sheng
|
d63f13c13b
|
fix: fp8 config (#723)
|
2024-07-25 02:01:56 -07:00 |
|
Ying Sheng
|
459abad261
|
Bump version to 0.1.24 (#718)
|
2024-07-24 15:55:01 -07:00 |
|
Ying Sheng
|
30d8e130e7
|
Improve benchmark scripts (#717)
|
2024-07-24 14:44:14 -07:00 |
|
Yineng Zhang
|
e17deb27b5
|
fix: llama 3.1 405b fp8 (#714)
|
2024-07-24 09:37:41 -07:00 |
|
Ying Sheng
|
83d2b30d75
|
format
|
2024-07-24 10:53:07 +00:00 |
|
Ying Sheng
|
4367f4bb8d
|
Fix prefill size (#711)
|
2024-07-24 03:41:15 -07:00 |
|
Lianmin Zheng
|
00e4baa728
|
Update schedule_heuristic.py
|
2024-07-24 01:22:30 -07:00 |
|
Liangsheng Yin
|
4cd64b8ee6
|
Auto adjust new ratio (#708)
|
2024-07-23 22:06:02 -07:00 |
|
Lianmin Zheng
|
01d66ae2e8
|
Fix multi-node deadlock (#709)
|
2024-07-23 21:53:36 -07:00 |
|
Mingyi
|
a523a3c13a
|
Reduce hardcoded logic of kernel usage (#707)
|
2024-07-23 16:42:21 -07:00 |
|
Ying Sheng
|
9f94728f5a
|
bump version to 0.1.23 (#706)
|
2024-07-23 13:53:19 -07:00 |
|
Ying Sheng
|
444a02441a
|
Update vllm version to support llama3.1 (#705)
|
2024-07-23 13:49:34 -07:00 |
|
zhyncs
|
fa7ccb3316
|
feat: add e2e latency (#704)
|
2024-07-24 05:51:10 +10:00 |
|
Liangsheng Yin
|
268684439b
|
Use min new token ratio at start (#701)
|
2024-07-23 11:52:50 -07:00 |
|
Ke Bao
|
824a77d04d
|
Fix hf config loading (#702)
|
2024-07-23 11:39:08 -07:00 |
|
Ying Sheng
|
cf99eab7d5
|
Fix flashinfer (#700)
|
2024-07-23 01:27:01 -07:00 |
|
zhyncs
|
9fdea29d05
|
misc: fix typo (#698)
|
2024-07-23 02:00:27 +10:00 |
|
Ying Sheng
|
df7c4c19b4
|
Fix trt benchmark (#697)
|
2024-07-22 23:32:41 +10:00 |
|
Ying Sheng
|
c3f1aac811
|
Tune params (#696)
|
2024-07-22 03:19:24 -07:00 |
|
zhyncs
|
d198791fe8
|
misc: update output token logic (#695)
|
2024-07-22 19:34:05 +10:00 |
|
zhyncs
|
c07526e46c
|
fix: update bench serving (#694)
|
2024-07-22 18:23:33 +10:00 |
|
Ke Bao
|
5303c1ed22
|
Support Mistral-Nemo (#691)
|
2024-07-22 03:36:53 +10:00 |
|
zhyncs
|
65bd13386b
|
misc: recommend to use chat model for benchmark (#690)
|
2024-07-22 00:13:33 +10:00 |
|
Liangsheng Yin
|
eedc12e12e
|
Support Deepseek MoE Model (#689)
|
2024-07-21 03:09:29 -07:00 |
|
zhyncs
|
6a846bb1fd
|
misc: update output file logic (#686)
|
2024-07-21 18:07:30 +10:00 |
|
zhyncs
|
0fdb3127a1
|
feat: update bench serving (#685)
|
2024-07-21 16:46:58 +10:00 |
|
Max Shawabkeh
|
5ad033a070
|
Fix StreamExecutor.fork() losing the current role start index. (#684)
|
2024-07-20 23:32:11 -07:00 |
|
Lianmin Zheng
|
77e592e8e0
|
support non-streaming benchmark (#682)
|
2024-07-20 18:36:42 -07:00 |
|
Liangsheng Yin
|
caaad53b52
|
Support gpt-bigcode model class (#681)
|
2024-07-20 18:34:37 -07:00 |
|
Liangsheng Yin
|
69d19188fc
|
Decouple kv (#679)
|
2024-07-20 14:16:45 -07:00 |
|
zhyncs
|
4b4a67f814
|
feat: support TRT LLM benchmark and multiple benchmarks (#670)
|
2024-07-20 11:05:35 -07:00 |
|
Ke Bao
|
0ac94c36cb
|
Fallback when sampling failed (#678)
|
2024-07-20 10:44:54 -07:00 |
|
Ying Sheng
|
2b4c646277
|
Update version to 0.1.22 (#677)
|
2024-07-20 03:39:50 -07:00 |
|
Liangsheng Yin
|
f424e76d96
|
Fix illegal tokens during sampling (#676)
|
2024-07-20 03:11:15 -07:00 |
|