Liangsheng Yin
|
679ebcbbdc
|
Deepseek v2 support (#693)
|
2024-07-26 17:10:07 -07:00 |
|
Liangsheng Yin
|
268684439b
|
Use min new token ratio at start (#701)
|
2024-07-23 11:52:50 -07:00 |
|
Ying Sheng
|
c3f1aac811
|
Tune params (#696)
|
2024-07-22 03:19:24 -07:00 |
|
Liangsheng Yin
|
caaad53b52
|
Support gpt-bigcode model class (#681)
|
2024-07-20 18:34:37 -07:00 |
|
Ying Sheng
|
06487f126e
|
refactor model loader: initial refactor (#664)
|
2024-07-20 02:18:22 -07:00 |
|
Ying Sheng
|
51fda1439f
|
Update Readme (#660)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-07-19 09:54:01 -07:00 |
|
zhyncs
|
ac971ff633
|
perf: reduce ttft and itl with stream_interval 1 (#658)
|
2024-07-19 09:14:22 -07:00 |
|
Mingyi
|
d774acad5c
|
Remove the dependency of rpyc (#646)
|
2024-07-18 02:13:54 -07:00 |
|
Ying Sheng
|
6a2941f4d0
|
Improve tensor parallel performance (#625)
Co-authored-by: Mingyi <wisclmy0611@gmail.com>
|
2024-07-15 07:10:51 -07:00 |
|
Lianmin Zheng
|
665815969a
|
Enable cuda graph by default (#612)
|
2024-07-13 05:29:46 -07:00 |
|
Lianmin Zheng
|
af4e7910e7
|
Clean up the usage of flashinfer (#610)
|
2024-07-12 13:00:03 -07:00 |
|
Liangsheng Yin
|
5304b4ef58
|
Add --enable-p2p-check option (#599)
|
2024-07-06 23:34:10 -07:00 |
|
Ying Sheng
|
dc1b8bcfaa
|
Format (#593)
|
2024-07-05 10:06:17 -07:00 |
|
Lianmin Zheng
|
63fbef9876
|
fix flashinfer & http log level
|
2024-07-03 23:19:33 -07:00 |
|
Lianmin Zheng
|
c7709d3abe
|
Update install commands (#583)
|
2024-07-03 02:10:59 -07:00 |
|
Ying Sheng
|
9380f50ff9
|
Turn on flashinfer by default (#578)
|
2024-07-02 02:25:07 -07:00 |
|
Lianmin Zheng
|
badf3fa020
|
Expose dtype argument (#569)
|
2024-06-27 23:30:39 -07:00 |
|
Lianmin Zheng
|
2187f36237
|
Add a new arguments log_level_http to control the HTTP logging (#563)
|
2024-06-25 01:16:20 -07:00 |
|
Ying Sheng
|
09593e9bc9
|
Multi-node Tensor Parallelism (#550)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2024-06-17 20:41:24 -07:00 |
|
Lianmin Zheng
|
159cc741e4
|
Make the server random by default (#493)
|
2024-05-31 23:33:34 -07:00 |
|
Ying Sheng
|
83525a1df2
|
Revert "Make the server random by default" (#492)
|
2024-05-31 12:00:21 -07:00 |
|
Lianmin Zheng
|
80a33ce8b0
|
Do not set the default value of global random seed (#488)
|
2024-05-29 18:41:18 -04:00 |
|
Ying Sheng
|
0463f7fb52
|
Support data parallelism (static) (#480)
Co-authored-by: Ying Sheng <ying.sheng@databricks.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2024-05-27 21:24:10 -07:00 |
|
Lianmin Zheng
|
55c1643627
|
Improve benchmark scripts & rename some scripts (#477)
|
2024-05-26 12:51:45 -07:00 |
|
Lianmin Zheng
|
0fafc5606b
|
port fp8 mixtral (#460)
|
2024-05-21 11:46:35 -07:00 |
|
Yuanhan Zhang
|
0992d85f92
|
support llava video (#426)
|
2024-05-13 16:57:00 -07:00 |
|
Liangsheng Yin
|
39191c8515
|
Cache optimizations (#418)
|
2024-05-13 12:47:13 +08:00 |
|
Lianmin Zheng
|
3fc97f6709
|
Move openai api server into a separate file (#429)
|
2024-05-12 06:41:32 -07:00 |
|
Lianmin Zheng
|
aee4f523cf
|
Fix logit processor bugs (#427)
|
2024-05-12 04:54:07 -07:00 |
|
Lianmin Zheng
|
7023f413c6
|
Clean up (#422)
|
2024-05-11 20:55:00 -07:00 |
|
Liangsheng Yin
|
62b3812b69
|
Time cost utils (#355)
|
2024-04-09 23:27:31 +08:00 |
|
Alessio Dalla Piazza
|
d5ae2ebaa2
|
Add Support for API Key Authentication (#230)
|
2024-03-11 05:16:10 -07:00 |
|
Liangsheng Yin
|
1b35547927
|
Organize server_args (#277)
|
2024-03-11 20:06:52 +08:00 |
|
Lianmin Zheng
|
faba293a0d
|
Improve gemma and documentations (#278)
|
2024-03-11 04:43:39 -07:00 |
|
Liangsheng Yin
|
89885b31ef
|
Gemma Support (#256)
|
2024-03-11 12:14:27 +08:00 |
|
psych0v0yager
|
9de9a46815
|
Added the ability to Modify the Context Length (#210)
|
2024-02-20 16:22:56 -08:00 |
|
Liangsheng Yin
|
b1a3a454ee
|
add --disable-disk-cache (#160)
Co-authored-by: Ja1Zhou <50169346+Ja1Zhou@users.noreply.github.com>
|
2024-02-08 00:50:12 +08:00 |
|
Lianmin Zheng
|
23f05005fd
|
Format code & move functions (#155)
|
2024-02-06 13:27:46 -08:00 |
|
Liangsheng Yin
|
26f0bedc8f
|
jump-forward rename (#144)
|
2024-02-05 16:50:37 +08:00 |
|
Ying Sheng
|
e095b16236
|
Add max_prefill_num_token into server arguments (#133)
|
2024-02-03 02:35:54 -08:00 |
|
Jay Zhou
|
4a634cf646
|
[Feature] Allow specifying all ports to use in advance (#116)
|
2024-01-30 08:34:51 -08:00 |
|
Lianmin Zheng
|
6f560c761b
|
Improve the control of streaming and improve the first token latency in streaming (#117)
|
2024-01-29 17:05:42 -08:00 |
|
Liangsheng Yin
|
01ee0fbc05
|
fast regex decode
Auto-detect constant str path in regex FSM, then extend instead.
|
2024-01-25 01:16:25 +08:00 |
|
Liangsheng Yin
|
40ab1f0129
|
Fix the possible bug of decode out of memory (#36)
|
2024-01-19 11:01:15 -08:00 |
|
Cody Yu
|
23471f9aa3
|
Support v1/chat/completions (#50)
|
2024-01-18 23:43:09 -08:00 |
|
Lianmin Zheng
|
22ec7bc2a1
|
Expose more arguments to control the scheduling policy (#32)
|
2024-01-17 18:37:02 -08:00 |
|
Lianmin Zheng
|
8024fc5eec
|
Fix streaming (#30)
|
2024-01-17 16:38:20 -08:00 |
|
Lianmin Zheng
|
f9d723816a
|
Teak mem fraction (#20)
|
2024-01-17 04:43:17 -08:00 |
|
Lianmin Zheng
|
bf51ddc6e5
|
Improve docs & Rename Gemini -> VertexAI (#19)
|
2024-01-17 02:54:41 -08:00 |
|
Lianmin Zheng
|
70359bf31a
|
Update benchmark scripts (#8)
|
2024-01-15 16:12:57 -08:00 |
|