Cody Yu
|
26c3494152
|
[Submodule] Change FlashInfer to import (#156)
|
2024-02-06 19:28:29 -08:00 |
|
Lianmin Zheng
|
23f05005fd
|
Format code & move functions (#155)
|
2024-02-06 13:27:46 -08:00 |
|
Cody Yu
|
a7334aeea1
|
Support decode token logprobs (#130)
|
2024-02-06 12:24:55 -08:00 |
|
Arcmoon
|
3ae78a09b3
|
Add gptq quantization model support (#141)
|
2024-02-06 11:35:04 -08:00 |
|
Cody Yu
|
ccbe1e67d8
|
Temporary fix OpenAI API for Pydantic v1/v2 (#153)
|
2024-02-06 11:34:15 -08:00 |
|
LiviaSun
|
e2bf732bc3
|
add openai error handler with retry and logger (#148)
|
2024-02-05 20:38:41 -08:00 |
|
Cody Yu
|
322421fae3
|
Add warmup to SRT server (#146)
|
2024-02-05 14:21:16 -08:00 |
|
Liangsheng Yin
|
26f0bedc8f
|
jump-forward rename (#144)
|
2024-02-05 16:50:37 +08:00 |
|
Yaya Sy
|
82fa69b3cc
|
fix undfined variable (#142)
|
2024-02-04 14:27:52 -08:00 |
|
Liangsheng Yin
|
bb3a3b6675
|
Support Faster JSON decoding for llava (#137)
When sending fast-forwarded reqs to model_rpc, re-calculate `pad_input_ids`
|
2024-02-03 23:32:05 +08:00 |
|
Ying Sheng
|
45d6592d40
|
Fix no-cache mode (#136)
|
2024-02-03 04:59:06 -08:00 |
|
Ying Sheng
|
f6bfe3aaff
|
Release 0.1.11 (#134)
|
2024-02-03 02:50:13 -08:00 |
|
Ying Sheng
|
e095b16236
|
Add max_prefill_num_token into server arguments (#133)
|
2024-02-03 02:35:54 -08:00 |
|
Ying Sheng
|
67be11c790
|
fix bug of race condition in copy()
|
2024-02-03 01:38:00 -08:00 |
|
Liangsheng Yin
|
cd8c3ccd95
|
Fix is_multimodal_model judge (#132)
|
2024-02-03 11:48:01 +08:00 |
|
Christopher Chou
|
864425300f
|
Yi-VL Model (#112)
|
2024-02-01 08:33:22 -08:00 |
|
Lianmin Zheng
|
c7af9f7393
|
Fix a bug in llava-hd
|
2024-01-31 18:52:15 +00:00 |
|
Lianmin Zheng
|
ad82bac6f5
|
Fix model loading & format code (#125)
|
2024-01-30 23:49:52 -08:00 |
|
Cody Yu
|
71b54eea7d
|
Add cache metrics (#119)
|
2024-01-30 22:13:14 -08:00 |
|
Lianmin Zheng
|
74b3bfaaf8
|
format code
|
2024-01-30 16:36:10 +00:00 |
|
Jay Zhou
|
4a634cf646
|
[Feature] Allow specifying all ports to use in advance (#116)
|
2024-01-30 08:34:51 -08:00 |
|
Lianmin Zheng
|
a49dc52bfa
|
release v0.1.10
|
2024-01-30 15:37:52 +00:00 |
|
Lianmin Zheng
|
873d0e8537
|
Ignore detokenization error
|
2024-01-30 14:52:06 +00:00 |
|
Keith Stevens
|
1d0fbe8e43
|
[Feature] Adds basic support for image content in OpenAI chat routes (#113)
|
2024-01-30 06:12:33 -08:00 |
|
Lianmin Zheng
|
97aa9b3284
|
Improve docs & Add JSON decode example (#121)
|
2024-01-30 05:45:27 -08:00 |
|
Lianmin Zheng
|
0617528632
|
Update quick start examples (#120)
|
2024-01-30 04:29:32 -08:00 |
|
Lianmin Zheng
|
4ea92f8307
|
Format code (#118)
|
2024-01-29 17:08:12 -08:00 |
|
Junyang Lin
|
6b0af2853c
|
Add qwen2 (#114)
|
2024-01-29 17:06:02 -08:00 |
|
Lianmin Zheng
|
6f560c761b
|
Improve the control of streaming and improve the first token latency in streaming (#117)
|
2024-01-29 17:05:42 -08:00 |
|
Cody Yu
|
cd6872334e
|
Fix Mistral model loading (#108)
Co-authored-by: johndun <dunavent.jm@gmail.com>
|
2024-01-26 09:38:43 -08:00 |
|
Liangsheng Yin
|
81561f8e2d
|
Flush Cache API (#103)
|
2024-01-25 21:32:59 -08:00 |
|
Cody Yu
|
3a581e9949
|
Dynamic model class loading (#101)
|
2024-01-25 15:29:07 -08:00 |
|
shiyi.c_98
|
0147f940dd
|
fix batch error for llava-hd (#98)
|
2024-01-25 07:56:25 -08:00 |
|
parasol-aser
|
23950056f0
|
support speculative execution for openai API (#48)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-01-25 01:57:06 -08:00 |
|
isaac-vidas
|
0c457bae8f
|
Handle grayscale images in expand2square (#97)
|
2024-01-24 16:23:11 -08:00 |
|
Haotian Liu
|
d3fc86a43e
|
Improve Chinese character streaming when the last char is half Chinese word. (#95)
|
2024-01-24 12:23:27 -08:00 |
|
Liangsheng Yin
|
01ee0fbc05
|
fast regex decode
Auto-detect constant str path in regex FSM, then extend instead.
|
2024-01-25 01:16:25 +08:00 |
|
Lianmin Zheng
|
6dceab4d17
|
bump version to 0.1.9
|
2024-01-24 11:37:25 +00:00 |
|
Lianmin Zheng
|
c70b3cfa9e
|
Bump the version to v0.1.8 (#93)
|
2024-01-24 03:33:34 -08:00 |
|
Ying Sheng
|
489796c7ea
|
minor performance fix
|
2024-01-24 10:45:44 +00:00 |
|
Lianmin Zheng
|
fa7a696d04
|
Fix max_new_tokens for limited memory
|
2024-01-24 10:44:32 +00:00 |
|
Lianmin Zheng
|
bef0b35902
|
Fix llava & Fix multiprocessing
|
2024-01-24 10:35:31 +00:00 |
|
shiyi.c_98
|
c6576e820c
|
Llava-hd Support (#92)
Co-authored-by: Haotian Liu <liuhaotian.cn@gmail.com>
|
2024-01-24 01:51:21 -08:00 |
|
Lianmin Zheng
|
99258181c6
|
set start method to spawn
|
2024-01-24 08:55:38 +00:00 |
|
isaac-vidas
|
3de54a1b55
|
Add health endpoint to SGLang runtime server (#90)
|
2024-01-23 19:00:28 -08:00 |
|
Lianmin Zheng
|
7358fa64f7
|
Fix a bug in runtime backend
|
2024-01-23 22:10:17 +00:00 |
|
Lianmin Zheng
|
9a16fea012
|
Return logprob for choices (#87)
|
2024-01-23 05:07:30 -08:00 |
|
Lianmin Zheng
|
959c4174b2
|
Fix the chat template for QWen (#83)
|
2024-01-22 21:46:47 -08:00 |
|
Lianmin Zheng
|
94e05770db
|
Fix after QWen support (#82)
|
2024-01-22 21:17:05 -08:00 |
|
Arcmoon
|
63e97e5e4c
|
Suppport qwen model and solve some problems (#75)
|
2024-01-22 20:14:51 -08:00 |
|