Commit Graph

65 Commits

Author SHA1 Message Date
Yineng Zhang
948625799e docs: init readthedocs support (#783) 2024-07-28 16:50:31 +10:00
Lianmin Zheng
30db99b3d9 Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776) 2024-07-27 19:50:34 -07:00
Max Shawabkeh
5ad033a070 Fix StreamExecutor.fork() losing the current role start index. (#684) 2024-07-20 23:32:11 -07:00
Ying Sheng
51fda1439f Update Readme (#660)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-07-19 09:54:01 -07:00
Liangsheng Yin
564a898ad9 Optimize mem indices mangement (#619) 2024-07-13 23:39:37 -07:00
胡译文
02b7258658 [Feat] Expose logprob options to sgl.gen API (#503)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
2024-07-09 00:35:39 -07:00
prophe
d557e9f3b7 Update chat template for qwen and yi-1.5. (#530) 2024-07-08 23:55:44 -07:00
Mingyi
c0982ac553 Fix Llava model (#594) 2024-07-06 00:58:46 -07:00
Ying Sheng
dc1b8bcfaa Format (#593) 2024-07-05 10:06:17 -07:00
Ying Sheng
75b31a2a88 Update run_batch interface and max_prefill_tokens (#574) 2024-06-30 18:26:04 -07:00
sglang
11616fc6bd Minor fix in compiler & format (#545) 2024-06-29 23:42:14 -07:00
Ying Sheng
fb9296f0ed Higher priority for user input of max_prefill_tokens & format (#540) 2024-06-12 21:48:40 -07:00
胡译文
87260b7bfd Litellm Backend (#502) 2024-06-07 12:24:28 -07:00
Lianmin Zheng
ced77c6626 Rename api_num_spec_tokens -> num_api_spec_tokens (#458) 2024-05-20 18:44:23 -07:00
Lianmin Zheng
8dbdc018a3 Abort disconnected requests (#457) 2024-05-20 18:41:21 -07:00
Ying Sheng
3e684be7a3 Fix openai speculative execution (#456) 2024-05-20 17:01:13 -07:00
LiviaSun
ec380dfd30 openai chat speculative execution (#250)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-05-18 22:23:53 -07:00
Lianmin Zheng
8210ec60f4 Improve error handling & abort disconnected requests (#449) 2024-05-17 05:49:31 -07:00
Liangsheng Yin
690d162d97 Format code (#441) 2024-05-14 22:40:46 +08:00
Yuanhan Zhang
0992d85f92 support llava video (#426) 2024-05-13 16:57:00 -07:00
Lianmin Zheng
5dc55a5f02 Handle truncation errors (#436) 2024-05-13 15:56:00 -07:00
Lianmin Zheng
562b8857d8 Improve error handling (#433) 2024-05-12 20:49:04 -07:00
Qubitium
33b242df30 Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380)
Co-authored-by: ZX <zx@lbx.dev>
Co-authored-by: ZhouXingg <165115237+ZhouXingg@users.noreply.github.com>
2024-05-11 16:37:49 -07:00
Liangsheng Yin
d5de20a3ee Fix sync() when fork(1) (#412) 2024-05-08 15:15:18 +08:00
YoungJoong Noah Kim
4a1c6ae2ce Add Cohere Command R chat template (#411) 2024-05-07 15:18:15 +08:00
Joschka Braun
5c5aba5900 Adding RAG tracing & eval cookbook using Parea (#390) 2024-04-30 16:13:28 -07:00
Liangsheng Yin
150d7020ed Revert removing the unused imports (#385) 2024-04-23 22:36:33 +08:00
Liangsheng Yin
9acc6e3504 add .isort.cfg (#378) 2024-04-22 22:38:09 +08:00
Enrique Shockwave
cf9d8efdd3 llama3 instruct template (#372) 2024-04-21 09:40:12 -07:00
Liangsheng Yin
1bf1cf1953 Reduce overhead when fork(1) (#375) 2024-04-21 17:25:14 +08:00
SimoneRaponi
ff99c38a07 Add timeout to get_meta_info (#346)
Co-authored-by: simone <simone.raponi@equixely.com>
2024-04-03 22:22:06 +08:00
Junlong Li
cb389c91bc Fix llava parallelism/fork bug (#315) 2024-03-28 19:24:54 -07:00
Liangsheng Yin
2af565b3bb [model] DBRX-instruct support (#337) 2024-03-28 10:05:19 -07:00
Liangsheng Yin
3842eba5fa Logprobs Refractor (#331) 2024-03-28 14:34:49 +08:00
Jani Monoses
e57f079275 Use Anthropic messages API (#304) 2024-03-22 13:23:31 -07:00
Liangsheng Yin
89885b31ef Gemma Support (#256) 2024-03-11 12:14:27 +08:00
Lin Tianchuan
30d67b2bca Add set_var to interpreter.py (#263) 2024-03-07 23:20:11 +08:00
Xinwei Xiong
b0b722ee8e Refactor ChatTemplate for Enhanced Clarity and Efficiency (#201) 2024-03-03 17:52:36 +08:00
Enrique Shockwave
9759d927cf fix chatml template (#195) 2024-02-24 16:34:22 +08:00
Zhang Wenbin
8d0a7fae3b Fix interpreter.py get_var(var_name) in text iter when stream is not enabled (#198) 2024-02-24 16:27:34 +08:00
Liangsheng Yin
c4e9ebe3a4 Fix stop str merging (#225)
Co-authored-by: Enrique Shockwave <33002121+qeternity@users.noreply.github.com>
2024-02-24 16:05:21 +08:00
Lianmin Zheng
c51020cf0c Fix the chat template for llava-v1.6-34b & format code (#177) 2024-02-11 05:50:13 -08:00
Lianmin Zheng
23f05005fd Format code & move functions (#155) 2024-02-06 13:27:46 -08:00
Ying Sheng
67be11c790 fix bug of race condition in copy() 2024-02-03 01:38:00 -08:00
Christopher Chou
864425300f Yi-VL Model (#112) 2024-02-01 08:33:22 -08:00
Lianmin Zheng
0617528632 Update quick start examples (#120) 2024-01-30 04:29:32 -08:00
parasol-aser
23950056f0 support speculative execution for openai API (#48)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
2024-01-25 01:57:06 -08:00
Liangsheng Yin
01ee0fbc05 fast regex decode
Auto-detect constant str path in regex FSM, then extend instead.
2024-01-25 01:16:25 +08:00
Lianmin Zheng
7358fa64f7 Fix a bug in runtime backend 2024-01-23 22:10:17 +00:00
Lianmin Zheng
9a16fea012 Return logprob for choices (#87) 2024-01-23 05:07:30 -08:00