Commit Graph

  • 4d303c4fa3 Fix token usage with jump forward (#174) Cody Yu 2024-02-09 20:06:15 -08:00
  • 37b42297f8 import outlines (#168) Liangsheng Yin 2024-02-09 10:13:02 +08:00
  • cba5027332 Fix BaseCache metric (#170) Cody Yu 2024-02-08 17:23:09 -08:00
  • a6aa46dd3f minor Ying Sheng 2024-02-08 04:35:25 +00:00
  • 405f26b00b Add Auth Token to RuntimeEndpoint (#162) Srinivas Billa 2024-02-08 04:07:31 +00:00
  • b1a3a454ee add --disable-disk-cache (#160) Liangsheng Yin 2024-02-08 00:50:12 +08:00
  • 79e6b84bec Update README.md Ying Sheng 2024-02-06 23:14:59 -08:00
  • 26c3494152 [Submodule] Change FlashInfer to import (#156) Cody Yu 2024-02-06 19:28:29 -08:00
  • cb8e1982f8 Update README.md Ying Sheng 2024-02-06 18:44:37 -08:00
  • 23f05005fd Format code & move functions (#155) Lianmin Zheng 2024-02-06 13:27:46 -08:00
  • a7334aeea1 Support decode token logprobs (#130) Cody Yu 2024-02-06 12:24:55 -08:00
  • ee1df26a77 Update README.md Lianmin Zheng 2024-02-06 11:35:42 -08:00
  • 3ae78a09b3 Add gptq quantization model support (#141) Arcmoon 2024-02-07 03:35:04 +08:00
  • ccbe1e67d8 Temporary fix OpenAI API for Pydantic v1/v2 (#153) Cody Yu 2024-02-06 11:34:15 -08:00
  • e2bf732bc3 add openai error handler with retry and logger (#148) LiviaSun 2024-02-06 12:38:41 +08:00
  • 322421fae3 Add warmup to SRT server (#146) Cody Yu 2024-02-05 14:21:16 -08:00
  • 8ff870bf3e improve docs Lianmin Zheng 2024-02-05 11:22:06 +00:00
  • 26f0bedc8f jump-forward rename (#144) Liangsheng Yin 2024-02-05 16:50:37 +08:00
  • 82fa69b3cc fix undfined variable (#142) Yaya Sy 2024-02-04 23:27:52 +01:00
  • 8fb7459e08 update json decoding docs Lianmin Zheng 2024-02-03 17:42:01 -08:00
  • bb3a3b6675 Support Faster JSON decoding for llava (#137) Liangsheng Yin 2024-02-03 23:32:05 +08:00
  • 45d6592d40 Fix no-cache mode (#136) Ying Sheng 2024-02-03 04:59:06 -08:00
  • f6bfe3aaff Release 0.1.11 (#134) Ying Sheng 2024-02-03 02:50:13 -08:00
  • e095b16236 Add max_prefill_num_token into server arguments (#133) Ying Sheng 2024-02-03 02:35:54 -08:00
  • 67be11c790 fix bug of race condition in copy() Ying Sheng 2024-02-03 01:38:00 -08:00
  • cd8c3ccd95 Fix is_multimodal_model judge (#132) Liangsheng Yin 2024-02-03 11:48:01 +08:00
  • 9c121f2a45 minor fix: result dump format hnyls2002 2024-02-02 09:58:24 +00:00
  • 03e04b2331 update docs for Yi-VL Lianmin Zheng 2024-02-01 22:44:05 +00:00
  • 864425300f Yi-VL Model (#112) Christopher Chou 2024-02-01 08:33:22 -08:00
  • 79cb018e4b Add city doc benchmark mode (#129) Liangsheng Yin 2024-02-01 13:38:47 +08:00
  • c7af9f7393 Fix a bug in llava-hd Lianmin Zheng 2024-01-31 18:52:15 +00:00
  • 876db8dc7a Update sampling_params.md Lianmin Zheng 2024-01-31 10:18:43 -08:00
  • ad82bac6f5 Fix model loading & format code (#125) Lianmin Zheng 2024-01-30 23:49:52 -08:00
  • 71b54eea7d Add cache metrics (#119) Cody Yu 2024-01-30 22:13:14 -08:00
  • 74b3bfaaf8 format code Lianmin Zheng 2024-01-30 16:36:10 +00:00
  • 4a634cf646 [Feature] Allow specifying all ports to use in advance (#116) Jay Zhou 2024-01-30 08:34:51 -08:00
  • a49dc52bfa release v0.1.10 Lianmin Zheng 2024-01-30 15:37:43 +00:00
  • 873d0e8537 Ignore detokenization error Lianmin Zheng 2024-01-30 14:52:06 +00:00
  • 1d0fbe8e43 [Feature] Adds basic support for image content in OpenAI chat routes (#113) Keith Stevens 2024-01-30 23:12:33 +09:00
  • 97aa9b3284 Improve docs & Add JSON decode example (#121) Lianmin Zheng 2024-01-30 05:45:27 -08:00
  • 0617528632 Update quick start examples (#120) Lianmin Zheng 2024-01-30 04:29:32 -08:00
  • 4ea92f8307 Format code (#118) Lianmin Zheng 2024-01-29 17:08:12 -08:00
  • 6b0af2853c Add qwen2 (#114) Junyang Lin 2024-01-30 09:06:02 +08:00
  • 6f560c761b Improve the control of streaming and improve the first token latency in streaming (#117) Lianmin Zheng 2024-01-29 17:05:42 -08:00
  • cd6872334e Fix Mistral model loading (#108) Cody Yu 2024-01-26 09:38:43 -08:00
  • 81561f8e2d Flush Cache API (#103) Liangsheng Yin 2024-01-26 13:32:59 +08:00
  • 3a581e9949 Dynamic model class loading (#101) Cody Yu 2024-01-25 15:29:07 -08:00
  • 0147f940dd fix batch error for llava-hd (#98) shiyi.c_98 2024-01-25 07:56:25 -08:00
  • 23950056f0 support speculative execution for openai API (#48) parasol-aser 2024-01-25 03:57:06 -06:00
  • 93414c8238 Add a link to HF paper page Lianmin Zheng 2024-01-24 22:25:33 -08:00
  • ed7c7eca0e Update README.md Lianmin Zheng 2024-01-24 16:52:21 -08:00
  • 0c457bae8f Handle grayscale images in expand2square (#97) isaac-vidas 2024-01-24 19:23:11 -05:00
  • d3fc86a43e Improve Chinese character streaming when the last char is half Chinese word. (#95) Haotian Liu 2024-01-24 14:23:27 -06:00
  • 01ee0fbc05 fast regex decode Liangsheng Yin 2024-01-25 01:16:25 +08:00
  • 711d343530 add a batch llava example Lianmin Zheng 2024-01-24 11:44:07 +00:00
  • 6dceab4d17 bump version to 0.1.9 Lianmin Zheng 2024-01-24 11:37:25 +00:00
  • c70b3cfa9e Bump the version to v0.1.8 (#93) Lianmin Zheng 2024-01-24 03:33:34 -08:00
  • 489796c7ea minor performance fix Ying Sheng 2024-01-24 10:45:44 +00:00
  • fa7a696d04 Fix max_new_tokens for limited memory Lianmin Zheng 2024-01-24 10:44:32 +00:00
  • bef0b35902 Fix llava & Fix multiprocessing Lianmin Zheng 2024-01-24 10:35:31 +00:00
  • c6576e820c Llava-hd Support (#92) shiyi.c_98 2024-01-24 01:51:21 -08:00
  • 99258181c6 set start method to spawn Lianmin Zheng 2024-01-24 08:55:38 +00:00
  • 3de54a1b55 Add health endpoint to SGLang runtime server (#90) isaac-vidas 2024-01-23 22:00:28 -05:00
  • 7358fa64f7 Fix a bug in runtime backend Lianmin Zheng 2024-01-23 22:10:17 +00:00
  • 9a16fea012 Return logprob for choices (#87) Lianmin Zheng 2024-01-23 05:07:30 -08:00
  • 9e037c822c Update README.md Lianmin Zheng 2024-01-23 03:43:19 -08:00
  • 9076386d90 Fix SRT endpoint api json syntax (#84) 0xWe11es.eth 2024-01-23 16:25:26 +08:00
  • 959c4174b2 Fix the chat template for QWen (#83) Lianmin Zheng 2024-01-22 21:46:47 -08:00
  • 94e05770db Fix after QWen support (#82) Lianmin Zheng 2024-01-22 21:17:05 -08:00
  • 63e97e5e4c Suppport qwen model and solve some problems (#75) Arcmoon 2024-01-23 12:14:51 +08:00
  • e08bca2840 Support load fine-tuned LLaVA model (#80) isaac-vidas 2024-01-22 21:15:48 -05:00
  • cd3ccb2ed7 Add a note about triton version for older GPUs (#72) Lianmin Zheng 2024-01-21 16:51:45 -08:00
  • 3f5c2f4c4a Add an async example (#37) Ying Sheng 2024-01-21 15:17:30 -08:00
  • 007eeb4eb9 Fix the error message and dependency of openai backend (#71) Lianmin Zheng 2024-01-21 14:56:25 -08:00
  • e8f2b155fe Update README.md Ying Sheng 2024-01-21 02:45:58 -08:00
  • 723f042163 release v0.1.7 & fix bugs Lianmin Zheng 2024-01-21 10:31:02 +00:00
  • 585eababa1 Improve error message of openai Lianmin Zheng 2024-01-21 10:13:45 +00:00
  • cc3ada983f Bump version to 0.1.6 (#68) Lianmin Zheng 2024-01-21 01:45:02 -08:00
  • a837166e6f Fix select and normalized logprobs (#67) Lianmin Zheng 2024-01-21 01:39:23 -08:00
  • 11f3cca64f Fix select (#64) Lianmin Zheng 2024-01-20 23:20:35 -08:00
  • ca13f3b8c5 Disk FSM cache and adjust code. (#63) Liangsheng Yin 2024-01-21 13:26:11 +08:00
  • 0b2efc2adc Update README.md (#58) Ikko Eltociear Ashimine 2024-01-20 14:00:29 +09:00
  • f30abd090a Improve error message & Add vicuna template (#57) Lianmin Zheng 2024-01-19 17:03:33 -08:00
  • 40ab1f0129 Fix the possible bug of decode out of memory (#36) Liangsheng Yin 2024-01-20 03:01:15 +08:00
  • 199e82a15d Format code & Improve readme (#52) Lianmin Zheng 2024-01-18 23:51:19 -08:00
  • 23471f9aa3 Support v1/chat/completions (#50) Cody Yu 2024-01-18 23:43:09 -08:00
  • 61d4c93962 Support stream=True in v1/completions (#49) Cody Yu 2024-01-18 17:00:56 -08:00
  • 98a3e8ef78 Add a llava example (#47) Lianmin Zheng 2024-01-18 13:46:38 -08:00
  • 2b079f8931 Increase interpreter parallelism (#46) Lianmin Zheng 2024-01-18 13:30:10 -08:00
  • 05b4c398df Document sampling parameters (#45) Lianmin Zheng 2024-01-18 11:49:27 -08:00
  • dafafe5b11 Use HTTP link in 3rdparty module (#42) Cody Yu 2024-01-18 11:18:22 -08:00
  • b240f75100 Add a parallel sampling case (#34) Lianmin Zheng 2024-01-17 22:26:32 -08:00
  • 501f944445 Bump version to 0.1.5 (#33) Lianmin Zheng 2024-01-17 21:14:31 -08:00
  • 22ec7bc2a1 Expose more arguments to control the scheduling policy (#32) Lianmin Zheng 2024-01-17 18:37:02 -08:00
  • c0454b323c Add option to return metadata in async streaming (#18) Christopher Chou 2024-01-17 18:15:02 -08:00
  • 8024fc5eec Fix streaming (#30) Lianmin Zheng 2024-01-17 16:38:20 -08:00
  • 70528762bf update readme Lianmin Zheng 2024-01-17 10:42:55 -08:00
  • 71d30d6ddc Update README.md Ying Sheng 2024-01-17 09:49:53 -08:00
  • f9d723816a Teak mem fraction (#20) Lianmin Zheng 2024-01-17 04:43:17 -08:00
  • bf51ddc6e5 Improve docs & Rename Gemini -> VertexAI (#19) Lianmin Zheng 2024-01-17 02:54:41 -08:00