Commit Graph

  • 0992d85f92 support llava video (#426) Yuanhan Zhang 2024-05-14 07:57:00 +08:00
  • 5dc55a5f02 Handle truncation errors (#436) Lianmin Zheng 2024-05-13 15:56:00 -07:00
  • 4231a42fa8 Fix import of global_config Lianmin Zheng 2024-05-13 12:11:55 -07:00
  • 455c9ccc4a Update readme (#434) Lianmin Zheng 2024-05-13 00:17:02 -07:00
  • 39191c8515 Cache optimizations (#418) Liangsheng Yin 2024-05-13 12:47:13 +08:00
  • 562b8857d8 Improve error handling (#433) Lianmin Zheng 2024-05-12 20:49:04 -07:00
  • 04c0b21488 Allow input_ids in the input of the /generate endpoint (#363) Shannon Shen 2024-05-12 12:29:00 -10:00
  • 6e09cf6a15 Misc fixes (#432) Lianmin Zheng 2024-05-12 15:05:40 -07:00
  • 72bb344388 Update version to 0.1.15 (#431) Lianmin Zheng 2024-05-12 14:22:33 -07:00
  • 2d580e7a89 Fix flashinfer (#430) Lianmin Zheng 2024-05-12 08:18:53 -07:00
  • 3fc97f6709 Move openai api server into a separate file (#429) Lianmin Zheng 2024-05-12 06:41:32 -07:00
  • abc548c707 Minor fix for the import path (#428) Lianmin Zheng 2024-05-12 05:10:35 -07:00
  • aee4f523cf Fix logit processor bugs (#427) Lianmin Zheng 2024-05-12 04:54:07 -07:00
  • 7023f413c6 Clean up (#422) Lianmin Zheng 2024-05-11 20:55:00 -07:00
  • 09deb20dee Optimize the memory usage of logits processor (#420) Lianmin Zheng 2024-05-11 16:56:42 -07:00
  • 33b242df30 Compat with latest VLLM 0.4.2 main + fork.number rename + Flashinfer 0.0.4 (#380) Qubitium 2024-05-12 07:37:49 +08:00
  • a511a2d089 restrict vllm version Lianmin Zheng 2024-05-09 15:49:29 -07:00
  • 6ec65f4555 Make public APIs more standard. (#416) Liangsheng Yin 2024-05-09 15:39:22 +08:00
  • e2c31fca5c Include finish reason in meta info response (#415) Enrique Shockwave 2024-05-09 08:14:01 +01:00
  • d5de20a3ee Fix sync() when fork(1) (#412) Liangsheng Yin 2024-05-08 15:15:18 +08:00
  • 4a1c6ae2ce Add Cohere Command R chat template (#411) YoungJoong Noah Kim 2024-05-07 16:18:15 +09:00
  • 14522e6a26 Organize Benchmark (#381) Liangsheng Yin 2024-05-05 16:14:17 +08:00
  • 183df47282 SamplingParams add "spaces_between_special_tokens" argument (#392) ZhouXingg 2024-05-01 07:17:12 +08:00
  • 5c5aba5900 Adding RAG tracing & eval cookbook using Parea (#390) Joschka Braun 2024-04-30 19:13:28 -04:00
  • ba67101f99 Fix chatml template (#406) Lianmin Zheng 2024-04-30 15:53:39 -07:00
  • 95c4e0dfac Format Benchmark Code (#399) Liangsheng Yin 2024-04-28 21:06:22 +08:00
  • 19818b9c2f Minor: style improvement of radix_cache and memory_pool (#395) Liangsheng Yin 2024-04-26 01:01:36 +08:00
  • 9216b10678 Improve performance when running with full parallel (#394) Liangsheng Yin 2024-04-25 17:29:07 +08:00
  • da19434c2f Benchmark Updates (#382) Liangsheng Yin 2024-04-24 02:23:01 +08:00
  • 150d7020ed Revert removing the unused imports (#385) Liangsheng Yin 2024-04-23 22:36:33 +08:00
  • 9acc6e3504 add .isort.cfg (#378) Liangsheng Yin 2024-04-22 22:38:09 +08:00
  • cf9d8efdd3 llama3 instruct template (#372) Enrique Shockwave 2024-04-21 17:40:12 +01:00
  • 1bf1cf1953 Reduce overhead when fork(1) (#375) Liangsheng Yin 2024-04-21 17:25:14 +08:00
  • e822e5900b Optimize radix tree matching (#364) Ke Bao 2024-04-18 00:47:37 +08:00
  • ca4f1ab89c Update model support in readme (#370) Ying Sheng 2024-04-17 00:16:32 -07:00
  • 2b6d999191 Fix issue #367 – System message not supported for Anthropic (anthropic.BadRequestError) (#368) Fronx 2024-04-16 20:18:24 +02:00
  • 65501a9cf1 Fix commandr import; format code Lianmin Zheng 2024-04-16 18:10:12 +00:00
  • db611066ad support command-r (#369) ZhouXingg 2024-04-17 01:36:51 +08:00
  • c93293c57e Update README.md (#358) Ikko Eltociear Ashimine 2024-04-10 00:39:30 +09:00
  • 62b3812b69 Time cost utils (#355) Liangsheng Yin 2024-04-09 23:27:31 +08:00
  • 550a4f78f3 Fix typos in infer_batch.py (#354) Tom Dörr 2024-04-09 09:10:05 +02:00
  • ff99c38a07 Add timeout to get_meta_info (#346) SimoneRaponi 2024-04-03 16:22:06 +02:00
  • c9de3e169c Eliminate 2 gpu ops during sampling when logit_bias is zero (#338) Qubitium 2024-04-03 13:56:06 +08:00
  • ed27a6b992 Revert "Eliminate 2 gpu ops during sampling when logit_bias is zero" (#345) Liangsheng Yin 2024-04-03 12:45:01 +08:00
  • 463c6632a8 Eliminate 2 gpu ops during sampling when logit_bias is zero (#343) Liangsheng Yin 2024-04-02 19:14:55 +08:00
  • b0890631a0 fix gemma import error Ying Sheng 2024-04-01 07:35:58 +00:00
  • cb389c91bc Fix llava parallelism/fork bug (#315) Junlong Li 2024-03-29 10:24:54 +08:00
  • eddaa2b599 Add support for new autogptq quant_config.checkpoint_format (#332) Qubitium 2024-03-29 10:24:16 +08:00
  • 2af565b3bb [model] DBRX-instruct support (#337) Liangsheng Yin 2024-03-29 01:05:19 +08:00
  • 3842eba5fa Logprobs Refractor (#331) Liangsheng Yin 2024-03-28 14:34:49 +08:00
  • 24e59f5350 model_runner simplify (#329) Liangsheng Yin 2024-03-24 19:48:37 +08:00
  • 7523541962 model_rpc style improvement (#293) Liangsheng Yin 2024-03-24 15:41:24 +08:00
  • 64ee9c030e Openrouter usage example (#327) Jani Monoses 2024-03-23 19:16:24 +02:00
  • 30d17840fc Update dependencies (#326) Jani Monoses 2024-03-23 19:15:58 +02:00
  • ce216c80dc Cleanup codebase: removed unnecessary code/logic (#298) Qubitium 2024-03-24 01:15:16 +08:00
  • 51104cd405 Update version to v0.1.14 (#324) Lianmin Zheng 2024-03-22 13:42:22 -07:00
  • e2b2f0a213 Support oai in benchmark/mmlu (#323) Lianmin Zheng 2024-03-22 13:37:57 -07:00
  • b57abe1663 Add StableLM model. (#301) Jani Monoses 2024-03-22 22:24:08 +02:00
  • e57f079275 Use Anthropic messages API (#304) Jani Monoses 2024-03-22 22:23:31 +02:00
  • 08df63a6f8 [Fix/Potential Bugs] Can not correctly import models in python/sglang/srt/models (#311) Li Bo 2024-03-23 03:19:58 +08:00
  • 77835756a7 Fix outlines-0.0.35 incompatibility (#291) ZhouGongZaiShi 2024-03-23 03:19:11 +08:00
  • ed31579971 Fix marlin model loading compat with autogptq (#290) Liurl 2024-03-13 13:15:43 +08:00
  • 92e2d74fd0 Fix env (docker) compat due to __file__ usage (#288) Qubitium 2024-03-13 13:02:48 +08:00
  • d9b3b01883 enable marlin kernels (#286) Enrique Shockwave 2024-03-13 02:10:12 +00:00
  • 745ea007ac Fix Incorrect CURL Request Example in README (#287) Arsalan 2024-03-12 22:09:38 -04:00
  • ad1dd74673 Fix flashinfer >= 0.0.3 compat (#282) Qubitium 2024-03-12 21:45:58 +08:00
  • eb4308c4c9 adding the triton docker build minimal example (#242) Arsalan 2024-03-12 03:16:06 -04:00
  • b2eb080501 Fix Runtime missing some ServerArgs options (#281) Qubitium 2024-03-11 22:32:15 +08:00
  • 4aa5dd2c5f Update version to v0.1.13 (#280) Lianmin Zheng 2024-03-11 05:49:27 -07:00
  • 13662fd533 Fix RuntimeEndpoint (#279) Lianmin Zheng 2024-03-11 05:24:24 -07:00
  • d5ae2ebaa2 Add Support for API Key Authentication (#230) Alessio Dalla Piazza 2024-03-11 13:16:10 +01:00
  • 1b35547927 Organize server_args (#277) Liangsheng Yin 2024-03-11 20:06:52 +08:00
  • faba293a0d Improve gemma and documentations (#278) Lianmin Zheng 2024-03-11 04:43:39 -07:00
  • 89885b31ef Gemma Support (#256) Liangsheng Yin 2024-03-11 12:14:27 +08:00
  • 64fe311593 replace skip_embed with input_embeds (#222) Geary.Z 2024-03-11 10:04:52 +08:00
  • a7ace9c88d Fix qwen config (#261) Liangsheng Yin 2024-03-11 09:54:18 +08:00
  • a833de05d3 Add logo (#275) Lianmin Zheng 2024-03-10 18:51:47 -07:00
  • 30d67b2bca Add set_var to interpreter.py (#263) Lin Tianchuan 2024-03-07 23:20:11 +08:00
  • b0b722ee8e Refactor ChatTemplate for Enhanced Clarity and Efficiency (#201) Xinwei Xiong 2024-03-03 17:52:36 +08:00
  • 01b07ea3ac Add SSL Cert Functionality (#224) Srinivas Billa 2024-03-03 09:41:41 +00:00
  • dfb13ac455 Fix addr reuse in check_port (#253) Liangsheng Yin 2024-03-03 17:09:16 +08:00
  • ec90b9c054 Upload agent_calls.jsonl download link (#226) Liangsheng Yin 2024-02-24 19:03:46 +08:00
  • 9759d927cf fix chatml template (#195) Enrique Shockwave 2024-02-24 08:34:22 +00:00
  • 8d0a7fae3b Fix interpreter.py get_var(var_name) in text iter when stream is not enabled (#198) Zhang Wenbin 2024-02-24 16:27:34 +08:00
  • c4e9ebe3a4 Fix stop str merging (#225) Liangsheng Yin 2024-02-24 16:05:21 +08:00
  • 3c2c5869ad Support outlines > 0.0.31 (#219) Cody Yu 2024-02-23 23:06:17 -08:00
  • 4cb9aaedf3 Fix logprobs with logprob_start_len (#193) Cody Yu 2024-02-22 10:33:03 -08:00
  • 9de9a46815 Added the ability to Modify the Context Length (#210) psych0v0yager 2024-02-20 18:22:56 -06:00
  • ce3b261053 Update README.md (#207) Ikko Eltociear Ashimine 2024-02-20 02:09:03 +09:00
  • 91e036334f Adjust outlines version. (#200) Liangsheng Yin 2024-02-17 13:40:39 +08:00
  • 2a74748b2f Pin outlines version (#196) Cody Yu 2024-02-16 13:01:40 -08:00
  • 63ba630bbb Refactor decoding logprob and add completion_tokens_wo_jump_forward (#189) Cody Yu 2024-02-15 10:54:20 -08:00
  • 6493256b7d improve print Lianmin Zheng 2024-02-12 12:43:48 +00:00
  • 06008bc295 Fix server launch for jupyter notebook (#186) Lianmin Zheng 2024-02-12 04:43:14 -08:00
  • bb824da41a Add Together and AzureOpenAI examples (#184) Lianmin Zheng 2024-02-12 01:06:38 -08:00
  • 931213245c correct reference dtype openai.py (#181) Yaya Sy 2024-02-11 22:26:20 +01:00
  • c97fdae4aa correct a mistake on the README.md (#182) Yaya Sy 2024-02-11 22:25:57 +01:00
  • 624b21e742 Update version to 0.1.12 (#178) Lianmin Zheng 2024-02-11 06:43:45 -08:00
  • c51020cf0c Fix the chat template for llava-v1.6-34b & format code (#177) Lianmin Zheng 2024-02-11 05:50:13 -08:00
  • 50afed4eaa Support extra field regex in OpenAI API (#172) Cody Yu 2024-02-10 17:21:33 -08:00