Commit Graph

  • c020f9ceda Support chunked prefill when radix cache is disabled (#811) Liangsheng Yin 2024-08-01 00:29:01 -07:00
  • ca600e8cd6 Add support for logprobs in OpenAI chat API (#852) yichuan~ 2024-08-01 15:08:21 +08:00
  • 0c0c81372e Fix #857 (#858) Kai Fronsdal 2024-08-01 00:05:39 -07:00
  • 90286d8576 Add troubleshooting doc (#856) Ying Sheng 2024-08-01 00:05:26 -07:00
  • 5e7dd984fe Fix llama for classification (#855) Ying Sheng 2024-07-31 15:48:31 -07:00
  • bc3eaac2b8 chore: update flashinfer to v0.1.3 (#850) Yineng Zhang 2024-08-01 02:37:05 +08:00
  • a78d98de19 misc: update e2e test paths config (#848) Yineng Zhang 2024-07-31 16:37:29 +08:00
  • 7d5ed7c6ee docs: update README.md (#843) Ikko Eltociear Ashimine 2024-07-31 11:48:18 +09:00
  • a6c7ebbbcb Add req slots leaking check (#842) Liangsheng Yin 2024-07-30 18:29:01 -07:00
  • bb0501c0d9 Fix List input bug (#838) yichuan~ 2024-07-31 04:40:51 +08:00
  • 6b0f2e9088 Add --max-total-tokens (#840) Liangsheng Yin 2024-07-30 13:33:55 -07:00
  • 1edd4e07d6 chore: bump v0.2.7 (#830) Yineng Zhang 2024-07-30 20:41:10 +10:00
  • 62c673c46f docs: add set up runner (#829) Yineng Zhang 2024-07-30 19:43:40 +10:00
  • 377c5dc9a9 misc: enable e2e test when push (#828) Yineng Zhang 2024-07-30 19:26:23 +10:00
  • f52eda35ea misc: update e2e test benchmark config (#825) Yineng Zhang 2024-07-30 19:19:23 +10:00
  • b579ecf028 Add awq_marlin (#826) Ying Sheng 2024-07-30 02:04:51 -07:00
  • e7487b08bc Adjust default mem fraction to avoid OOM (#823) Ying Sheng 2024-07-30 01:58:31 -07:00
  • ae5c0fc442 Support disable_ignore_eos in bench_serving.py (#824) Ying Sheng 2024-07-30 01:42:07 -07:00
  • a30d5d75bf feat: add pr e2e test (#822) Yineng Zhang 2024-07-30 18:31:26 +10:00
  • 17af39c5dc feat: add runner (#821) Yineng Zhang 2024-07-30 17:32:13 +10:00
  • daf593a385 Fix streaming bug (#820) ObjectNotFound 2024-07-30 15:32:07 +08:00
  • bece265f5a docs: update README (#819) Yineng Zhang 2024-07-30 16:17:50 +10:00
  • cdcbde5fc3 Code structure refactor (#807) Liangsheng Yin 2024-07-29 23:04:48 -07:00
  • 21e22b9e96 Fix LiteLLM kwargs (#817) Enrique Shockwave 2024-07-30 06:38:02 +01:00
  • a50c8a14b3 fix: use v0.2.5 for benchmark (#814) Yineng Zhang 2024-07-30 12:40:35 +10:00
  • db6089e6f3 Revert "Organize public APIs" (#815) Ying Sheng 2024-07-29 19:40:28 -07:00
  • 3520f75fb1 Remove inf value for chunked prefill size (#812) Liangsheng Yin 2024-07-29 18:34:25 -07:00
  • c8e9fed87a Organize public APIs (#809) Liangsheng Yin 2024-07-29 15:34:16 -07:00
  • 084fa54d37 Add support for OpenAI API : offline batch(file) processing (#699) yichuan~ 2024-07-30 04:07:18 +08:00
  • eba458bd19 Revert "Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118"" (#806) Ying Sheng 2024-07-29 12:20:42 -07:00
  • 3d1cb0af83 feat: add chat template for internlm2-chat (#802) Yineng Zhang 2024-07-30 05:18:03 +10:00
  • 7d352b4fdd Revert "fix: update flashinfer to 0.1.2 to fix sampling for cu118" (#805) Ying Sheng 2024-07-29 11:39:12 -07:00
  • 87064015d9 fix: update flashinfer to 0.1.2 to fix sampling for cu118 (#803) Yineng Zhang 2024-07-30 04:00:52 +10:00
  • 7cd4f244a4 Chunked prefill (#800) Liangsheng Yin 2024-07-29 03:32:58 -07:00
  • 98111fbe3e Revert "Chunked prefill support" (#799) Ying Sheng 2024-07-29 02:38:31 -07:00
  • 2ec39ab712 Chunked prefill support (#797) Liangsheng Yin 2024-07-29 02:21:50 -07:00
  • 8f6274c82b Add role documentation, add system begin & end tokens (#793) ObjectNotFound 2024-07-29 14:02:49 +08:00
  • 325a06c2de Fix logging (#796) Ying Sheng 2024-07-28 23:01:45 -07:00
  • 79f816292e Fix lazy import location (#795) Ying Sheng 2024-07-28 22:09:50 -07:00
  • b688fd858d Lazy-import third-party backends (#794) Eric Yoon 2024-07-29 13:57:41 +09:00
  • 5bd899243b Update README.md (#792) Ying Sheng 2024-07-28 21:57:23 -07:00
  • 8d908a937c Fix echo + lobprob for OpenAI API when the prompt is a list (#791) Ying Sheng 2024-07-28 17:09:16 -07:00
  • dd7e8b9421 chore: add copyright for srt (#790) Yineng Zhang 2024-07-28 23:07:12 +10:00
  • 1f013d64eb docs: make badges center (#789) Yineng Zhang 2024-07-28 22:27:52 +10:00
  • 628e1fa760 docs: update README (#788) Yineng Zhang 2024-07-28 22:24:27 +10:00
  • c71880f896 Vectorize logprobs computation (#787) Ying Sheng 2024-07-28 05:22:14 -07:00
  • bcb6611a46 Update README.md Ying Sheng 2024-07-28 01:00:06 -07:00
  • fa2aa0db0a docs: update index (#786) Yineng Zhang 2024-07-28 17:22:00 +10:00
  • 6a387a69cc fix: exclude logo png in gitignore (#785) Yineng Zhang 2024-07-28 17:08:16 +10:00
  • 27f5ce0a6c fix: init readthedocs support (#784) Yineng Zhang 2024-07-28 16:55:54 +10:00
  • 948625799e docs: init readthedocs support (#783) Yineng Zhang 2024-07-28 16:50:31 +10:00
  • 68e5262699 fix: replace pillow with PIL in PACKAGE_LIST (#781) Yineng Zhang 2024-07-28 14:06:24 +10:00
  • bc1154c399 Bump version to 0.2.6 (#779) Lianmin Zheng 2024-07-27 20:29:33 -07:00
  • 752e643007 Allow disabling flashinfer sampling kernel (#778) Lianmin Zheng 2024-07-27 20:18:56 -07:00
  • 30db99b3d9 Rename prefill_token_logprobs -> input_token_logprobs; decode_token_logprobs -> output_token_logprobs (#776) Lianmin Zheng 2024-07-27 19:50:34 -07:00
  • 0a409bd438 Fix return_log_probs with cuda graph (#775) Lianmin Zheng 2024-07-27 19:15:09 -07:00
  • e4db4e5ba5 minor refactor: move check server args to server_args.py (#774) Mingyi 2024-07-27 19:03:40 -07:00
  • bbc07c4197 Move sampling logits to float32 (#773) Lianmin Zheng 2024-07-27 17:30:12 -07:00
  • a036d41980 Fix max new tokens (#772) Lianmin Zheng 2024-07-27 17:22:18 -07:00
  • f95e661757 Fix max_tokens for OpenAI chat completion API (#766) Lianmin Zheng 2024-07-27 15:44:27 -07:00
  • de854fb5c5 feat: add fake tag (#770) Yineng Zhang 2024-07-28 02:22:22 +10:00
  • f64b2a9bc0 Add slack invitation link. Lianmin Zheng 2024-07-27 06:29:15 -07:00
  • 9f95dcc64f Update readme (#769) Ying Sheng 2024-07-27 06:12:16 -07:00
  • 0736b27020 [Minor] Improve the code style in TokenizerManager (#767) Lianmin Zheng 2024-07-27 05:05:15 -07:00
  • 3fdab91912 Fix TransformerTokenizer init for chatglm2 & 3 (#761) Ke Bao 2024-07-27 17:44:46 +08:00
  • ba29504b21 Update supported models (#763) Liangsheng Yin 2024-07-26 22:53:53 -07:00
  • a72342f180 fix: not run workflows on fork repo (#762) Yineng Zhang 2024-07-27 14:51:33 +10:00
  • c3c74bf874 docs: update model support (#760) Yineng Zhang 2024-07-27 14:07:37 +10:00
  • d9fccfefe2 Fix context length (#757) Liangsheng Yin 2024-07-26 18:13:13 -07:00
  • 679ebcbbdc Deepseek v2 support (#693) Liangsheng Yin 2024-07-26 17:10:07 -07:00
  • 5bd06b4599 fix: use REPO_TOKEN (#755) Yineng Zhang 2024-07-27 05:56:30 +10:00
  • 9a61182732 fix: add release tag workflow (#754) Yineng Zhang 2024-07-27 05:48:38 +10:00
  • eeb2482186 feat: add release tag workflow (#753) Yineng Zhang 2024-07-27 05:37:02 +10:00
  • 3e455b016e misc: replace deprecated variable HUGGING_FACE_HUB_TOKEN with HF_TOKEN (#752) Yineng Zhang 2024-07-27 04:19:30 +10:00
  • 8628ab9c8b feat: add docker workflow (#751) Yineng Zhang 2024-07-27 03:54:51 +10:00
  • 1b77670f39 chore: bump v0.2.1 (#740) Yineng Zhang 2024-07-26 21:27:41 +10:00
  • 768e05d08f fix benchmark (#743) Yineng Zhang 2024-07-26 21:26:13 +10:00
  • 01fbb11bb7 docs: fix typo (#742) Yineng Zhang 2024-07-26 21:05:53 +10:00
  • 05d216da32 docs: add llama 3.1 405b instruction (#739) Yineng Zhang 2024-07-26 21:03:20 +10:00
  • 6b32bb1c0b misc: format (#741) Yineng Zhang 2024-07-26 21:00:51 +10:00
  • 40facad5f1 feat: support token ids in /v1/completions (#736) Toshiki Kataoka 2024-07-26 18:53:17 +09:00
  • da504445dc fix /generate without sampling_params (#734) Toshiki Kataoka 2024-07-26 17:27:56 +09:00
  • 252e0f7bbd fix: small bug for llama-405b fp16 (#733) Ying Sheng 2024-07-25 21:14:54 -07:00
  • 7f6f2f0f09 Update readme (#731) Ying Sheng 2024-07-25 09:13:37 -07:00
  • 7802df1e2b Update readme Ying Sheng 2024-07-25 08:14:36 -07:00
  • 1a491d00cb Bump version to 0.2.0 (#730) Ying Sheng 2024-07-25 08:03:36 -07:00
  • 8fbba3de3d Fix bugs (fp8 checkpoints, triton cache manager) (#729) Ying Sheng 2024-07-25 07:42:00 -07:00
  • ae0f6130cb Revert "fix: fp8 config" (#728) Ying Sheng 2024-07-25 07:25:33 -07:00
  • 6010589783 misc: update bug issue template (#727) Yineng Zhang 2024-07-25 20:52:37 +10:00
  • 926ac01b64 fix: resolve the logo display issue on the PyPI page (#726) Yineng Zhang 2024-07-25 20:47:46 +10:00
  • 25c881a005 chore: bump v0.1.25 (#725) Yineng Zhang 2024-07-25 20:04:35 +10:00
  • 04ec6ba2ac Fix dockerfile and triton cache manager (#720) Liangsheng Yin 2024-07-25 03:04:21 -07:00
  • d63f13c13b fix: fp8 config (#723) Ying Sheng 2024-07-25 02:01:56 -07:00
  • fded67441d misc: update bulid instruction (#724) Yineng Zhang 2024-07-25 17:08:11 +10:00
  • 6e45394051 chore: add close inactive issues workflow (#722) Yineng Zhang 2024-07-25 16:31:23 +10:00
  • 97e0f7d250 docs: update comment (#721) Yineng Zhang 2024-07-25 10:51:18 +10:00
  • d5146baec9 docs: update supported models (#719) Yineng Zhang 2024-07-25 09:34:01 +10:00
  • 459abad261 Bump version to 0.1.24 (#718) Ying Sheng 2024-07-24 15:55:01 -07:00
  • 30d8e130e7 Improve benchmark scripts (#717) Ying Sheng 2024-07-24 14:44:14 -07:00
  • 08a3bd19cc docs: update doc (#716) Ying Sheng 2024-07-24 13:38:06 -07:00