Commit Graph

  • 33d61356b8 misc: update issue template (#1024) Yineng Zhang 2024-08-11 15:34:30 +08:00
  • 94752ac811 feat: use FlashInfer rmsnorm and silu (#907) Yineng Zhang 2024-08-11 12:57:13 +08:00
  • 43fbb6d919 Fix input_ids && rename to fill_ids (#1021) Liangsheng Yin 2024-08-10 16:24:12 -07:00
  • 54fb1c80c0 Clean up unit tests (#1020) Lianmin Zheng 2024-08-10 15:09:03 -07:00
  • b68c4c073b fix: force max new tokens to be 1 for embedding request (#1019) Ying Sheng 2024-08-10 13:46:42 -07:00
  • e712837d38 misc: update test config (#990) Yineng Zhang 2024-08-11 02:20:30 +08:00
  • 7599badeaf Support embedding input as a list (#1014) Ying Sheng 2024-08-10 08:39:05 -07:00
  • 62757db6f0 Reduce the overhead when cache is disabled (#1010) Liangsheng Yin 2024-08-09 16:36:57 -07:00
  • 73fa2d49d5 Some warnings to crash when CI (#1009) Liangsheng Yin 2024-08-09 15:16:23 -07:00
  • 61728884d7 Fix benchmark latency (#1007) Mingyi 2024-08-09 13:18:58 -07:00
  • 9cf0a5bada Add skip_tokenizer_init args. (#959) gryffindor-rr 2024-08-10 03:14:13 +08:00
  • b16e856f11 Add openai embedding API (#997) Ying Sheng 2024-08-09 11:19:18 -07:00
  • 05c50a82b8 Minor bugfix on benchmark serving (#1005) Roger Wang 2024-08-09 09:53:50 -07:00
  • b568df5d03 fix: resolve correctness_test issue (#1002) Yineng Zhang 2024-08-09 21:21:42 +08:00
  • 10bca45bc6 bugfix: penalizers to be merged before reqs (#1001) Juwan Yoo 2024-08-09 04:46:24 -07:00
  • b91a4cb1b1 support models from www.modelscope.cn (#994) liuyhwangyh 2024-08-09 17:52:14 +08:00
  • 95a28019ba test: negative value testing for frequency, presence penalizers (#995) Juwan Yoo 2024-08-08 23:30:50 -07:00
  • e040a2450b Add e5-mistral embedding model - step 3/3 (#988) Ying Sheng 2024-08-08 16:31:19 -07:00
  • 9f662501a3 Move torch.compile configs into cuda_graph_runner.py (#993) Ying Sheng 2024-08-08 13:20:30 -07:00
  • ab7875941b feat: frequency, min_new_tokens, presence, and repetition penalties (#973) Juwan Yoo 2024-08-08 04:21:08 -07:00
  • 228cf47547 Create contributor_guide.md (#992) Ying Sheng 2024-08-08 03:58:47 -07:00
  • 3a79613c28 support more optioin about usage in stream mode (#985) yichuan~ 2024-08-08 17:41:57 +08:00
  • 1ac304eeb4 Adjust InputeMetadata and ScheduleBatch (#981) Liangsheng Yin 2024-08-08 01:11:22 -07:00
  • 20a4f927dc Add io struct for embedding models [unreachable code] - step 2/3 (#987) Ying Sheng 2024-08-08 00:52:31 -07:00
  • 0de7c2d09e Add e5-mistral modules [unreachable code] - step 1/3 (#983) Ying Sheng 2024-08-08 00:04:15 -07:00
  • 6ed4e3b8fb Fix chunked prefill (#984) Liangsheng Yin 2024-08-07 22:28:42 -07:00
  • 00023d622a [minor] Update type annotation in tokenizer_manager.py (#982) Ying Sheng 2024-08-07 18:48:45 -07:00
  • c62d560c03 #590 Increase default , track changes in examples and documentation (#971) foszto 2024-08-08 02:54:46 +02:00
  • 2b8257f325 Adjust max prefix len (#980) Liangsheng Yin 2024-08-07 17:41:26 -07:00
  • 7623091d97 RadixCache method adjust (#977) Liangsheng Yin 2024-08-07 15:52:24 -07:00
  • f724f1f1e9 PrefillAdder abstraction (#968) Liangsheng Yin 2024-08-07 13:47:28 -07:00
  • 6db27f7b3b misc: correct the int data type for token ids and indices (#969) Zhiqiang Xie 2024-08-07 13:40:07 -07:00
  • 4d929107ae Run purge-cache only in sgl-project (#976) Liangsheng Yin 2024-08-07 13:16:36 -07:00
  • fbe0c818c2 Purge self-runner's pip cache weekly (#975) Liangsheng Yin 2024-08-07 12:43:12 -07:00
  • dc9d06d886 chore: bump v0.2.11 (#970) Yineng Zhang 2024-08-07 20:47:53 +08:00
  • c31f084c71 chore: update vllm to 0.5.4 (#966) Yineng Zhang 2024-08-07 19:15:41 +08:00
  • a01ddd9605 misc: fix the req_to_token member change (#967) Liangsheng Yin 2024-08-07 01:52:10 -07:00
  • 7fa54a1ab3 Make req_pool_indices on CPU (#960) Liangsheng Yin 2024-08-07 01:41:25 -07:00
  • 05abd1261c misc: add compute capability in check_env (#965) Yineng Zhang 2024-08-07 16:39:36 +08:00
  • 5f6fa04a3f misc: simplify test (#964) yichuan~ 2024-08-07 16:23:27 +08:00
  • 58a0970853 misc: update issue template (#963) Yineng Zhang 2024-08-07 15:41:21 +08:00
  • ff68ae857a Show more error messages for warmup errors (#932) Ying Sheng 2024-08-06 23:57:06 -07:00
  • 795eab6dda Add support for Batch API test (#936) yichuan~ 2024-08-07 14:52:10 +08:00
  • 41bb1ab10d fix nsys cannot profile cuda kernel (#957) Meng, Peng 2024-08-07 11:51:21 +08:00
  • 87e8c090e9 Organize code (rename, movement) (#953) Liangsheng Yin 2024-08-06 20:50:32 -07:00
  • ad56e68495 Fix stuck in get_new_prefill_batch (#948) Liangsheng Yin 2024-08-06 01:05:58 -07:00
  • ffb15744b5 Support multiple args options (#941) yichuan~ 2024-08-06 02:12:53 +08:00
  • a9c833d580 Fix union operator (#940) Ke Bao 2024-08-06 00:46:34 +08:00
  • 94e0115186 Feat: add alternative choices selection methods (#835) Aidan Cooper 2024-08-05 11:27:49 +01:00
  • b216a545b3 Remove leftover auth_token (#934) Aidan Cooper 2024-08-05 11:25:48 +01:00
  • fde8340550 docs: update README (#935) Yineng Zhang 2024-08-05 18:06:06 +08:00
  • fd7926e46e Fix prompt len in parallel sampling (#928) yichuan~ 2024-08-05 15:56:08 +08:00
  • 399cad91f3 Update README.md (#927) Ying Sheng 2024-08-04 23:01:35 -07:00
  • 0a4f5f9bea Test regex in vision api (#926) Ying Sheng 2024-08-04 22:52:41 -07:00
  • 3bc99e6fe4 Test openai vision api (#925) Ying Sheng 2024-08-04 20:51:55 -07:00
  • ebf69964cd latency test enhancement - final part (#921) min-xu-et 2024-08-04 18:15:23 -07:00
  • 141e8c71a3 Bump version to 0.2.10 (#923) Ying Sheng 2024-08-04 16:52:51 -07:00
  • d53dcf9c98 Support more OpenAI API test (#916) yichuan~ 2024-08-05 07:43:09 +08:00
  • bb66cc4c52 Fix CI && python3.8 compatible (#920) Liangsheng Yin 2024-08-04 16:02:05 -07:00
  • 975adb802b Update hyperparameter_tuning.md (#918) Ying Sheng 2024-08-04 13:51:52 -07:00
  • 0d4f3a9fcd Make API Key OpenAI-compatible (#917) Ying Sheng 2024-08-04 13:35:44 -07:00
  • afd411d09f enhance latency test - part 2 (#915) min-xu-et 2024-08-04 12:27:25 -07:00
  • e1eae1fd15 Support MLA for DeepSeek-V2 with Triton - step 1 (#905) Ke Bao 2024-08-05 01:40:33 +08:00
  • f4d9953d9d misc: add triton in check_env PACKAGE_LIST (#914) Yineng Zhang 2024-08-04 21:20:59 +08:00
  • 4f00525057 fix: use e2e and unit test only for original repo or pr (#912) Yineng Zhang 2024-08-04 14:34:50 +08:00
  • 995af5a54b Improve the structure of CI (#911) Ying Sheng 2024-08-03 23:09:21 -07:00
  • 539856455d latency test enhancement - part 1 (#909) min-xu-et 2024-08-03 22:44:58 -07:00
  • 70cc0749ce Add model accuracy test - step 1 (#866) Ying Sheng 2024-08-03 18:20:50 -07:00
  • 7dd8a7e6d9 fixed an error handling in bench_latency.py (#904) min-xu-et 2024-08-03 17:42:17 -07:00
  • 947402c829 Reorder CI unit tests. (#908) Liangsheng Yin 2024-08-03 16:18:50 -07:00
  • 8c5382e62c Update README.md Ying Sheng 2024-08-03 12:58:41 -07:00
  • 001b0bdd08 Update the base image of the docker (#900) Ying Sheng 2024-08-02 21:54:57 -07:00
  • b906c01592 Bump version to 0.2.9.post1 (#899) Ying Sheng 2024-08-02 12:08:00 -07:00
  • 9319cd139c [minor] fixed code formatting doc (#896) min-xu-et 2024-08-02 09:39:28 -07:00
  • 046c2b339e chore: add multipart dep for fastapi (#895) Yineng Zhang 2024-08-02 22:50:19 +08:00
  • 6b8f66efe1 misc: update cuda graph capture exception log (#894) Yineng Zhang 2024-08-02 22:40:52 +08:00
  • 7937a886b2 docs: update setup runner (#884) Yineng Zhang 2024-08-02 19:03:53 +08:00
  • 2e218b9e04 fix: set env in runner (#891) Yineng Zhang 2024-08-02 18:48:56 +08:00
  • 30a9b2ef20 Bump version to v0.2.9 (#890) Ying Sheng 2024-08-02 01:45:48 -07:00
  • 3cadecf0c4 Increase openai client limit (#886) Ying Sheng 2024-08-02 00:47:23 -07:00
  • e90e3a50d4 Add benchmark: HumanEval (#889) Ying Sheng 2024-08-02 00:46:41 -07:00
  • fbd6b94d69 Fix the double BOS problem in the HF chat template (#888) Ying Sheng 2024-08-02 00:30:50 -07:00
  • 4c8093c8db Update workflow name (#883) Ying Sheng 2024-08-01 21:29:46 -07:00
  • ae7ee01a8e Add accuracy test to CI: MMLU (#882) Ying Sheng 2024-08-01 21:20:17 -07:00
  • 76e59088d8 Add more unit tests to CI (#880) Ying Sheng 2024-08-01 18:14:33 -07:00
  • 12ce3befb6 Update runner docs (#879) Liangsheng Yin 2024-08-01 17:37:47 -07:00
  • 4013a4e1b0 Implement served_model_name to customize model id when use local mode… (#749) 任嘉 2024-08-02 08:13:51 +08:00
  • 60340a3643 Improve the coverage of the openai api server test (#878) Ying Sheng 2024-08-01 16:01:30 -07:00
  • 70c78cfb03 Update runner docs (#876) Liangsheng Yin 2024-08-01 15:32:33 -07:00
  • 72b6ea88b4 Make scripts under /test/srt as unit tests (#875) Ying Sheng 2024-08-01 14:34:55 -07:00
  • e4d3333c6c bump to 0.2.8 (#877) Ying Sheng 2024-08-01 14:18:26 -07:00
  • 6f221d4ca0 Fix unit tests for the frontend language part (#872) Ying Sheng 2024-08-01 12:39:12 -07:00
  • aba6f51f88 misc: update unit test config (#873) Yineng Zhang 2024-08-02 03:27:05 +08:00
  • 7f6c690b67 misc: use pip cache purge and add unit test ci (#871) Yineng Zhang 2024-08-02 03:12:20 +08:00
  • 40e6f5131a Fix openai CI tests (#870) Ying Sheng 2024-08-01 09:39:09 -07:00
  • 4075677621 Add OpenAI backend to the CI test (#869) Ying Sheng 2024-08-01 09:25:24 -07:00
  • 9e8d2c7f74 misc: add cancel previous at e2e (#864) Yineng Zhang 2024-08-01 16:26:54 +08:00
  • c9bff5fcc8 misc: disable auto release (#862) Yineng Zhang 2024-08-01 15:46:51 +08:00
  • b04444ac01 Rename github workflows (#861) Ying Sheng 2024-08-01 00:39:55 -07:00
  • 3d617a21ba misc: update e2e test paths config (#860) Yineng Zhang 2024-08-01 15:38:24 +08:00