Lianmin Zheng
|
761b2cebd6
|
[CI] merge all ci tests into one file (#1289)
|
2024-09-01 02:36:56 -07:00 |
|
Lianmin Zheng
|
1b5d56f7f8
|
[CI] Add more multi-gpu tests (#1280)
|
2024-09-01 00:27:25 -07:00 |
|
xiaobochen
|
d134c139a1
|
Optimize the update flashinfer indices (#1262)
|
2024-08-31 23:40:28 -07:00 |
|
Christopher Chou
|
51c554d812
|
Allow more flexible assistant and system response (#1256)
|
2024-08-30 11:51:44 -07:00 |
|
Yineng Zhang
|
c411f32e1c
|
feat: replace GeluAndMul (#1234)
|
2024-08-28 14:07:02 +00:00 |
|
Lianmin Zheng
|
bf53bf5142
|
[Fix] Fix llava on multi images (#1247)
|
2024-08-28 06:33:05 -07:00 |
|
Yineng Zhang
|
66975360e7
|
fix: increase max_new_tokens when testing generation models (#1244)
|
2024-08-28 22:12:36 +10:00 |
|
yichuan~
|
5ff25cdf5b
|
[Minor] add delete test and delete tmp file on ci server (#1227)
|
2024-08-26 22:04:52 -07:00 |
|
caiyueliang
|
2f1d92834f
|
[FEAT] Support batches cancel (#1222)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-26 23:28:26 +00:00 |
|
Liangsheng Yin
|
c61a1b6f97
|
Torch compile CI throughput test (#1223)
|
2024-08-26 13:52:58 -07:00 |
|
havetc
|
9935f97b3e
|
[FEAT] JSON constrained support (#1125)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-26 09:37:26 -07:00 |
|
Mingyi
|
97589a60a2
|
[CI] Parallelize unit tests in CI (#1219)
|
2024-08-26 04:54:02 +00:00 |
|
Kaichen Zhang - NTU
|
3579162ab1
|
[Fix] Multi-images loading error (#1218)
|
2024-08-26 03:58:51 +00:00 |
|
Mingyi
|
7514b9f8d3
|
[CI] Fix CI (#1217)
|
2024-08-26 02:56:42 +00:00 |
|
Mingyi
|
158e8f1e2d
|
improve the threshold and ports in tests (#1215)
|
2024-08-25 19:02:08 -07:00 |
|
Lianmin Zheng
|
15f1a49d2d
|
Update CI workflows (#1210)
|
2024-08-25 16:43:07 -07:00 |
|
Ying Sheng
|
308d024092
|
[CI] Fix the issue of unit test hanging (#1211)
|
2024-08-25 16:21:37 -07:00 |
|
Ying Sheng
|
ab4990e4bf
|
[Minor] Temporarily skip flaky test (#1209)
|
2024-08-25 14:49:23 -07:00 |
|
Chayenne
|
30b4f771b0
|
Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-25 10:29:12 -07:00 |
|
Kaichen Zhang - NTU
|
66e7dcaf70
|
[Fix] Fixing the multi-images error for llava-onevision (#1205)
|
2024-08-25 10:28:23 -07:00 |
|
Lianmin Zheng
|
bc4c7a3545
|
Relax the assert in moe throughput test to fix the flaky CI (#1207)
|
2024-08-25 10:27:02 -07:00 |
|
Ying Sheng
|
1cb4da5c5f
|
[Fix] the issue of random order when input is a list (#1199)
|
2024-08-24 21:43:03 -07:00 |
|
Lianmin Zheng
|
f6af3a6561
|
Cleanup readme, llava examples, usage examples and nccl init (#1194)
|
2024-08-24 08:02:23 -07:00 |
|
Kaichen Zhang - NTU
|
a5b14ad043
|
[Feat/WIP] add llava-onevision, with support for (1) siglip encoder, (2) qwen2 decoder (3) openai api compatible server. (#1123)
Co-authored-by: Bo Li <drluodian@gmail.com>
|
2024-08-23 14:11:16 -07:00 |
|
Shan Yu
|
cd10654e7e
|
[Feat] Support update weights without restart server (#1157)
|
2024-08-20 13:48:24 -07:00 |
|
Juwan Yoo
|
d8476818ef
|
feat: allow streaming for multi-prompt and/or parallel sampling (#1134)
|
2024-08-20 08:06:55 -07:00 |
|
yichuan~
|
b997a18d74
|
[Feat]Add support for optional start len of logprobs (#1035)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2024-08-18 23:45:41 -07:00 |
|
Liangsheng Yin
|
5d0d40d0eb
|
Fix CI accuracy && time out limit (#1133)
|
2024-08-16 21:41:11 -07:00 |
|
Liangsheng Yin
|
3694f8f996
|
Mixed style of chunked prefill (#1013)
|
2024-08-16 09:13:00 +00:00 |
|
Lianmin Zheng
|
e86b1ccbf0
|
Enable chunked prefill by default (#1040)
|
2024-08-14 21:56:20 -07:00 |
|
Liangsheng Yin
|
73cf6834f2
|
Support stop_token_ids in sglang API (#1092)
|
2024-08-15 00:31:39 +00:00 |
|
Liangsheng Yin
|
a34dd86a7d
|
Use dtype to control generate (#1082)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2024-08-14 15:58:07 +00:00 |
|
Yineng Zhang
|
c8423ca311
|
ci: update timeout and retry (#1086)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2024-08-14 00:27:35 -07:00 |
|
Ying Sheng
|
0909bb0d2f
|
[Feat] Add window attention for gemma-2 (#1056)
|
2024-08-13 17:01:26 -07:00 |
|
Lianmin Zheng
|
ad3e4f1619
|
Update the mixtral to use the better FusedMoE layer (#1081)
|
2024-08-13 15:44:25 -07:00 |
|
Yineng Zhang
|
cebd78d83e
|
ci: add accuracy timeout (#1078)
|
2024-08-13 22:12:58 +10:00 |
|
Yineng Zhang
|
f7fb68d292
|
ci: add moe test (#1053)
|
2024-08-13 18:43:23 +10:00 |
|
Lianmin Zheng
|
c877292cc1
|
Re-organize CI tests (#1052)
|
2024-08-12 03:39:01 -07:00 |
|
Lianmin Zheng
|
0c1c72a0b4
|
Fix accuracy test (#1051)
|
2024-08-12 19:48:40 +10:00 |
|
Lianmin Zheng
|
41598e0d8e
|
Add longer accuracy test on CI (#1049)
|
2024-08-12 09:21:38 +00:00 |
|
Ying Sheng
|
32f6144323
|
fix: Fix returned prefill logits and add output str test (#1046)
|
2024-08-12 06:13:45 +00:00 |
|
Lianmin Zheng
|
14b6493087
|
Delete the useless test/srt/test_throughput.py (#1045)
|
2024-08-11 21:31:52 -07:00 |
|
Lianmin Zheng
|
8207637029
|
Improve end-to-end throughput test and its coverage (#1039)
|
2024-08-11 18:27:33 -07:00 |
|
Lianmin Zheng
|
d84c5e70f7
|
Test the case when max_new_tokens is very large (#1038)
|
2024-08-11 16:41:03 -07:00 |
|
Lianmin Zheng
|
54fb1c80c0
|
Clean up unit tests (#1020)
|
2024-08-10 15:09:03 -07:00 |
|
Ying Sheng
|
b68c4c073b
|
fix: force max new tokens to be 1 for embedding request (#1019)
|
2024-08-10 13:46:42 -07:00 |
|
Ying Sheng
|
7599badeaf
|
Support embedding input as a list (#1014)
|
2024-08-10 08:39:05 -07:00 |
|
gryffindor-rr
|
9cf0a5bada
|
Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-09 12:14:13 -07:00 |
|
Ying Sheng
|
b16e856f11
|
Add openai embedding API (#997)
|
2024-08-09 11:19:18 -07:00 |
|
Juwan Yoo
|
10bca45bc6
|
bugfix: penalizers to be merged before reqs (#1001)
|
2024-08-09 21:46:24 +10:00 |
|