Liangsheng Yin
|
73cf6834f2
|
Support stop_token_ids in sglang API (#1092)
|
2024-08-15 00:31:39 +00:00 |
|
Ying Sheng
|
96a2093ef0
|
[Fix] Compatibility of window attention and cuda graph (#1090)
|
2024-08-14 10:37:01 -07:00 |
|
Liangsheng Yin
|
a34dd86a7d
|
Use dtype to control generate (#1082)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2024-08-14 15:58:07 +00:00 |
|
Lianmin Zheng
|
a59636bb5e
|
Update grok 1 model (#1095)
|
2024-08-14 04:40:44 -07:00 |
|
Lianmin Zheng
|
8f790ac100
|
Fix a bug in cuda graph runner (#1094)
|
2024-08-14 03:25:38 -07:00 |
|
rainred
|
616b59f384
|
[Feature] modify Runtime to support skip_tokenizer_init (#1088)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-14 00:28:04 -07:00 |
|
Liangsheng Yin
|
e205527cb1
|
Fix jump forward final state circular path bug. (#1084)
|
2024-08-13 21:14:05 -07:00 |
|
Ying Sheng
|
0909bb0d2f
|
[Feat] Add window attention for gemma-2 (#1056)
|
2024-08-13 17:01:26 -07:00 |
|
Lianmin Zheng
|
ad3e4f1619
|
Update the mixtral to use the better FusedMoE layer (#1081)
|
2024-08-13 15:44:25 -07:00 |
|
rainred
|
95f5fbf1a7
|
Fix create_abort_task, GenerateReqInput does not have rids. (#1079)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-13 12:47:22 +00:00 |
|
Yineng Zhang
|
65915f9f3e
|
fix: temporary solution for DeepSeek V2 H100 layout conversion issue (#1060)
Co-authored-by: ispobock <ISPObaoke@163.com>
|
2024-08-13 15:48:54 +10:00 |
|
Ke Bao
|
162f3ccb01
|
Fix layernorm input shape (#1066)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-13 15:48:07 +10:00 |
|
Yineng Zhang
|
6a38efa834
|
feat: replace all rmsnorm and silu (#1057)
|
2024-08-13 02:15:59 +10:00 |
|
Lianmin Zheng
|
c877292cc1
|
Re-organize CI tests (#1052)
|
2024-08-12 03:39:01 -07:00 |
|
Lianmin Zheng
|
41598e0d8e
|
Add longer accuracy test on CI (#1049)
|
2024-08-12 09:21:38 +00:00 |
|
Ying Sheng
|
32f6144323
|
fix: Fix returned prefill logits and add output str test (#1046)
|
2024-08-12 06:13:45 +00:00 |
|
Lianmin Zheng
|
fb1f28cbbb
|
Clean up the comments and names under python/sglang/srt/layers (#1047)
|
2024-08-12 05:54:37 +00:00 |
|
Liangsheng Yin
|
fb7421db0d
|
minor: some potential bugs (#1044)
|
2024-08-12 05:35:44 +00:00 |
|
Liangsheng Yin
|
7de6034534
|
Fix the prefix indices (#1037)
|
2024-08-11 17:57:02 -07:00 |
|
Lianmin Zheng
|
d84c5e70f7
|
Test the case when max_new_tokens is very large (#1038)
|
2024-08-11 16:41:03 -07:00 |
|
Lianmin Zheng
|
d785412077
|
Fix the case when max_new_tokens is too large (#1025)
|
2024-08-11 15:20:18 -07:00 |
|
Liangsheng Yin
|
7b6a5332ca
|
Fix triton args init (#1034)
|
2024-08-11 12:11:26 -07:00 |
|
Lianmin Zheng
|
4080e82244
|
Fix the case where r.prefix_indices is None (#1031)
|
2024-08-11 04:53:51 -07:00 |
|
Yineng Zhang
|
c245b78973
|
hotfix: add CustomOp abstraction (#1027)
|
2024-08-11 02:45:59 -07:00 |
|
Lianmin Zheng
|
9dae407812
|
Improve type annotation (#1029)
|
2024-08-11 02:44:59 -07:00 |
|
Liangsheng Yin
|
fcc0f5ed99
|
Fix wrong assert (#1028)
|
2024-08-11 09:22:16 +00:00 |
|
Lianmin Zheng
|
a97df79124
|
Clean up readme and arguments of chunked prefill (#1022)
|
2024-08-11 01:18:52 -07:00 |
|
Yineng Zhang
|
94752ac811
|
feat: use FlashInfer rmsnorm and silu (#907)
|
2024-08-11 14:57:13 +10:00 |
|
Liangsheng Yin
|
43fbb6d919
|
Fix input_ids && rename to fill_ids (#1021)
|
2024-08-10 16:24:12 -07:00 |
|
Lianmin Zheng
|
54fb1c80c0
|
Clean up unit tests (#1020)
|
2024-08-10 15:09:03 -07:00 |
|
Ying Sheng
|
b68c4c073b
|
fix: force max new tokens to be 1 for embedding request (#1019)
|
2024-08-10 13:46:42 -07:00 |
|
Ying Sheng
|
7599badeaf
|
Support embedding input as a list (#1014)
|
2024-08-10 08:39:05 -07:00 |
|
Liangsheng Yin
|
62757db6f0
|
Reduce the overhead when cache is disabled (#1010)
|
2024-08-09 16:36:57 -07:00 |
|
Liangsheng Yin
|
73fa2d49d5
|
Some warnings to crash when CI (#1009)
|
2024-08-09 15:16:23 -07:00 |
|
gryffindor-rr
|
9cf0a5bada
|
Add skip_tokenizer_init args. (#959)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-09 12:14:13 -07:00 |
|
Ying Sheng
|
b16e856f11
|
Add openai embedding API (#997)
|
2024-08-09 11:19:18 -07:00 |
|
Juwan Yoo
|
10bca45bc6
|
bugfix: penalizers to be merged before reqs (#1001)
|
2024-08-09 21:46:24 +10:00 |
|
liuyhwangyh
|
b91a4cb1b1
|
support models from www.modelscope.cn (#994)
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
|
2024-08-09 02:52:14 -07:00 |
|
Ying Sheng
|
e040a2450b
|
Add e5-mistral embedding model - step 3/3 (#988)
|
2024-08-08 16:31:19 -07:00 |
|
Ying Sheng
|
9f662501a3
|
Move torch.compile configs into cuda_graph_runner.py (#993)
|
2024-08-08 13:20:30 -07:00 |
|
Juwan Yoo
|
ab7875941b
|
feat: frequency, min_new_tokens, presence, and repetition penalties (#973)
|
2024-08-08 04:21:08 -07:00 |
|
yichuan~
|
3a79613c28
|
support more optioin about usage in stream mode (#985)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-08 09:41:57 +00:00 |
|
Liangsheng Yin
|
1ac304eeb4
|
Adjust InputeMetadata and ScheduleBatch (#981)
|
2024-08-08 01:11:22 -07:00 |
|
Ying Sheng
|
20a4f927dc
|
Add io struct for embedding models [unreachable code] - step 2/3 (#987)
|
2024-08-08 07:52:31 +00:00 |
|
Ying Sheng
|
0de7c2d09e
|
Add e5-mistral modules [unreachable code] - step 1/3 (#983)
|
2024-08-08 00:04:15 -07:00 |
|
Liangsheng Yin
|
6ed4e3b8fb
|
Fix chunked prefill (#984)
|
2024-08-07 22:28:42 -07:00 |
|
Ying Sheng
|
00023d622a
|
[minor] Update type annotation in tokenizer_manager.py (#982)
|
2024-08-08 01:48:45 +00:00 |
|
foszto
|
c62d560c03
|
#590 Increase default , track changes in examples and documentation (#971)
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
|
2024-08-08 00:54:46 +00:00 |
|
Liangsheng Yin
|
2b8257f325
|
Adjust max prefix len (#980)
|
2024-08-08 00:41:26 +00:00 |
|
Liangsheng Yin
|
7623091d97
|
RadixCache method adjust (#977)
|
2024-08-07 15:52:24 -07:00 |
|