Liangsheng Yin
|
3694f8f996
|
Mixed style of chunked prefill (#1013)
|
2024-08-16 09:13:00 +00:00 |
|
Lianmin Zheng
|
5a261bd055
|
Fix the deadlock in multi-node tp (#1122)
|
2024-08-16 01:39:24 -07:00 |
|
Yineng Zhang
|
6aa8ad14f8
|
fix: resolve Python.h header missing (#1119)
|
2024-08-16 15:46:43 +10:00 |
|
Yineng Zhang
|
26e9c12c15
|
ci: compatible with fork repo (#1115)
|
2024-08-16 04:26:44 +10:00 |
|
Lianmin Zheng
|
87a0db82b8
|
update hyperparameter guide (#1114)
|
2024-08-15 10:54:24 -07:00 |
|
Yineng Zhang
|
5bd953749b
|
chore: bump v0.2.13 (#1111)
|
2024-08-16 03:50:43 +10:00 |
|
Lianmin Zheng
|
0cb099e20a
|
set CUDA_DEVICE_MAX_CONNECTIONS=1 (#1113)
|
2024-08-16 03:47:39 +10:00 |
|
Ying Sheng
|
93d4e354d8
|
[Fix] Window attention compatible with RadixAttention and chunked prefill (#1112)
|
2024-08-15 10:33:20 -07:00 |
|
Yineng Zhang
|
9195d1362a
|
misc: rm unused model_loader (#1110)
|
2024-08-15 08:29:35 -07:00 |
|
Ying Sheng
|
14cb544d56
|
[Fix] fix flashinfer usage for window attention (#1107)
|
2024-08-15 00:53:24 -07:00 |
|
Lianmin Zheng
|
e86b1ccbf0
|
Enable chunked prefill by default (#1040)
|
2024-08-14 21:56:20 -07:00 |
|
Ying Sheng
|
8d2d876fc8
|
[Fix] fix the typo bug for window attention (#1106)
|
2024-08-14 21:56:01 -07:00 |
|
Lianmin Zheng
|
326df4bab2
|
Use a single workspace for flashinfer (#1077)
|
2024-08-14 19:25:37 -07:00 |
|
Ying Sheng
|
6767e2229f
|
Support jinja as chat template file (#1104)
|
2024-08-14 17:43:14 -07:00 |
|
Liangsheng Yin
|
73cf6834f2
|
Support stop_token_ids in sglang API (#1092)
|
2024-08-15 00:31:39 +00:00 |
|
Yineng Zhang
|
1c2b5f5240
|
docs: update nsys usage (#1103)
|
2024-08-15 01:39:15 +08:00 |
|
Ying Sheng
|
96a2093ef0
|
[Fix] Compatibility of window attention and cuda graph (#1090)
|
2024-08-14 10:37:01 -07:00 |
|
Liangsheng Yin
|
a34dd86a7d
|
Use dtype to control generate (#1082)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2024-08-14 15:58:07 +00:00 |
|
Yineng Zhang
|
67c0d832a6
|
docs: update pr template (#1099)
|
2024-08-14 22:25:39 +10:00 |
|
Lianmin Zheng
|
a59636bb5e
|
Update grok 1 model (#1095)
|
2024-08-14 04:40:44 -07:00 |
|
Yineng Zhang
|
fe5024325b
|
docs: update README (#1098)
|
2024-08-14 04:40:05 -07:00 |
|
Yineng Zhang
|
f14569f64a
|
ci: remove workflow path trigger (#1096)
|
2024-08-14 20:36:24 +10:00 |
|
Lianmin Zheng
|
8f790ac100
|
Fix a bug in cuda graph runner (#1094)
|
2024-08-14 03:25:38 -07:00 |
|
rainred
|
616b59f384
|
[Feature] modify Runtime to support skip_tokenizer_init (#1088)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-14 00:28:04 -07:00 |
|
Yineng Zhang
|
c8423ca311
|
ci: update timeout and retry (#1086)
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
|
2024-08-14 00:27:35 -07:00 |
|
Liangsheng Yin
|
e205527cb1
|
Fix jump forward final state circular path bug. (#1084)
|
2024-08-13 21:14:05 -07:00 |
|
Ying Sheng
|
0909bb0d2f
|
[Feat] Add window attention for gemma-2 (#1056)
|
2024-08-13 17:01:26 -07:00 |
|
Lianmin Zheng
|
ad3e4f1619
|
Update the mixtral to use the better FusedMoE layer (#1081)
|
2024-08-13 15:44:25 -07:00 |
|
Lucien
|
312e849255
|
Example file for docker compose and k8s (#1006)
|
2024-08-13 15:07:57 -07:00 |
|
rainred
|
95f5fbf1a7
|
Fix create_abort_task, GenerateReqInput does not have rids. (#1079)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
|
2024-08-13 12:47:22 +00:00 |
|
Yineng Zhang
|
cebd78d83e
|
ci: add accuracy timeout (#1078)
|
2024-08-13 22:12:58 +10:00 |
|
Yineng Zhang
|
0076f11541
|
fix: use devel for Triton's compiler requirements (#1074)
|
2024-08-13 04:08:43 -07:00 |
|
Yineng Zhang
|
f7fb68d292
|
ci: add moe test (#1053)
|
2024-08-13 18:43:23 +10:00 |
|
Yineng Zhang
|
396a13e6ad
|
ci: add cancel pr workflow (#1070)
|
2024-08-13 18:16:50 +10:00 |
|
Yineng Zhang
|
65915f9f3e
|
fix: temporary solution for DeepSeek V2 H100 layout conversion issue (#1060)
Co-authored-by: ispobock <ISPObaoke@163.com>
|
2024-08-13 15:48:54 +10:00 |
|
Ke Bao
|
162f3ccb01
|
Fix layernorm input shape (#1066)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2024-08-13 15:48:07 +10:00 |
|
Yineng Zhang
|
65e89baea9
|
fix: not use the default port (#1068)
|
2024-08-13 15:12:56 +10:00 |
|
Yineng Zhang
|
6a38efa834
|
feat: replace all rmsnorm and silu (#1057)
|
2024-08-13 02:15:59 +10:00 |
|
Yineng Zhang
|
b0ad0c1bc8
|
chore: bump v0.2.12 (#1048)
|
2024-08-12 20:59:38 +10:00 |
|
Lianmin Zheng
|
c877292cc1
|
Re-organize CI tests (#1052)
|
2024-08-12 03:39:01 -07:00 |
|
Lianmin Zheng
|
0c1c72a0b4
|
Fix accuracy test (#1051)
|
2024-08-12 19:48:40 +10:00 |
|
Lianmin Zheng
|
41598e0d8e
|
Add longer accuracy test on CI (#1049)
|
2024-08-12 09:21:38 +00:00 |
|
Yineng Zhang
|
89f23a5178
|
docs: update setup github runner (#1050)
|
2024-08-12 18:11:38 +10:00 |
|
Yineng Zhang
|
cb99ba4fc6
|
feat: update Dockerfile (#1033)
Co-authored-by: vhain <vhain6512@gmail.com>
|
2024-08-12 16:24:06 +10:00 |
|
Ying Sheng
|
32f6144323
|
fix: Fix returned prefill logits and add output str test (#1046)
|
2024-08-12 06:13:45 +00:00 |
|
Lianmin Zheng
|
fb1f28cbbb
|
Clean up the comments and names under python/sglang/srt/layers (#1047)
|
2024-08-12 05:54:37 +00:00 |
|
Liangsheng Yin
|
fb7421db0d
|
minor: some potential bugs (#1044)
|
2024-08-12 05:35:44 +00:00 |
|
Lianmin Zheng
|
14b6493087
|
Delete the useless test/srt/test_throughput.py (#1045)
|
2024-08-11 21:31:52 -07:00 |
|
Lianmin Zheng
|
8207637029
|
Improve end-to-end throughput test and its coverage (#1039)
|
2024-08-11 18:27:33 -07:00 |
|
Liangsheng Yin
|
7de6034534
|
Fix the prefix indices (#1037)
|
2024-08-11 17:57:02 -07:00 |
|