Lianmin Zheng
|
f0f8a7699b
|
Simplify the nan detection and greedy check in sampler (#1709)
|
2024-10-18 20:21:24 -07:00 |
|
Lianmin Zheng
|
2bcfba1b08
|
Skip unnecessary penalizer (#1707)
|
2024-10-18 17:54:03 -07:00 |
|
Lianmin Zheng
|
bc12d4033f
|
Add grouped free operations (#1706)
|
2024-10-18 13:21:05 -07:00 |
|
Lianmin Zheng
|
392f2863c8
|
Add dtype for more operations (#1705)
|
2024-10-18 12:18:15 -07:00 |
|
Lianmin Zheng
|
6d0fa73ece
|
Simplify flashinfer utilities (#1704)
|
2024-10-17 22:54:14 -07:00 |
|
Liangsheng Yin
|
9e0dac1ad7
|
Fix regex and logprob conflicts when chunked prefilling (#1703)
|
2024-10-17 18:33:21 -07:00 |
|
Gleb Drozdov
|
a95d5589c3
|
Add matched_stop token or str to distinguish between eos or stop str finish_reason generation (#1684)
|
2024-10-17 18:06:52 +00:00 |
|
Lianmin Zheng
|
d17d19e5b8
|
Fix mixed batch for multi modal models (#1702)
|
2024-10-17 10:27:26 -07:00 |
|
Lianmin Zheng
|
dd3809fad8
|
Fix engine unit test (#1701)
|
2024-10-17 09:53:32 -07:00 |
|
Lianmin Zheng
|
7feba41584
|
Fix failed ci tests on long prompts; Better error messages for embedding models (#1700)
|
2024-10-17 09:23:29 -07:00 |
|
Michael Feil
|
e5db40dcbc
|
ORJson. Faster Json serialization (#1694)
|
2024-10-17 08:03:08 -07:00 |
|
wxsm
|
b170930534
|
feat: radix tree code optimize (#1697)
|
2024-10-17 08:01:27 -07:00 |
|
Jani Monoses
|
5ab20cceba
|
Use SGLang imports for linear layer (#1696)
|
2024-10-17 07:50:01 -07:00 |
|
Lianmin Zheng
|
02f7f3e488
|
Update the transformers version in CI (#1690)
|
2024-10-16 19:03:55 -07:00 |
|
Zeng Zhongchao
|
2782132be8
|
Add date to logging messages (#1623) (#1679)
|
2024-10-16 18:54:55 -07:00 |
|
Michael Feil
|
b0facb3316
|
add orjson for jsonresponse (#1688)
|
2024-10-16 18:14:30 -07:00 |
|
havetc
|
ecb8bad276
|
Returning a per request metric for number of cached_tokens read (#1599)
|
2024-10-16 11:49:22 -07:00 |
|
Lianmin Zheng
|
dbec2f1847
|
Launch a thread to overlap CPU and GPU (#1687)
|
2024-10-16 11:20:17 -07:00 |
|
Ke Bao
|
d10b933a36
|
Fix srt dependency (#1685)
|
2024-10-16 08:21:20 -07:00 |
|
Lianmin Zheng
|
9116b2896f
|
Add a new event loop (#1677)
|
2024-10-16 01:33:20 -07:00 |
|
Jani Monoses
|
a5114b6f91
|
Add OLMo model (#1676)
|
2024-10-16 00:11:18 -07:00 |
|
Liangsheng Yin
|
b6b4094621
|
Fix filter_batch function call (#1681)
|
2024-10-15 22:59:26 -07:00 |
|
Lianmin Zheng
|
f1088e0fc8
|
Fix memory leak during abort (#1674)
|
2024-10-15 08:15:08 -07:00 |
|
Lianmin Zheng
|
175afed370
|
Improve benchmark scripts (#1672)
|
2024-10-14 21:53:01 -07:00 |
|
Lianmin Zheng
|
4a292f670d
|
[Minor] Add some utility functions (#1671)
|
2024-10-14 20:08:03 -07:00 |
|
Byron Hsu
|
56503d9bc9
|
[1/N] Remove CacheConfig import in all model files (#1658)
|
2024-10-14 09:06:34 -07:00 |
|
Lianmin Zheng
|
02bc95796d
|
Simplify chunked prefill (#1667)
|
2024-10-14 06:47:50 -07:00 |
|
Lianmin Zheng
|
24f3e1511c
|
[Minor] Improve style (#1666)
|
2024-10-14 05:25:00 -07:00 |
|
Lianmin Zheng
|
6790240cc3
|
Fix unit test order to balance the tasks in CI (#1665)
|
2024-10-14 02:01:44 -07:00 |
|
Shuo Yang
|
061e546313
|
Support double sparsity (#1459)
|
2024-10-14 02:00:41 -07:00 |
|
Lianmin Zheng
|
0c1e87964b
|
Move filter_batch out of stream_output (#1663)
|
2024-10-14 01:15:34 -07:00 |
|
Lianmin Zheng
|
869f1c02c4
|
Add a test case to test retract (#1662)
|
2024-10-13 20:32:37 -07:00 |
|
Ying Sheng
|
2725f8da61
|
[Minor] Rename no_eos_trim to no_stop_trim (#1661)
|
2024-10-13 20:30:03 -07:00 |
|
Lianmin Zheng
|
da1ffed689
|
Add output_ids into ScheduleBatch (#1659)
|
2024-10-13 19:54:02 -07:00 |
|
Ying Sheng
|
4876117171
|
[Fix] fix eos trim inconsistency (#1650)
|
2024-10-13 01:07:09 -07:00 |
|
Lianmin Zheng
|
7ee6c259ff
|
Simplify the event loop and expose --num-continuous-decode-steps as an argument (#1652)
|
2024-10-12 21:35:30 -07:00 |
|
Lianmin Zheng
|
9610fcd469
|
Fix the batch_is_full check for jump-forward decoding (#1654)
|
2024-10-12 19:47:24 -07:00 |
|
Patrick Yi
|
31fad29ab0
|
Add get_tokenizer function for Engine class (#1653)
|
2024-10-12 19:39:35 -07:00 |
|
Lianmin Zheng
|
9da5a60b18
|
Add an option to disable penalizer (#1651)
|
2024-10-12 17:53:23 -07:00 |
|
Lianmin Zheng
|
69aa937aa5
|
Fix unit tests and type annotations (#1648)
|
2024-10-12 14:49:24 -07:00 |
|
Zhang, Liangang
|
5d638c92f5
|
[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch (#1480)
|
2024-10-12 18:10:32 +00:00 |
|
Lianmin Zheng
|
e37cdab0c6
|
Fix ignore_eos (#1645)
|
2024-10-12 00:36:28 -07:00 |
|
LI MOU
|
1d9deeacdb
|
fix missing ignore_eos in v1/chat/completions (#1642)
|
2024-10-11 21:37:20 -07:00 |
|
Byron Hsu
|
862cd265e5
|
[engine] support async and streaming (#1614)
|
2024-10-11 15:26:25 -07:00 |
|
Lianmin Zheng
|
00c7e6368b
|
Release v0.3.3.post1 (#1636)
|
2024-10-11 07:56:16 -07:00 |
|
Lianmin Zheng
|
23cc66f7b6
|
Add back data parallelism (#1635)
|
2024-10-11 07:22:48 -07:00 |
|
Lianmin Zheng
|
5d09ca5735
|
Fix constrained decoding (#1634)
|
2024-10-11 06:26:20 -07:00 |
|
Lianmin Zheng
|
f13d86f920
|
Add image_token in conversation.py (#1632)
Co-authored-by: yizhang2077 <1109276519@qq.com>
|
2024-10-11 05:07:51 -07:00 |
|
Lianmin Zheng
|
aba9eae4c6
|
Fix the correctness test in bench_latency.py when tp > 1 and test_generation_models.py (#1631)
|
2024-10-11 05:03:20 -07:00 |
|
科英
|
bbd72bfc86
|
Add the ability to enable and disable the Profiler via HTTP API. (#1626)
|
2024-10-11 02:34:25 -07:00 |
|