Lifu Huang
|
3cf1473a09
|
Use monotonic clock for interval measurement (#6211)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
|
2025-05-17 16:49:18 -07:00 |
|
applesaucethebun
|
2ce8793519
|
Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-05-11 12:55:00 +08:00 |
|
SEPLOS
|
032f8faaab
|
Fix sglang frontend's incorrect dependency on torch (#4931)
|
2025-03-30 13:00:24 -07:00 |
|
mlmz
|
f6ab4ca6bc
|
fix: fix ipython running error for Engine due to outlines nest_asyncio (#4582)
Co-authored-by: shuaills <shishuaiuoe@gmail.com>
|
2025-03-21 19:11:15 -07:00 |
|
Zhiqiang Xie
|
9376ac361d
|
Memory pool fix for upstream change about eagle (#4170)
|
2025-03-07 00:58:20 -08:00 |
|
Yueyang Pan
|
25482edb5c
|
Online serving benchmarks of real datasets for hierarchical KV caching (#3211)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-03-05 16:16:43 -08:00 |
|
Shi Shuai
|
55de40f782
|
[Docs]: Fix Multi-User Port Allocation Conflicts (#3601)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: simveit <simp.veitner@gmail.com>
|
2025-02-19 11:15:44 -08:00 |
|
Jiada Li
|
39416e394a
|
fix lockfile and port_registry file permission error (#3598)
Co-authored-by: jiada li <jiada@lmsys.us-northcentral1-a.compute.internal>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-15 19:14:45 -08:00 |
|
Shi Shuai
|
7443197a63
|
[CI] Improve Docs CI Efficiency (#3587)
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
|
2025-02-14 19:57:00 -08:00 |
|
Jhin
|
7b9b4f4426
|
Docs fix about EAGLE and streaming output (#3166)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
Co-authored-by: Chayenne <zhaochen20@outlook.com>
Co-authored-by: Jhin <jhinpan@umich.edu>
|
2025-01-27 18:10:45 -08:00 |
|
Lianmin Zheng
|
73401fd016
|
Sync distributed package from vllm 0.6.4.post1 (#3010)
|
2025-01-20 04:57:14 -08:00 |
|
fzyzcjy
|
81d27c8e31
|
Refactor to add TypeBasedDispatcher to simplify dispatching (#2958)
|
2025-01-18 20:13:27 -08:00 |
|
SangBin Cho
|
9208618b3e
|
[Core] in batch prefix caching by delay scheduling (#2442)
|
2024-12-11 12:51:50 -08:00 |
|
Yineng Zhang
|
75ae968959
|
minor: update killall script (#2391)
|
2024-12-08 04:21:00 +08:00 |
|
Lianmin Zheng
|
d4fc1a70e3
|
Crash the server correctly during error (#2231)
|
2024-11-28 00:22:39 -08:00 |
|
Chayenne
|
c77c1e05ba
|
fix black in pre-commit (#1940)
|
2024-11-08 07:42:47 +08:00 |
|
Iñaki Arango
|
1363b51983
|
Escape backwards slash (#1902)
|
2024-11-03 12:27:11 -08:00 |
|
geeker-smallwhite
|
8ce202a493
|
delete unused character (#1855)
|
2024-10-31 19:33:55 +08:00 |
|
Lianmin Zheng
|
b548801ddb
|
Update docs (#1839)
|
2024-10-30 02:49:08 -07:00 |
|
Chayenne
|
539df95d2c
|
Imporve openai api documents (#1827)
Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>
|
2024-10-30 00:39:41 -07:00 |
|
Chayenne
|
ced362f7c6
|
Simplify our docs with complicated functions into utils (#1807)
Co-authored-by: Chayenne <zhaochenyang@ucla.edu>
|
2024-10-26 17:44:11 +00:00 |
|
Lianmin Zheng
|
e4d68afcf0
|
[Minor] Many cleanup (#1357)
|
2024-09-09 04:14:11 -07:00 |
|
Lianmin Zheng
|
1e495e0847
|
[Fix] Fix select by ensuring each request has at least one token (#1318)
|
2024-09-03 06:31:45 -07:00 |
|
Ying Sheng
|
9f662501a3
|
Move torch.compile configs into cuda_graph_runner.py (#993)
|
2024-08-08 13:20:30 -07:00 |
|
Ying Sheng
|
0d4f3a9fcd
|
Make API Key OpenAI-compatible (#917)
|
2024-08-04 13:35:44 -07:00 |
|
Ying Sheng
|
995af5a54b
|
Improve the structure of CI (#911)
|
2024-08-03 23:09:21 -07:00 |
|
Ying Sheng
|
79f816292e
|
Fix lazy import location (#795)
|
2024-07-28 22:09:50 -07:00 |
|
Ying Sheng
|
fb9296f0ed
|
Higher priority for user input of max_prefill_tokens & format (#540)
|
2024-06-12 21:48:40 -07:00 |
|
Lianmin Zheng
|
2cea6146d8
|
Improve logging & add logit cap (#471)
|
2024-05-24 03:48:53 -07:00 |
|
Lianmin Zheng
|
19d2135cb8
|
Use model loader from vllm (#459)
|
2024-05-21 09:13:37 -07:00 |
|
Lianmin Zheng
|
8210ec60f4
|
Improve error handling & abort disconnected requests (#449)
|
2024-05-17 05:49:31 -07:00 |
|
Liangsheng Yin
|
690d162d97
|
Format code (#441)
|
2024-05-14 22:40:46 +08:00 |
|
Yuanhan Zhang
|
0992d85f92
|
support llava video (#426)
|
2024-05-13 16:57:00 -07:00 |
|
Lianmin Zheng
|
562b8857d8
|
Improve error handling (#433)
|
2024-05-12 20:49:04 -07:00 |
|
Lianmin Zheng
|
13662fd533
|
Fix RuntimeEndpoint (#279)
|
2024-03-11 05:24:24 -07:00 |
|
Alessio Dalla Piazza
|
d5ae2ebaa2
|
Add Support for API Key Authentication (#230)
|
2024-03-11 05:16:10 -07:00 |
|
Lianmin Zheng
|
faba293a0d
|
Improve gemma and documentations (#278)
|
2024-03-11 04:43:39 -07:00 |
|
Srinivas Billa
|
01b07ea3ac
|
Add SSL Cert Functionality (#224)
|
2024-03-03 17:41:41 +08:00 |
|
Lianmin Zheng
|
c51020cf0c
|
Fix the chat template for llava-v1.6-34b & format code (#177)
|
2024-02-11 05:50:13 -08:00 |
|
Ying Sheng
|
a6aa46dd3f
|
minor
|
2024-02-08 04:35:25 +00:00 |
|
Srinivas Billa
|
405f26b00b
|
Add Auth Token to RuntimeEndpoint (#162)
|
2024-02-07 20:07:31 -08:00 |
|
Haotian Liu
|
d3fc86a43e
|
Improve Chinese character streaming when the last char is half Chinese word. (#95)
|
2024-01-24 12:23:27 -08:00 |
|
Liangsheng Yin
|
08ab2a1655
|
Json Decode && Mutl-Turns (#4)
|
2024-01-15 00:49:29 -08:00 |
|
Lianmin Zheng
|
22085081bb
|
release initial code
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
Co-authored-by: parasol-aser <3848358+parasol-aser@users.noreply.github.com>
Co-authored-by: LiviaSun <33578456+ChuyueSun@users.noreply.github.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-01-08 04:37:50 +00:00 |
|