Simo Lin
|
97c3823931
|
[router] refactor router and worker management 3/n (#10727)
|
2025-09-22 12:17:50 -07:00 |
|
Qiaolin Yu
|
e2ac7888b8
|
[2/2] Support deterministic inference for temperature > 0 (#10678)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
|
2025-09-21 19:36:08 -07:00 |
|
Jimmy
|
56321e9fc2
|
[Router]fix: fix get_load missing api_key (#10385)
|
2025-09-21 15:28:38 -04:00 |
|
Simo Lin
|
1d1ce62495
|
[router] refactor router and worker management 2.5/n (#10677)
|
2025-09-19 20:54:40 -07:00 |
|
Simo Lin
|
00eb5eb721
|
[router] refactor router and worker management 2/n (#10666)
|
2025-09-19 12:37:57 -07:00 |
|
Simo Lin
|
873d858b28
|
[router] refactor worker to builder pattern 5/n (#10653)
|
2025-09-19 05:43:23 -04:00 |
|
Simo Lin
|
ac2a723bb3
|
[router] refactor worker to builder pattern 3/n (#10647)
|
2025-09-18 22:52:57 -07:00 |
|
Chang Su
|
5fe39e85a2
|
[router] fix router manager and router init in server (#10499)
|
2025-09-15 22:23:26 -07:00 |
|
Simo Lin
|
16e9335998
|
[router] add router db connector for responses api (#10487)
|
2025-09-15 22:04:56 -07:00 |
|
Simo Lin
|
7eccbe992d
|
[router] fix service discovery and mcp ut (#10449)
|
2025-09-14 21:07:23 -07:00 |
|
Jintao Zhang
|
f9ee6ae17a
|
[router]: Add Embedding routing logic (#10129)
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
Co-authored-by: Waël Boukhobza <wawa_wael@live.fr>
|
2025-09-14 18:44:35 -07:00 |
|
Keyang Ru
|
366043db8e
|
[router] Add get and cancel method for response api (#10387)
|
2025-09-12 16:19:38 -07:00 |
|
Simo Lin
|
2f173ea074
|
[router] allow one router to support different model families and serving mode (#10244)
|
2025-09-12 16:18:27 -07:00 |
|
Frank Fang
|
4634fd5953
|
[router] Add Rerank Routing Logic in Regular Router (#10219)
|
2025-09-12 09:10:18 -07:00 |
|
Keyang Ru
|
a23bdeaf04
|
[router] Basic OAI Response api (#10346)
|
2025-09-11 20:56:17 -07:00 |
|
Keyang Ru
|
dee197e11b
|
[router] Add OpenAI backend support - core function (#10254)
|
2025-09-11 14:13:51 -07:00 |
|
Simo Lin
|
db37422c92
|
[router] move to mcp sdk instead (#10057)
|
2025-09-05 18:03:46 -07:00 |
|
Simo Lin
|
d966b902af
|
[router] move tokenizer, reasoning, tool initialization to server (#9996)
|
2025-09-03 19:35:13 -07:00 |
|
Simo Lin
|
9491d6e554
|
[router] include rust benchamrks (#9932)
|
2025-09-02 09:32:09 -07:00 |
|
Chang Su
|
9a0cac1be0
|
[router] add grpc pd and regular router init (#9893)
|
2025-09-01 20:06:15 -07:00 |
|
Chang Su
|
598c0bc19d
|
[router] add tokenizer download support from hf hub (#9882)
|
2025-09-01 10:40:37 -07:00 |
|
Simo Lin
|
5343058875
|
[router] grpc router bootstraps (#9759)
|
2025-08-28 12:07:06 -07:00 |
|
Simo Lin
|
07c9d8fba2
|
[router] add llama3.2 multi json streaming parser (#9735)
|
2025-08-28 05:57:13 -07:00 |
|
Simo Lin
|
e1f7cf57dc
|
[router] additional llama32 parser unit test and multi json support (#9732)
|
2025-08-27 20:34:11 -07:00 |
|
Simo Lin
|
2bb9d454b5
|
[router] additional pythonic parser unit test (#9730)
|
2025-08-27 19:55:59 -07:00 |
|
Keyang Ru
|
3f2d0cefcd
|
[router] Add MCP Tool Handler (#9615)
|
2025-08-27 19:12:39 -07:00 |
|
Simo Lin
|
07ee0ab750
|
[router] add gpt-oss and glm4 tool parser (#9703)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-08-27 11:26:00 -07:00 |
|
Simo Lin
|
5c06dcb75a
|
[router] add kimi-k2 tool parser (#9702)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-08-27 11:04:55 -07:00 |
|
Simo Lin
|
6f6beca49d
|
[router] add step3 tool parser (#9695)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-08-27 10:44:52 -07:00 |
|
Simo Lin
|
6e4e1c8cdc
|
[router] add deepseek tool parser (#9694)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-08-27 06:18:24 -07:00 |
|
Chang Su
|
90313fb09a
|
[router] add token bucket rate limiter (#9656)
|
2025-08-26 10:36:26 -07:00 |
|
Stefan He
|
cbc0e4d779
|
Fix lint for router (#9636)
|
2025-08-26 00:38:53 -07:00 |
|
Simo Lin
|
e2e378caba
|
[router] add ut for mistral, llama, pythonic, and streaming tool parser (#9632)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-08-25 22:02:15 -07:00 |
|
Keyang Ru
|
5ef545e678
|
[router] Move all protocols to spec.rs file (#9519)
|
2025-08-22 14:18:47 -07:00 |
|
Chang Su
|
53e2cd464f
|
[router] remove all tokenizer metrics for performance (#9474)
|
2025-08-21 18:35:24 -07:00 |
|
Simo Lin
|
78ae175866
|
[router] add tokenizer benchmark (#9427)
|
2025-08-21 11:09:39 -07:00 |
|
Chang Su
|
e65231022f
|
[router] add tokenizer integration test with real mini tokenizer (#9413)
|
2025-08-20 17:56:23 -07:00 |
|
Keyang Ru
|
3828db4309
|
[router] Add IGW (Inference Gateway) Feature Flag (#9371)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-08-20 17:38:57 -07:00 |
|
Keyang Ru
|
5ae5ecaa15
|
[router] Implement OpenAI Responses API specification (#9367)
|
2025-08-19 20:14:47 -07:00 |
|
Simo Lin
|
5fbad308cd
|
[router] add tokenizer chat template support (#9370)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-08-19 20:14:02 -07:00 |
|
Keyang Ru
|
ce67b2d586
|
[router]restructure protocol modules for better organization (#9321)
|
2025-08-19 01:07:58 +00:00 |
|
Jeff Nettleton
|
ce3ca9b02f
|
[router] add cargo clippy in CI and fix-up linting errors (#9242)
|
2025-08-17 11:03:56 -07:00 |
|
Simo Lin
|
21b8846066
|
[router] allow more health check configuration (#9198)
|
2025-08-15 08:07:45 -07:00 |
|
Simo Lin
|
9d68bdb240
|
[router] Add Rust Binary Entrypoint for SGLang Router (#9089)
|
2025-08-11 21:37:36 -07:00 |
|
Simo Lin
|
067068f271
|
[router] regular router circuit breaker (#8997)
|
2025-08-10 21:19:30 -07:00 |
|
Simo Lin
|
61a4680494
|
[router] router circuit breaker core (#8941)
|
2025-08-08 09:20:22 -07:00 |
|
Simo Lin
|
a69b637014
|
[router] fix req handling order, improve serialization, remove retry (#8888)
|
2025-08-06 23:24:39 -07:00 |
|
Simo Lin
|
8c7bb39dfb
|
[router] PD Router Simplification and Reorganization (#8838)
|
2025-08-05 21:20:38 -07:00 |
|
Simo Lin
|
5d62b56f7e
|
[router] complete router oai spec (#8828)
|
2025-08-05 18:30:19 -07:00 |
|
Simo Lin
|
354ac43555
|
[pd-router] Add Configurable Retry Logic for reduce backend pressure (#8744)
|
2025-08-04 20:42:07 -07:00 |
|