Commit Graph

85 Commits

Author SHA1 Message Date
Chang Su
0c3db88978 [router][grpc] Add helpfer functions for decoder in router.rs and fix specs (#10971) 2025-09-26 20:10:45 -04:00
Simo Lin
aae7ead2d0 [router] remove old/oudated/useless comments across code base (#10968) 2025-09-26 10:48:50 -07:00
Simo Lin
be059b83d6 [router] grpc router regular mode import cleanup (#10963) 2025-09-26 04:06:59 -07:00
Simo Lin
5d4fe1ceee [router] add move grpc worker management from router to worker manager (#10960) 2025-09-26 03:57:57 -07:00
Simo Lin
1b011e68dc [router] move grpc client from router to worker and builder (#10958) 2025-09-26 03:13:47 -07:00
Simo Lin
1e57b9472d [router] add grpc client get and set (#10955) 2025-09-26 03:07:05 -07:00
Chang Su
37158f2018 router: Support parallel sampling num > 1 in grpc_server and non-stream handling (#10929) 2025-09-25 20:03:35 -07:00
Chang Su
5e21d6aec0 refactor: Move grpc/client.rs to grpc_client/sglang_scheduler.rs (#10924) 2025-09-25 17:21:22 -04:00
Chang Su
916784746b router: Fix constraint proto and build_constraint in grpc router (#10881) 2025-09-25 11:12:06 -04:00
Simo Lin
d511b2d905 [router] consolidate worker load monitoring (#10894) 2025-09-25 09:59:30 -04:00
Simo Lin
e738703547 [router] consolidate worker get loads (#10880) 2025-09-24 22:13:31 -04:00
Simo Lin
7a06ef984d [router] consolidate health endpoints and flush cache (#10876) 2025-09-24 15:23:21 -07:00
Chang Su
4a87ba217f router-grpc: Add tools processing and other paramters for apply_chat_template (#10877) 2025-09-24 15:23:06 -07:00
luna
c3faf2d6e6 [router] select first healthy worker on proxied get requests (#10827) 2025-09-24 11:45:41 -07:00
Chang Su
9209b209be router-grpc: Support jinja chat template content format detection (#10832) 2025-09-24 11:45:01 -07:00
Keyang Ru
f4e3ebeb05 [router] Support streaming for Openai Router Response api (#10822) 2025-09-23 14:56:28 -07:00
Chang Su
7ff93e613f router(grpc): Implement route for chat_cmpl endpoint (#10761) 2025-09-23 11:26:33 -07:00
Simo Lin
98c3b04ff2 [router] responses api POST and GET with local storage (#10581)
Co-authored-by: key4ng <rukeyang@gmail.com>
2025-09-23 09:12:02 -07:00
Simo Lin
c3a1d7759f [router] remove pd router draining channel (#10767) 2025-09-22 20:49:33 -07:00
Simo Lin
89971c4c3c [router] refactor router and worker management 4/n (#10756)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
2025-09-22 18:35:10 -07:00
Simo Lin
97c3823931 [router] refactor router and worker management 3/n (#10727) 2025-09-22 12:17:50 -07:00
Chang Su
60dbbd086a bugfix: Fix get_worker_urls_for_model in http/router.rs (#10754) 2025-09-22 14:10:31 -04:00
Qiaolin Yu
e2ac7888b8 [2/2] Support deterministic inference for temperature > 0 (#10678)
Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
2025-09-21 19:36:08 -07:00
Jimmy
56321e9fc2 [Router]fix: fix get_load missing api_key (#10385) 2025-09-21 15:28:38 -04:00
Simo Lin
1d1ce62495 [router] refactor router and worker management 2.5/n (#10677) 2025-09-19 20:54:40 -07:00
Simo Lin
00eb5eb721 [router] refactor router and worker management 2/n (#10666) 2025-09-19 12:37:57 -07:00
Simo Lin
36efd5be8a [router] refactor router and worker management 1/n (#10664) 2025-09-19 06:19:57 -07:00
Simo Lin
873d858b28 [router] refactor worker to builder pattern 5/n (#10653) 2025-09-19 05:43:23 -04:00
Simo Lin
4f2055ad56 [router] refactor worker to builder pattern 4/n (#10650) 2025-09-18 23:49:10 -07:00
Simo Lin
ac2a723bb3 [router] refactor worker to builder pattern 3/n (#10647) 2025-09-18 22:52:57 -07:00
Chang Su
5fe39e85a2 [router] fix router manager and router init in server (#10499) 2025-09-15 22:23:26 -07:00
Chang Su
35ef3f2902 [router] fix worker registration in multi model mode (#10486) 2025-09-15 21:05:00 -04:00
Chang Su
2689f0bf02 [router] multi model registration fix (#10481) 2025-09-15 15:22:21 -07:00
Simo Lin
7eccbe992d [router] fix service discovery and mcp ut (#10449) 2025-09-14 21:07:23 -07:00
Jintao Zhang
f9ee6ae17a [router]: Add Embedding routing logic (#10129)
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
Co-authored-by: Waël Boukhobza <wawa_wael@live.fr>
2025-09-14 18:44:35 -07:00
Simo Lin
7c5a0a1b77 [router] add not implemented functions for multi model trait (#10394) 2025-09-12 16:44:18 -07:00
Keyang Ru
366043db8e [router] Add get and cancel method for response api (#10387) 2025-09-12 16:19:38 -07:00
Simo Lin
2f173ea074 [router] allow one router to support different model families and serving mode (#10244) 2025-09-12 16:18:27 -07:00
Frank Fang
4634fd5953 [router] Add Rerank Routing Logic in Regular Router (#10219) 2025-09-12 09:10:18 -07:00
Keyang Ru
a23bdeaf04 [router] Basic OAI Response api (#10346) 2025-09-11 20:56:17 -07:00
Keyang Ru
dee197e11b [router] Add OpenAI backend support - core function (#10254) 2025-09-11 14:13:51 -07:00
Simo Lin
4f8a982d52 [router] clean up dependency injector to use ctx (#10000) 2025-09-03 21:35:51 -07:00
Simo Lin
d966b902af [router] move tokenizer, reasoning, tool initialization to server (#9996) 2025-09-03 19:35:13 -07:00
Chang Su
11dcabc545 Grpc client (#9939) 2025-09-02 11:47:35 -07:00
Chang Su
9a0cac1be0 [router] add grpc pd and regular router init (#9893) 2025-09-01 20:06:15 -07:00
LukasBluebaum
9d9fa9a537 [router] Fix short timeout for the prefill client (#9803) 2025-09-01 19:57:04 -07:00
Simo Lin
5343058875 [router] grpc router bootstraps (#9759) 2025-08-28 12:07:06 -07:00
Bruce-x-1997
8b30bec265 [router] fix error response in pd_router (#9505)
Co-authored-by: bruce.xu <bruce.xu@gmicloud.ai>
2025-08-27 19:10:55 -07:00
Simo Lin
3578eb1e9b [router] address worker load tracking consistency (#9523)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-08-26 06:40:51 -07:00
Bruce-x-1997
446c8e4cdb [router] ignore client error when record failure in pd_router (#9503)
Co-authored-by: bruce.xu <bruce.xu@gmicloud.ai>
2025-08-22 14:19:45 -07:00