Commit Graph

1503 Commits

Author SHA1 Message Date
Yineng Zhang
e8dbdf75bc fix typo (#2487) 2024-12-15 13:44:55 +08:00
yizhang2077
e04d3f2897 adapt tensorrt llm custom all reduce to sgl-kernel (#2481)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2024-12-15 13:15:59 +08:00
Yineng Zhang
5f2595be43 hotfix: checking for HIP (#2485) 2024-12-15 02:47:26 +08:00
Ke Bao
0ba2c58947 Remove cuda graph batch size adjustment for dp attention (#2484) 2024-12-14 23:53:54 +08:00
Yineng Zhang
fccbfa3752 format: add clang-format for sgl-kernel (#2483) 2024-12-14 22:36:04 +08:00
Ke Bao
2f9bd0fafd Fix correctness issue for triton decoding kernel (#2479) 2024-12-14 16:50:54 +08:00
Lianmin Zheng
5282a4735f [Minor] Fix grok model loader (#2473) 2024-12-12 14:34:47 -08:00
Yineng Zhang
f0ed9c353e feat: support dev image (#2469) 2024-12-13 02:23:52 +08:00
Ata Fatahi
e3b3acfa6f Rename rust folder to sgl-router (#2464)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-12 09:40:41 -08:00
Yineng Zhang
2673fa29d4 fix: set runtime path (#2466) 2024-12-12 18:05:48 +08:00
Yineng Zhang
dedaf8cd48 minor: update pypi tag (#2463) 2024-12-12 15:21:45 +08:00
Yineng Zhang
32ed016041 chore: bump v0.0.2 for sgl-kernel (#2462) 2024-12-12 14:58:05 +08:00
Ata Fatahi
6efa9e4a6d Bump sglang-router to 0.1.1 (#2459)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-11 17:40:03 -08:00
Ata Fatahi
7791fd9948 Include version info into the router package (#2456)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
2024-12-11 17:31:20 -08:00
Ata Fatahi
2ac36b9a7b Make request payload size configurable (#2444)
Signed-off-by: Ata Fatahi <immrata@gmail.com>
2024-12-11 16:55:21 -08:00
Byron Hsu
2d60a5ee75 Update v0.1.0.md 2024-12-11 13:48:18 -08:00
Byron Hsu
2e4a5907c9 [router] Release router 0.1.0 with dynamic scaling and fault tolerance (#2455) 2024-12-11 13:42:35 -08:00
Byron Hsu
c0ee46fe10 [router] Update doc for dynamic scaling and fault tolerance (#2454) 2024-12-11 13:11:42 -08:00
SangBin Cho
9208618b3e [Core] in batch prefix caching by delay scheduling (#2442) 2024-12-11 12:51:50 -08:00
Byron Hsu
864bf2ba00 [router] remove main.rs because only lib.rs is used for py binding (#2453) 2024-12-11 12:13:19 -08:00
Byron Hsu
a4cca7fc53 [router] Add retries based fault tolerance (#2452) 2024-12-11 12:13:08 -08:00
Fred Reiss
993956c6b1 Add support for IBM Granite 3.x models (#2437) 2024-12-11 06:30:23 -08:00
Lianmin Zheng
f8548295d6 Fix warmup in bench_offline_throughput.py (#2449) 2024-12-11 06:16:01 -08:00
Lianmin Zheng
959735fc9e Fix model loader for more quantization formats (#2448) 2024-12-11 05:21:23 -08:00
bjmsong
f67723940d decoding attention kernel benchmark (#2425)
Co-authored-by: root <bjmsong@126.com>
2024-12-11 04:46:59 -08:00
Yineng Zhang
626a99ac13 chore: update ao v0.7.0 (#2447) 2024-12-11 04:44:28 -08:00
Ke Wen
ece724910a Make torch TP composable with torchao (#2436) 2024-12-11 04:21:42 -08:00
Byron Hsu
0fb88aaa77 [router] Use borrow if possible to save cost (#2441) 2024-12-11 01:38:50 -08:00
Byron Hsu
d4de9a6235 [router] Refactor: decouple select and send stage (#2440) 2024-12-11 00:51:21 -08:00
Yineng Zhang
7310aede97 fix: compatible with PEP 440 (#2435) 2024-12-11 06:48:45 +08:00
Yineng Zhang
5de9a58eca fix: use manylinux2014_x86_64 tag (#2434) 2024-12-11 06:17:41 +08:00
Yineng Zhang
56fcd8e8a5 feat: support sgl-kernel PyPI (#2433)
Co-authored-by: Zhangyi <1109276519@qq.com>
2024-12-11 06:06:19 +08:00
Adarsh Shirawalmath
2b340adfb1 Typo fix in router.md (#2424) 2024-12-09 21:49:40 -08:00
Ying Sheng
8586b72da0 [feat] Enable chunked prefill for llava-onevision (#2412) 2024-12-09 09:52:38 -08:00
Lianmin Zheng
641b7d0ae0 [Minor] Improve code style (#2422) 2024-12-09 06:30:35 -08:00
Lianmin Zheng
0ce091a82d [Minor] Improve code style (#2419) 2024-12-09 03:05:59 -08:00
Lianmin Zheng
835f8afc77 Migrate llama_classification to use the /classify interface (#2417) 2024-12-08 23:30:51 -08:00
Xiaoyu Zhang
3844feb9bb Add a unittest for fused_moe (#2416) 2024-12-08 22:46:10 -08:00
Byron Hsu
27f7bed7a7 reduce watchdog interval to 5s (#2410) 2024-12-08 21:17:31 -08:00
Byron Hsu
6387098f5f [router] add health checking in router init (#2393) 2024-12-08 17:17:37 -08:00
Byron Hsu
2a717c5078 [Router] fix interrupt from terminal (#2413) 2024-12-08 16:58:41 -08:00
Byron Hsu
a1e697b25b [router] Improve cleanup logic (#2411) 2024-12-08 15:24:02 -08:00
Lianmin Zheng
a6ca736c8e Simplify stream_output (#2398) 2024-12-08 12:27:13 -08:00
Yineng Zhang
f62055b528 minor: add random flashinfer vs triton use case (#2409) 2024-12-09 04:15:21 +08:00
Yineng Zhang
74bc9184c3 minor: add random use case (#2408) 2024-12-09 03:21:35 +08:00
Yineng Zhang
0f8eb15323 feat: support custom task runner (#2407) 2024-12-09 02:29:55 +08:00
Yineng Zhang
67470bbb28 minor: update correct measurement unit (#2406) 2024-12-08 20:55:04 +08:00
Lianmin Zheng
cc858953a0 Fix recv_requests (#2405) 2024-12-08 04:08:04 -08:00
Yineng Zhang
6128f7cff5 fix: specify dtype with begin_forward aka plan (#2404) 2024-12-08 20:07:30 +08:00
Lianmin Zheng
a2486eb58f Fix a bug with logprob streaming + chunked prefill (#2403) 2024-12-08 03:55:27 -08:00