sglang

Author	SHA1	Message	Date
Yineng Zhang	e8dbdf75bc	fix typo (#2487 )	2024-12-15 13:44:55 +08:00
yizhang2077	e04d3f2897	adapt tensorrt llm custom all reduce to sgl-kernel (#2481 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-12-15 13:15:59 +08:00
Yineng Zhang	5f2595be43	hotfix: checking for HIP (#2485 )	2024-12-15 02:47:26 +08:00
Ke Bao	0ba2c58947	Remove cuda graph batch size adjustment for dp attention (#2484 )	2024-12-14 23:53:54 +08:00
Yineng Zhang	fccbfa3752	format: add clang-format for sgl-kernel (#2483 )	2024-12-14 22:36:04 +08:00
Ke Bao	2f9bd0fafd	Fix correctness issue for triton decoding kernel (#2479 )	2024-12-14 16:50:54 +08:00
Lianmin Zheng	5282a4735f	[Minor] Fix grok model loader (#2473 )	2024-12-12 14:34:47 -08:00
Yineng Zhang	f0ed9c353e	feat: support dev image (#2469 )	2024-12-13 02:23:52 +08:00
Ata Fatahi	e3b3acfa6f	Rename rust folder to sgl-router (#2464 ) Signed-off-by: Ata Fatahi <immrata@gmail.com>	2024-12-12 09:40:41 -08:00
Yineng Zhang	2673fa29d4	fix: set runtime path (#2466 )	2024-12-12 18:05:48 +08:00
Yineng Zhang	dedaf8cd48	minor: update pypi tag (#2463 )	2024-12-12 15:21:45 +08:00
Yineng Zhang	32ed016041	chore: bump v0.0.2 for sgl-kernel (#2462 )	2024-12-12 14:58:05 +08:00
Ata Fatahi	6efa9e4a6d	Bump sglang-router to 0.1.1 (#2459 ) Signed-off-by: Ata Fatahi <immrata@gmail.com>	2024-12-11 17:40:03 -08:00
Ata Fatahi	7791fd9948	Include version info into the router package (#2456 ) Signed-off-by: Ata Fatahi <immrata@gmail.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2024-12-11 17:31:20 -08:00
Ata Fatahi	2ac36b9a7b	Make request payload size configurable (#2444 ) Signed-off-by: Ata Fatahi <immrata@gmail.com>	2024-12-11 16:55:21 -08:00
Byron Hsu	2d60a5ee75	Update v0.1.0.md	2024-12-11 13:48:18 -08:00
Byron Hsu	2e4a5907c9	[router] Release router 0.1.0 with dynamic scaling and fault tolerance (#2455 )	2024-12-11 13:42:35 -08:00
Byron Hsu	c0ee46fe10	[router] Update doc for dynamic scaling and fault tolerance (#2454 )	2024-12-11 13:11:42 -08:00
SangBin Cho	9208618b3e	[Core] in batch prefix caching by delay scheduling (#2442 )	2024-12-11 12:51:50 -08:00
Byron Hsu	864bf2ba00	[router] remove main.rs because only lib.rs is used for py binding (#2453 )	2024-12-11 12:13:19 -08:00
Byron Hsu	a4cca7fc53	[router] Add retries based fault tolerance (#2452 )	2024-12-11 12:13:08 -08:00
Fred Reiss	993956c6b1	Add support for IBM Granite 3.x models (#2437 )	2024-12-11 06:30:23 -08:00
Lianmin Zheng	f8548295d6	Fix warmup in bench_offline_throughput.py (#2449 )	2024-12-11 06:16:01 -08:00
Lianmin Zheng	959735fc9e	Fix model loader for more quantization formats (#2448 )	2024-12-11 05:21:23 -08:00
bjmsong	f67723940d	decoding attention kernel benchmark (#2425 ) Co-authored-by: root <bjmsong@126.com>	2024-12-11 04:46:59 -08:00
Yineng Zhang	626a99ac13	chore: update ao v0.7.0 (#2447 )	2024-12-11 04:44:28 -08:00
Ke Wen	ece724910a	Make torch TP composable with torchao (#2436 )	2024-12-11 04:21:42 -08:00
Byron Hsu	0fb88aaa77	[router] Use borrow if possible to save cost (#2441 )	2024-12-11 01:38:50 -08:00
Byron Hsu	d4de9a6235	[router] Refactor: decouple select and send stage (#2440 )	2024-12-11 00:51:21 -08:00
Yineng Zhang	7310aede97	fix: compatible with PEP 440 (#2435 )	2024-12-11 06:48:45 +08:00
Yineng Zhang	5de9a58eca	fix: use manylinux2014_x86_64 tag (#2434 )	2024-12-11 06:17:41 +08:00
Yineng Zhang	56fcd8e8a5	feat: support sgl-kernel PyPI (#2433 ) Co-authored-by: Zhangyi <1109276519@qq.com>	2024-12-11 06:06:19 +08:00
Adarsh Shirawalmath	2b340adfb1	Typo fix in router.md (#2424 )	2024-12-09 21:49:40 -08:00
Ying Sheng	8586b72da0	[feat] Enable chunked prefill for llava-onevision (#2412 )	2024-12-09 09:52:38 -08:00
Lianmin Zheng	641b7d0ae0	[Minor] Improve code style (#2422 )	2024-12-09 06:30:35 -08:00
Lianmin Zheng	0ce091a82d	[Minor] Improve code style (#2419 )	2024-12-09 03:05:59 -08:00
Lianmin Zheng	835f8afc77	Migrate llama_classification to use the /classify interface (#2417 )	2024-12-08 23:30:51 -08:00
Xiaoyu Zhang	3844feb9bb	Add a unittest for fused_moe (#2416 )	2024-12-08 22:46:10 -08:00
Byron Hsu	27f7bed7a7	reduce watchdog interval to 5s (#2410 )	2024-12-08 21:17:31 -08:00
Byron Hsu	6387098f5f	[router] add health checking in router init (#2393 )	2024-12-08 17:17:37 -08:00
Byron Hsu	2a717c5078	[Router] fix interrupt from terminal (#2413 )	2024-12-08 16:58:41 -08:00
Byron Hsu	a1e697b25b	[router] Improve cleanup logic (#2411 )	2024-12-08 15:24:02 -08:00
Lianmin Zheng	a6ca736c8e	Simplify stream_output (#2398 )	2024-12-08 12:27:13 -08:00
Yineng Zhang	f62055b528	minor: add random flashinfer vs triton use case (#2409 )	2024-12-09 04:15:21 +08:00
Yineng Zhang	74bc9184c3	minor: add random use case (#2408 )	2024-12-09 03:21:35 +08:00
Yineng Zhang	0f8eb15323	feat: support custom task runner (#2407 )	2024-12-09 02:29:55 +08:00
Yineng Zhang	67470bbb28	minor: update correct measurement unit (#2406 )	2024-12-08 20:55:04 +08:00
Lianmin Zheng	cc858953a0	Fix recv_requests (#2405 )	2024-12-08 04:08:04 -08:00
Yineng Zhang	6128f7cff5	fix: specify dtype with begin_forward aka plan (#2404 )	2024-12-08 20:07:30 +08:00
Lianmin Zheng	a2486eb58f	Fix a bug with logprob streaming + chunked prefill (#2403 )	2024-12-08 03:55:27 -08:00

1 2 3 4 5 ...

1503 Commits