sglang

Author	SHA1	Message	Date
Lianmin Zheng	3f0fe08d37	Let ModelRunner take InputMetadata as input, instead of ScheduleBatch (#1541 )	2024-09-29 20:28:45 -07:00
Lianmin Zheng	f86c1e611f	Move scheduler code from tp_worker.py to scheduler.py (#1538 )	2024-09-29 17:42:45 -07:00
Lianmin Zheng	067d8e16fc	Simplify bench_latency.py (#1503 )	2024-09-24 17:42:07 -07:00
Lianmin Zheng	2854a5ea9f	Fix the overhead due to penalizer in bench_latency (#1496 )	2024-09-23 07:38:14 -07:00
Lianmin Zheng	2cd7e181dd	Fix env vars in bench_latency (#1472 )	2024-09-19 03:19:26 -07:00
Lianmin Zheng	5e62a6b706	Add bench_server_latency.py (#1452 )	2024-09-18 00:56:06 -07:00
Lianmin Zheng	899cf5c438	Remove deprecated configs (#1431 )	2024-09-15 08:52:18 -07:00
Lianmin Zheng	9ba1f09760	[Fix] Fix logprob and normalized_logprob (#1428 )	2024-09-15 06:36:06 -07:00
Lianmin Zheng	9463bc1385	Enable torch.compile for triton backend (#1422 )	2024-09-14 15:38:37 -07:00
Liangsheng Yin	70b6802982	Optimize conflicts between CUDA graph and vocab mask tensors (#1392 )	2024-09-13 20:27:53 -07:00
Lianmin Zheng	3a6e8b6d78	[Minor] move triton attention kernels into a separate folder (#1379 )	2024-09-10 15:15:08 -07:00
Liangsheng Yin	69b3bb9ae1	Unify forward mode (#1360 )	2024-09-09 13:49:29 -07:00
Kai-Hsun Chen	c9b75917d5	[server] Passing `model_override_args` to `launch_server` via the CLI. (#1298 ) Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>	2024-09-09 02:14:25 -07:00
Lianmin Zheng	1b5d56f7f8	[CI] Add more multi-gpu tests (#1280 )	2024-09-01 00:27:25 -07:00
Lianmin Zheng	79ece2c51f	Report median instead of mean in bench_latency.py (#1269 )	2024-08-30 06:05:01 -07:00
Liangsheng Yin	381dd57bd6	Sampler cudagraph (#1253 )	2024-08-28 18:58:52 -07:00
Yineng Zhang	f25f4dfde5	hotfix: revert sampler CUDA Graph (#1242 )	2024-08-28 21:16:47 +10:00
Liangsheng Yin	1ece2cda3d	Fix bench latency benchmark (#1225 )	2024-08-28 00:37:32 -07:00
Lianmin Zheng	f6af3a6561	Cleanup readme, llava examples, usage examples and nccl init (#1194 )	2024-08-24 08:02:23 -07:00
Ying Sheng	5fafcac008	Fix benchmark script (#1185 )	2024-08-22 09:03:25 +00:00
Liangsheng Yin	83e23c69b3	Improve code style of sampler (#1168 )	2024-08-21 16:48:24 -07:00
Liangsheng Yin	a34dd86a7d	Use `dtype` to control generate (#1082 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-08-14 15:58:07 +00:00
Lianmin Zheng	a59636bb5e	Update grok 1 model (#1095 )	2024-08-14 04:40:44 -07:00
Ying Sheng	0909bb0d2f	[Feat] Add window attention for gemma-2 (#1056 )	2024-08-13 17:01:26 -07:00
Liangsheng Yin	43fbb6d919	Fix `input_ids` && rename to `fill_ids` (#1021 )	2024-08-10 16:24:12 -07:00
Mingyi	61728884d7	Fix benchmark latency (#1007 )	2024-08-09 13:18:58 -07:00
Yineng Zhang	b568df5d03	fix: resolve correctness_test issue (#1002 )	2024-08-09 23:21:42 +10:00
Liangsheng Yin	87e8c090e9	Organize code (rename, movement) (#953 )	2024-08-06 20:50:32 -07:00
min-xu-et	ebf69964cd	latency test enhancement - final part (#921 )	2024-08-04 18:15:23 -07:00
min-xu-et	afd411d09f	enhance latency test - part 2 (#915 )	2024-08-04 12:27:25 -07:00
min-xu-et	539856455d	latency test enhancement - part 1 (#909 )	2024-08-03 22:44:58 -07:00
Liangsheng Yin	cdcbde5fc3	Code structure refactor (#807 )	2024-07-29 23:04:48 -07:00
Ying Sheng	db6089e6f3	Revert "Organize public APIs" (#815 )	2024-07-29 19:40:28 -07:00
Liangsheng Yin	c8e9fed87a	Organize public APIs (#809 )	2024-07-29 15:34:16 -07:00
Liangsheng Yin	3de2f30a27	Flashinfer sample kernel (#617 )	2024-07-17 13:24:43 -07:00
Lianmin Zheng	41d1f67704	Fix flush cache (#627 )	2024-07-15 20:44:04 -07:00
Liangsheng Yin	564a898ad9	Optimize mem indices mangement (#619 )	2024-07-13 23:39:37 -07:00
Lianmin Zheng	665815969a	Enable cuda graph by default (#612 )	2024-07-13 05:29:46 -07:00
Lianmin Zheng	d9a6902986	Fix bench latency (#607 )	2024-07-11 14:37:01 -07:00
Ying Sheng	dc1b8bcfaa	Format (#593 )	2024-07-05 10:06:17 -07:00
Ying Sheng	5a57b8addd	Add Gemma2 (#592 )	2024-07-05 09:48:54 -07:00
Ying Sheng	2a754e57b0	2x performance improvement for large prefill & Fix workspace conflicts (#579 )	2024-07-03 16:14:57 -07:00
Ying Sheng	9ce89bc14b	Update benchmark script (#571 )	2024-06-28 00:44:22 -07:00
Lianmin Zheng	eb1ae6ae0c	Add sglang.bench_latency for offline benchmark (#564 )	2024-06-25 03:38:04 -07:00

44 Commits