sglang

Author	SHA1	Message	Date
Ying Sheng	e1e595d702	[feat] Refactor session control interface and add CI (#2173 )	2024-11-25 12:32:51 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Byron Hsu	30af7dfb34	[router] add base_gpu_id server args & merged radix tree python reference (#2115 )	2024-11-21 17:13:33 -08:00
Lianmin Zheng	56a347f7d3	Move test_session_id.py to playground (#2104 )	2024-11-20 01:28:27 -08:00
Ke Bao	62832bb272	Support cuda graph for DP attention (#2061 )	2024-11-17 16:29:20 -08:00
Chayenne	c77c1e05ba	fix black in pre-commit (#1940 )	2024-11-08 07:42:47 +08:00
Xuehai Pan	a5e0defb5a	minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926 )	2024-11-06 13:46:04 +00:00
Jani Monoses	916b3cdddc	Allow passing dtype and max_new_tokens to HF reference script (#1903 )	2024-11-03 08:24:37 -08:00
Ying Sheng	c5325aba75	[Profile] Add pytorch profiler (#1604 )	2024-10-07 14:37:16 -07:00
Lianmin Zheng	fb2d0680e0	[Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510 )	2024-09-24 21:37:33 -07:00
Lianmin Zheng	2854a5ea9f	Fix the overhead due to penalizer in bench_latency (#1496 )	2024-09-23 07:38:14 -07:00
Lianmin Zheng	167591e864	Better unit tests for adding a new model (#1488 )	2024-09-22 01:50:37 -07:00
Ying Sheng	37963394aa	[Feature] Support LoRA path renaming and add LoRA serving benchmarks (#1433 )	2024-09-15 12:46:04 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00
Lianmin Zheng	f64eae3a29	[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308 )	2024-09-02 21:44:45 -07:00
Ying Sheng	0909bb0d2f	[Feat] Add window attention for gemma-2 (#1056 )	2024-08-13 17:01:26 -07:00
Ying Sheng	4075677621	Add OpenAI backend to the CI test (#869 )	2024-08-01 09:25:24 -07:00

17 Commits