sglang

Author	SHA1	Message	Date
fzyzcjy	453d412cdb	Tiny update error hint (#5037 )	2025-04-21 00:47:47 -07:00
fzyzcjy	dc86f25a57	Tiny remove duplicated code (#5021 )	2025-04-21 00:47:32 -07:00
Chuyue Sun	08289eaa3e	Support o1 model on Azure (#4980 ) Co-authored-by: Shan Yu <shanyu1@g.ucla.edu>	2025-04-21 00:46:09 -07:00
Lucius	3b6d539f63	[Fix] Enhance DP Attention for IPv6 Compatibility (#4937 )	2025-04-21 00:44:11 -07:00
lambert0312	c44f2869c9	Modify metrics service endpoint (#3443 )	2025-04-21 00:35:38 -07:00
fzyzcjy	685d8980c3	Tiny add warning when cannot recognize bool env var (#5348 )	2025-04-20 23:11:29 -07:00
Zhiqiang Xie	70645f4d7d	upstream hicache fixes (#5570 )	2025-04-20 23:08:30 -07:00
Qingquan Song	188f0955fa	Add Speculative Decoding Eagle3 topk > 1 (#5318 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com> Co-authored-by: Yubo Wang <yubowang2019@gmail.com>	2025-04-20 22:58:28 -07:00
Lianmin Zheng	eef9433b46	Fix flush cache (#5590 )	2025-04-20 22:56:40 -07:00
JieXin Liang	97cb762bb6	[misc] remove is_cuda_available (#5319 )	2025-04-20 18:16:51 -07:00
fzyzcjy	1195182040	Tiny add Engine.flush_cache API (#5241 )	2025-04-20 18:15:03 -07:00
fzyzcjy	5239d79568	Speedup shared expert weight construction by avoid cloning (#5188 )	2025-04-20 18:12:01 -07:00
Sundara Raman Ramachandran	f08154193c	Perform Batch Tokenization. (#5141 )	2025-04-20 18:10:37 -07:00
fzyzcjy	5fc4b6004e	Add sanity check for max_running_requests (#5016 )	2025-04-20 17:56:49 -07:00
Brayden Zhong	b868526d94	Fix one more issue reported by torchfix (#4859 )	2025-04-20 17:49:27 -07:00
Juwan Yoo	502524e2da	compressed_tensors: port w8a16 fp8 from vllm (#4852 )	2025-04-20 17:48:31 -07:00
Enrique Shockwave	4c7640079c	check marlin format before attempting conversion (#4675 )	2025-04-20 17:47:09 -07:00
kyle-pena-kuzco	9f3bd2ad39	Feat: Implement JSON Mode (response_format.type="json_object") (#4733 ) Co-authored-by: Kyle Pena <kylepena@kyles-macbook-pro.turkey-marlin.ts.net>	2025-04-20 17:41:22 -07:00
Yi Zhou	fac17acf08	add function call parser for DeepSeek V3 (#5224 )	2025-04-20 17:38:08 -07:00
Adarsh Shirawalmath	8b39274e34	[Feature] Prefill assistant response - add continue_final_message parameter (#4226 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-04-20 17:37:18 -07:00
Byron Hsu	c951d312ed	[PD] Fix large page size + chunk prefill (#5588 )	2025-04-20 17:21:54 -07:00
AmadeusW	dcb8232596	Fix ChatCompletionMessageGenericParam to allow for None content (#5452 )	2025-04-20 17:15:38 -07:00
Yineng Zhang	66c0ff9e31	fix: use fa3 for gemma2 (#5586 )	2025-04-20 17:02:09 -07:00
tarinkk	9a7e83e899	Fix enable chunked prefill for Llama4 (#5575 )	2025-04-20 17:01:30 -07:00
lukec	417b44eba8	[Feat] upgrade pytorch2.6 (#5417 )	2025-04-20 16:06:34 -07:00
fzyzcjy	475e2e378a	[PD] Fix server crash when using batch requests (#5531 )	2025-04-20 16:02:23 -07:00
fzyzcjy	fba86b6b54	Tiny improve error message (#5526 )	2025-04-20 16:00:15 -07:00
fzyzcjy	fa2f677e18	Fix torch memory saver not enabled in DP scenario (#5560 )	2025-04-20 14:20:52 -07:00
fzyzcjy	463d4b7400	Fix DeepEP cannot run on latest master (#5567 )	2025-04-20 14:19:42 -07:00
fzyzcjy	9924bbe153	Fix bench_serving fail when zero warmup requests (#5574 )	2025-04-20 14:16:03 -07:00
Lianmin Zheng	fbdc94ba59	Release v0.4.5.post2 (#5582 )	2025-04-20 14:12:37 -07:00
Baizhou Zhang	b54b5a96e4	[Doc]Add instruction for profiling with bench_one_batch (#5581 )	2025-04-20 14:05:36 -07:00
JieXin Liang	bca832c7c6	[Fix] fix outlines and xgrammar (#4947 )	2025-04-20 13:31:25 -07:00
Xiaoyu Zhang	d9dd529854	enable DeepSeek V3 shared_experts_fusion in sm90 (#5571 )	2025-04-20 12:46:42 -07:00
fzyzcjy	0a0dd34e6a	Fix BumpAllocator error when no input_ids (#5564 )	2025-04-20 02:20:53 -07:00
fzyzcjy	80ac527d22	[PD] Fix DeepSeek cannot be run on latest master (#5568 )	2025-04-20 02:19:48 -07:00
JieXin Liang	99456bcacb	[perf] introduce deep gemm group_gemm_masked as bmm (#5432 )	2025-04-20 00:38:27 -07:00
fzyzcjy	d07e797ace	Fix bench_one_batch producing unnatural results for expert parallel (#5149 )	2025-04-20 00:38:04 -07:00
Zhiqiang Xie	e2574ee986	fix hicache write back (#5543 )	2025-04-19 21:56:22 -07:00
Byron Hsu	ab4b5606e4	[PD] Support page size > 1 (#5561 )	2025-04-19 21:54:27 -07:00
Yubo Wang	20f1c8e374	Fix sampler nan check when calling top_k_top_p_sampling_from_probs (#5546 )	2025-04-19 21:47:23 -07:00
fzyzcjy	613b197e57	Remove one kernel in per_tensor_quant_mla_fp8 (#5549 )	2025-04-19 15:08:15 -07:00
Xiaoyu Zhang	d58e354472	simplify the control logic for using shared experts fusion (#5504 )	2025-04-19 13:17:35 -07:00
Xiaoyu Zhang	bf86c5e990	restruct compressed_tensors_w8a8_fp8 (#5475 )	2025-04-19 04:52:15 -07:00
shangmingc	dca90f1db8	[PD] Remove the requirement of config file for mooncake backend (#5460 )	2025-04-19 19:31:00 +08:00
ybyang	59dd090f1c	[PD] Fix no cache connect for recevier (#5534 )	2025-04-19 14:55:28 +08:00
fzyzcjy	569b032c58	[PD] Tiny fix timeout error when generate (#5545 )	2025-04-19 14:42:57 +08:00
fzyzcjy	f6a71139a8	Make profiler output file names consistent (#5548 )	2025-04-18 22:57:11 -07:00
fzyzcjy	1e0806f30b	Fix DeepGEMM masked cannot be run on groups not being multiple or 4 (#5340 )	2025-04-18 22:38:07 -07:00
Yineng Zhang	2c11f9c2eb	chore: upgrade sgl-kernel 0.0.9.post2 (#5540 )	2025-04-18 21:17:23 -07:00

1 2 3 4 5 ...

1953 Commits