sglang

Author	SHA1	Message	Date
Yury Sulsky	24161c5913	The Gemma template is missing a newline after the user role. (#6331 ) Co-authored-by: Yury Sulsky <ysulsky@tesla.com>	2025-05-18 01:57:27 -07:00
libra	11553c1a37	Add pipeline parallelism for Qwen2 and Qwen3 Model (#6250 )	2025-05-18 00:42:55 -07:00
Mick	01dd39bac1	refactor: minor refactors regarding multimodal processing (#6187 )	2025-05-17 22:53:20 -07:00
Lianmin Zheng	b3f3d610fd	Do not use FA3 for mistral (#6379 )	2025-05-17 19:47:34 -07:00
Yineng Zhang	f07c6a009b	chore: upgrade sgl-kernel v0.1.3 (#6377 )	2025-05-17 19:47:05 -07:00
Lianmin Zheng	4bb816d444	Fix CI tests (#6362 )	2025-05-17 19:16:45 -07:00
ybyang	c250939ecb	[Fix Chat API] add request id for chat/completion for tracing (#6364 )	2025-05-17 18:58:22 -07:00
ishandhanani	b6909aa223	fix: allow `launch_dummy_health_check_server` to start inside of running asyncio loop (#6330 )	2025-05-17 18:32:41 -07:00
fzyzcjy	f87283573e	Add expert distribution APIs for engine (#6290 )	2025-05-17 18:31:51 -07:00
fzyzcjy	73187152a4	Reland tiny refactor DefaultModelLoader.Source (#6041 )	2025-05-17 17:11:20 -07:00
fzyzcjy	4086566516	Fix expert distribution recorder and profiler command stuck forever (#6284 )	2025-05-17 17:10:44 -07:00
fzyzcjy	fd08c04821	Support custom DeepEP tuning config (#6257 )	2025-05-17 17:09:42 -07:00
fzyzcjy	26ebb849eb	Tiny refactor bench_serving to extract RequestFuncOutput.init_new (#6108 )	2025-05-17 17:08:52 -07:00
fzyzcjy	02973cd9a4	Tiny refactor bench_serving to improve extensibility (#6134 )	2025-05-17 17:07:58 -07:00
fzyzcjy	6d95a35abf	Support outputing details for bench_serving (#6107 )	2025-05-17 17:06:52 -07:00
fzyzcjy	01d2838c0f	Fix stop_profile does not wait for finishing (#4741 )	2025-05-17 17:06:15 -07:00
xutizhou	e3b8a72291	[fix] illegal memory in _fwd_kernel_ep_scatter_2 and _fwd_kernel_ep_gather (#6348 )	2025-05-17 17:01:42 -07:00
Lifu Huang	3cf1473a09	Use monotonic clock for interval measurement (#6211 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-17 16:49:18 -07:00
fzyzcjy	2716830802	Speed up when having padding tokens in DeepEP (#6175 )	2025-05-17 16:44:05 -07:00
Chang Su	205d5cb407	perf: Optimize local attention memory allocation in FlashAttentionBackend (#6356 )	2025-05-17 01:45:46 -07:00
fzyzcjy	2df9d40aa6	Minor code cleanup refactor for DeepSeek models (#6324 )	2025-05-16 19:06:03 -07:00
fzyzcjy	8dc191f237	Fix one wasted kernel in DeepSeek and minor refactor (#6316 )	2025-05-16 19:05:33 -07:00
Kiv Chen	64825b8395	model(vlm): mistral 3.1 (#5099 ) Co-authored-by: KivenChen <sleigh-queue-0y@icloud.com>	2025-05-16 18:36:18 -07:00
Lianmin Zheng	dcc0a45618	Fix amd ci (#6360 )	2025-05-16 15:33:10 -07:00
Lianmin Zheng	c2b7ddca49	[Minor] cleanup unused imports (#6358 )	2025-05-16 14:52:38 -07:00
Fr4nk1in	4bd2952a37	feat: add dp attention support for Qwen 2/3 MoE models, fixes #6088 (#6121 ) Co-authored-by: King.Zevin <zevin@mail.ustc.edu.cn> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-05-16 14:44:10 -07:00
Elfie Guo	6fc9357503	[2/2] Add python wrapper for CUTLASS FP8 Blockscale MoE Kernel. (#5694 )	2025-05-16 13:14:07 -07:00
Baizhou Zhang	839fb31e5f	[Fix] Improve dependencies for Blackwell image (#6334 )	2025-05-16 12:38:22 -07:00
Yury Sulsky	f19a9204cd	Support precomputed multimodal features for Qwen-VL and Gemma3 models. (#6136 ) Co-authored-by: Yury Sulsky <ysulsky@tesla.com>	2025-05-16 12:26:15 -07:00
Lianmin Zheng	e07a6977e7	Minor improvements of TokenizerManager / health check (#6327 )	2025-05-15 15:29:25 -07:00
Qiaolin Yu	cd8d4b9dfc	Fix lora bench (#6302 )	2025-05-15 10:09:55 -07:00
fzyzcjy	f194e14fb7	Reduce MoE memory usage (#6147 )	2025-05-15 09:38:28 -07:00
Yi Liu	cfc9f9ab8d	Fix gpu mem check on CPU (#6317 ) Signed-off-by: yiliu30 <yi4.liu@intel.com>	2025-05-15 09:37:45 -07:00
JieXin Liang	9a405274e2	[misc] remove redundant platform codes (#6298 )	2025-05-15 00:51:30 -07:00
quinnrong94	2e4babdb0a	[Feat] Support FlashMLA backend with MTP and FP8 KV cache (#6109 ) Co-authored-by: Yingyi <yingyihuang2000@outlook.com> Co-authored-by: neiltian <neiltian@tencent.com> Co-authored-by: lukec <118525388+sleepcoo@users.noreply.github.com> Co-authored-by: kexueyu <kexueyu@tencent.com> Co-authored-by: vincentmeng <vincentmeng@tencent.com> Co-authored-by: pengmeng <pengmeng@tencent.com>	2025-05-15 00:48:09 -07:00
Zilin Zhu	44a3783d13	[fix][RL] Remove the incorrect barrier in init_weights_update_group (#5914 )	2025-05-14 19:15:21 -07:00
Junrong Lin	f3bf611054	feat: add flush cache to EngineBase and HttpServerEngineAdapter (#6009 )	2025-05-14 19:15:02 -07:00
Hubert Lu	198b9056d1	[AMD] Fix Llama 4 Scout and Maverick accuracy issues on MI300X (#6274 )	2025-05-14 22:07:29 +00:00
Lifu Huang	3e350a931e	[Bug] Fix accidental logger override caused by internVL. (#6282 )	2025-05-13 23:29:25 -07:00
Ying Sheng	fb71725c98	Fix a bug in schedule_policy (#6276 )	2025-05-13 18:04:00 -07:00
Chang Su	912788c095	perf: optimize local_block_table memory allocation (#6273 )	2025-05-13 17:18:38 -07:00
Yineng Zhang	16267d4fa7	chore: bump v0.4.6.post4 (#6245 )	2025-05-13 01:57:51 -07:00
JieXin Liang	17299f088a	[misc] deep_gemm fallback to NVRTC when NVCC not found (#6252 )	2025-05-13 01:41:35 -07:00
Kiv Chen	5380cd7ea3	model(vlm): pixtral (#5084 )	2025-05-13 00:16:10 -07:00
Cheng Wan	b2e95f62b4	Fix two issues related to `--moe-dense-tp-size=1` (#5657 ) Co-authored-by: liusy58 <liusy58@linux.alibaba.com> Co-authored-by: 颉沆 <xiehang.lsy@alibaba-inc.com>	2025-05-12 23:51:39 -07:00
Stefan He	1ab14c4c5c	[VERL Use Case] Add torch_memory_saver into deps (#6247 )	2025-05-12 19:09:03 -07:00
Lianmin Zheng	ac2324c177	Skip the flaky test_stateful_custom_logit_processor (#6251 )	2025-05-12 18:29:41 -07:00
Yineng Zhang	f24fc5b86d	fix typo (#6248 )	2025-05-12 15:45:12 -07:00
Lianmin Zheng	d18c6b3358	Support incremental streaming of logprob/token_ids between scheduler and detokenizer (#6225 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-05-12 14:33:38 -07:00
shangmingc	f1c896007a	[PD] Add support for different TP sizes per DP rank (#5922 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-12 13:55:42 -07:00

1 2 3 4 5 ...

2204 Commits