sglang

Author	SHA1	Message	Date
JieXin Liang	9e93ef3f8e	[fix] fix illegal mem access and clean up triton attention backend (#4571 )	2025-03-20 02:01:52 -07:00
Jinyan Chen	f44db16c8e	[Feature] Integrate DeepEP into SGLang (#4232 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: Xuting Zhou <xutingz@nvidia.com>	2025-03-19 08:16:31 -07:00
JieXin Liang	c0e9a36c5f	Optimize Triton decoding kernel for dynamic workload (#4553 )	2025-03-18 21:25:38 -07:00
aoshen524	588865f0e0	[Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-03-18 20:33:07 -07:00
Cheng Wan	3196999f63	Reduce computation and communication in DP attention (#4521 )	2025-03-18 13:41:36 -07:00
James Liu	9e0186f352	[Feature] Support EAGLE 3 (#4247 )	2025-03-18 07:35:23 -07:00
Yineng Zhang	c787298547	use sgl custom all reduce (#4441 )	2025-03-18 00:46:41 -07:00
Ke Bao	45212ce18b	Add deepseek v2 torch compile pr test (#4538 )	2025-03-18 00:29:24 -07:00
Mick	d373a48c98	fix: second_per_grid_ts should be used to get mrope position (#3682 )	2025-03-17 18:12:38 -07:00
Zhiqiang Xie	a98290aea3	Unit test for Hierarchical Caching (#4486 )	2025-03-17 17:45:00 -07:00
Lianmin Zheng	82dec1f70b	Remove redundant type conversion (#4513 )	2025-03-17 05:57:35 -07:00
Lianmin Zheng	5493c3343e	Fix data parallel + tensor parallel (#4499 )	2025-03-17 05:13:16 -07:00
Wei Wu	91ba98fe50	[Fix] Resolve GPU Memory Leak in update_weights_from_tensor (#4446 )	2025-03-17 08:54:30 +00:00
Xihuai Wang	927ca935a7	Constraint Decoding: Tool call with text (#4067 )	2025-03-17 01:06:46 -07:00
Stefan He	ef3c2dd08e	Support Online Quantization for W8A8 (#4485 )	2025-03-17 00:28:56 -07:00
萝卜菜	d6d21640d3	[Feature] Support Deepseek-VL2 (#2798 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-03-16 23:07:59 -07:00
Byron Hsu	8cc300f536	Fix router test (#4483 )	2025-03-16 22:49:47 -07:00
Rin Intachuen	d1112d8548	Add endpoint for file support, purely to speed up processing of input_embeds. (#2797 )	2025-03-16 18:30:37 -07:00
woodx	48efec7b05	Feature: support code completion (#3612 )	2025-03-16 18:26:19 -07:00
Mick	9d02bb3e2a	Urgent model support: support gemma-3-it (#4424 )	2025-03-16 17:37:32 -07:00
Ying Sheng	1b859295f4	[Eagle] Remove the greedy branch and some redundant code (#4363 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-16 02:48:55 -07:00
Lianmin Zheng	2c4f5ccac1	Fix minor style (#4460 )	2025-03-15 21:51:12 -07:00
lukec	21d485f835	Fix test_create_kvindices unit test (#4452 )	2025-03-15 16:01:04 -07:00
Lu Changqi	0e0ec70200	Hierarchical Caching supports MLA (#4009 ) Signed-off-by: Changqi Lu <luchangqi.123@bytedance.com> Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-03-13 20:42:14 -07:00
Lianmin Zheng	f0afaf5289	Add a dummy grok test case (#4399 )	2025-03-13 15:29:48 -07:00
Qiaolin Yu	85d2365d33	Fix the output of hidden states after HTTP requests (#4269 )	2025-03-13 14:54:06 -07:00
Lianmin Zheng	a5a892ffd3	Fix auto merge & add back get_flat_data_by_layer (#4393 )	2025-03-13 08:46:25 -07:00
Lianmin Zheng	8e66fbecee	Improve DP attention (#4390 ) Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-03-13 08:23:56 -07:00
Lianmin Zheng	c76040e31b	Support page size > 1 (#4356 )	2025-03-12 22:22:39 -07:00
Lianmin Zheng	d40ee62b5d	Update nightly tests (#4352 )	2025-03-12 15:36:13 -07:00
Mick	01090e8ac3	model: Support Janus-pro (#3203 )	2025-03-12 11:02:11 -07:00
Mick	ff2ce0b86f	refactor: move image processors to separate files (#4229 )	2025-03-11 12:35:35 -07:00
lukec	dce303e279	linear support deepgemm (#4199 ) Co-authored-by: yinfan98 <1106310035@qq.com>	2025-03-11 00:38:37 -07:00
Lianmin Zheng	5524e7d057	Fix nightly eval for neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8 (#4279 )	2025-03-10 16:50:28 -07:00
HandH1998	2ac189edc8	Amd test fp8 (#4261 )	2025-03-10 10:12:09 -07:00
Lianmin Zheng	00d25a7f5e	Fix quantization and nightly tests (#4258 )	2025-03-10 03:06:21 -07:00
Lianmin Zheng	aa957102a9	Simplify tests & Fix trtllm custom allreduce registration (#4252 )	2025-03-10 01:24:22 -07:00
Lianmin Zheng	e8a69e4d0c	Clean up fp8 support (#4230 )	2025-03-09 21:46:35 -07:00
Lianmin Zheng	fbd560028a	Auto balance CI tests (#4238 )	2025-03-09 21:05:55 -07:00
Lianmin Zheng	730d084f2a	Minor style fix for sgl-kernel (#4243 )	2025-03-09 20:15:13 -07:00
Baizhou Zhang	9dfafa743c	Fix test of flashinfer mla with nextn (#4237 )	2025-03-09 12:45:39 -07:00
Baizhou Zhang	9fb48f951f	Support nextn for flashinfer mla attention backend (#4218 )	2025-03-09 00:01:54 -08:00
Lianmin Zheng	48473684cc	Split test_mla.py into two files (#4216 )	2025-03-08 15:40:49 -08:00
Lianmin Zheng	2cadd51d11	Test no vllm custom allreduce (#4210 )	2025-03-08 05:23:06 -08:00
Lianmin Zheng	08c4d764a5	lazy import attn backends (#4200 )	2025-03-08 00:41:35 -08:00
Lianmin Zheng	d4017a6b63	[EAGLE] many fixes for eagle (#4195 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-07 22:12:13 -08:00
HandH1998	c7f254468f	[Feature] DeepSeek V3/R1 INT8 Quantization (channel-wise) (#3888 ) Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: b0urnee <2769086541@qq.com>	2025-03-06 20:54:52 -08:00
Pan Lyu	361971b859	Add Support for Qwen2-VL Multi-modal Embedding Models (#3694 )	2025-03-06 16:46:20 -08:00
Lianmin Zheng	bc1534ff32	Fix a draft model accuracy bug in eagle; support step=1; return logprob in eagle (#4134 ) Co-authored-by: Sehoon Kim <kssteven418@gmail.com> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-06 06:13:59 -08:00
Lianmin Zheng	fcc2e37f69	Split the __init__ of scheduler as smaller functions. Improve the eagle tests (#4128 )	2025-03-06 00:13:20 -08:00

1 2 3 4 5 ...

480 Commits