sglang

Author	SHA1	Message	Date
Lifu Huang	5c705b1dce	Add perf tests for LoRA (#8314 )	2025-07-26 14:55:22 -07:00
Mick	3212c2ad3f	vlm: optimize tensor transport (#6003 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-26 17:41:01 +08:00
Stefan He	ce32bc2ba9	Extract update_weights from RL Engine to SGLang to keep simplicity and fix torch reduce (#8267 ) Co-authored-by: CuiBo 82354186+SuperCB@users.noreply.github.com Co-authored-by: GeLee 865038696@qq.com Co-authored-by: 杨睿 yangruipis@163.com	2025-07-26 02:00:59 -07:00
Lianmin Zheng	3ec0b21229	[CI] Fix flaky threshold (#8370 )	2025-07-25 16:41:56 -07:00
Chang Su	d8ee15643b	[Feat] Add reasoning parser for Qwen/Qwen3-235B-A22B-Thinking-2507 (#8363 )	2025-07-25 14:59:42 -07:00
Lianmin Zheng	ed2e313eb6	Clean up server_args, triton cache manager (#8332 )	2025-07-25 14:14:51 -07:00
Chang Su	f8260f2539	[Bugfix][Feat] Add XML-ish grammar in EBNFComposer and fix misc bugs in Qwen3 detector (#8357 )	2025-07-25 12:03:16 -07:00
Cheng Wan	c0fb25e949	DP Enhancement (#8280 )	2025-07-24 21:36:21 -07:00
li haoyang	28d4d47280	[Feature] Integrate quick allreduce and select the best allreduce implementation (#6619 ) Signed-off-by: Haoyang Li <Haoyang.Li@amd.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-07-24 20:48:42 -07:00
xianzhiT	624a3b8d1f	Fix incomplete tool call capture issue in streaming response of DeepSeek-V3 when enable MTP (#7562 )	2025-07-23 17:40:23 -07:00
Chang Su	01079e174f	feat(function call): complete utility method for KimiK2Detector and enhance documentation (#8043 )	2025-07-23 17:37:31 -07:00
Xinyuan Tong	38000a5f44	Fix gemma3n with hybrid swa (#8240 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-07-23 13:29:18 -07:00
xianzhiT	c87d4fec99	Fix the issue of incorrect finish reason in final stream response chunk returned during tool call (#7708 ) Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-23 13:28:53 -07:00
Lifu Huang	8abd3e77fe	Introduce Stable LoRA ID System for Overlapped Updates and Prefix Caching (#8261 )	2025-07-23 00:32:16 -07:00
Xiaoze Fan	7b68d27111	[Feature] Add a test for Layer-wise Prefill (#8231 ) Signed-off-by: jason-fxz <jason341132@qq.com>	2025-07-21 22:06:15 +08:00
Xinyuan Tong	8430bfe3e9	[Refactor] simplify multimodal data processing (#8107 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-20 21:43:09 -07:00
ronnie_zheng	93d124ef5a	[feature] enable NPU CI (#7935 ) Co-authored-by: Even Zhou <14368888+iforgetmyname@users.noreply.github.com>	2025-07-20 13:12:42 -07:00
Praneth Paruchuri	83c104b188	Feat: Support for Persimmon Model (#7983 )	2025-07-19 23:07:47 -07:00
Pavel Logachev	877e35d775	Add get_hidden_dim to qwen3.py for correct lora (#7312 )	2025-07-19 19:31:16 -07:00
Clay	cbdfb77123	Enable FlashInfer support encoder models and add head_dim padding workaround (#6230 )	2025-07-19 19:30:16 -07:00
Lifu Huang	4e3defe5a7	Support start up LoRA server without initial adapters (#8019 )	2025-07-19 15:38:09 -07:00
Lifu Huang	3de617a75b	Fix LoRA buffer contamination during adapter eviction (#8103 )	2025-07-19 13:14:08 -07:00
Lianmin Zheng	bb0e8a32b5	Clean up server args (#8161 )	2025-07-19 11:32:52 -07:00
Cheng Wan	15ad6c9086	[1/N] MoE Refactor: refactor `select_experts` (#7966 )	2025-07-19 00:51:15 -07:00
Binyao Jiang	b7e951a6db	Feat: Support audio in Phi4-mm model (#8048 )	2025-07-18 21:03:53 -07:00
Mick	3964b352c3	chore: tune mem fraction static for vlm (#6881 )	2025-07-18 17:19:27 -07:00
Lianmin Zheng	9c7a46180c	[Doc] Steps to add a new attention backend (#8155 )	2025-07-18 16:38:26 -07:00
Hubert Lu	7750b91ca8	[AMD] Add triton awq_dequantize kernel to support AWQ on ROCm (#7661 )	2025-07-18 14:27:25 -07:00
Hongbo Xu	1f76fc8747	[3/n] chore: decouple AWQ implementation from vLLM dependency (#8113 ) Co-authored-by: AniZpZ <zhuangsen.zp@antgroup.com>	2025-07-18 11:45:22 -07:00
Zhiqiang Xie	9d33fcfb8e	Hicache Storage Layer Prototype (#7704 )	2025-07-18 15:20:19 +08:00
Cheng Wan	02404a1e35	[ci] recover 8-gpu deepep test (#8105 )	2025-07-17 00:46:40 -07:00
Mick	4395c87a9b	refactor: unify names of the feature field of MultimodalDataItem (#8075 )	2025-07-16 17:52:38 -07:00
Peng Zhang	c28ad1990d	[1/n] chore: decouple quantization implementation from vLLM dependency (#7992 )	2025-07-16 15:56:26 -07:00
Xinyuan Tong	7498522f7d	update transformers to 4.53.2 (#8029 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-15 18:24:39 -07:00
Praneth Paruchuri	cb736df854	Support for Phi-1.5 & Phi-2 models (#7862 )	2025-07-13 18:43:40 -07:00
Lifu Huang	e2ed9d049a	Refactor dynamic LoRA update to fix incorrect handling of variant weight shapes (#7844 )	2025-07-13 18:36:01 -07:00
Peng Zhang	b5dd5e8741	chore: remove unnecessary limits on quantization methods in test script (#7997 )	2025-07-13 16:11:49 -07:00
Hanming Lu	9379da77de	SWA Prefix Cache (#7367 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-07-13 12:31:07 -07:00
Cheng Wan	475a249bb8	temporarily disable deepep-8-gpu and activate two small tests (#7961 )	2025-07-11 14:22:05 -07:00
Atream	615553079d	Support Kimi K2 (#7940 )	2025-07-11 00:02:21 -07:00
Binyao Jiang	2d54d4bb64	Feat: Support Phi-3.5-MoE in SGLang (#7907 )	2025-07-09 23:51:33 -07:00
Mick	b5e3d6031c	vlm: support video as an input modality (#5888 )	2025-07-09 23:48:35 -07:00
kyleliang-nv	dd445a41f5	[feature] Add start step profile argument in /start_profile (#7608 )	2025-07-09 18:42:15 -07:00
Cheng Wan	d487555f84	[CI] Add deepep tests to CI (#7872 )	2025-07-09 01:49:47 -07:00
Brayden Zhong	a37e1247c1	[Multimodal][Perf] Use `pybase64` instead of `base64` (#7724 )	2025-07-08 14:00:58 -07:00
Xinyuan Tong	136c6e0431	fix: Handles input_embeds in GenerateReqInput when n>1 (#7830 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-08 14:00:42 -07:00
Xinyuan Tong	43e20c0647	Support Mimo-VL (#7579 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-07-08 14:00:25 -07:00
Lifu Huang	2b0e1d1ce0	[Minor] Fix sporadic CI timeout caused by underestimated tests. (#7850 )	2025-07-08 01:01:49 -07:00
Yuan Luo	253454de9b	Integrate triton moe kernel (#7689 ) Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>	2025-07-06 20:05:49 -07:00
Hubert Lu	e00715eb66	[AMD] Add test_fused_moe.py and test_rope_rocm.py to AMD CI (#5246 )	2025-07-06 01:47:16 -07:00

1 2 3 4 5 ...

824 Commits