sglang

Author	SHA1	Message	Date
Zijian	31d6dee5c4	Support VILA models (#6106 )	2025-06-11 11:47:25 -07:00
Baizhou Zhang	2a5f0100e0	Fix GGuf and add back test_gguf.py (#7067 )	2025-06-10 21:07:20 -07:00
Yudi Xue	14c18d25df	Frontend language separate reasoning support (#6031 )	2025-06-10 17:11:29 -07:00
Brayden Zhong	ca9291181d	[Feature] Add Logit Bias (#6579 ) Co-authored-by: Cinjon Resnick <cinjon.resnick@gmail.com>	2025-06-10 15:39:25 -07:00
kyle-pena-kuzco	b56de8f943	Open AI API hidden states (#6716 )	2025-06-10 14:37:29 -07:00
Yineng Zhang	2f58445531	Revert "Add sanity checks when a test file is not added to CI (#6947 )" (#7063 )	2025-06-10 12:43:25 -07:00
fzyzcjy	fe55947acd	Add sanity checks when a test file is not added to CI (#6947 )	2025-06-10 12:34:57 -07:00
Baizhou Zhang	3b014bc13d	Fix test_lora.py CI (#7061 )	2025-06-10 12:24:46 -07:00
Lianmin Zheng	019851d099	Fix eagle on AMD (#7051 )	2025-06-10 05:22:40 -07:00
YanbingJiang	fcde67b016	CPU: map changes from developing branch in sgl-kernel (#6833 ) Co-authored-by: mingfeima <mingfei.ma@intel.com>	2025-06-10 01:08:15 -07:00
Emmanuel Ferdman	f40942ad63	Migrate to assertEqual (#6741 ) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-06-09 16:47:39 -07:00
Lianmin Zheng	dc0705a504	Simplify prepare_extend_after_decode (#6987 )	2025-06-09 16:39:21 -07:00
Sai Enduri	3465d7ae78	Update amd nightly models CI. (#6992 )	2025-06-09 10:54:08 -07:00
Yineng Zhang	56ccd3c22c	chore: upgrade flashinfer v0.2.6.post1 jit (#6958 ) Co-authored-by: alcanderian <alcanderian@gmail.com> Co-authored-by: Qiaolin Yu <qy254@cornell.edu> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: ispobock <ispobaoke@gmail.com>	2025-06-09 09:22:39 -07:00
Pan Lyu	451ffe74d9	support qwen3 emebedding (#6990 )	2025-06-09 01:32:49 -07:00
Sai Enduri	2c18642502	Enable more unit tests for AMD CI. (#6983 )	2025-06-08 19:41:55 -07:00
Lianmin Zheng	9ecb18568b	Fix triton sliding window test case (#6981 )	2025-06-08 17:20:46 -07:00
Ke Bao	cc74499d51	Fix draft extend ut stability with flush cache (#6979 )	2025-06-08 17:09:32 -07:00
Lianmin Zheng	0c1f03a23d	Sync cuda graph runners (#6976 )	2025-06-08 16:12:25 -07:00
Lianmin Zheng	20d3ad3b58	Fix CI and triton moe Configs (#6974 )	2025-06-08 05:06:46 -07:00
Hubert Lu	4740288303	[AMD] Add more tests to per-commit-amd (#6926 )	2025-06-08 01:08:37 -07:00
HAI	b819381fec	AITER backend extension and workload optimizations (#6838 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>	2025-06-05 23:00:18 -07:00
Zaili Wang	562f279a2d	[CPU] enable CI for PRs, add Dockerfile and auto build task (#6458 ) Co-authored-by: diwei sun <diwei.sun@intel.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-06-05 13:43:54 -07:00
Chang Su	8b2474898b	bugfix(OAI): Fix image_data processing for jinja chat templates (#6877 )	2025-06-05 13:37:01 -07:00
fzyzcjy	0de5e7d40f	Support layerwise rebalancing experts (#6851 )	2025-06-05 00:05:52 -07:00
zyksir	8e3797be1c	support 1 shot allreduce in 1-node and 2-node using mscclpp (#6277 )	2025-06-04 22:11:24 -07:00
Lifu Huang	4474eaf552	Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. (#6861 )	2025-06-04 22:08:30 -07:00
ishandhanani	f0f84975f4	feat: add dp-rank to KV events (#6852 )	2025-06-04 15:29:34 -07:00
Chanh Nguyen	3f1e433903	Decoder-only Scoring API (#6460 ) Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>	2025-06-04 14:14:54 -07:00
Xinyuan Tong	cf9815ba69	[Refactor] Multimodal data processing for VLM (#6659 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-06-04 11:22:33 -07:00
Marc Sun	37f1547587	[FEAT] Add transformers backend support (#5929 )	2025-06-03 21:05:29 -07:00
jianan-gu	ff00895c46	Add CPU optimized kernels for topk and rope fusions (#6456 )	2025-06-02 17:37:34 -07:00
Ke Bao	a2cb5913a0	Add draft extend CUDA graph for flashinfer backend (#6805 )	2025-06-02 01:51:26 -07:00
Ravi Theja	c6a0cacc35	Update CI tests for Llama4 models (#6421 )	2025-06-01 11:52:15 +08:00
Lianmin Zheng	2d72fc47cf	Improve profiler and integrate profiler in bench_one_batch_server (#6787 )	2025-05-31 15:53:55 -07:00
Chang Su	e39bca0756	ci: relax test_function_call_required (#6786 )	2025-05-30 19:18:42 -07:00
Jianan Ji	a2bb856543	Temporarily lower mmlu threshold for triton sliding window backend (#6785 )	2025-05-30 18:40:50 -07:00
Chang Su	f18b068f15	feat(tool call): Enhance Llama32Detector for improved JSON parsing in non-stream (#6784 )	2025-05-30 17:05:17 -07:00
Chao Yang	4fac524b14	update llama4 chat template and pythonic parser (#6679 ) Co-authored-by: Chang Su <chang.s.su@oracle.com>	2025-05-30 17:01:22 -07:00
Jianan Ji	22630ca242	Support sliding window in triton backend (#6509 )	2025-05-30 01:11:53 -07:00
Chang Su	c673727e0e	refactor(tool call): Fix BaseFormatDetector tool_index issue and refactor `parse_streaming_increment` (#6715 )	2025-05-29 00:08:45 -07:00
iLeGend	e06b076105	Fix PP for Qwen3 MoE (#6709 )	2025-05-28 23:06:18 -07:00
shangmingc	d63e76f735	[CI] Fix setup of disaggregation with different tp (#6706 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-28 11:17:27 -07:00
shangmingc	c25231c679	[CI] Fix flaky pp single node test (#6689 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>	2025-05-28 00:40:26 -07:00
Sai Enduri	f4a8987f69	Update amd docker and nightly models. (#6687 )	2025-05-28 00:08:08 -07:00
Chang Su	41ba767f0c	feat: Add warnings for invalid tool_choice and UTs (#6582 )	2025-05-27 16:53:19 -07:00
Ke Bao	f127355a30	Add batch test for draft extend (#6672 )	2025-05-27 16:32:05 -07:00
fzyzcjy	87068b5cc7	Support gathering expert distribution details (#6665 )	2025-05-27 15:32:59 -07:00
Junrong Lin	2103b80607	[CI] update verlengine ci to 4-gpu test (#6007 )	2025-05-27 14:32:23 -07:00
Sai Enduri	eb8f02dd87	Update nightly thresholds and dependencies. (#6635 )	2025-05-26 11:44:13 -07:00

1 2 3 4 5 ...

715 Commits