sglang

Author	SHA1	Message	Date
Chang Su	f2a5de284b	[Bugfix] Fix accuracy-test-1-gpu failure caused by `builtin_tools` (#9114 )	2025-08-12 09:56:13 -07:00
fzyzcjy	9aea255522	Fuse writing KV buffer into rope kernel (part 1: sgl-kernel) (#9077 )	2025-08-12 01:46:40 -07:00
fzyzcjy	5190ba7f42	Fuse two kernels of hidden states padding into quantization kernel (#9005 ) Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-08-12 01:20:13 -07:00
Chang Su	9c83d74da3	bugfix: Fix the commentary msg extraction in GptOssDetector (#9097 )	2025-08-11 23:53:10 -07:00
DarkSharpness	b4ac2b9c0c	[Fix] Fix dual chunk model default behavior (#9032 )	2025-08-11 23:50:23 -07:00
Jianwei Dong	83262dcb29	Fix mismatch between padded_scales shape and reshape dimensions in modelopt quantization (#8766 )	2025-08-11 23:44:40 -07:00
zixuanzhang226	c46c75f8c0	feat: add fused moe config for Qwen3-30B-A3B on B200 (#9087 )	2025-08-11 23:25:36 -07:00
Makcum888e	2aaf22c46c	Optimization for AscendPagedTokenToKVPoolAllocator (#8293 ) Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: VDV1985 <vladdv85@mail.ru>	2025-08-11 23:06:39 -07:00
Lifu Huang	5ded39cab2	Fix race condition in async lora unload (#9084 )	2025-08-11 22:59:29 -07:00
Chang Su	a218490136	(gpt-oss, oai, chat): Remove Harmony Integration and Implement Native GPT-OSS Tool Call Support (#9043 )	2025-08-11 18:59:18 -07:00
Zhiqiang Xie	0eec4cb6cc	HiCache, add bench long context plus minor fixs (#9086 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 16:54:52 -07:00
Faradawn Yang	ff1f68252c	[fix] Set Radix tree root node hash to None - Nvidia Dynamo Integration (#9030 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 14:20:39 -07:00
Zhiqiang Xie	9f78f391ae	HiCache Storage: generate hash when inserting new nodes (#9053 )	2025-08-11 14:18:59 -07:00
Faraz	f508cd3cb7	TRTLLM-MLA FP8 path (#8638 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-08-11 14:02:13 -07:00
Xiaoyu Zhang	44e86480e8	fuse allreduce and residual_rmsnorm (#8731 )	2025-08-11 13:50:53 -07:00
SijiaYang	90f44b74e6	fix: w4afp8 accuracy problem and rebase (#8752 ) Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com> Co-authored-by: Jinwu <ayrnb@users.noreply.github.com>	2025-08-11 13:41:19 -07:00
Liangsheng Yin	f9afa7dceb	Fix docs for clip max new tokens (#9082 )	2025-08-11 13:15:21 -07:00
Jimmy	0d9e89ec69	[PD]decode: add CLIP_MAX_NEW_TOKEN for pop_preallocated (#8866 )	2025-08-11 13:08:11 -07:00
633WHU	3bffe11279	Fix chunked prefill size validation for disabled state (#8973 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 11:05:29 -07:00
Baizhou Zhang	75e6a7cde1	Support radix cache for Lora feature (#7216 )	2025-08-11 10:14:11 -07:00
Chang Su	a6452b7188	bugfix: Fix output_ids extraction in detokenizer_manager (#9047 )	2025-08-11 03:17:32 -07:00
zhyncs	f4ae50e97c	fix: use flashinfer v0.2.11.post1	2025-08-11 02:49:25 -07:00
Yineng Zhang	84cb449eec	Revert "chore: upgrade flashinfer 0.2.11 (#9036 )" (#9057 )	2025-08-11 00:16:39 -07:00
Yineng Zhang	9d834fdcc1	Revert "feat: update flashinfer ar oneshot params (#8687 )" (#9054 )	2025-08-10 23:24:42 -07:00
Lianmin Zheng	2449a0afe2	Refactor the docs (#9031 )	2025-08-10 19:49:45 -07:00
Yineng Zhang	dd001a5477	chore: upgrade flashinfer 0.2.11 (#9036 )	2025-08-10 17:35:37 -07:00
Lianmin Zheng	4ea9d74a3e	Simplify health check (#9034 )	2025-08-10 17:35:05 -07:00
Yineng Zhang	dd949ace23	Revert "[1/2][resubmit] sgl-kernel: Fuse routed scaling factor into m… (#9035 )	2025-08-10 17:34:54 -07:00
Lianmin Zheng	f2887498f0	Simplify memory pool (#9033 )	2025-08-10 17:32:28 -07:00
Stefan He	8ecf6b9d24	Support Flatten Tensor Update Weights to speed up MOE Update Weights by 20% (#8079 )	2025-08-10 16:08:59 -07:00
YiXR	0418b9d4ea	[Optimization] Update estimated_num_new_pages logic in TokenToKVPoolAllocator (#8794 ) Signed-off-by: Xingrui Yi <yixingrui@linux.alibaba.com> Co-authored-by: Xingrui Yi <yixingrui@linux.alibaba.com>	2025-08-10 16:01:51 -07:00
Lianmin Zheng	b58ae7a2a0	Simplify frontend language (#9029 )	2025-08-10 10:59:30 -07:00
Lifu Huang	f8a173bb50	Improve LoRA Perf by Deprecating FlashInfer and Eliminating Redundant Tensor Ops (#8940 )	2025-08-10 01:04:45 -07:00
JiLi	6b847a9a05	Optimize: Cache CUDA device to reduce redundant calls during tensor l… (#8996 )	2025-08-10 00:32:57 -07:00
DarkSharpness	7ba5ad5766	[Fix] Fix flashinfer cpu <-> gpu synchronization (#8340 )	2025-08-10 03:11:40 +00:00
DarkSharpness	19bc77f05c	[Fix] Fix hicache backend (#8991 )	2025-08-09 17:16:25 -07:00
huangtingwei	86497d99f2	fix page first per layer pf2lf kernel (#8915 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-09 17:16:11 -07:00
cctry	5c31b35db2	[hicache] Optimization for DMA copy (#8245 )	2025-08-09 17:16:07 -07:00
Lianmin Zheng	ef48d5547e	Fix CI (#9013 )	2025-08-09 16:00:10 -07:00
Xiaoyu Zhang	a886564a18	fix flashinfer allreduce fusion import bug (#9007 )	2025-08-09 13:47:05 -07:00
Lianmin Zheng	9a44b643c6	Fix CI (#9012 )	2025-08-09 13:33:42 -07:00
Mick	41d71ca488	fix: fix obsolete qwen-audio processor arg (#9003 )	2025-08-09 13:18:36 -07:00
JieXin Liang	20cfc5a251	[perf] add kimi-k2 b200 fused moe config (#9010 )	2025-08-09 12:40:49 -07:00
Chaitanya Sri Krishna Lolla	323bc2f51a	Enable TBO on ROCm (#8329 )	2025-08-09 01:59:55 -07:00
Even Zhou	137e75daa1	[Feature] Optimize DeepSeek's DeepEP on Ascend NPU (#8355 ) Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: Hexq0210 <hexq0809521@gmail.com>	2025-08-09 01:35:00 -07:00
Trevor Morris	52e1f52f32	[bugfix] Fix missing args in bench one batch (#8877 )	2025-08-09 01:34:03 -07:00
Cheng Wan	5018809222	[DP] fix: engine crash when decode batch is padded (#8995 )	2025-08-09 01:29:29 -07:00
Yineng Zhang	326a901df4	chore: upgrade sgl-kernel 0.3.3 (#8998 )	2025-08-09 01:22:01 -07:00
Zhiqiang Xie	6e0b646832	HiCache Storage tp fix (#8878 )	2025-08-09 01:16:51 -07:00
Brayden Zhong	4a9f3eef90	Tiny Llama4 type error in constructor (#6752 )	2025-08-09 01:03:59 -07:00

1 2 3 4 5 ...

3121 Commits