sglang

Author	SHA1	Message	Date
Chang Su	f2a5de284b	[Bugfix] Fix accuracy-test-1-gpu failure caused by `builtin_tools` (#9114 )	2025-08-12 09:56:13 -07:00
Liangsheng Yin	445f9dca6e	Runtime check CUDA driver version to avoid unresolved green context symbols (#9021 )	2025-08-12 09:26:10 -07:00
Yineng Zhang	3a9afe2a42	chore: bump sgl-kernel v0.3.4 (#9103 )	2025-08-12 01:48:47 -07:00
fzyzcjy	9aea255522	Fuse writing KV buffer into rope kernel (part 1: sgl-kernel) (#9077 )	2025-08-12 01:46:40 -07:00
Yichao Cheng	fcc11e5ed5	update support new models doc (#9096 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-12 01:21:02 -07:00
fzyzcjy	5190ba7f42	Fuse two kernels of hidden states padding into quantization kernel (#9005 ) Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-08-12 01:20:13 -07:00
Hsiang-Yu Tsou	5438886c87	docs: fix broken links in README.md (#9075 )	2025-08-12 00:03:35 -07:00
Chang Su	9c83d74da3	bugfix: Fix the commentary msg extraction in GptOssDetector (#9097 )	2025-08-11 23:53:10 -07:00
DarkSharpness	b4ac2b9c0c	[Fix] Fix dual chunk model default behavior (#9032 )	2025-08-11 23:50:23 -07:00
Jianwei Dong	83262dcb29	Fix mismatch between padded_scales shape and reshape dimensions in modelopt quantization (#8766 )	2025-08-11 23:44:40 -07:00
zixuanzhang226	c46c75f8c0	feat: add fused moe config for Qwen3-30B-A3B on B200 (#9087 )	2025-08-11 23:25:36 -07:00
Makcum888e	2aaf22c46c	Optimization for AscendPagedTokenToKVPoolAllocator (#8293 ) Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: VDV1985 <vladdv85@mail.ru>	2025-08-11 23:06:39 -07:00
Lifu Huang	29a610b4d9	Fix broken CI TestRequestLengthValidation (#9095 )	2025-08-11 22:59:56 -07:00
Lifu Huang	5ded39cab2	Fix race condition in async lora unload (#9084 )	2025-08-11 22:59:29 -07:00
Keyang Ru	4093d460ce	[CI] migrate router to BM.A10.4 runner (#8992 ) Co-authored-by: key4ng <rukeyang@gamil.com>	2025-08-11 22:41:18 -07:00
Simo Lin	9d68bdb240	[router] Add Rust Binary Entrypoint for SGLang Router (#9089 )	2025-08-11 21:37:36 -07:00
Chang Su	a218490136	(gpt-oss, oai, chat): Remove Harmony Integration and Implement Native GPT-OSS Tool Call Support (#9043 )	2025-08-11 18:59:18 -07:00
Zhiqiang Xie	0eec4cb6cc	HiCache, add bench long context plus minor fixs (#9086 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 16:54:52 -07:00
Faradawn Yang	ff1f68252c	[fix] Set Radix tree root node hash to None - Nvidia Dynamo Integration (#9030 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 14:20:39 -07:00
Zhiqiang Xie	9f78f391ae	HiCache Storage: generate hash when inserting new nodes (#9053 )	2025-08-11 14:18:59 -07:00
Faraz	f508cd3cb7	TRTLLM-MLA FP8 path (#8638 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-08-11 14:02:13 -07:00
Xiaoyu Zhang	44e86480e8	fuse allreduce and residual_rmsnorm (#8731 )	2025-08-11 13:50:53 -07:00
Lianmin Zheng	8c07fabda7	Update hyperparameter_tuning.md (#9083 )	2025-08-11 13:44:11 -07:00
SijiaYang	90f44b74e6	fix: w4afp8 accuracy problem and rebase (#8752 ) Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com> Co-authored-by: Jinwu <ayrnb@users.noreply.github.com>	2025-08-11 13:41:19 -07:00
Simo Lin	38907fe639	refactor(pd-router): extract common patterns to reduce code duplication (#9081 )	2025-08-11 13:32:31 -07:00
Liangsheng Yin	f9afa7dceb	Fix docs for clip max new tokens (#9082 )	2025-08-11 13:15:21 -07:00
Jimmy	0d9e89ec69	[PD]decode: add CLIP_MAX_NEW_TOKEN for pop_preallocated (#8866 )	2025-08-11 13:08:11 -07:00
Hangzhi	3d64fda376	Fix broken Kimi models HuggingFace link (#9080 )	2025-08-11 12:15:00 -07:00
633WHU	3bffe11279	Fix chunked prefill size validation for disabled state (#8973 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 11:05:29 -07:00
HAI	44426e54be	Update REVIEWERS (#9063 )	2025-08-11 11:04:39 -07:00
ishandhanani	9f24dfefd1	chore(gb200): remove ToT flashinfer installation (#9079 )	2025-08-11 11:02:15 -07:00
Yi Zhang	89f1d4f536	update deepep commit to support qwen3-coder (#9066 )	2025-08-11 10:42:33 -07:00
Baizhou Zhang	75e6a7cde1	Support radix cache for Lora feature (#7216 )	2025-08-11 10:14:11 -07:00
Simo Lin	6f81a710f7	[pd-router] add retry and circuit breakfor for pd router (#9051 )	2025-08-11 05:53:26 -07:00
Chang Su	a6452b7188	bugfix: Fix output_ids extraction in detokenizer_manager (#9047 )	2025-08-11 03:17:32 -07:00
zhyncs	f4ae50e97c	fix: use flashinfer v0.2.11.post1	2025-08-11 02:49:25 -07:00
Yineng Zhang	84cb449eec	Revert "chore: upgrade flashinfer 0.2.11 (#9036 )" (#9057 )	2025-08-11 00:16:39 -07:00
Cheng Wan	f003cd3548	[CI] Fix CI tests (#9050 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-10 23:52:05 -07:00
Yineng Zhang	9d834fdcc1	Revert "feat: update flashinfer ar oneshot params (#8687 )" (#9054 )	2025-08-10 23:24:42 -07:00
Zhiqiang Xie	b32792516a	REVIEWERS.md typo fix (#9048 )	2025-08-10 22:33:37 -07:00
Simo Lin	067068f271	[router] regular router circuit breaker (#8997 )	2025-08-10 21:19:30 -07:00
Lianmin Zheng	6beeff41c5	Update REVIEWERS.md (#9046 )	2025-08-10 21:11:14 -07:00
Lianmin Zheng	2e8e7e353b	Improve docs and developer guide (#9044 )	2025-08-10 21:05:18 -07:00
Lianmin Zheng	2449a0afe2	Refactor the docs (#9031 )	2025-08-10 19:49:45 -07:00
Lianmin Zheng	0f229c07f1	Update release-docs.yml (#9037 )	2025-08-10 18:52:11 -07:00
Yineng Zhang	dd001a5477	chore: upgrade flashinfer 0.2.11 (#9036 )	2025-08-10 17:35:37 -07:00
Lianmin Zheng	4ea9d74a3e	Simplify health check (#9034 )	2025-08-10 17:35:05 -07:00
Yineng Zhang	dd949ace23	Revert "[1/2][resubmit] sgl-kernel: Fuse routed scaling factor into m… (#9035 )	2025-08-10 17:34:54 -07:00
Lianmin Zheng	f2887498f0	Simplify memory pool (#9033 )	2025-08-10 17:32:28 -07:00
Stefan He	8ecf6b9d24	Support Flatten Tensor Update Weights to speed up MOE Update Weights by 20% (#8079 )	2025-08-10 16:08:59 -07:00

1 2 3 4 5 ...

4602 Commits