sglang

Author	SHA1	Message	Date
Zhiqiang Xie	0eec4cb6cc	HiCache, add bench long context plus minor fixs (#9086 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 16:54:52 -07:00
Faradawn Yang	ff1f68252c	[fix] Set Radix tree root node hash to None - Nvidia Dynamo Integration (#9030 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 14:20:39 -07:00
Zhiqiang Xie	9f78f391ae	HiCache Storage: generate hash when inserting new nodes (#9053 )	2025-08-11 14:18:59 -07:00
Faraz	f508cd3cb7	TRTLLM-MLA FP8 path (#8638 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-08-11 14:02:13 -07:00
Xiaoyu Zhang	44e86480e8	fuse allreduce and residual_rmsnorm (#8731 )	2025-08-11 13:50:53 -07:00
Lianmin Zheng	8c07fabda7	Update hyperparameter_tuning.md (#9083 )	2025-08-11 13:44:11 -07:00
SijiaYang	90f44b74e6	fix: w4afp8 accuracy problem and rebase (#8752 ) Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com> Co-authored-by: Jinwu <ayrnb@users.noreply.github.com>	2025-08-11 13:41:19 -07:00
Simo Lin	38907fe639	refactor(pd-router): extract common patterns to reduce code duplication (#9081 )	2025-08-11 13:32:31 -07:00
Liangsheng Yin	f9afa7dceb	Fix docs for clip max new tokens (#9082 )	2025-08-11 13:15:21 -07:00
Jimmy	0d9e89ec69	[PD]decode: add CLIP_MAX_NEW_TOKEN for pop_preallocated (#8866 )	2025-08-11 13:08:11 -07:00
Hangzhi	3d64fda376	Fix broken Kimi models HuggingFace link (#9080 )	2025-08-11 12:15:00 -07:00
633WHU	3bffe11279	Fix chunked prefill size validation for disabled state (#8973 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 11:05:29 -07:00
HAI	44426e54be	Update REVIEWERS (#9063 )	2025-08-11 11:04:39 -07:00
ishandhanani	9f24dfefd1	chore(gb200): remove ToT flashinfer installation (#9079 )	2025-08-11 11:02:15 -07:00
Yi Zhang	89f1d4f536	update deepep commit to support qwen3-coder (#9066 )	2025-08-11 10:42:33 -07:00
Baizhou Zhang	75e6a7cde1	Support radix cache for Lora feature (#7216 )	2025-08-11 10:14:11 -07:00
Simo Lin	6f81a710f7	[pd-router] add retry and circuit breakfor for pd router (#9051 )	2025-08-11 05:53:26 -07:00
Chang Su	a6452b7188	bugfix: Fix output_ids extraction in detokenizer_manager (#9047 )	2025-08-11 03:17:32 -07:00
zhyncs	f4ae50e97c	fix: use flashinfer v0.2.11.post1	2025-08-11 02:49:25 -07:00
Yineng Zhang	84cb449eec	Revert "chore: upgrade flashinfer 0.2.11 (#9036 )" (#9057 )	2025-08-11 00:16:39 -07:00
Cheng Wan	f003cd3548	[CI] Fix CI tests (#9050 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-10 23:52:05 -07:00
Yineng Zhang	9d834fdcc1	Revert "feat: update flashinfer ar oneshot params (#8687 )" (#9054 )	2025-08-10 23:24:42 -07:00
Zhiqiang Xie	b32792516a	REVIEWERS.md typo fix (#9048 )	2025-08-10 22:33:37 -07:00
Simo Lin	067068f271	[router] regular router circuit breaker (#8997 )	2025-08-10 21:19:30 -07:00
Lianmin Zheng	6beeff41c5	Update REVIEWERS.md (#9046 )	2025-08-10 21:11:14 -07:00
Lianmin Zheng	2e8e7e353b	Improve docs and developer guide (#9044 )	2025-08-10 21:05:18 -07:00
Lianmin Zheng	2449a0afe2	Refactor the docs (#9031 )	2025-08-10 19:49:45 -07:00
Lianmin Zheng	0f229c07f1	Update release-docs.yml (#9037 )	2025-08-10 18:52:11 -07:00
Yineng Zhang	dd001a5477	chore: upgrade flashinfer 0.2.11 (#9036 )	2025-08-10 17:35:37 -07:00
Lianmin Zheng	4ea9d74a3e	Simplify health check (#9034 )	2025-08-10 17:35:05 -07:00
Yineng Zhang	dd949ace23	Revert "[1/2][resubmit] sgl-kernel: Fuse routed scaling factor into m… (#9035 )	2025-08-10 17:34:54 -07:00
Lianmin Zheng	f2887498f0	Simplify memory pool (#9033 )	2025-08-10 17:32:28 -07:00
Stefan He	8ecf6b9d24	Support Flatten Tensor Update Weights to speed up MOE Update Weights by 20% (#8079 )	2025-08-10 16:08:59 -07:00
YiXR	0418b9d4ea	[Optimization] Update estimated_num_new_pages logic in TokenToKVPoolAllocator (#8794 ) Signed-off-by: Xingrui Yi <yixingrui@linux.alibaba.com> Co-authored-by: Xingrui Yi <yixingrui@linux.alibaba.com>	2025-08-10 16:01:51 -07:00
Lifu Huang	e322a94d1f	Reduce CI duration of test_lora_update. (#9024 )	2025-08-10 15:34:04 -07:00
Lianmin Zheng	2c7f01bc89	Reorganize CI and test files (#9027 )	2025-08-10 12:30:06 -07:00
Lianmin Zheng	b58ae7a2a0	Simplify frontend language (#9029 )	2025-08-10 10:59:30 -07:00
Stefan He	6345069f6c	[RL] Add test for /abort_request (#7626 )	2025-08-10 09:14:19 -07:00
Simo Lin	ce9cf35327	[router] update pyo3 version to 0.25.1 (#9022 )	2025-08-10 06:45:51 -07:00
Lifu Huang	f8a173bb50	Improve LoRA Perf by Deprecating FlashInfer and Eliminating Redundant Tensor Ops (#8940 )	2025-08-10 01:04:45 -07:00
JiLi	6b847a9a05	Optimize: Cache CUDA device to reduce redundant calls during tensor l… (#8996 )	2025-08-10 00:32:57 -07:00
Simo Lin	473400e452	[router] upgrade kube version to latest (#9018 )	2025-08-09 22:49:45 -07:00
Simo Lin	dd665f967f	[router] upgrade rand to latest version (#9017 )	2025-08-09 22:49:30 -07:00
Simo Lin	3817a37d87	[router] upgrade to latest sgl kernel for router ci (#9019 )	2025-08-09 21:49:18 -07:00
DarkSharpness	7ba5ad5766	[Fix] Fix flashinfer cpu <-> gpu synchronization (#8340 )	2025-08-10 03:11:40 +00:00
DarkSharpness	19bc77f05c	[Fix] Fix hicache backend (#8991 )	2025-08-09 17:16:25 -07:00
huangtingwei	86497d99f2	fix page first per layer pf2lf kernel (#8915 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-09 17:16:11 -07:00
cctry	5c31b35db2	[hicache] Optimization for DMA copy (#8245 )	2025-08-09 17:16:07 -07:00
Lianmin Zheng	ef48d5547e	Fix CI (#9013 )	2025-08-09 16:00:10 -07:00
Xiaoyu Zhang	a886564a18	fix flashinfer allreduce fusion import bug (#9007 )	2025-08-09 13:47:05 -07:00

1 2 3 4 5 ...

4585 Commits