sglang

Author	SHA1	Message	Date
Yineng Zhang	71fb8c9527	feat: update fa3 (#9126 )	2025-08-13 20:07:08 +08:00
Ke Bao	94f44b88d1	Update fa3 interface and add unit test (#9150 )	2025-08-13 20:05:02 +08:00
Kevin Xiang Li	3b3b3baf9f	Double vision prefill throughput by defaulting to optimal vision attention backend (#8484 ) Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com>	2025-08-13 02:08:30 -07:00
kk	35e6bc92e3	Update docker file for MI35x base image update to support gpt-oss mxfp4 model (#9111 ) Co-authored-by: wunhuang <wunhuang@amd.com>	2025-08-13 00:55:31 -07:00
fzyzcjy	9394ed6386	Fix gpt-oss ~2x memory consumption issue (#9146 )	2025-08-13 00:11:43 -07:00
Stefan He	930fe467bd	Support Triton FP8 Gemm can handle hidden_dim not divisible by 16 (#9093 ) Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>	2025-08-12 21:21:55 -07:00
Trevor Morris	13c48dcf88	[1/2][resubmit again] sgl-kernel: Fuse routed scaling factor into moe_fused_gate (#9088 )	2025-08-12 20:12:38 -07:00
Elfie Guo	8723b4f146	Use FlashInfer's TRTLLM FP8 Blockscale GEMM (#8588 )	2025-08-12 20:08:40 -07:00
li chaoran	62f99e08b3	fix: wrong docker hub org name (#9137 ) Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>	2025-08-12 19:26:19 -07:00
DarkSharpness	86a0be65d8	[Feature] Support custom set kv buffer kernel (#8884 )	2025-08-12 16:56:51 -07:00
huangtingwei	0edda32001	Support page first layout zero copy for mooncake store (#8651 ) Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>	2025-08-12 15:59:26 -07:00
Yineng Zhang	924827c3de	chore: use cp310 (#9130 )	2025-08-12 15:33:22 -07:00
Yineng Zhang	c81daf838d	fix: update Dockerfile (#9129 )	2025-08-12 15:01:29 -07:00
jacky.cheng	25caa7a8a9	[AMD] Support Wave attention backend with AMD GPU optimizations (#8660 ) Signed-off-by: Stanley Winata <stanley.winata@amd.com> Signed-off-by: Harsh Menon <harsh@nod-labs.com> Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com> Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com> Signed-off-by: xintin <gaurav.verma@amd.com> Co-authored-by: Harsh Menon <harsh@nod-labs.com> Co-authored-by: Stanley Winata <stanley.winata@amd.com> Co-authored-by: Stanley Winata <68087699+raikonenfnu@users.noreply.github.com> Co-authored-by: Stanley Winata <stanley@nod-labs.com> Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com> Co-authored-by: nithinsubbiah <nithinsubbiah@gmail.com> Co-authored-by: Nithin Meganathan <18070964+nithinsubbiah@users.noreply.github.com> Co-authored-by: Ivan Butygin <ibutygin@amd.com>	2025-08-12 13:49:11 -07:00
Hangzhi	03d114496f	Fix typos in supported models documentation (#9119 )	2025-08-12 13:35:24 -07:00
ichernob	83123f481e	[Quantization] Supported w8a8 int8 quantized Gemma3 and Qwen-VL models (#8619 ) Co-authored-by: ronnie_zheng <zl19940307@163.com>	2025-08-12 13:31:18 -07:00
ronnie_zheng	48afa8f14f	[feat] Enable Ascend profiling on SGLang (#8610 ) Co-authored-by: liyou_b <2953090824@qq.com>	2025-08-12 13:28:31 -07:00
li chaoran	2ecbd8b8bf	[feat] add ascend readme and docker release (#8700 ) Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com> Signed-off-by: lichaoran <pkwarcraft@gmail.com> Co-authored-by: Even Zhou <even.y.zhou@outlook.com> Co-authored-by: ronnie_zheng <zl19940307@163.com>	2025-08-12 13:25:42 -07:00
Yineng Zhang	305b27c124	fix: update Dockerfile (#9125 )	2025-08-12 13:23:10 -07:00
Simo Lin	1ce30dd13e	[router] update router documentation (#9121 )	2025-08-12 13:16:34 -07:00
Jiaqi Gu	c9ee738515	Fuse writing KV buffer into rope kernel (part 2: srt) (#9014 ) Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>	2025-08-12 13:15:30 -07:00
ishandhanani	1f9ec65374	fix(docker): update sgl_kernel version to 0.3.4 in Dockerfile.gb200 (#9118 )	2025-08-12 13:12:33 -07:00
Chang Su	ad359d1c71	router: Fix user guide link README.md (#9122 )	2025-08-12 12:29:10 -07:00
Cheng Wan	5f5b3b2449	[5/n] DP Enhancement: Correct `num_token_non_padded` (#9107 )	2025-08-12 12:23:46 -07:00
Shangming Cai	4caca4f6b4	Fix typo in REVIEWERS (#9113 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-08-12 11:55:49 -07:00
Chang Su	f2a5de284b	[Bugfix] Fix accuracy-test-1-gpu failure caused by `builtin_tools` (#9114 )	2025-08-12 09:56:13 -07:00
Liangsheng Yin	445f9dca6e	Runtime check CUDA driver version to avoid unresolved green context symbols (#9021 )	2025-08-12 09:26:10 -07:00
Yineng Zhang	3a9afe2a42	chore: bump sgl-kernel v0.3.4 (#9103 )	2025-08-12 01:48:47 -07:00
fzyzcjy	9aea255522	Fuse writing KV buffer into rope kernel (part 1: sgl-kernel) (#9077 )	2025-08-12 01:46:40 -07:00
Yichao Cheng	fcc11e5ed5	update support new models doc (#9096 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-08-12 01:21:02 -07:00
fzyzcjy	5190ba7f42	Fuse two kernels of hidden states padding into quantization kernel (#9005 ) Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-08-12 01:20:13 -07:00
Hsiang-Yu Tsou	5438886c87	docs: fix broken links in README.md (#9075 )	2025-08-12 00:03:35 -07:00
Chang Su	9c83d74da3	bugfix: Fix the commentary msg extraction in GptOssDetector (#9097 )	2025-08-11 23:53:10 -07:00
DarkSharpness	b4ac2b9c0c	[Fix] Fix dual chunk model default behavior (#9032 )	2025-08-11 23:50:23 -07:00
Jianwei Dong	83262dcb29	Fix mismatch between padded_scales shape and reshape dimensions in modelopt quantization (#8766 )	2025-08-11 23:44:40 -07:00
zixuanzhang226	c46c75f8c0	feat: add fused moe config for Qwen3-30B-A3B on B200 (#9087 )	2025-08-11 23:25:36 -07:00
Makcum888e	2aaf22c46c	Optimization for AscendPagedTokenToKVPoolAllocator (#8293 ) Co-authored-by: ronnie_zheng <zl19940307@163.com> Co-authored-by: VDV1985 <vladdv85@mail.ru>	2025-08-11 23:06:39 -07:00
Lifu Huang	29a610b4d9	Fix broken CI TestRequestLengthValidation (#9095 )	2025-08-11 22:59:56 -07:00
Lifu Huang	5ded39cab2	Fix race condition in async lora unload (#9084 )	2025-08-11 22:59:29 -07:00
Keyang Ru	4093d460ce	[CI] migrate router to BM.A10.4 runner (#8992 ) Co-authored-by: key4ng <rukeyang@gamil.com>	2025-08-11 22:41:18 -07:00
Simo Lin	9d68bdb240	[router] Add Rust Binary Entrypoint for SGLang Router (#9089 )	2025-08-11 21:37:36 -07:00
Chang Su	a218490136	(gpt-oss, oai, chat): Remove Harmony Integration and Implement Native GPT-OSS Tool Call Support (#9043 )	2025-08-11 18:59:18 -07:00
Zhiqiang Xie	0eec4cb6cc	HiCache, add bench long context plus minor fixs (#9086 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 16:54:52 -07:00
Faradawn Yang	ff1f68252c	[fix] Set Radix tree root node hash to None - Nvidia Dynamo Integration (#9030 ) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-11 14:20:39 -07:00
Zhiqiang Xie	9f78f391ae	HiCache Storage: generate hash when inserting new nodes (#9053 )	2025-08-11 14:18:59 -07:00
Faraz	f508cd3cb7	TRTLLM-MLA FP8 path (#8638 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-08-11 14:02:13 -07:00
Xiaoyu Zhang	44e86480e8	fuse allreduce and residual_rmsnorm (#8731 )	2025-08-11 13:50:53 -07:00
Lianmin Zheng	8c07fabda7	Update hyperparameter_tuning.md (#9083 )	2025-08-11 13:44:11 -07:00
SijiaYang	90f44b74e6	fix: w4afp8 accuracy problem and rebase (#8752 ) Signed-off-by: yangsijia.614 <yangsijia.614@bytedance.com> Co-authored-by: Jinwu <ayrnb@users.noreply.github.com>	2025-08-11 13:41:19 -07:00
Simo Lin	38907fe639	refactor(pd-router): extract common patterns to reduce code duplication (#9081 )	2025-08-11 13:32:31 -07:00

... 6 7 8 9 10 ...

4977 Commits