sglang

Author	SHA1	Message	Date
Liangsheng Yin	d658f0497e	[overlap-spec] fix stop condition and trimming (#11819 )	2025-10-19 22:00:20 +08:00
Liangsheng Yin	57e25de756	Revert "Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads" (#11827 )	2025-10-19 19:44:06 +08:00
fzyzcjy	12eb02e982	Change bf16 to fp8 for some gemms in attention for DeepSeek ckpt v2 (#11805 )	2025-10-19 16:15:13 +08:00
fzyzcjy	002d037359	Avoid generation gets hanging when user specifies multiple event loops (#5162 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-10-19 16:12:49 +08:00
fzyzcjy	a27825ae01	Support not officially supported high sgl-kernel version with low srt version (#11786 )	2025-10-19 16:11:59 +08:00
fzyzcjy	ce399e154c	Make single-batch overlap compatible with NextN (#11804 )	2025-10-19 16:10:44 +08:00
fzyzcjy	ea6275dfbc	Tiny add hints when users send requests to wrong place (#11808 )	2025-10-19 16:10:20 +08:00
narutolhy	eb7318f1c2	support tokenized batch request (#11091 )	2025-10-19 07:05:02 +00:00
Lianmin Zheng	6058fb520c	Update CODEOWNERS for layer quantization path (#11818 )	2025-10-18 21:17:17 -07:00
YAMY	80407b0493	Fix: Dynamic RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds in EAGLE + Long-Sequence Workloads (#10788 )	2025-10-19 11:37:43 +08:00
Liangsheng Yin	b288f4f440	Improve `send_sone` script (#11817 )	2025-10-19 11:28:16 +08:00
tazjin	6d6ea5af0c	fix: do not wrap invalid grammar objects during constrained generation (#11328 )	2025-10-19 10:54:33 +08:00
Marin	1dacedd2db	make sure logit bias is applied during eagle spec decoding verification (#11555 )	2025-10-19 10:53:33 +08:00
ybyang	b5e14b2b78	[1/2][feature] support openai like classification api (#11618 )	2025-10-18 19:32:48 -07:00
ybyang	d513ee93ef	[2/2] [feature] support openai like classification api in router (#11670 )	2025-10-18 19:31:08 -07:00
Simo Lin	a7ae61ed77	[router] Add Configurable L0 and L1 Tokenizer Caching (#11688 )	2025-10-18 18:33:53 -07:00
kyleliang-nv	fda0cb2a30	Fix Dockerfile not installing correct version of DeepEP for arm build (#11773 )	2025-10-18 15:06:05 -07:00
Qiaolin Yu	ebda73dc72	Use cutlass fp4 gemm by default (#11813 )	2025-10-18 14:10:15 -07:00
b8zhong	f4f8a1b4d8	ci: update `lmms-eval` to speed up multimodal CI (#11000 )	2025-10-19 02:51:19 +08:00
Kindyaa	c44e985dc2	feat(example/fastapi): support --startup-timeout using Qwen3-Next-80B-A3B-Instruct as example (#11710 ) Co-authored-by: chenan01 <chenan01@cheche-MacBook-Pro.local> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-19 02:50:34 +08:00
b8zhong	f9a7d9b3dc	support server arg override KV cache to bf16 to avoid slow cases (#11749 )	2025-10-19 02:49:48 +08:00
Liangsheng Yin	a93f10a722	[overlap-spec] support page size > 1 (#11772 )	2025-10-19 02:09:13 +08:00
Teng Ma	585e1223f0	[HiCache] feat: add more eviction policy (#11506 )	2025-10-18 15:49:45 +00:00
fzyzcjy	a7043c6f0d	Bump torch_memory_saver to avoid installing pre-release versions (#11797 )	2025-10-18 01:20:42 -07:00
Lianmin Zheng	67e34c56d7	Fix install instructions and pyproject.tomls (#11781 )	2025-10-18 01:08:01 -07:00
Yuwei An	1d726528f7	Eager Compiler for Torch Compile (#11803 ) Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>	2025-10-18 15:18:52 +08:00
Minglei Zhu	f4488e9dd9	set default attention backend for deterministic inference (#11801 )	2025-10-18 00:01:24 -07:00
Zilin Zhu	e68a2b5b2f	[RL] use cpu group to prepare_mlp_sync_batch_raw when the server is offloaded (#10152 )	2025-10-18 14:29:35 +08:00
Zilin Zhu	31b9f19e54	[RL] support weight update with DP attention (#11669 )	2025-10-18 14:26:19 +08:00
Qiaolin Yu	547003bdd0	fix command line usage of profiling (#11793 )	2025-10-18 12:54:36 +08:00
Jimmy	f7ab955455	fix(glm45): disable reduce scatter (#11665 ) Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-10-18 12:19:20 +08:00
fzyzcjy	dbbd4e1891	Try add back no-commit-to-branch (#11799 )	2025-10-18 12:05:12 +08:00
Chang Su	ca240eefb4	[router][grpc] Support parallel queue puts in grpc_request_manager and remove mutex for grpc_client (#11798 )	2025-10-17 20:49:43 -07:00
fzyzcjy	6c7c92eb02	Enable lint on main (#11794 )	2025-10-17 19:08:50 -07:00
Cheng Wan	5b214b50b6	[Refactor] move `deep_gemm_wrapper` out of `quantization` (#11784 )	2025-10-17 18:57:54 -07:00
Minglei Zhu	13219e1e48	completely remove mixed mode deterministic test as prefix mode could cover it (#11783 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-10-17 17:46:03 -07:00
fzyzcjy	33e9bbec35	Make single-batch overlap compatible with offloading (#11614 )	2025-10-18 08:45:54 +08:00
fzyzcjy	dcb8f090ad	Super tiny fix CI (#11788 )	2025-10-17 17:41:58 -07:00
Lianmin Zheng	9eefe2c0b7	Set CUDA_VISIBLE_DEVICES to achieve one GPU per process (#9170 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Cheng Wan <cwan@x.ai> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-10-17 17:30:06 -07:00
Zilin Zhu	69fe3c9726	Manually flip deepep_mode for cuda_graph (#11666 )	2025-10-18 08:05:48 +08:00
fzyzcjy	8af8491298	Support casting bf16 NextN moe to fp8 (#11613 )	2025-10-18 08:02:15 +08:00
fzyzcjy	505329cab0	Support shared experts overlap in cutlass moe (#11611 )	2025-10-18 07:59:40 +08:00
fzyzcjy	8a382fd399	Super tiny fix missing input throughput (#11607 )	2025-10-18 07:58:48 +08:00
Chang Su	627974405d	[Lint] Add `python/sglang` to ruff F401 checks and remove unused imports in files (#11685 )	2025-10-17 16:49:46 -07:00
Antonin Vidon	2614adf9ca	[Fix] Skip visual layers when applying LoRA to Qwen2VL modules (#11519 )	2025-10-17 17:39:57 -05:00
Lianmin Zheng	fdd7c69d65	[Auto Sync] Update common.py (20251017) (#11782 ) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>	2025-10-17 15:03:42 -07:00
Lianmin Zheng	b9a54e0968	[minor] sync code on python/sglang/test/test_deterministic.py and improve ci tests (#11777 ) Co-authored-by: Stefan He <hebiaobuaa@gmail.com> Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>	2025-10-17 14:25:22 -07:00
Baizhou Zhang	20b8d2306c	Cleaning indexer for DeepSeek V3.2 (#11682 )	2025-10-17 13:47:21 -07:00
Chang Su	d1984e218c	[router][grpc] Remove timeout for connections and remove `max_tokens` deprecation warning log (#11775 )	2025-10-17 12:36:36 -07:00
Yineng Zhang	b79f75fd53	[Auto Sync] Update scheduler.py (20251017) (#11738 )	2025-10-17 12:36:07 -07:00

1 2 3 4 5 ...

6055 Commits