sglang

Author	SHA1	Message	Date
narutolhy	d17986f8c6	Enable optional FP32 compute for LM Head (#10729 ) Thanks to MiniMax Team and Chenyang Zhao's support.	2025-09-29 20:45:17 -07:00
Lianmin Zheng	dda34c2f93	Fix mem fraction static for nightly tests (#11076 )	2025-09-29 12:57:41 -07:00
Lianmin Zheng	a17e70f5cc	Use more general heuristics to set the default value of --mem-fraction-static (#10975 ) Co-authored-by: sglang-bot <sglangbot@gmail.com>	2025-09-29 10:11:03 -07:00
Zhihao Zhang	24f7cb1ece	[speculative decoding] rename lookahead to ngram (#11010 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com>	2025-09-28 21:06:59 -07:00
huangtingwei	e05555fad8	[HiCacheStorage] mooncake store support page_first_direct layout (#10591 )	2025-09-28 20:45:48 -07:00
Mick	2e7633982c	fix: show failed models in nightly ci (#10986 )	2025-09-28 12:38:29 -07:00
Tejesh Anand	8cc27fdc46	Use jsonschema to constrain required or specific tool choice (#10550 )	2025-09-27 13:18:50 -04:00
Mick	777eb53897	ci: refactor nightly test (#10495 )	2025-09-26 15:24:30 -07:00
Mick	fff7fbabe6	ci: fix rate-limit of huggingface with hf auth login (#10947 )	2025-09-26 11:02:44 -07:00
hzh0425	7ec5b4e89c	[PD-HiCache]: Support Async Offloading KVCache In Decode Side (#10192 ) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-09-25 23:20:49 -07:00
eraser00	0ac6114694	Replace the Kimi-K2 generated tool call idx with history tool call count (#10612 ) Co-authored-by: eraser00 <eraser00@github.com>	2025-09-25 18:47:40 -07:00
Lianmin Zheng	f68dd998b9	Rename customer label -> custom label (#10899 ) Co-authored-by: Yingchun Lai <laiyingchun@apache.org> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-25 16:19:53 -07:00
Lianmin Zheng	35ec2a45a8	[minor] Remove deprecated function `get_ip` (#10883 )	2025-09-25 16:18:04 -07:00
kushanam	d7b20dd65d	chore: Initial support for input config files (#10534 ) Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-09-24 14:45:52 -07:00
Xinyuan Tong	71f24ef8f6	feat: add cache_salt support to request (#10718 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-09-23 23:30:25 -07:00
Lianmin Zheng	b1f0fc1c0b	Add CI timeout guidelines (#10829 )	2025-09-23 22:08:02 -07:00
Shangming Cai	23632d350c	Fix latest main ci (#10799 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-09-23 12:46:13 -07:00
Shangming Cai	d21c35224d	Fix hicache mooncake backend CI (#10792 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-09-23 02:04:44 -07:00
Even Zhou	d27a6f7092	[Feature] Add MLAProcess for DeepSeek MLA on NPU (#10130 )	2025-09-22 17:17:48 -07:00
Vedant Jhaveri	2f555c4cee	[Generative Score API] Added test_scores_api.py to github CICD to run per commit (#10755 ) Co-authored-by: Vedant Jhaveri <vjhaveri@linkedin.com> Co-authored-by: Sundara Raman Ramachandran <sundar24295@gmail.com>	2025-09-22 14:41:57 -07:00
Lifu Huang	2101d93b4f	Fix CI TestChunkedSGMV (#10737 )	2025-09-22 16:09:58 +08:00
Shangming Cai	70e4b21853	Fix flaky logprobs test (#10728 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-09-22 00:46:26 -07:00
Yineng Zhang	2f18602f13	fix: disable gpt-oss b200 ut (#10716 )	2025-09-21 17:02:25 -07:00
Xinyuan Tong	12d6cf18f0	Refactors radix cache for extra key support (#10317 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>	2025-09-22 02:16:16 +08:00
Lifu Huang	08ecd0aa2a	[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592 )	2025-09-20 22:47:48 -07:00
Yineng Zhang	ba94b82986	fix: update run_suite (#10685 )	2025-09-20 01:22:06 -07:00
huangtingwei	7f399e4bce	[HiCacheStorage]support page_first_direct layout for generic set&get (#10522 )	2025-09-19 05:47:16 -07:00
Zhihao Zhang	e7bc600304	[Feature] Speculative decoding support lookahead (#9873 ) Co-authored-by: a4zhangfei <a4zhangfei@qq.com> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-09-18 16:42:41 -07:00
yuk.igalaxy	9a5c42f9ad	feat: Add FlexAttention Backend for Efficient Sparse Attention (#9947 ) Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-09-18 11:49:17 -07:00
penguin_wwy	93f75778be	[RL] Add destroy process group api (#9979 )	2025-09-19 00:31:56 +08:00
Yineng Zhang	564050766d	fix: update dsv3 fp4 ut (#10584 )	2025-09-17 14:34:58 -07:00
Teng Ma	77098aea7b	[HiCache] Add tests for hicache storage mooncake backend (#10171 ) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: hzh0425 <hzh0425@apache.org> Co-authored-by: Shangming Cai <csmthu@gmail.com>	2025-09-18 01:07:16 +08:00
harrisonlimh	14fdd52740	feat: add priority based scheduling with priority based request acceptance and preemption (#8746 )	2025-09-16 17:10:10 -07:00
Night	f1c692f6f8	Add Logprobs unit test with a loose threshold (#10230 ) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Ryan <ryan@ryanmini.mynetworksettings.com>	2025-09-16 13:04:40 +08:00
Lifu Huang	3f41b48c40	[2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286 )	2025-09-15 16:04:03 -07:00
fzyzcjy	3b25dc127a	[1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473 )	2025-09-15 11:53:21 -07:00
Praneth Paruchuri	a45d9a4ee8	model: support solar (#8189 )	2025-09-16 02:21:13 +08:00
Lianmin Zheng	50dc0c1e9c	Run tests based on labels (#10456 )	2025-09-15 00:29:20 -07:00
Jintao Zhang	f9ee6ae17a	[router]: Add Embedding routing logic (#10129 ) Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com> Co-authored-by: Waël Boukhobza <wawa_wael@live.fr>	2025-09-14 18:44:35 -07:00
Yineng Zhang	dcee42c200	feat: add dsv3 fp4 cutlass moe etp ut (#10433 )	2025-09-14 18:44:09 -07:00
Cheng Wan	2f8ba6fe82	[Fix] MoE: fix w8a8_fp8 MoE and add tests to cover this code path (#10429 )	2025-09-14 17:34:28 -07:00
Feng Su	4c21b09074	[Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 1 (#9962 ) Signed-off-by: Feng Su <sufeng@linux.alibaba.com> Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com> Signed-off-by: Peng Wang <rocking@linux.alibaba.com>	2025-09-15 02:08:02 +08:00
Sundara Raman Ramachandran	94d0f656fb	[Performance] Dynamic Batch Tokenizer (#9382 )	2025-09-14 01:56:04 +08:00
Yineng Zhang	9d775b1a2d	feat: add deepseek v3 fp4 ut (#10391 )	2025-09-12 15:43:29 -07:00
Yi Zhang	fe6cdf8972	add qwen3-next ut (#10355 )	2025-09-12 18:06:48 +08:00
amysaq2023	30d20ce84f	Support loading weights from remote instance (#8215 ) Signed-off-by: Anqi Shen <amy.saq@antgroup.com> Co-authored-by: Chayenne <74843776+zhaochenyang20@users.noreply.github.com>	2025-09-12 17:40:22 +08:00
EduardDurech	46d8fb1c98	model: support Apertus (#9774 )	2025-09-11 20:49:10 -07:00
Shu Wang	3df05f4d6a	[NVIDIA] [3/N] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#9199 )	2025-09-11 20:18:43 -07:00
Minglei Zhu	46ccbed2cd	update GLM nightly test threshold (#10331 )	2025-09-11 14:54:58 -07:00
Zaili Wang	ef959d7b85	[CPU] fix OOM when mem-fraction is not set (#9090 )	2025-09-10 23:52:22 -07:00

1 2 3 4 5 ...

993 Commits