sglang-bot
|
1053e1be17
|
chore: bump SGLang version to 0.5.4 (#12027)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-23 18:01:40 -07:00 |
|
Teng Ma
|
96a5e4dd79
|
[Feature] Support loading weights from ckpt engine worker (#11755)
Signed-off-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Co-authored-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-23 09:23:30 -07:00 |
|
Zaili Wang
|
007b849b0e
|
[CPU] misc updates (#11906)
|
2025-10-22 21:10:05 -07:00 |
|
Baizhou Zhang
|
983ef22cf3
|
[Doc] Update deterministic inference flag in server_arguments.md (#11978)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-22 14:12:15 -07:00 |
|
Minglei Zhu
|
200a3c0bb1
|
[Documentation] add doc for deterministic inference (#11956)
|
2025-10-22 12:36:15 -05:00 |
|
Zhiyu
|
80b2b3207a
|
Enable native ModelOpt quantization support (3/3) (#10154)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-10-21 21:44:29 -07:00 |
|
Baizhou Zhang
|
ef4a8097b8
|
Rename flashmla kernel options of nsa backend for better readability (#11876)
|
2025-10-21 13:14:16 -07:00 |
|
ybyang
|
dbb16bedd5
|
Support Thinking Budget (via custom_logit_processor for OpenAI API) [Fix #6572] (#11416)
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: YorkSu <york_su@qq.com>
|
2025-10-21 16:27:56 +08:00 |
|
Neelabh Sinha
|
852c0578fd
|
[FEATURE] Add OpenAI-Compatible LoRA Adapter Selection (#11570)
|
2025-10-21 15:44:33 +08:00 |
|
Meng, Hengyu
|
b113c72e7a
|
Init attention backend for Intel XPU (#10656)
Co-authored-by: guangyey <guangye.yu@intel.com>
Co-authored-by: DiweiSun <105627594+DiweiSun@users.noreply.github.com>
|
2025-10-21 11:41:28 +08:00 |
|
DarkSharpness
|
276e7b3e4e
|
[Feature] New structural tag support (#10691)
|
2025-10-20 18:25:58 +08:00 |
|
Sai Enduri
|
e53bf44243
|
Update amd gpu install docs. (#11849)
|
2025-10-20 00:03:26 -07:00 |
|
Shane A
|
d383e6616e
|
[Model] Add Olmo 3 model support (#11396)
|
2025-10-19 23:59:16 -07:00 |
|
Shangming Cai
|
a2ba0bc3df
|
Tiny clean up for PD module and doc (#11747)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-20 11:52:42 +08:00 |
|
Baizhou Zhang
|
44f0ece9fc
|
[Doc] Update documents for FA4 (#11778)
|
2025-10-19 17:40:38 -07:00 |
|
ybyang
|
b5e14b2b78
|
[1/2][feature] support openai like classification api (#11618)
|
2025-10-18 19:32:48 -07:00 |
|
b8zhong
|
f9a7d9b3dc
|
support server arg override KV cache to bf16 to avoid slow cases (#11749)
|
2025-10-19 02:49:48 +08:00 |
|
Lianmin Zheng
|
67e34c56d7
|
Fix install instructions and pyproject.tomls (#11781)
|
2025-10-18 01:08:01 -07:00 |
|
Qiaolin Yu
|
547003bdd0
|
fix command line usage of profiling (#11793)
|
2025-10-18 12:54:36 +08:00 |
|
Lianmin Zheng
|
9eefe2c0b7
|
Set CUDA_VISIBLE_DEVICES to achieve one GPU per process (#9170)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Cheng Wan <cwan@x.ai>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
|
2025-10-17 17:30:06 -07:00 |
|
Lianmin Zheng
|
b9a54e0968
|
[minor] sync code on python/sglang/test/test_deterministic.py and improve ci tests (#11777)
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2025-10-17 14:25:22 -07:00 |
|
Keyang Ru
|
2bc3fcd420
|
[doc] update router document (#11767)
|
2025-10-17 10:26:54 -07:00 |
|
sglang-bot
|
85ebeecf06
|
chore: bump SGLang version to 0.5.3.post3 (#11693)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-16 13:14:55 -07:00 |
|
Lianmin Zheng
|
cd7e1bd591
|
Sync code and test CI; rename some env vars (#11686)
|
2025-10-15 18:37:03 -07:00 |
|
sglang-bot
|
baf277a9bf
|
chore: bump SGLang version to 0.5.3.post2 (#11680)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-15 16:49:14 -07:00 |
|
Fan Yin
|
5464457251
|
[sgl-kernel] Optimize gguf test (#11667)
|
2025-10-15 15:45:53 -07:00 |
|
Yineng Zhang
|
ab9187a20b
|
docs: update sglang installation guide (#11659)
|
2025-10-15 00:35:48 -07:00 |
|
b8zhong
|
6bc503af73
|
[Doc] Update support matrix for attn and hybrid attn (#11293)
|
2025-10-14 22:43:11 -07:00 |
|
Xun Sun
|
a40229f6f8
|
[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-14 19:40:54 -07:00 |
|
Simo Lin
|
e0c2af2ac2
|
[router] update router doc to latest features (#11639)
|
2025-10-14 18:32:30 -07:00 |
|
Lianmin Zheng
|
d314bf6010
|
Update install.md (#11631)
|
2025-10-14 14:34:46 -07:00 |
|
Wenyi Xu
|
642fa966f2
|
[Docs] [Router]: Update sg-router doc on circuit breaker (#11449)
|
2025-10-14 02:18:14 -07:00 |
|
Chenxi Li
|
28f80b1244
|
Implement LRU eviction policy for LoRA adapters (#11041)
|
2025-10-13 20:18:25 -07:00 |
|
Xiaoyu Zhang
|
88a6f9dab5
|
bench_serving support PD Disaggregation (#11542)
|
2025-10-13 19:43:26 -07:00 |
|
Neelabh Sinha
|
aaf7af1b17
|
[FEATURE] Add Profile Trace Merger for Distributed Traces (#11413)
|
2025-10-14 09:20:17 +08:00 |
|
Liangsheng Yin
|
acc2327bbd
|
Move deep gemm related arguments to sglang.srt.environ (#11547)
|
2025-10-14 00:34:35 +08:00 |
|
hzh0425
|
318424e2c8
|
[HICache]: Support 3FS-Store with page_first_direct layout (#11460)
|
2025-10-13 15:47:22 +08:00 |
|
Jonah Bernard
|
8e776c78a1
|
docs(router): add token-bucket rate limiting to the docs (#11485)
|
2025-10-12 20:03:27 -07:00 |
|
Lianmin Zheng
|
2ac46e94ef
|
Sync changes on io_struct.py and deterministic ops (#11498)
|
2025-10-12 16:03:10 -07:00 |
|
Glen Liu
|
47c606d3dc
|
[Feature] support regex strings as a stopping condition (#10635)
|
2025-10-12 10:53:15 +08:00 |
|
ykcombat
|
f5754d1256
|
[Documentation][Configuration] Server args and documentation of PD-Multiplexing. (#11427)
|
2025-10-11 21:36:07 +08:00 |
|
Zaili Wang
|
f19613e6c3
|
Dedicated toml files for CPU/XPU (#10734)
|
2025-10-10 00:44:55 -07:00 |
|
sglang-bot
|
758b887ad1
|
chore: bump SGLang version to 0.5.3.post1 (#11324)
|
2025-10-09 15:19:59 -07:00 |
|
Netanel Haber
|
d6837aea4d
|
model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-10-09 00:37:38 +08:00 |
|
Kevin Xiang Li
|
e3bb7f5ae6
|
benchmark: enhance configurable multimodal benchmarking in bench_serving (#9812)
Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-10-08 01:31:36 -07:00 |
|
Shangming Cai
|
0a7c4bded7
|
[Doc] Update mooncake nvlink transport doc for PD disaggregation (#11321)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-08 00:59:29 -07:00 |
|
Cheng Wan
|
3c06b673af
|
[8/N] MoE Refactor: deprecate EPMoE (#11211)
|
2025-10-07 21:51:41 -07:00 |
|
Adarsh Shirawalmath
|
7c3f07dbcb
|
[Feature] Add /tokenize and /detokenize OpenAI compatible endpoints (#9545)
|
2025-10-08 12:38:48 +08:00 |
|
Xinyuan Tong
|
c4d77774e1
|
update sampling_params documentation with defaults (#11315)
|
2025-10-07 18:36:26 -07:00 |
|
Xinyuan Tong
|
e3c7f09146
|
Update tool parser and related documentation (#11223)
|
2025-10-07 11:03:40 -07:00 |
|