sglang-bot
|
758b887ad1
|
chore: bump SGLang version to 0.5.3.post1 (#11324)
|
2025-10-09 15:19:59 -07:00 |
|
Netanel Haber
|
d6837aea4d
|
model: Support Hybrid Mamba2 NemotronHForCausalLM (nvidia/NVIDIA-Nemotron-Nano-9B-v2) (#10909)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-10-09 00:37:38 +08:00 |
|
Kevin Xiang Li
|
e3bb7f5ae6
|
benchmark: enhance configurable multimodal benchmarking in bench_serving (#9812)
Co-authored-by: Xiang (Kevin) Li <lik@nvidia.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-10-08 01:31:36 -07:00 |
|
Shangming Cai
|
0a7c4bded7
|
[Doc] Update mooncake nvlink transport doc for PD disaggregation (#11321)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-08 00:59:29 -07:00 |
|
Cheng Wan
|
3c06b673af
|
[8/N] MoE Refactor: deprecate EPMoE (#11211)
|
2025-10-07 21:51:41 -07:00 |
|
Adarsh Shirawalmath
|
7c3f07dbcb
|
[Feature] Add /tokenize and /detokenize OpenAI compatible endpoints (#9545)
|
2025-10-08 12:38:48 +08:00 |
|
Xinyuan Tong
|
c4d77774e1
|
update sampling_params documentation with defaults (#11315)
|
2025-10-07 18:36:26 -07:00 |
|
Xinyuan Tong
|
e3c7f09146
|
Update tool parser and related documentation (#11223)
|
2025-10-07 11:03:40 -07:00 |
|
hzh0425
|
df08bf9b9f
|
[Doc]: Best Practice for HICache (#11001)
Co-authored-by: 晟海 <huangtingwei.htw@antgroup.com>
Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-10-08 00:59:21 +08:00 |
|
ykwd
|
69efdd27bc
|
[Doc] HiCache Design Documents (#11027)
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-10-08 00:35:45 +08:00 |
|
Wenyi Xu
|
0958a39704
|
[Docs] [Router] Update Observability and Common Issues Section (#11302)
|
2025-10-07 08:03:09 -07:00 |
|
Lianmin Zheng
|
708f4ff490
|
Rename max_micro_batch_size -> pp_max_micro_batch_size (#11279)
|
2025-10-06 15:50:56 -07:00 |
|
sglang-bot
|
a4a3d82393
|
chore: bump SGLang version to 0.5.3 (#11263)
|
2025-10-06 20:07:02 +08:00 |
|
sglang-bot
|
0b13cbb7c9
|
chore: bump SGLang version to 0.5.3rc2 (#11259)
Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com>
|
2025-10-06 01:12:10 -07:00 |
|
Lianmin Zheng
|
d645ae90a3
|
Rename runner labels (#11228)
|
2025-10-05 18:05:41 -07:00 |
|
Praneth Paruchuri
|
fad7ca73f8
|
model: support starcoder2 (#10609)
|
2025-10-04 00:11:19 +08:00 |
|
Matt Nappo
|
8c57490210
|
[Feature] Option to save model weights to CPU when memory saver mode is enabled (#10873)
Co-authored-by: molocule <34072934+molocule@users.noreply.github.com>
|
2025-10-03 16:48:19 +08:00 |
|
fzyzcjy
|
5e786cca3a
|
Support single batch overlap (#10422)
|
2025-10-02 18:04:36 +08:00 |
|
Xinyuan Tong
|
a9ce2bcb3c
|
[Doc] Update multimodal language models documentation (#11111)
Co-authored-by: Mick <mickjagger19@icloud.com>
|
2025-09-30 22:10:31 -07:00 |
|
narutolhy
|
d17986f8c6
|
Enable optional FP32 compute for LM Head (#10729)
Thanks to MiniMax Team and Chenyang Zhao's support.
|
2025-09-29 20:45:17 -07:00 |
|
Lianmin Zheng
|
dda34c2f93
|
Fix mem fraction static for nightly tests (#11076)
|
2025-09-29 12:57:41 -07:00 |
|
Lianmin Zheng
|
f68dd998b9
|
Rename customer label -> custom label (#10899)
Co-authored-by: Yingchun Lai <laiyingchun@apache.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-25 16:19:53 -07:00 |
|
kushanam
|
d7b20dd65d
|
chore: Initial support for input config files (#10534)
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
|
2025-09-24 14:45:52 -07:00 |
|
Lianmin Zheng
|
b1f0fc1c0b
|
Add CI timeout guidelines (#10829)
|
2025-09-23 22:08:02 -07:00 |
|
Even Zhou
|
d27a6f7092
|
[Feature] Add MLAProcess for DeepSeek MLA on NPU (#10130)
|
2025-09-22 17:17:48 -07:00 |
|
Adarsh Shirawalmath
|
592caab66a
|
[Docs, minor] Fix LLM doc matrix (#10753)
|
2025-09-23 01:29:55 +08:00 |
|
Lifu Huang
|
08ecd0aa2a
|
[3/4] Speed up CSGMV backend perf by 10% through dynamic chunking + kernel optimization (#10592)
|
2025-09-20 22:47:48 -07:00 |
|
Zaili Wang
|
6fd4816d9f
|
Fix sgl_kernel import failure on devices other than CUDA (#10610)
|
2025-09-18 11:38:02 -07:00 |
|
Philip Kiely - Baseten
|
7f028b07c4
|
Fix formatting in long code blocks (#10528)
|
2025-09-16 12:02:05 -07:00 |
|
Zaili Wang
|
925dbb3218
|
[CPU] fix CPU backend sel. issue for Llama4 (#10511)
|
2025-09-16 02:57:45 -07:00 |
|
Lifu Huang
|
3f41b48c40
|
[2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286)
|
2025-09-15 16:04:03 -07:00 |
|
Praneth Paruchuri
|
a45d9a4ee8
|
model: support solar (#8189)
|
2025-09-16 02:21:13 +08:00 |
|
Yineng Zhang
|
86a32bb5cd
|
chore: bump v0.5.3rc0 (#10468)
|
2025-09-15 03:55:18 -07:00 |
|
Lianmin Zheng
|
50dc0c1e9c
|
Run tests based on labels (#10456)
|
2025-09-15 00:29:20 -07:00 |
|
Vincent Zhong
|
0b14159fc4
|
Add reasoning examples for GPT-OSS in Markdown examples (#9626)
|
2025-09-15 11:27:40 +08:00 |
|
Feng Su
|
4c21b09074
|
[Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 1 (#9962)
Signed-off-by: Feng Su <sufeng@linux.alibaba.com>
Signed-off-by: Huaixin Chang <changhuaixin@linux.alibaba.com>
Signed-off-by: Peng Wang <rocking@linux.alibaba.com>
|
2025-09-15 02:08:02 +08:00 |
|
Shu Wang
|
3df05f4d6a
|
[NVIDIA] [3/N] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#9199)
|
2025-09-11 20:18:43 -07:00 |
|
Zaili Wang
|
7bc5fb0d78
|
[CPU][doc] add torch.compile param in example commands (#10349)
|
2025-09-11 19:22:46 -07:00 |
|
Yineng Zhang
|
b0d25e72c4
|
chore: bump v0.5.2 (#10221)
|
2025-09-11 16:09:20 -07:00 |
|
Yi Zhang
|
760b788a58
|
add qwen3-next doc (#10327)
|
2025-09-11 14:29:11 -07:00 |
|
Zaili Wang
|
ef959d7b85
|
[CPU] fix OOM when mem-fraction is not set (#9090)
|
2025-09-10 23:52:22 -07:00 |
|
Glen Liu
|
ebd0e1c18b
|
[doc] add walkthrough for implementing and hosting a simple llama wrapper m… (#10093)
|
2025-09-10 12:05:06 +08:00 |
|
Shakhizat Nurgaliyev
|
2fe17735a6
|
Updated Nvidia Jetson docs (#4422)
|
2025-09-09 11:41:21 +08:00 |
|
geray
|
ba066ca02f
|
Update link for EAGLE speculative decoding (#10191)
|
2025-09-09 11:09:50 +08:00 |
|
Baizhou Zhang
|
8ad700f735
|
Cleaning codes for speculative attention mode (#10149)
|
2025-09-08 17:38:06 -07:00 |
|
Teng Ma
|
a02071a12c
|
[Bench] feat: mooncake trace integration (#9839)
Signed-off-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
Signed-off-by: Teng Ma <sima.mt@alibaba-inc.com>
Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com>
|
2025-09-09 02:50:54 +08:00 |
|
Yineng Zhang
|
b7d1f17b8d
|
Revert "enable auto-round quantization model (#6226)" (#10148)
|
2025-09-07 22:31:11 -07:00 |
|
Weiwei
|
c8295d2353
|
enable auto-round quantization model (#6226)
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
|
2025-09-07 22:05:35 -07:00 |
|
Cao E
|
7577f0e40f
|
Add graph runner support with torch compile on CPU (#7843)
|
2025-09-07 21:33:58 -07:00 |
|
eigen
|
b0fcbb74d0
|
[DOC]: some minor updates (#10134)
|
2025-09-07 14:58:15 -07:00 |
|