Simo Lin
|
f5d30dae89
|
[router] Refactor StopSequenceDecoder to Use Sequence for Incremental Decoding (#11676)
|
2025-10-15 16:31:03 -07:00 |
|
Chang Su
|
2479b89405
|
[router][grpc] Simplify model_id determination (#11684)
|
2025-10-15 15:56:58 -07:00 |
|
Fan Yin
|
5464457251
|
[sgl-kernel] Optimize gguf test (#11667)
|
2025-10-15 15:45:53 -07:00 |
|
Qi Yuhang
|
6c01844f45
|
[sgl-kernel][3/N]Support Expert Specialization Grouped GEMM (#11674)
|
2025-10-15 13:39:31 -07:00 |
|
Chang Su
|
f226d3da2a
|
Fix missing json imports in serving_responses.py (#11681)
|
2025-10-15 13:01:55 -07:00 |
|
Keyang Ru
|
d2478cd4ff
|
[router] Fix response api related spec (#11621)
|
2025-10-15 09:59:38 -07:00 |
|
Chang Su
|
30ea4c462b
|
[tool call] Fix prev_tool_call_arr management in base_format_detector.py (#11367)
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
|
2025-10-15 09:51:51 -07:00 |
|
Shangming Cai
|
6d0364681c
|
Fix 1-step draft model forward (#11653)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
|
2025-10-15 19:11:33 +08:00 |
|
Liangsheng Yin
|
8221f9ae8b
|
Tiny cleanup some eagle unused codes (#11660)
|
2025-10-15 17:24:08 +08:00 |
|
Yineng Zhang
|
ab9187a20b
|
docs: update sglang installation guide (#11659)
|
2025-10-15 00:35:48 -07:00 |
|
Stefan He
|
6b143d62a2
|
Clean up some Qwen3-Next and deterministic code (#11585)
|
2025-10-15 15:19:37 +08:00 |
|
b8zhong
|
6bc503af73
|
[Doc] Update support matrix for attn and hybrid attn (#11293)
|
2025-10-14 22:43:11 -07:00 |
|
Zheng Wengang
|
b2c8566920
|
[BugFix][Qwen3-VL]: fix cu_seqlens in qwen3-vl (#11458)
|
2025-10-14 22:16:49 -07:00 |
|
fzyzcjy
|
32803fb279
|
Super tiny improve FA3 import error message (#11590)
|
2025-10-14 22:06:31 -07:00 |
|
Yineng Zhang
|
91fc5bb5a9
|
feat: add add_chunked_prefix_cache_attention_backend (#11636)
|
2025-10-14 21:48:13 -07:00 |
|
Lifu Huang
|
780fbf2f38
|
[Fix] Fix accuracy bug in CSGMV kernel caching key. (#11579)
|
2025-10-14 20:25:56 -07:00 |
|
Jinwu
|
825432fce6
|
[1/N]Support DeepSeek-R1 w4a8 normal deepep (#8247)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
|
2025-10-14 20:10:53 -07:00 |
|
Xun Sun
|
a40229f6f8
|
[1/N] Introduce Mooncake Backend and Mooncake EP to Support Elastic EP (#10423)
Co-authored-by: Hank Han <hanhan7630@outlook.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-14 19:40:54 -07:00 |
|
Simo Lin
|
74737b2863
|
[router] upgrade to 0.2.0 (#11642)
|
2025-10-14 22:10:30 -04:00 |
|
Simo Lin
|
40e0082d8d
|
[router] add worker self discovery for metadata (#11638)
|
2025-10-14 22:07:25 -04:00 |
|
Sahithi Chigurupati
|
e9e120ac7a
|
fix: upgrade transformers to 4.57.1 (#11628)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Co-authored-by: zhyncs <me@zhyncs.com>
|
2025-10-14 18:35:05 -07:00 |
|
Simo Lin
|
e0c2af2ac2
|
[router] update router doc to latest features (#11639)
|
2025-10-14 18:32:30 -07:00 |
|
cctry
|
1d7f783501
|
Refactor kv cache free (#11351)
|
2025-10-14 17:45:19 -07:00 |
|
Simo Lin
|
325951460f
|
[router][grpc] add warm up to grpc server (#11627)
Co-authored-by: Chang Su <chang.s.su@oracle.com>
|
2025-10-14 16:11:16 -07:00 |
|
Yineng Zhang
|
86373b9e48
|
fix: Update SGL_KERNEL_VERSION to 0.3.15 (#11633)
|
2025-10-14 14:45:28 -07:00 |
|
Lianmin Zheng
|
d314bf6010
|
Update install.md (#11631)
|
2025-10-14 14:34:46 -07:00 |
|
DarkSharpness
|
e28c9e526f
|
[Minor] Update xgrammar dependency (#11622)
|
2025-10-14 13:46:50 -07:00 |
|
Lianmin Zheng
|
b98cf39866
|
[Auto Sync] Update collector.py (20251014) (#11625)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Byron Hsu <byronhsu1230@gmail.com>
|
2025-10-14 13:34:33 -07:00 |
|
Lianmin Zheng
|
27d710457c
|
[Auto Sync] Update scheduler.py, server_args.py (20251014) (#11623)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Stefan He <hebiaobuaa@gmail.com>
|
2025-10-14 13:20:03 -07:00 |
|
Baizhou Zhang
|
c224a4c6cc
|
Fix log for chunked prefix cache (#11624)
|
2025-10-14 11:49:33 -07:00 |
|
Simo Lin
|
49345a68cf
|
[router] update router readme to latest features (#11619)
|
2025-10-14 11:47:38 -07:00 |
|
strgrb
|
94d26d850d
|
use non_blocking h2d in ForwardBatch.prepare_mlp_sync_batch. (#11605)
|
2025-10-14 11:30:59 -07:00 |
|
Simo Lin
|
9e8a15a74c
|
[router] add chang and keyang to sgl router author (#11620)
|
2025-10-14 11:10:49 -07:00 |
|
Simo Lin
|
3962e39d7c
|
[router] cleanup app context and move to startup (#11617)
|
2025-10-14 10:19:28 -07:00 |
|
Keyang Ru
|
eb8cac6fe2
|
[router] add py binding and readme for openai router and history backend (#11453)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-14 09:42:34 -07:00 |
|
Liangsheng Yin
|
5ea96ac7cc
|
Reduce one step decode for draft model. (#11561)
|
2025-10-14 23:52:04 +08:00 |
|
yinghui
|
56222658ec
|
move eagle draft post process to cuda graph (#11434)
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
|
2025-10-14 22:50:53 +08:00 |
|
Alex Chi Z
|
dc965db0e0
|
make radix cache deterministic (#10721)
Signed-off-by: Alex Chi Z <iskyzh@gmail.com>
|
2025-10-14 21:01:52 +08:00 |
|
Scott Lee
|
817e46f412
|
Refactor spec decoding metrics calculation into separate TokenizerManager utility function (#11586)
|
2025-10-14 20:45:49 +08:00 |
|
Liangsheng Yin
|
5a33c3aae7
|
Optimize Triton Draft Backend (#11556)
|
2025-10-14 20:08:32 +08:00 |
|
sglang-bot
|
9767a1e41b
|
Update release-docker-dev.yml (#11603)
|
2025-10-14 03:06:48 -07:00 |
|
Sai Enduri
|
1d08653972
|
[AMD CI] Add image and weights caching. (#11593)
|
2025-10-14 02:51:35 -07:00 |
|
Simo Lin
|
a04efc4933
|
[router] when given both local tokenizer and chat template, log all (#11601)
|
2025-10-14 02:22:58 -07:00 |
|
Wenyi Xu
|
642fa966f2
|
[Docs] [Router]: Update sg-router doc on circuit breaker (#11449)
|
2025-10-14 02:18:14 -07:00 |
|
Simo Lin
|
da7fac1b75
|
[router] allow router launch server to use grpc mode (#11600)
|
2025-10-14 01:42:43 -07:00 |
|
Simo Lin
|
28ad2297a0
|
[router] delete useless table content comment in spec (#11597)
|
2025-10-14 01:08:18 -07:00 |
|
Lianmin Zheng
|
f7f9f8eceb
|
Update news section in README.md (#11598)
|
2025-10-14 00:49:39 -07:00 |
|
Simo Lin
|
4b62af92ef
|
[router] change worker api to async instead of sync (#11566)
|
2025-10-14 00:32:21 -07:00 |
|
Simo Lin
|
0b9915c132
|
[router] update generate spec to align with sgl io struct (#11591)
|
2025-10-14 02:51:33 -04:00 |
|
Chang Su
|
27ef1459e6
|
[router][protocols] Add Axum validate extractor and use it for /v1/chat/completions endpoint (#11588)
|
2025-10-13 22:51:15 -07:00 |
|