Liu-congo
|
c80a96dae9
|
[BugFix] test_mla_fp8.py fails on Cublas 12.9 (#11360)
Signed-off-by: Liu-congo <1502632128@qq.com>
|
2025-10-10 21:14:24 -07:00 |
|
Stefan He
|
eae9a9fb9d
|
Fix batch invariant ops (#11368)
|
2025-10-10 20:49:08 -07:00 |
|
wxsm
|
2674c1d280
|
fix: Change dsv32 hack temporary path to use system temp directory (#11445)
|
2025-10-10 19:59:41 -07:00 |
|
Lianmin Zheng
|
61055cb309
|
Reorder PD disagg CI tests (#11438)
|
2025-10-10 17:56:49 -07:00 |
|
Chang Su
|
92777135a0
|
[router][grpc] Consolidate parser checks for chat completions (#11439)
|
2025-10-10 20:44:29 -04:00 |
|
Simo Lin
|
c495833186
|
[router] leverage RAII to actively cancel request during client disconnect (#11399)
|
2025-10-10 20:43:38 -04:00 |
|
Simo Lin
|
2eeb27515a
|
[router] disable rate limiter by default (#11435)
|
2025-10-10 20:43:07 -04:00 |
|
cctry
|
b36afed4a7
|
Separate allocation logic from scheduler (#11313)
|
2025-10-10 17:38:54 -07:00 |
|
JinYan Su
|
9aa4502d11
|
feat(mooncake): support GB suffix for global_segment_size (#10745)
Signed-off-by: Jinyang Su <751080330@qq.com>
Co-authored-by: huangtingwei <141888744+huangtingwei9988@users.noreply.github.com>
|
2025-10-10 17:38:25 -07:00 |
|
Keyang Ru
|
a0835c3a62
|
[router] Fix ci nvcc not found error (#11411)
|
2025-10-10 15:43:16 -07:00 |
|
Scott Lee
|
55b14656e6
|
Revert "Add metrics for speculative decoding (acceptance rate, average acceptance length)" (#11433)
|
2025-10-10 12:54:57 -07:00 |
|
Lianmin Zheng
|
b4408e6098
|
Revert "fix: fix video input for qwen3-vl" (#11437)
|
2025-10-10 12:44:40 -07:00 |
|
Cheng Wan
|
52fcbbb8bd
|
Revert "perf: optimize qwen-vl with symm mem allreduce" (#11436)
|
2025-10-10 12:30:05 -07:00 |
|
Sahithi Chigurupati
|
af96ca1136
|
[CI] Merge build-dev into workflow matrix (#11345)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
|
2025-10-10 11:13:42 -07:00 |
|
Teng Ma
|
9082a7d323
|
[HiCache] feat: add multi tenant with prefix tag (#9256)
|
2025-10-11 00:23:28 +08:00 |
|
Yuan Luo
|
3b9d97f335
|
perf: optimize qwen-vl with symm mem allreduce (#11381)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-10-10 22:24:45 +08:00 |
|
Mick
|
a1a20b4c7c
|
fix: fix video input for qwen3-vl (#11361)
|
2025-10-10 04:35:35 -07:00 |
|
Yineng Zhang
|
4299aebdbb
|
chore: update pyproject (#11420)
|
2025-10-10 00:56:30 -07:00 |
|
Scott Lee
|
0babd48736
|
Add metrics for speculative decoding (acceptance rate, average acceptance length) (#11144)
|
2025-10-10 00:46:44 -07:00 |
|
Zaili Wang
|
f19613e6c3
|
Dedicated toml files for CPU/XPU (#10734)
|
2025-10-10 00:44:55 -07:00 |
|
ziruiliu
|
8df4945559
|
fix file and object naming scheme in HiCacheNixl to avoid data corruption (#10969)
Signed-off-by: Zirui Liu <ziliu@ddn.com>
|
2025-10-10 00:23:10 -07:00 |
|
hzh0425
|
ee3bd8a1c8
|
feat(hicache): Support passing prefix keys for l3 store. (#9045)
Co-authored-by: pansicheng <sicheng.pan.chn@gmail.com>
Co-authored-by: Zhiqiang Xie <xiezhq@stanford.edu>
|
2025-10-10 00:22:05 -07:00 |
|
Yineng Zhang
|
d8467db727
|
fix: reinstall torch in deps install (#11414)
|
2025-10-09 22:58:18 -07:00 |
|
Yuan Luo
|
b5044fbf12
|
Replace pad with cat for better performance (#11388)
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
|
2025-10-10 12:03:17 +08:00 |
|
Shangming Cai
|
70fbb3adf6
|
[CI] Refactor PD disaggregation test suite (#11363)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
2025-10-09 18:50:39 -07:00 |
|
Glen Liu
|
9a7e7a6576
|
[Bug Fix] prevent lora adapter from being loaded into LoRAManager if it is already loaded (#11365)
|
2025-10-09 18:43:03 -07:00 |
|
Yingchun Lai
|
0fe87213bb
|
fix: fix gpu-proc affinity set incorrectly when pp_size > 1 (#11389)
|
2025-10-09 18:40:05 -07:00 |
|
Xinyuan Tong
|
1f106ee365
|
[grammar] Avoid server crash when grammar backend is None (#11401)
|
2025-10-09 18:38:10 -07:00 |
|
Lianmin Zheng
|
9b8ebb2798
|
move more files under srt/utils (#11285)
|
2025-10-09 16:46:15 -07:00 |
|
sglang-bot
|
758b887ad1
|
chore: bump SGLang version to 0.5.3.post1 (#11324)
|
2025-10-09 15:19:59 -07:00 |
|
Keyang Ru
|
eb7d9261c0
|
[router] conversation item API: create, retrieve and delete (#11369)
|
2025-10-09 17:43:16 -04:00 |
|
Yineng Zhang
|
44cb060785
|
chore: upgrade flashinfer 0.4.0 (#11364)
|
2025-10-09 14:17:54 -07:00 |
|
Simo Lin
|
88bb627d0d
|
[router] change grpc client from mutable to clone (#11394)
|
2025-10-09 11:00:24 -07:00 |
|
Chang Su
|
b520958ec8
|
[router][grpc] Replace fake health check with correct ones (#11387)
|
2025-10-09 09:13:57 -07:00 |
|
shaharmor98
|
fa7e2c3049
|
fix bench_serving mishandling of internal states (#11376)
Signed-off-by: Shahar Mor <smor@nvidia.com>
|
2025-10-09 19:24:50 +08:00 |
|
shaharmor98
|
8f2cd177af
|
add code pp support for nixl (#11375)
Signed-off-by: Shahar Mor <smor@nvidia.com>
|
2025-10-09 19:24:32 +08:00 |
|
Chang Su
|
ab926dd697
|
[router][grpc] Fix streaming bugs: empty tool names, state pollution, and panics (#11373)
|
2025-10-09 06:53:23 -04:00 |
|
Trevor Morris
|
a4b424c632
|
[DeepSeek-V3.2] Include indexer kv cache when estimating kv cache size (#11309)
|
2025-10-08 23:59:46 -07:00 |
|
Chang Su
|
a0557642ea
|
[router][lint] Add unused_qualifications to cargo lint warnings (#11366)
|
2025-10-08 22:17:11 -07:00 |
|
Keyang Ru
|
84768d1017
|
[router] Refactor OpenAI router: split monolithic file and move location (#11359)
|
2025-10-09 00:46:39 -04:00 |
|
Simo Lin
|
368fd20622
|
[router][grpc] disable health check generation and increase timeout (#11353)
|
2025-10-08 19:23:08 -07:00 |
|
Sundara Raman Ramachandran
|
53bd00d975
|
[Generative Score API] Multi-Item scoring with custom attention mask. (#10979)
|
2025-10-08 18:47:32 -07:00 |
|
Yineng Zhang
|
e22b13c569
|
[Auto Sync] Update scheduler.py (20251009) (#11350)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Junxiong Wang <junxiong@together.ai>
|
2025-10-08 17:39:04 -07:00 |
|
Mick
|
a3c2ea4451
|
fix: fix revision for sgl-flash-attn in sgl-kernel (#11327)
|
2025-10-08 15:50:44 -07:00 |
|
Chang Su
|
fccac7d126
|
[router][grpc] Add dependencies in Cargo.toml to support chat template rendering (#11342)
|
2025-10-08 15:38:37 -07:00 |
|
Keyang Ru
|
7ac6b900f4
|
[router] Support history management using conversation (#11339)
|
2025-10-08 15:24:02 -07:00 |
|
Chang Su
|
a1080b72a0
|
[router] Fix all unused_qualifications (#11341)
|
2025-10-08 13:55:27 -07:00 |
|
Chang Su
|
a65ca73911
|
[router][grpc] Cleanup debug logs in grpc_server and grpc_router (#11340)
|
2025-10-08 13:26:19 -07:00 |
|
Simo Lin
|
677aa0e25f
|
[router] improve reasoning parser lock and reduce req cloning (#11336)
|
2025-10-08 11:18:15 -07:00 |
|
Simo Lin
|
01c9ee1ab4
|
[router] refactor generate to use new pipeline arch (#11323)
|
2025-10-08 09:38:50 -07:00 |
|