Commit Graph

  • 7f399e4bce [HiCacheStorage]support page_first_direct layout for generic set&get (#10522) huangtingwei 2025-09-19 20:47:16 +08:00
  • 873d858b28 [router] refactor worker to builder pattern 5/n (#10653) Simo Lin 2025-09-19 05:43:23 -04:00
  • 3fa3c22ae2 Fix fast decode plan for flashinfer v0.4.0rc1 and upgrade sgl-kernel 0.3.11 (#10634) Baizhou Zhang 2025-09-19 01:25:29 -07:00
  • 4f2055ad56 [router] refactor worker to builder pattern 4/n (#10650) Simo Lin 2025-09-19 02:49:10 -04:00
  • 616a3e20df [sgl-kernel] Support moe_sum_reduce cuda kernel (#10321) Yuan Luo 2025-09-19 14:12:09 +08:00
  • ac2a723bb3 [router] refactor worker to builder pattern 3/n (#10647) Simo Lin 2025-09-19 01:52:57 -04:00
  • 56b991b12d [Feature]feat(get_ip): unify get_ip_xxx (#10081) Jimmy 2025-09-19 13:35:26 +08:00
  • 780d6a22cd [router] refactor worker to builder pattern 2/n (#10633) Simo Lin 2025-09-19 00:47:56 -04:00
  • 8b713c7248 Hicache L3 backend mooncake optimization configuration reading method (#10319) FlyPanda 2025-09-19 12:25:01 +08:00
  • 5bfafdfcb4 chore: bump sgl-kernel 0.3.11 (#10630) Yineng Zhang 2025-09-18 18:43:20 -07:00
  • 8c52de6fab feat: add fused moe config for Qwen3-Next-80B-A3B-Instruct on B200 (#10631) zixuanzhang226 2025-09-18 17:31:58 -07:00
  • c1815a99b7 model support: Sarashina2VisionForCausalLM (#10632) Chang Su 2025-09-18 17:30:38 -07:00
  • 4e6c4923a0 [Performance] Qwen3-Next: speed up update_mamba_state_after_mtp_verify by 10x; e2e up to 3.54% faster (#10586) Binyao Jiang 2025-09-18 17:13:59 -07:00
  • b91cb67e7a [Performance] Qwen3-Next: replace arange to cached query_start_loc_li… (#10553) Binyao Jiang 2025-09-18 17:02:42 -07:00
  • e7bc600304 [Feature] Speculative decoding support lookahead (#9873) Zhihao Zhang 2025-09-19 07:42:41 +08:00
  • 2a2ff9a840 refactor: use registry for _get_attention_backend_from_str (#10629) Yineng Zhang 2025-09-18 16:27:59 -07:00
  • 5291f32d75 [router] refactor worker to builder pattern 1/n (#10628) Simo Lin 2025-09-18 16:25:40 -04:00
  • 67073dde85 Garbage collector regression in the online server (#10621) brayden-hai 2025-09-18 11:52:35 -07:00
  • 9a5c42f9ad feat: Add FlexAttention Backend for Efficient Sparse Attention (#9947) yuk.igalaxy 2025-09-19 02:49:17 +08:00
  • 388c05d544 Fix bias handling in TritonMoeQuantInfo within quantization/mxfp4.py (#10579) yhyang201 2025-09-19 02:44:43 +08:00
  • fc809665fd [Performance] qwen3-next improve causal conv1d in prefill phase (#10595) Jinyan Chen 2025-09-19 02:42:49 +08:00
  • 6fd4816d9f Fix sgl_kernel import failure on devices other than CUDA (#10610) Zaili Wang 2025-09-19 02:38:02 +08:00
  • 1344ebc833 support qwen3-next-fp8 deepep (#10622) Yi Zhang 2025-09-19 02:36:22 +08:00
  • e07b21ceaf update deepep version for qwen3-next deepep moe (#10624) Yi Zhang 2025-09-19 02:35:22 +08:00
  • 52f248cd31 Feat/add heartbeat mechanism for nixl conn (#10222) shaharmor98 2025-09-18 20:20:42 +03:00
  • 93f75778be [RL] Add destroy process group api (#9979) penguin_wwy 2025-09-19 00:31:56 +08:00
  • 4039c626e2 fix deepep assert when PD disaggregation == null (#8274) AlphaBaby 2025-09-19 00:31:01 +08:00
  • db71c38fcd Scale kkt after reduction (#10604) Yi Zhang 2025-09-18 20:51:40 +08:00
  • 7a68b4225a [improvement] add average input/output token length for hicache benchmark stats output (#10525) zhannngchen 2025-09-18 15:38:03 +08:00
  • 60fc5b51f6 chore: upgrade mooncake 0.3.6 (#10596) Shangming Cai 2025-09-18 15:19:30 +08:00
  • a13dd1e492 [PD] Improve disaggregation common backend and refactor mooncake backend (#10273) Shangming Cai 2025-09-18 13:58:07 +08:00
  • d500eb9173 aiter v0.1.5.post2 (#10563) HAI 2025-09-17 22:10:45 -07:00
  • 1ccd59c715 [HICache] introduce evict policy (#10190) Xuchun Shang 2025-09-18 11:10:20 +08:00
  • c32fb7a24d [ROCm] Fix fp8 quantization accuracy issue. (#10558) sogalin 2025-09-18 08:44:59 +08:00
  • 1ba137e98f Enable trtllm mla prefix extend (#10526) Shu Wang 2025-09-17 18:44:11 -05:00
  • de28f8e741 vlm: remove redundant d2h movement of mm feature tensors (#9987) Kevin Xiang Li 2025-09-17 15:00:39 -07:00
  • 564050766d fix: update dsv3 fp4 ut (#10584) Yineng Zhang 2025-09-17 14:34:58 -07:00
  • b73ac629cd [BugFix] Fix incorrect hidden_states_tensor in pd disaggregation + eagle (#9976) Ziming Huang 2025-09-18 01:37:14 +08:00
  • 77098aea7b [HiCache] Add tests for hicache storage mooncake backend (#10171) Teng Ma 2025-09-18 01:07:16 +08:00
  • 5ccf0b03bd [bench] Fix random seed in bench_one_batch_server (#10548) Liangsheng Yin 2025-09-17 19:30:32 +08:00
  • a77564e0fb CUDA Arch Independent (#8813) EduardDurech 2025-09-17 08:01:45 +02:00
  • 4f9e71df3c Remove duplicated code (#10545) Yichen Yan 2025-09-17 11:48:22 +08:00
  • 541551cefe [bugfix]hicache bench_long_context.py run failed (#10523) zhannngchen 2025-09-17 11:27:06 +08:00
  • 124097fc5b enable prefix cache with dp (#10459) Shu Wang 2025-09-16 20:26:58 -05:00
  • e1d45bc280 Fix decord dependency for aarch64 docker build (#10529) kyleliang-nv 2025-09-16 17:34:37 -07:00
  • 14fdd52740 feat: add priority based scheduling with priority based request acceptance and preemption (#8746) harrisonlimh 2025-09-16 17:10:10 -07:00
  • f949ad5794 [Auto Sync] Update activation.py, chunk_cache.py, utils.py (20250917) (#10538) Lianmin Zheng 2025-09-16 17:06:43 -07:00
  • c49484a658 [Auto Sync] Update scheduler_profiler_mixin.py, rpd_utils.p... (20250916) (#10494) Lianmin Zheng 2025-09-16 17:02:20 -07:00
  • a2f7218a2e support using fa4 on deepseek on blackwell (#9928) cicirori 2025-09-17 07:16:06 +08:00
  • 311de47bb7 [2/2] Speed up trtllm_mla attention backend (#10474) fzyzcjy 2025-09-17 06:49:22 +08:00
  • 373080ea6c skip vision_model for lora (#10530) gongwei-130 2025-09-16 12:34:42 -07:00
  • 7f028b07c4 Fix formatting in long code blocks (#10528) Philip Kiely - Baseten 2025-09-16 12:02:05 -07:00
  • 0abb41c70d adjust import setuptools_rust (#10524) ybyang 2025-09-16 23:01:58 +08:00
  • 925dbb3218 [CPU] fix CPU backend sel. issue for Llama4 (#10511) Zaili Wang 2025-09-16 17:57:45 +08:00
  • 8df7353af3 Support sgl-router parallel_batch in bench_one_batch_server (#10506) fzyzcjy 2025-09-16 17:52:57 +08:00
  • ae4be601c2 Fix CI when sgl-kernel is changed but srt is not changed (#10515) fzyzcjy 2025-09-16 17:49:54 +08:00
  • 9b876889b7 Update CUTLASS. Refine KernelSchedule for fp8 (grouped) gemm. (#10491) Qi Yuhang 2025-09-16 17:47:37 +08:00
  • c0c6f543e4 chore: upgrade sgl-kernel 0.3.10 (#10500) Yineng Zhang 2025-09-16 02:00:53 -07:00
  • edd6a07bc0 Minor fix lint introduced by #10466 (#10507) Shangming Cai 2025-09-16 16:38:25 +08:00
  • b6dd4bcb81 feat: update support for qwen3next model (#10466) cao1zhg 2025-09-16 16:09:56 +08:00
  • b2435be682 Cache the result of is_blackwell platform check (#10498) b8zhong 2025-09-15 22:30:28 -07:00
  • 5fe39e85a2 [router] fix router manager and router init in server (#10499) Chang Su 2025-09-15 22:23:26 -07:00
  • fa5d0bf6a5 Remove wrong imports from sglang.python (#10493) Liangsheng Yin 2025-09-16 13:12:21 +08:00
  • 16e9335998 [router] add router db connector for responses api (#10487) Simo Lin 2025-09-16 01:04:56 -04:00
  • f1c692f6f8 Add Logprobs unit test with a loose threshold (#10230) Night 2025-09-16 01:04:40 -04:00
  • 80572c8345 [ModelOpt] Respect kv_cache_quant_algo in ModelOpt checkpoints (#10336) brayden-hai 2025-09-15 20:16:49 -07:00
  • 4bb08f6e07 [Hicache] Evaluate Per-Round Metrics in Multiturn Bench (#10203) ykwd 2025-09-16 10:34:40 +08:00
  • ec272dda9c Temporay work-around for rocm 7.0.0 alpha with enabling data-parallel issue (#10434) kk 2025-09-16 10:08:04 +08:00
  • a220c14f81 fix crash of DeepSeek-V3 update_weights_from_disk (#8863) scut-cbq 2025-09-16 09:45:15 +08:00
  • 35ef3f2902 [router] fix worker registration in multi model mode (#10486) Chang Su 2025-09-15 18:05:00 -07:00
  • 31fb19a0a2 [Auto Sync] Update registry.py (20250915) (#10484) Lianmin Zheng 2025-09-15 17:34:28 -07:00
  • 3f41b48c40 [2/2] Introduce Chunked-SGMV kernels and corresponding LoRA backend for improved performance (#10286) Lifu Huang 2025-09-15 16:04:03 -07:00
  • 2689f0bf02 [router] multi model registration fix (#10481) Chang Su 2025-09-15 15:22:21 -07:00
  • 5207424014 chore: bump v0.3.10 sgl-kernel (#10478) Yineng Zhang 2025-09-15 15:20:09 -07:00
  • c3c26f76b3 [Env] minimal version for organizing envs (#10479) Liangsheng Yin 2025-09-16 03:51:25 +08:00
  • 2cf811a9da Fix --dataset-path in bench_one_batch_server (#10475) Liangsheng Yin 2025-09-16 02:55:02 +08:00
  • 3b25dc127a [1/2] Speed up trtllm_mla attention backend (>10% e2e) (#10473) fzyzcjy 2025-09-16 02:53:21 +08:00
  • 5c08d7d21d fix: resolve sgl-kernel ut (#10476) Yineng Zhang 2025-09-15 11:42:48 -07:00
  • a45d9a4ee8 model: support solar (#8189) Praneth Paruchuri 2025-09-15 23:51:13 +05:30
  • 28c79dc84a fix: gpt-oss streaming dropping normal content when tools are provided but not used (#9657) Jonas 2025-09-15 20:02:32 +02:00
  • 1fcccda4b2 fix(internvl): fix accuracy issue of normalization (#10375) Kevin Tuan 2025-09-16 01:56:01 +08:00
  • 79acec4fe7 [CI] Fix runner for sgl-kernel (#9887) Sahithi Chigurupati 2025-09-15 10:55:48 -07:00
  • b1721edbac [PD metrics] Add latency Histogram metrics of each stage for generate requests (#8710) Yingchun Lai 2025-09-16 01:52:49 +08:00
  • 57234d0c9c [bugfix] fix typo (#10471) Jiayi Yan 2025-09-15 22:29:20 +08:00
  • b93acd7020 [router] minor code clean up in server startup (#10470) Chang Su 2025-09-15 07:28:25 -07:00
  • 86a32bb5cd chore: bump v0.5.3rc0 (#10468) Yineng Zhang 2025-09-15 03:55:18 -07:00
  • 5afd036533 feat: support pip install sglang (#10465) Yineng Zhang 2025-09-15 03:09:17 -07:00
  • 059c13de5c Fix trtllm_moe wrong correction bias (#10440) fzyzcjy 2025-09-15 16:02:05 +08:00
  • 50dc0c1e9c Run tests based on labels (#10456) Lianmin Zheng 2025-09-15 00:29:20 -07:00
  • 76becc1dbc Add rtx5880 moe triton (#10439) Jimmy_L 2025-09-15 15:12:10 +08:00
  • 2a37b24d23 [HotFix]: Hot fix import path in 3fs_bench_client.py (#10463) hzh0425 2025-09-15 14:45:46 +08:00
  • 51d32b6d49 fixed buged: sgl_kernel object has no attribute 'fwd' v0.5.2 maxiao1 2025-09-15 06:01:38 +00:00
  • f73aae0bfc Update GITHUB_TOKEN secret for documentation push (#10458) Lianmin Zheng 2025-09-14 21:59:13 -07:00
  • 69b35793a0 [router] fix logger ordering git ctx (#10457) Chang Su 2025-09-14 21:37:21 -07:00
  • 957482c8f2 [router] add dependency for router (#10401) ooapex 2025-09-15 12:14:14 +08:00
  • 3795b6a43f fix(server_args): Skip chunked_prefill_size validation when disaggregation mode is decode (#10358) Jimmy 2025-09-15 12:13:35 +08:00
  • 7eccbe992d [router] fix service discovery and mcp ut (#10449) Simo Lin 2025-09-15 00:07:23 -04:00
  • 0549f21c60 fix: fix max_new_tokens uninitialized error (#9343) Mick 2025-09-15 12:06:55 +08:00
  • b354e3c90d [CI] Fix token key in label-pr.yml workflow (#10452) Lianmin Zheng 2025-09-14 20:45:53 -07:00
  • 65e6f48ce4 Update permissions in label-pr.yml (#10450) Lianmin Zheng 2025-09-14 20:41:43 -07:00