Commit Graph

  • 2ff572e28c [CI][Router] Fix bench_one_batch_server for pd router test (#7731) Shangming Cai 2025-07-03 14:18:24 +08:00
  • 84f2e4a0f8 fix awq and dsv3 fused gemm compatible (#7735) AniZpZ 2025-07-03 13:56:57 +08:00
  • 8f844db699 [CPU] fix all_reduce and all_gather (#6770) Chunyuan WU 2025-07-03 13:39:45 +08:00
  • 36cc3ffdc7 [CPU] [sgl-kernel] set dispatch key of initialize to CatchAll (#7734) Chunyuan WU 2025-07-03 13:39:24 +08:00
  • 1bebd3154e Fix num_tokens_pre_allocated in disaggregation log (#7714) Ziming Huang 2025-07-03 13:31:49 +08:00
  • d3c275b117 Support updating weights at once by stopping all requests (#6698) Albert 2025-07-03 13:26:06 +08:00
  • b044400dd3 Support non-contiguous query input for extend/decode attention (#7462) YanbingJiang 2025-07-03 10:59:45 +08:00
  • 40e5cb7a9c [CPU] Bind threads and numa node for each TP rank (#6549) Chunyuan WU 2025-07-03 10:57:59 +08:00
  • 8e64140e35 [b200] support trt-llm allreduce fuse rms_norm_add kernel (#7621) Xiaoyu Zhang 2025-07-03 10:36:20 +08:00
  • 82f021e22e [router] add --log-level to sgl-router (#6512) Zilin Zhu 2025-07-03 10:33:04 +08:00
  • 0626f678de [RL] support update_weights_from_distributed with different group and multiple weights (#7292) Zilin Zhu 2025-07-03 10:29:11 +08:00
  • 09e699bba4 [RL] add --skip-warmup (#7416) Zilin Zhu 2025-07-03 09:50:43 +08:00
  • b116b21a46 [AMD] Temporarily disable test_no_overlap_scheduler and test_vision_chunked_prefill (#7717) Hubert Lu 2025-07-02 12:39:18 -07:00
  • 88f484ce4c Apply dsv3 router gemm kernel for deepseek-r1 fp4 (#7677) Baizhou Zhang 2025-07-02 12:30:18 -07:00
  • 8e03b641ba [1/n] apply wna16marlin kernel in moe weight only quantization (#7683) AniZpZ 2025-07-02 14:21:25 +08:00
  • b3fa5dc3c8 Fix GPTQMarlinMoE (#7697) Kyungmin Lee 2025-07-02 14:34:43 +09:00
  • 00aec6ad6c Apply dsv3_fused_a_gemm kernel (#7635) Ke Bao 2025-07-02 13:32:05 +08:00
  • 1a08358aed Improve error handling for requests with unloaded LoRA path(s) (#7642) Lifu Huang 2025-07-01 20:05:34 -07:00
  • f18a8fddd4 chore: upgrade flashinfer v0.2.7.post1 (#7698) Yineng Zhang 2025-07-01 14:05:57 -07:00
  • a7efbb2757 fix(model loader): use safe_open to prevent file handle leaks. (#7684) Simon_CQK 2025-07-02 04:18:35 +08:00
  • 93b6785d78 add description for llama4 eagle3 (#7688) Yi Zhang 2025-07-01 16:19:19 +08:00
  • f9eb04ddb2 upgrade sgl kernel to 0.2.1 for main (#7676) Zhiqiang Xie 2025-07-01 00:00:13 -07:00
  • 3a911b854d Refactor mm processors and Enable mixed modality processing (#7629) Xinyuan Tong 2025-06-30 23:14:48 -07:00
  • 886d344964 support llama4 eagle3 (#6985) lukec 2025-07-01 13:34:10 +08:00
  • 637bfee448 chore: bump sgl-kernel v0.2.1 (#7675) Yineng Zhang 2025-06-30 22:12:33 -07:00
  • 6005eceee3 [CPU] remove process_group from inputs of shm_allreduce and shm_allgather (#7486) Chunyuan WU 2025-07-01 12:54:11 +08:00
  • ff2e9c9479 Add small requirements for benchmark/parse_result tools (#7671) Xiaoyu Zhang 2025-07-01 12:52:20 +08:00
  • 3e34e9004f Fix: sync prepare_fp8_layer_for_marlin with latest vllm changes (#7648) narutolhy 2025-06-30 21:51:01 -07:00
  • 7349717e4b [doc] update lws doc for pd (#7318) ybyang 2025-07-01 10:39:04 +08:00
  • 392e441ad1 chore: upgrade flashinfer v0.2.7 jit (#7663) Yineng Zhang 2025-06-30 13:26:26 -07:00
  • 7248272ccc Add dsv3 router gemm kernel (#7627) Baizhou Zhang 2025-06-29 23:31:55 -07:00
  • 22352d47a9 Improve streaming, log_level, memory report, weight loading, and benchmark script (#7632) Lianmin Zheng 2025-06-29 23:16:19 -07:00
  • c5131f7a2f [CPU] add c++ kernel to bind CPU cores and memory node (#7524) Chunyuan WU 2025-06-30 10:45:25 +08:00
  • 78700893ee [EAGLE] remove a wrong adjustment for page_size > 1 & topk > 1 in server_args.py (#7643) Lianmin Zheng 2025-06-29 19:25:28 -07:00
  • 663c04f76e Update CODEOWNERS (#7640) Lianmin Zheng 2025-06-29 16:58:43 -07:00
  • 3b3f1e3aeb [AMD] Add unit-test-sgl-kernel-amd to AMD CI (#7539) Hubert Lu 2025-06-29 15:50:09 -07:00
  • b691dcc490 [misc] reduce weird rope_scaling_factor warning (#7176) JieXin Liang 2025-06-30 06:42:45 +08:00
  • 0c9c6c75a8 Move files related to EPLB (#7580) fzyzcjy 2025-06-30 06:39:38 +08:00
  • e3f9b54819 [bugfix] fix runtime dropping panic in editable (#7628) Simo Lin 2025-06-29 15:38:28 -07:00
  • b3cff3651e Fix sgl-router startup crash (#7619) finetune 2025-06-29 23:41:34 +02:00
  • 8f335b5bd6 Fix stream reasoning parser and Adds Kimi reasoning parser (#7432) Xinyuan Tong 2025-06-29 14:39:05 -07:00
  • b2264076dc Add @mickqian as the CODEOWNERS of multimodal (#7636) Lianmin Zheng 2025-06-29 09:27:33 -07:00
  • 04b35190e2 Add dsv3 fused a gemm to sgl-kernel (#7630) Ke Bao 2025-06-29 17:52:24 +08:00
  • 071a1f51ae [Minor] clean up multimodal processor and tokenizer manager (#7624) Lianmin Zheng 2025-06-29 02:50:14 -07:00
  • 7c0db3a6c5 [bugfix] Remove PR comment posting from Rust benchmark workflow (#7625) Simo Lin 2025-06-28 22:10:01 -07:00
  • c45e49d817 oai: Adds support for OpenAI chat completions API in bench_serving (#7036) Xinyuan Tong 2025-06-28 15:59:20 -07:00
  • d80539291b docs: add gb200 nvl72 and a16z grant (#7620) Yineng Zhang 2025-06-28 02:08:09 -07:00
  • 00c7b1ad07 Let EP prefill support new DeepGEMM (#7310) fzyzcjy 2025-06-28 16:45:30 +08:00
  • 82eccae44e Let ep_scatter support arbitrary strides / ue8m0 format (#7309) fzyzcjy 2025-06-28 16:38:33 +08:00
  • a8c10aeeee fix unit tests (#7618) Yineng Zhang 2025-06-28 00:32:41 -07:00
  • eb429b88a4 [PD] Respect sampling_params.max_new_tokens when PD disaggregation is activated (#7598) Shangming Cai 2025-06-28 13:22:01 +08:00
  • 49538d111b Support dynamic LoRA loading / unloading in engine/server API (#7446) Lifu Huang 2025-06-27 21:00:27 -07:00
  • cfe2edac38 [BUG] fix local_rank in initialize_dp_attention (#7584) Sheng Qi 2025-06-28 11:01:01 +08:00
  • 2373faa317 Fix flakiness in LoRA batch test. (#7552) Lifu Huang 2025-06-27 19:51:43 -07:00
  • 9efb2993da Tiny add logs for expert location updater (#7308) fzyzcjy 2025-06-28 10:12:33 +08:00
  • a5317b2fd3 [CPU] add optimizations for INT8 and FP8 DeepSeek (#6769) Chunyuan WU 2025-06-28 10:04:29 +08:00
  • eb6c2c1663 Hybrid kv cache for LLaMA4 (#6563) tarinkk 2025-06-27 21:58:55 -04:00
  • 357921aa51 Fix: Minicpm (#7612) Xinyuan Tong 2025-06-27 17:32:29 -07:00
  • c071198c1d [router] add centralized configuration module for sgl-router (#7588) Simo Lin 2025-06-27 15:42:02 -07:00
  • d7374d7467 Fix broken CI TestVILAServer (#7610) Lifu Huang 2025-06-27 15:01:03 -07:00
  • ce3a3e8783 Move multimodal processors into a separate folder (#7581) Lianmin Zheng 2025-06-27 11:58:24 -07:00
  • 41650b0d70 feat: support compatibility between MTP and two-batch-overlap (#7225) Qiaolin Yu 2025-06-27 01:10:27 -07:00
  • 1b95162008 Updates transformers and timm dependencies (#7577) Xinyuan Tong 2025-06-27 00:30:17 -07:00
  • 29bd4c8135 [CI] Add CI Testing for Prefill-Decode Disaggregation with Router (#7540) Keyang Ru 2025-06-27 00:18:56 -07:00
  • 031f64aa1b Add e2e test for multi instance multi stage memory release/resume occupuation (#7208) Ata Fatahi 2025-06-26 17:40:38 -07:00
  • 3d7cdb2ebd Fix MTP error when enabling two-batch overlap (#7569) fzyzcjy 2025-06-27 06:40:54 +08:00
  • 604efe07e1 Updates Gemma3n MLP layer to adapt latest transformers version (#7573) Xinyuan Tong 2025-06-26 15:07:22 -07:00
  • 1b8cf77b01 [Fix] incorrect assert in EPLB (#7575) Cheng Wan 2025-06-26 14:59:20 -07:00
  • bb9b608c86 [PD][NIXL] Set is_sorted=False to fix NIXL_ERR_NOT_FOUND (#7330) Trevor Morris 2025-06-26 10:39:39 -07:00
  • 69183f8808 chore: bump v0.4.8.post1 (#7559) Yineng Zhang 2025-06-26 02:21:12 -07:00
  • 9b00990bea chore: remove vlm unnecessary import (#7541) Xinyuan Tong 2025-06-26 01:38:15 -07:00
  • 4d67025a1d chore: improve ci bug reporting (#7542) Mick 2025-06-26 16:32:44 +08:00
  • 0e05fe8cf4 Update seed in CPU UTs to avoid flaky failure with single test (#7544) YanbingJiang 2025-06-26 12:25:50 +08:00
  • 2390a2bc8d Add Tencent HunYuanMoEV1 model support (#7549) Meng, Peng 2025-06-26 11:59:53 +08:00
  • 16d76b9f23 [CMake] Fix sgl-kernel CMakeLists for Blackwell (#7543) Ruihang Lai 2025-06-25 22:00:46 -04:00
  • 5c2142579a [PD] Raise error for incompatible mooncake version and some minor fixes (#7527) Shangming Cai 2025-06-26 09:55:24 +08:00
  • b8df43ab9c Fix gathered_buffer issues in tbo (#7531) Qiaolin Yu 2025-06-25 14:42:21 -07:00
  • a1c1ebe935 Fix FP8 KV Cache Support in FA3 Backend (#7148) Yuhong Guo 2025-06-25 17:14:40 +08:00
  • fe2a0f962f minor: 'role' must be system/assistant/tool, but case insensitive for now (#7499) mlmz 2025-06-25 17:11:03 +08:00
  • 20beb3702b feat: add return hidden_states at async generation (#7507) eigen 2025-06-25 05:10:09 -04:00
  • 00fbd8a484 Fix typo of flash_cache (#7513) Stefan He 2025-06-25 02:04:41 -07:00
  • 802815e40b take aiter get_rope back (#7521) valarLip 2025-06-25 17:03:33 +08:00
  • 4c6675c4fc enable aiter fp8 blockscale quant (#7520) valarLip 2025-06-25 17:02:31 +08:00
  • e21aa1df67 [PD] Add different TP sizes support for no-MLA models (#6793) Hongbo Xu 2025-06-25 17:00:22 +08:00
  • f3cbd24541 feat: send kvmetrics from sglang scheduler (#6721) zixuanzhang226 2025-06-25 01:57:49 -07:00
  • 506a2d5934 npu fused op (#7386) ll819214 2025-06-25 16:54:20 +08:00
  • a07f8ae4b7 [CI] Upgrade mooncake to v0.3.4.post2 to fix potential slice failed bug (#7522) Shangming Cai 2025-06-25 16:49:22 +08:00
  • 7eb47b0f3d [CPU] [BF16] Call fused_experts_cpu, weight_packed_linear and bmm_cpu kernel in DeepSeek model (#6641) Chunyuan WU 2025-06-25 16:43:33 +08:00
  • bc2e5645c4 fix: force synchronization between TP workers when update_weights (#6626) DangKai 2025-06-25 16:35:59 +08:00
  • 3abc30364d [ci] add router benchmark script and CI (#7498) Simo Lin 2025-06-25 01:28:25 -07:00
  • afeed46530 clean duplicate code (#7512) linzhuo 2025-06-25 16:22:20 +08:00
  • 587b4c6e92 EPLB support for MTP (#7510) yilian49 2025-06-25 01:16:56 -07:00
  • 7b9a174a7a [PD][Spec] Fix hidden state transfer for spec decode (#7516) Shangming Cai 2025-06-25 15:42:07 +08:00
  • 03c039c48e [OAI] patch origin request_id logic (#7508) ybyang 2025-06-25 11:09:38 +08:00
  • 57ab776910 Fuse sorted_token_ids padding to moe_align_block_size kernel (#7437) Ke Bao 2025-06-25 08:44:27 +08:00
  • 112b496a6c misc: Improvement to serving_chat.py and add more ut (#7489) Chang Su 2025-06-24 17:19:51 -07:00
  • 3562256bb2 fix: Add --model as an alias for --model-path in server_args (#7505) Chang Su 2025-06-24 12:08:08 -07:00
  • 5f527834a8 [PD] NIXL: Register kv args in advance and cleanup finished requests (#6717) Trevor Morris 2025-06-24 11:26:09 -07:00
  • 9f1787fa60 Support multi-thread model weight loading (#7277) xianzhiT 2025-06-25 01:39:10 +08:00
  • 8ecad0b16f [benchmark] fbgemm benchmark support bandwidth report and support fbgemm_cutlass_gmm (#7422) Xiaoyu Zhang 2025-06-25 00:44:55 +08:00