Commit Graph

456 Commits

Author SHA1 Message Date
Yineng Zhang
b3839a7f99 fix: resolve transfer_kv_all_layer_direct_lf_pf import error (#10360) 2025-09-11 23:53:23 -07:00
Keyang Ru
7b141f816c [router][ci] Add gpu utilization analyze with nvml (#10345) 2025-09-11 19:26:02 -07:00
Yineng Zhang
b0d25e72c4 chore: bump v0.5.2 (#10221) 2025-09-11 16:09:20 -07:00
Keyang Ru
1ee11df8ac [router][ci] add gpu process check and free port before start server (#10338) 2025-09-11 14:24:16 -07:00
Keyang Ru
480d1b8b20 [router] add benchmark for regular router and pd router (#10280) 2025-09-11 12:04:11 -07:00
Yineng Zhang
bfe01a5eef chore: upgrade v0.3.9.post2 sgl-kernel (#10297) 2025-09-11 04:10:29 -07:00
Hank Han
3dd6420a4d [CI] add pyproject.toml to deepseek w4a8 ci (#10314) 2025-09-11 02:10:50 -07:00
BourneSun0527
4aa1e69bc7 [chore]Add sgl-router to npu images (#10229) 2025-09-10 23:51:16 -07:00
Even Zhou
5b64f006ec [Feature] Support DeepEP normal & Redundant Experts on NPU (#9881) 2025-09-10 20:35:26 -07:00
Hubert Lu
91b3555d2d Add tests to AMD CI for MI35x (#9662)
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
2025-09-10 12:50:05 -07:00
Yineng Zhang
f3817cb0b2 chore: bump v0.3.9 sgl-kernel (#10208) 2025-09-09 01:40:05 -07:00
Yineng Zhang
cdc56ef6c1 feat: use sgl-kernel cu129 as default (#10188) 2025-09-08 22:01:17 -07:00
Even Zhou
b67c277f86 [Bugfix] Qwen3MoE aclrtMemcpy failed with NPUGraph (#10013) 2025-09-07 21:50:49 -07:00
Cao E
7577f0e40f Add graph runner support with torch compile on CPU (#7843) 2025-09-07 21:33:58 -07:00
Keyang Ru
9eb50ecc9c [router] Improve the router e2e tests (#10102) 2025-09-06 16:19:28 -07:00
Keyang Ru
b3e7a2cee4 increase the rust e2e timeout (#10116) 2025-09-06 16:17:34 -07:00
hzh0425
1a3d6f31da Modify ci workflow for auto-partitioning in 2-GPU backend tests (#10029) 2025-09-06 10:28:42 +08:00
Keyang Ru
21b9a4b435 [router] Introduce router integration tests (#10086) 2025-09-05 18:52:53 -07:00
Simo Lin
bde73ee43f [router] add rust cache in benchmark ci (#10080) 2025-09-05 09:59:36 -07:00
Keyang Ru
4f0e28d7fc [router] add rust cache for rust unit test (#10079) 2025-09-05 09:58:59 -07:00
Keyang Ru
045ab92dc0 [router] add py binding unit tests to coverage 80% (#10043) 2025-09-05 08:40:21 -07:00
Chang Su
8b3b995ac9 [router] fix release workflow to include protobuf (#10055) 2025-09-04 22:09:30 -07:00
Simo Lin
9491d6e554 [router] include rust benchamrks (#9932) 2025-09-02 09:32:09 -07:00
Yineng Zhang
8766b3aca8 fix: update router deps (#9921) 2025-09-02 03:28:58 -07:00
LukasBluebaum
9d9fa9a537 [router] Fix short timeout for the prefill client (#9803) 2025-09-01 19:57:04 -07:00
Sai Enduri
4750cddf68 Update docker build workflows for gfx942 ROCm 7.0. (#9794)
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-09-01 00:37:12 -07:00
Chang Su
fd5ce576a4 Tool parser.benchmark (#9835) 2025-08-30 21:08:11 -07:00
Lianmin Zheng
646076b71e Update guidelines for syncing code between repos (#9831) 2025-08-30 16:10:35 -07:00
Lianmin Zheng
0d04008936 [CI] Code sync tools (#9830) 2025-08-30 16:02:29 -07:00
Lianmin Zheng
05e4787243 [CI] Fix the trigger condition for PR test workflows (#9761) 2025-08-30 15:47:10 -07:00
Hubert Lu
711390a971 [AMD] Support Hierarchical Caching on AMD GPUs (#8236) 2025-08-28 15:27:07 -07:00
Chang Su
28684f909d [router] upgrade kernel version in pd ci (#9720) 2025-08-27 16:02:41 -07:00
Yineng Zhang
b962a296ed chore: upgrade sgl-kernel 0.3.7 (#9708) 2025-08-27 14:00:31 -07:00
PGFLMG
aa3eba8eb4 [sgl-kernel] misc: update deepgemm version for sgl-kernel (#9340)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
2025-08-27 12:01:30 -07:00
ZhengdQin
f92b729d52 [new feat] ascend backend support fia fusion kernel (#8328)
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
2025-08-25 23:13:08 -07:00
DiweiSun
029e0af31d ci: enhance xeon ci (#9395) 2025-08-21 03:35:17 -07:00
Keyang Ru
3828db4309 [router] Add IGW (Inference Gateway) Feature Flag (#9371)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-20 17:38:57 -07:00
Lianmin Zheng
f20b6a3f2b [minor] Sync style changes (#9376) 2025-08-19 21:35:01 -07:00
Chang Su
7638f5e44e [router] Implement gRPC SGLangSchedulerClient (#9364) 2025-08-19 16:44:11 -07:00
Hubert Lu
c6c379ab31 [AMD] Reorganize hip-related header files in sgl-kernel (#9320) 2025-08-18 16:53:44 -07:00
Jeff Nettleton
ce3ca9b02f [router] add cargo clippy in CI and fix-up linting errors (#9242) 2025-08-17 11:03:56 -07:00
Even Zhou
fda762a27d [Bugfix] Change vLLM install order & Add A2 support (#9232) 2025-08-16 22:36:14 -07:00
Sai Enduri
740f063035 Fix Custom All Reduce CI job. (#9258) 2025-08-16 16:29:43 -07:00
Hank Han
81da16f6d3 [CI] add deepseek w4a8 test on h20 ci (#7758) 2025-08-16 01:54:13 -07:00
kk
983aa4967b Fix nan value generated after custom all reduce (#8663)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-08-15 12:33:54 -07:00
Hubert Lu
9c3e95d98b [AMD] Expand test coverage for AMD CI and enable apply_token_bitmask_inplace_cuda in sgl-kernel (#8268) 2025-08-15 12:32:51 -07:00
Shangming Cai
8ca07bd948 [CI] Fix sgl-router disaggregation test (#9222)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-08-15 02:24:44 -07:00
Hongbo Xu
2cc9eeab01 [4/n]decouple quantization implementation from vLLM dependency (#9191)
Co-authored-by: AniZpZ <aniz1905@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-14 12:05:46 -07:00
DiweiSun
2f20f43026 Swap xeon ci to gnr server (#9042) 2025-08-13 12:39:19 -07:00
li chaoran
62f99e08b3 fix: wrong docker hub org name (#9137)
Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
2025-08-12 19:26:19 -07:00