Commit Graph

422 Commits

Author SHA1 Message Date
ZhengdQin
f92b729d52 [new feat] ascend backend support fia fusion kernel (#8328)
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
2025-08-25 23:13:08 -07:00
DiweiSun
029e0af31d ci: enhance xeon ci (#9395) 2025-08-21 03:35:17 -07:00
Keyang Ru
3828db4309 [router] Add IGW (Inference Gateway) Feature Flag (#9371)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-20 17:38:57 -07:00
Lianmin Zheng
f20b6a3f2b [minor] Sync style changes (#9376) 2025-08-19 21:35:01 -07:00
Chang Su
7638f5e44e [router] Implement gRPC SGLangSchedulerClient (#9364) 2025-08-19 16:44:11 -07:00
Hubert Lu
c6c379ab31 [AMD] Reorganize hip-related header files in sgl-kernel (#9320) 2025-08-18 16:53:44 -07:00
Jeff Nettleton
ce3ca9b02f [router] add cargo clippy in CI and fix-up linting errors (#9242) 2025-08-17 11:03:56 -07:00
Even Zhou
fda762a27d [Bugfix] Change vLLM install order & Add A2 support (#9232) 2025-08-16 22:36:14 -07:00
Sai Enduri
740f063035 Fix Custom All Reduce CI job. (#9258) 2025-08-16 16:29:43 -07:00
Hank Han
81da16f6d3 [CI] add deepseek w4a8 test on h20 ci (#7758) 2025-08-16 01:54:13 -07:00
kk
983aa4967b Fix nan value generated after custom all reduce (#8663)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-08-15 12:33:54 -07:00
Hubert Lu
9c3e95d98b [AMD] Expand test coverage for AMD CI and enable apply_token_bitmask_inplace_cuda in sgl-kernel (#8268) 2025-08-15 12:32:51 -07:00
Shangming Cai
8ca07bd948 [CI] Fix sgl-router disaggregation test (#9222)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-08-15 02:24:44 -07:00
Hongbo Xu
2cc9eeab01 [4/n]decouple quantization implementation from vLLM dependency (#9191)
Co-authored-by: AniZpZ <aniz1905@gmail.com>
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-08-14 12:05:46 -07:00
DiweiSun
2f20f43026 Swap xeon ci to gnr server (#9042) 2025-08-13 12:39:19 -07:00
li chaoran
62f99e08b3 fix: wrong docker hub org name (#9137)
Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
2025-08-12 19:26:19 -07:00
li chaoran
2ecbd8b8bf [feat] add ascend readme and docker release (#8700)
Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
Signed-off-by: lichaoran <pkwarcraft@gmail.com>
Co-authored-by: Even Zhou <even.y.zhou@outlook.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2025-08-12 13:25:42 -07:00
Jiaqi Gu
c9ee738515 Fuse writing KV buffer into rope kernel (part 2: srt) (#9014)
Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
2025-08-12 13:15:30 -07:00
Keyang Ru
4093d460ce [CI] migrate router to BM.A10.4 runner (#8992)
Co-authored-by: key4ng <rukeyang@gamil.com>
2025-08-11 22:41:18 -07:00
Lianmin Zheng
2449a0afe2 Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00
Lianmin Zheng
0f229c07f1 Update release-docs.yml (#9037) 2025-08-10 18:52:11 -07:00
Lianmin Zheng
2c7f01bc89 Reorganize CI and test files (#9027) 2025-08-10 12:30:06 -07:00
Simo Lin
3817a37d87 [router] upgrade to latest sgl kernel for router ci (#9019) 2025-08-09 21:49:18 -07:00
Lianmin Zheng
ef48d5547e Fix CI (#9013) 2025-08-09 16:00:10 -07:00
Lianmin Zheng
9a44b643c6 Fix CI (#9012) 2025-08-09 13:33:42 -07:00
ishandhanani
de8b8b6e5c chore(deps): update minimum python to 3.10 (#8984) 2025-08-09 00:30:23 -07:00
DiweiSun
7c0db868a1 Molly/ci gnr server (#8667) 2025-08-08 20:01:16 -07:00
Lianmin Zheng
706bd69cc5 Clean up server_args.py to have a dedicated function for model specific adjustments (#8983) 2025-08-08 19:56:50 -07:00
Lianmin Zheng
6642e3a295 [Fix] Add a workflow to cancel all pending CI runs (#8988) 2025-08-08 16:09:50 -07:00
Lianmin Zheng
67a7d1f699 Create cancel-all-pr-test-runs (#8986) 2025-08-08 15:53:51 -07:00
ishandhanani
7d3af603e7 chore(ci): update Python version from 3.9 to 3.10 in sgl-kernel workflow (#8981) 2025-08-08 14:03:17 -07:00
ishandhanani
4e7f025219 chore(gb200): update to CUDA 12.9 and improve build process (#8772) 2025-08-08 13:42:47 -07:00
Yineng Zhang
1ac16add8b chore: support blackwell cu129 image (#8928) 2025-08-07 14:24:57 -07:00
Simo Lin
16a4c66d25 [router] update pd router ci summary step with new threshold (#8916) 2025-08-07 07:15:38 -07:00
Simo Lin
89e6521c61 [router] re-enable pd router benchmark CI (#8912) 2025-08-07 06:29:36 -07:00
fzyzcjy
b114a8105b Support B200 in CI (#8861) 2025-08-06 21:42:44 +08:00
Yineng Zhang
aeac900ca2 fix: resolve ci issue (#8859) 2025-08-06 02:28:14 -07:00
Yineng Zhang
3ae8e3ea8f chore: upgrade torch 2.8.0 (#8836) 2025-08-05 17:32:01 -07:00
kk
32d9e39a29 Fix potential memory fault issue and ncclSystemError in CI test (#8681)
Co-authored-by: wunhuang <wunhuang@amd.com>
2025-08-05 12:19:37 -07:00
Yineng Zhang
194561f27a feat: support sgl-kernel cu129 (#8800) 2025-08-05 02:33:47 -07:00
Even Zhou
fee0ab0fba [CI] Ascend NPU CI enhancement (#8294)
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2025-08-03 22:16:38 -07:00
Liangsheng Yin
7a27e798ca [CI] Do not trigger pd-disaggregation CI in draft PR (#8737) 2025-08-04 05:12:20 +08:00
Yineng Zhang
5ce5093b97 chore: bump sgl-kernel 0.3.0 with torch 2.8.0 (#8718) 2025-08-03 02:31:50 -07:00
li chaoran
fe5086fd8b chore: speedup NPU CI by cache (#8270)
Signed-off-by: mywaaagh_admin <pkwarcraft@gmail.com>
Co-authored-by: ronnie_zheng <zl19940307@163.com>
2025-07-31 17:29:50 -07:00
Simo Lin
aee0ef52f5 [router] update router pypi version (#8628) 2025-07-31 11:24:12 -07:00
Simo Lin
ae807774f5 [ci] fix genai-bench execution cmd (#8629) 2025-07-31 10:40:54 -07:00
yihong
09f1a247ce fix: fork should not run pypi router (#8604)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-07-31 02:37:13 -07:00
Simo Lin
a9fd80336d [router] allow longer time out for router e2e (#8560) 2025-07-29 23:43:37 -07:00
Lianmin Zheng
263c9236a0 Always trigger pr-test (#8527) 2025-07-29 04:05:19 -07:00
Lianmin Zheng
69712e6f55 Rename the last step in pr-test.yml as pr-test-finish (#8486) 2025-07-28 19:06:13 -07:00