Yuhong Guo
|
7d8c0ce7ce
|
[Build] Support build sgl-kernel with ccache (#5020)
|
2025-04-03 00:22:37 -07:00 |
|
Yineng Zhang
|
19e96e5923
|
bump v0.4.4.post3 (#4878)
|
2025-03-28 23:21:24 -07:00 |
|
Jiří Suchomel
|
f60f293195
|
[k8s] Clarified the usage of shared memory. (#4341)
|
2025-03-27 08:53:19 -07:00 |
|
Yineng Zhang
|
8bf6d7f406
|
support cmake for sgl-kernel (#4706)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: yinfan98 <1106310035@qq.com>
|
2025-03-27 01:42:28 -07:00 |
|
Yineng Zhang
|
1099f6c974
|
bump v0.4.4.post2 (#4669)
|
2025-03-26 19:58:00 -07:00 |
|
Jinyan Chen
|
f44db16c8e
|
[Feature] Integrate DeepEP into SGLang (#4232)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: Xuting Zhou <xutingz@nvidia.com>
|
2025-03-19 08:16:31 -07:00 |
|
Yineng Zhang
|
ba80c102f9
|
bump v0.4.4.post1 (#4402)
|
2025-03-13 17:53:46 -07:00 |
|
Yineng Zhang
|
6aaeb84872
|
chore: bump v0.4.4 (#4041)
|
2025-03-13 02:49:58 -07:00 |
|
Peter Pan
|
0e90ae628a
|
[docker] Distributed Serving with k8s Statefulset ( good example for DeepSeek-R1) (#3631)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Co-authored-by: Kebe <kebe.liu@daocloud.io>
|
2025-03-08 23:41:20 -08:00 |
|
Kebe
|
4a893d142d
|
Refactor Dockerfile: unify CUDA logic and reduce image size by ~2.6 GB (#3749)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-03-08 03:01:13 -08:00 |
|
Yineng Zhang
|
eb61f5c9af
|
Revert "ROCm: Flex Attention Enablement with custom backends (#4178)" (#4186)
|
2025-03-07 10:27:52 -08:00 |
|
HAI
|
0beea4503f
|
ROCm: Flex Attention Enablement with custom backends (#4178)
Co-authored-by: linsun12 <linsun12@amd.com>
|
2025-03-07 04:38:53 -08:00 |
|
Lianmin Zheng
|
9c58e68b4c
|
Release v0.4.3.post4 (#4140)
|
2025-03-06 12:50:28 -08:00 |
|
kk
|
b16af90bc3
|
AMD/ROCm: update base image string (#4137)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: yichiche <yichiche@amd.com>
|
2025-03-06 03:38:54 -08:00 |
|
Yineng Zhang
|
fc671f66c1
|
chore: bump v0.4.3.post3 (#4114)
|
2025-03-05 17:26:10 -08:00 |
|
HAI
|
51d25405a7
|
ROCm: update aiter and its usage to fused moe (bloat16, fp8, fp8 block-quant) (#4053)
|
2025-03-04 03:00:46 -08:00 |
|
Andrew Smith
|
1df6eabd5d
|
feat: Add SageMaker support (#3740)
|
2025-02-21 19:31:09 +08:00 |
|
HAI
|
5c54ef0352
|
AMD/ROCm: update AITER repo to ROCm/aiter (#3747)
|
2025-02-21 00:18:08 -08:00 |
|
Peter Pan
|
bb3e526823
|
[k8s] remove unnecessary hostIPC for security concern (#3700)
|
2025-02-20 02:11:21 +08:00 |
|
Yineng Zhang
|
a5375adc3a
|
chore: bump v0.4.3.post2 (#3645)
Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>
|
2025-02-18 02:48:30 +08:00 |
|
Yineng Zhang
|
2e6be53e7d
|
fix Dockerfile.rocm
|
2025-02-17 22:13:03 +08:00 |
|
Yineng Zhang
|
e782eb7e6a
|
chore: bump v0.4.3.post1 (#3638)
|
2025-02-17 21:58:19 +08:00 |
|
Shenggui Li
|
c9565e49e7
|
[docker] added rdma support (#3619)
|
2025-02-17 15:36:16 +08:00 |
|
HAI
|
d973c78e79
|
ROCm docker: triton update (#3584)
|
2025-02-14 10:26:32 -08:00 |
|
Jesse Lopez
|
6ce6eabbcc
|
Copy config files for MI300X to support in virtualized environments (#3505)
|
2025-02-15 01:23:32 +08:00 |
|
Yineng Zhang
|
ac963be234
|
update flashinfer-python (#3557)
|
2025-02-14 09:52:56 +08:00 |
|
Yineng Zhang
|
e0b9a423c8
|
chore: bump v0.4.3 (#3556)
|
2025-02-14 09:43:14 +08:00 |
|
Yineng Zhang
|
cddb1cdf8f
|
chore: bump v0.4.2.post4 (#3459)
|
2025-02-10 14:12:16 +08:00 |
|
Yineng Zhang
|
fa1b40e00d
|
use nvcr.io/nvidia/tritonserver:24.04-py3-min as base image (#3457)
|
2025-02-10 13:52:33 +08:00 |
|
Yineng Zhang
|
c1f5f99f60
|
chore: bump v0.4.2.post3 (#3369)
|
2025-02-07 08:20:03 -08:00 |
|
Yineng Zhang
|
7aad8d1854
|
chore: bump v0.4.2.post2 (#3313)
|
2025-02-05 17:35:02 +08:00 |
|
Yineng Zhang
|
6186a8f889
|
update flashinfer install index url (#3293)
|
2025-02-05 00:44:35 +08:00 |
|
HAI
|
2c1a695ff1
|
ROCm: sgl-kernel enablement starting with sgl_moe_align_block (#3287)
|
2025-02-04 21:44:44 +08:00 |
|
HAI
|
566d61d90f
|
ROCm: bump 6.3.0 (#3259)
|
2025-02-03 04:13:40 +08:00 |
|
HAI
|
17dbf976c5
|
update ENV to ROCm dockers (#3248)
|
2025-02-01 17:27:43 +08:00 |
|
Yineng Zhang
|
cf0f7eafe6
|
chore: bump v0.4.2.post1 (#3233)
|
2025-01-31 20:35:55 +08:00 |
|
Yineng Zhang
|
b49d6d0fee
|
support 12.5 CUDA runtime (#3231)
|
2025-01-31 20:31:38 +08:00 |
|
Yineng Zhang
|
cf142b6eb8
|
fix: update Dockerfile for cu118 (#3181)
|
2025-01-27 23:46:44 +08:00 |
|
Yineng Zhang
|
4ab43cfb3e
|
chore: bump v0.4.2 (#3180)
|
2025-01-27 21:42:05 +08:00 |
|
Byron Hsu
|
c0bf9bf15c
|
[devcontainer] add non-root user (#2989)
|
2025-01-22 17:47:54 -08:00 |
|
Yineng Zhang
|
e94fb7cb10
|
chore: bump v0.4.1.post7 (#3009)
|
2025-01-20 21:50:55 +08:00 |
|
Yineng Zhang
|
3fc2b62589
|
update docker dev image (#2985)
|
2025-01-19 23:45:39 +08:00 |
|
Byron Hsu
|
53cc91e504
|
[devcontainer] Fix mount and GPU & Support rust dev (#2978)
|
2025-01-19 16:34:01 +08:00 |
|
Yineng Zhang
|
b3e99dfb22
|
chore: bump v0.4.1.post6 (#2899)
|
2025-01-15 16:23:42 +08:00 |
|
kk
|
b8cd09f27a
|
update ROCm docker for layernorm kernel optimization (#2885)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-01-14 16:59:43 +08:00 |
|
kk
|
e808c1df3e
|
Integrate ROCm ater package for ck moe function feasibility (#2854)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Lin, Soga <soga.lin@amd.com>
|
2025-01-13 08:23:07 +00:00 |
|
sogalin
|
a18ab81ddd
|
Update base image for ROCm (#2852)
Co-authored-by: HAI <hixiao@gmail.com>
|
2025-01-13 14:39:44 +08:00 |
|
Yineng Zhang
|
f624901cdd
|
chore: bump v0.4.1.post5 (#2840)
|
2025-01-11 23:10:02 +08:00 |
|
Yineng Zhang
|
2f0d386496
|
chore: bump v0.4.1.post4 (#2713)
|
2025-01-06 01:29:54 +08:00 |
|
kk
|
148254d4db
|
Improve moe reduce sum kernel performance (#2705)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-01-02 01:11:06 -08:00 |
|