Commit Graph

10 Commits

Author SHA1 Message Date
Teng Ma
96a5e4dd79 [Feature] Support loading weights from ckpt engine worker (#11755)
Signed-off-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Yang Kaiyong <yangkaiyong.yky@antgroup.com>
Co-authored-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
Co-authored-by: Xuchun Shang <xuchun.shang@gmail.com>
Co-authored-by: Shangming Cai <csmthu@gmail.com>
2025-10-23 09:23:30 -07:00
Lianmin Zheng
9eefe2c0b7 Set CUDA_VISIBLE_DEVICES to achieve one GPU per process (#9170)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Cheng Wan <cwan@x.ai>
Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
2025-10-17 17:30:06 -07:00
Lianmin Zheng
cd7e1bd591 Sync code and test CI; rename some env vars (#11686) 2025-10-15 18:37:03 -07:00
Liangsheng Yin
acc2327bbd Move deep gemm related arguments to sglang.srt.environ (#11547) 2025-10-14 00:34:35 +08:00
Shu Wang
3df05f4d6a [NVIDIA] [3/N] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#9199) 2025-09-11 20:18:43 -07:00
Liangsheng Yin
f9afa7dceb Fix docs for clip max new tokens (#9082) 2025-08-11 13:15:21 -07:00
Yueyang Pan
98c00a2df1 Fix torch profiler bugs for bench_offline_throughput.py (#6557) 2025-06-09 20:33:41 +08:00
HAI
b819381fec AITER backend extension and workload optimizations (#6838)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-06-05 23:00:18 -07:00
Baizhou Zhang
791b3bfabb [Feature] Support Flashinfer fp8 blockwise GEMM kernel on Blackwell (#6479) 2025-05-28 16:03:43 -07:00
Brayden Zhong
12319a6787 [Docs] Add docs for SGLANG_ and SGL_ environment variables (#6206) 2025-05-13 01:45:41 +08:00