sglang

Author	SHA1	Message	Date
Xiaoyu Zhang	e0cd65c2b6	[hotfix] fix test_sampling_scaling_penalties.py ci test (#3084 )	2025-01-24 00:33:59 +08:00
Xiaoyu Zhang	f1b6861828	use flashinfer vec_dtypes in sgl_kernel (#3083 )	2025-01-23 22:19:04 +08:00
Yineng Zhang	0da0989ad4	sync flashinfer and update sgl-kernel tests (#3081 )	2025-01-23 21:13:55 +08:00
Yineng Zhang	07a22cbba3	use env variable to control the build conf on the CPU build node (#3080 )	2025-01-23 20:46:49 +08:00
Yineng Zhang	3d0bfa3e17	update version setup for sgl-kernel (#3079 )	2025-01-23 19:45:25 +08:00
Yineng Zhang	1f6cf0d4b9	fix build error for sgl-kernel (#3078 )	2025-01-23 19:16:35 +08:00
Lianmin Zheng	553f5a3ffe	Remove torch dependency in sgl-kernel (#3074 )	2025-01-23 17:23:37 +08:00
Xiaoyu Zhang	ac2dc35d0e	support lightning_attention_decode in sgl-kernel for MiniMax-Text-01 (#3030 )	2025-01-23 15:29:20 +08:00
Yineng Zhang	3e032c07cc	use v0.6.4.post1 for sgl-kernel ci (#3071 )	2025-01-23 14:19:38 +08:00
Yineng Zhang	44e12ce463	docs: update developer guide for sgl-kernel (#3069 )	2025-01-23 14:08:25 +08:00
Yineng Zhang	a547aad61f	docs: add developer guide for sgl-kernel (#3068 )	2025-01-23 13:47:53 +08:00
Lianmin Zheng	ea535dc574	Revert "disable custom allreduce on HIP" (#3067 )	2025-01-22 21:33:35 -08:00
Ke Wen	862bcff833	Support loading of larger models with on-the-fly quantization (#3061 )	2025-01-22 21:33:17 -08:00
Lianmin Zheng	8b84e69f25	Fix tp token sync for dp attention (#3062 )	2025-01-22 18:51:40 -08:00
Byron Hsu	5de50653cd	[router] make error actionable (#3063 )	2025-01-22 17:56:21 -08:00
Byron Hsu	c0bf9bf15c	[devcontainer] add non-root user (#2989 )	2025-01-22 17:47:54 -08:00
Lianmin Zheng	022614d26e	Add some flags to allow sync token ids across TP ranks (#3060 )	2025-01-22 15:05:51 -08:00
lukec	b8ab989ff4	Fix the FP8 E4M3 parsing offline scales failure bug (#3045 )	2025-01-22 14:19:33 -08:00
Baizhou Zhang	b3393e941f	[Doc] Update doc of profiling with PyTorch Profiler (#3038 )	2025-01-22 14:17:26 -08:00
Hui Liu	ddc2001fb0	disable custom allreduce on HIP (#3058 )	2025-01-22 13:57:22 -08:00
Yineng Zhang	806a3002c1	add notice about flashinfer in sgl-kernel (#3057 )	2025-01-23 02:47:36 +08:00
nstream-ai-devx	0d2148efaa	fix rotary_embedding rope_scaling for phi (#3055 )	2025-01-23 02:15:32 +08:00
Yineng Zhang	bf669606eb	feat: integrate bmm_fp8 kernel into sgl-kernel (#3056 )	2025-01-23 00:39:38 +08:00
Yineng Zhang	b2bd8f444c	minor: update header and use pytest (#3054 )	2025-01-22 23:45:18 +08:00
Yineng Zhang	9d9b482a39	feat: integrate activation kernels into sgl-kernel (#3053 )	2025-01-22 23:25:45 +08:00
Yineng Zhang	7353fb9b97	feat: integrate norm kernels into sgl-kernel (#3052 )	2025-01-22 21:32:48 +08:00
Yineng Zhang	bcda0c9ee6	sync the upstream updates of flashinfer (#3051 )	2025-01-22 20:33:13 +08:00
Yineng Zhang	9f8f2c7f74	update norm cu (#3048 )	2025-01-22 18:58:44 +08:00
Ke Bao	6fc37bd8ee	Fix sgl-kernel compile for sm80 (#3046 )	2025-01-22 16:49:08 +08:00
Lianmin Zheng	3d8f1c9bcf	Use int64 as indices for set_kv_buffer (#3039 )	2025-01-21 19:46:09 -08:00
Yineng Zhang	a42213dbd4	fix pr-test-sgl-kernel (#3036 )	2025-01-22 00:56:42 +08:00
Ke Bao	0ac019f171	Support sm90 Int8 gemm (#3035 )	2025-01-21 22:21:54 +08:00
Yineng Zhang	5a0d680a14	feat: add flashinfer as 3rdparty and use rmsnorm as example (#3033 )	2025-01-21 20:44:49 +08:00
Lianmin Zheng	a4331cd260	Add accuracy and latency tests of eagle into CI (#3027 )	2025-01-21 02:55:14 -08:00
Yineng Zhang	ec1c21cdc4	upgrade torch version for sgl-kernel (#3026 )	2025-01-21 14:32:08 +08:00
Yineng Zhang	6c856b4f3a	minor: update Makefile for sgl-kernel (#3025 )	2025-01-21 13:08:15 +08:00
Lianmin Zheng	287d07a669	Misc fixes for eagle (flush_cache, CPU overhead) (#3014 )	2025-01-20 20:27:38 -08:00
Hui Liu	d2571dd5c7	Enable Cohere2 Models (#3018 )	2025-01-20 19:21:41 -08:00
996_icu	b730aa6b9e	[EAGLE] Fix some boundary situation when retract reqs and req's max token = 1 (#2939 ) Co-authored-by: josephyou <josephyou@tencent.com>	2025-01-20 17:46:43 -08:00
Lianmin Zheng	60b2a44a80	Fix flaky tests in test_programs.py (#3022 )	2025-01-20 16:50:39 -08:00
Hongpeng Guo	949b3fbfce	[Doc] Update doc of custom logit processor (#3021 ) Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>	2025-01-20 16:50:25 -08:00
Hui Liu	da4e8b3892	enable kv_scale remap (#3017 )	2025-01-20 14:40:45 -08:00
Enrique Shockwave	af6c5357d5	deepseek v3 and r1 chat template (#3015 )	2025-01-20 14:40:12 -08:00
Byron Hsu	3ad4cd4915	bump router to 0.1.3 (#3020 )	2025-01-20 14:38:06 -08:00
Byron Hsu	3a8428ecaa	[router] Expose worker startup interval (#3019 )	2025-01-20 14:36:54 -08:00
Byron Hsu	0311ce8e1c	[router] Expose worker startup secs & Return error instead of panic for router init (#3016 )	2025-01-20 12:45:13 -08:00
Ke Bao	5dfcacfcb1	Add compile flags for cutlass 3.x (#3013 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2025-01-21 00:04:12 +08:00
Ke Bao	41a0ccd4f1	Add clang-format check to sgl-kernel ci (#3012 )	2025-01-20 23:22:19 +08:00
Yineng Zhang	e94fb7cb10	chore: bump v0.4.1.post7 (#3009 )	2025-01-20 21:50:55 +08:00
Byron Hsu	b5caa22dfb	[kernel] port rope cuda kernel to sgl-kernel (#2993 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-01-20 20:58:51 +08:00

1 2 3 4 5 ...

1847 Commits