sglang

Author	SHA1	Message	Date
Baizhou Zhang	90a4b7d98a	[Feature]Support ragged prefill in flashinfer mla backend (#3967 ) Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-28 18:13:56 -08:00
fzyzcjy	e3e0bc50a9	[Feature] SPMD for SGLang + Verl (#3852 )	2025-02-28 09:53:10 -08:00
Chang Su	eec3f6d1eb	[Bugfix] Fix tokenizer_manager not getting 400 when req is too long (#3678 ) Co-authored-by: voidxb <unkown>	2025-02-27 22:59:43 -08:00
KCFindstr	bc20e93f2d	[feat] Add Vertex AI compatible prediction route for /generate (#3866 )	2025-02-27 19:42:15 -08:00
Qiaolin Yu	d6898dd253	Add return hidden state in the native API (#3897 ) Co-authored-by: Beichen-Ma <mabeichen12@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-26 22:06:54 -08:00
JC1DA	7551498a69	[Feature] Support llguidance for constrained decoding (#3298 )	2025-02-26 10:41:49 -08:00
Chaitanya Sri Krishna Lolla	6ce9dbe828	[ROCm] Enable Fused MLA Triton kernel for DeepSeekV3 (#3237 ) Co-authored-by: HAI <hixiao@gmail.com>	2025-02-24 18:14:31 -08:00
Lianmin Zheng	d7934cde45	Fix CI and install docs (#3821 )	2025-02-24 16:17:38 -08:00
laixin	1a6e97577a	Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3730 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2025-02-24 05:43:35 -08:00
Lianmin Zheng	27a46317b6	Fix dependency (#3813 )	2025-02-24 03:50:58 -08:00
Zhiyu	c66b2c9cf1	Add support for nvidia modelopt fp8 kv cache (#3223 )	2025-02-22 07:04:58 +08:00
Andrew Smith	1df6eabd5d	feat: Add SageMaker support (#3740 )	2025-02-21 19:31:09 +08:00
aoshen524	e79f7420be	[Fix] Fix bugs and refactor codes in lora for better scalability. (#3652 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-20 11:51:57 -08:00
Yineng Zhang	e782eb7e6a	chore: bump v0.4.3.post1 (#3638 )	2025-02-17 21:58:19 +08:00
Yineng Zhang	e319153be8	update unit test (#3636 )	2025-02-17 21:06:10 +08:00
Yineng Zhang	32b44d2fca	add mtp unit test (#3634 )	2025-02-17 19:04:07 +08:00
Mick	bcc213df61	Model: Support Qwen 2.5 vl (#3258 )	2025-02-16 00:58:53 -08:00
Yineng Zhang	6718b10996	fix eagle unit test (#3591 )	2025-02-15 23:10:48 +08:00
Yineng Zhang	e0b9a423c8	chore: bump v0.4.3 (#3556 )	2025-02-14 09:43:14 +08:00
Yineng Zhang	70f894b810	feat: support flashinfer mla attention for deepseek v3 (#3550 )	2025-02-14 08:50:14 +08:00
Ke Bao	7e6d5fc694	Support Eagle cuda graph for Triton backend (#3500 )	2025-02-12 02:27:45 +08:00
Jackmin801	5f0e7de339	[Feat] Return hidden states (experimental) (#3364 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-10 15:54:37 -08:00
Ke Bao	2d61132374	Support Eagle2 for Triton backend (#3466 )	2025-02-10 20:00:42 +08:00
Baizhou Zhang	c45cab1c00	[Fix] Fix accuracy bug and refactor codes for lora (#3413 )	2025-02-10 13:29:00 +08:00
Yineng Zhang	60abdb3e7c	minor: cleanup test_eagle_infer (#3415 )	2025-02-09 09:34:30 +08:00
Ying Sheng	7b4e61fff3	[Fix] Fix eagle with disable cuda graph (#3411 )	2025-02-09 08:40:00 +08:00
Yineng Zhang	6222e1c228	add disable cuda graph unit test for eagle 2 (#3412 )	2025-02-09 08:02:56 +08:00
Yineng Zhang	2b1808cec4	update unit test in AMD CI (#3366 )	2025-02-07 17:25:16 +08:00
Ke Bao	a322051e31	Support custom mask for Triton attention (#3317 )	2025-02-06 01:16:02 +08:00
Ke Bao	de5533341e	Update Triton extend backend interface (#3309 )	2025-02-05 18:12:22 +08:00
Ke Bao	a07364ccc5	Update Triton decode backend interface (#3292 )	2025-02-04 23:26:04 +08:00
Yineng Zhang	d39899e85c	upgrade flashinfer v0.2.0.post2 (#3288 ) Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-04 21:41:40 +08:00
Baizhou Zhang	70817a7eae	[Feature] Define backends and add Triton backend for Lora (#3161 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-03 22:09:13 -08:00
Xiaoyu Zhang	3c8ac78dc1	optimize test_fused_moe style (#3268 )	2025-02-03 18:56:18 +08:00
Ke Bao	5317902670	Add test for fp8 torch compile (#3246 )	2025-02-01 16:07:54 +08:00
Byron Hsu	734daedd8f	[fix] Clamp logprob with dtype min to prevent `-inf` (#3224 )	2025-01-31 17:04:04 +08:00
Byron Hsu	20453cef62	[test] Lower number of top logprobs to get rid of `-inf` (#3212 )	2025-01-30 18:01:23 +08:00
Mick	9f635ea50d	[Fix] Address remaining issues of supporting MiniCPMV (#2977 )	2025-01-28 00:22:13 -08:00
Byron Hsu	988d0a4bfc	[kernel] Use sgl_kernel rope (#3169 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-01-28 14:33:11 +08:00
Byron Hsu	27aeb4b7d8	[test] deduplicate test_session_control (#3183 )	2025-01-28 13:17:06 +08:00
Lianmin Zheng	f8ca66fb49	Update thresholds in test_nightly_gsm8k_eval.py (#3176 )	2025-01-27 03:02:09 -08:00
Lianmin Zheng	52c03f16b9	Add activation parameters to fused_moe (#3170 )	2025-01-27 00:23:37 -08:00
yizhang2077	1e3e521544	add unit test for block wise fp8 (#3156 )	2025-01-27 15:32:04 +08:00
Lianmin Zheng	af02f99b7c	Add more logprob tests (#3162 )	2025-01-26 22:24:55 -08:00
YAMY	b045841bae	Feature/function calling update (#2700 ) Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com>	2025-01-26 09:57:51 -08:00
Lianmin Zheng	f4a92f4b56	Temporarily skip the openai frontend tests (#3151 )	2025-01-26 04:17:35 -08:00
Lianmin Zheng	d1a0863251	Add a test case for cached_tokens (#3145 )	2025-01-26 01:39:28 -08:00
Lianmin Zheng	da6f8081f6	Fix CI tests (#3132 )	2025-01-25 17:43:39 -08:00
Lianmin Zheng	a4331cd260	Add accuracy and latency tests of eagle into CI (#3027 )	2025-01-21 02:55:14 -08:00
Lianmin Zheng	287d07a669	Misc fixes for eagle (flush_cache, CPU overhead) (#3014 )	2025-01-20 20:27:38 -08:00

1 2 3 4 5 ...

417 Commits