sglang

Author	SHA1	Message	Date
Byron Hsu	20453cef62	[test] Lower number of top logprobs to get rid of `-inf` (#3212 )	2025-01-30 18:01:23 +08:00
Mick	9f635ea50d	[Fix] Address remaining issues of supporting MiniCPMV (#2977 )	2025-01-28 00:22:13 -08:00
Byron Hsu	988d0a4bfc	[kernel] Use sgl_kernel rope (#3169 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-01-28 14:33:11 +08:00
Byron Hsu	27aeb4b7d8	[test] deduplicate test_session_control (#3183 )	2025-01-28 13:17:06 +08:00
Lianmin Zheng	f8ca66fb49	Update thresholds in test_nightly_gsm8k_eval.py (#3176 )	2025-01-27 03:02:09 -08:00
Lianmin Zheng	52c03f16b9	Add activation parameters to fused_moe (#3170 )	2025-01-27 00:23:37 -08:00
yizhang2077	1e3e521544	add unit test for block wise fp8 (#3156 )	2025-01-27 15:32:04 +08:00
Lianmin Zheng	af02f99b7c	Add more logprob tests (#3162 )	2025-01-26 22:24:55 -08:00
YAMY	b045841bae	Feature/function calling update (#2700 ) Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com>	2025-01-26 09:57:51 -08:00
Lianmin Zheng	f4a92f4b56	Temporarily skip the openai frontend tests (#3151 )	2025-01-26 04:17:35 -08:00
Lianmin Zheng	d1a0863251	Add a test case for cached_tokens (#3145 )	2025-01-26 01:39:28 -08:00
Lianmin Zheng	da6f8081f6	Fix CI tests (#3132 )	2025-01-25 17:43:39 -08:00
Lianmin Zheng	a4331cd260	Add accuracy and latency tests of eagle into CI (#3027 )	2025-01-21 02:55:14 -08:00
Lianmin Zheng	287d07a669	Misc fixes for eagle (flush_cache, CPU overhead) (#3014 )	2025-01-20 20:27:38 -08:00
Hongpeng Guo	583697cd71	[Enhancement] Custom Logit Processor Improvement (#2998 ) Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>	2025-01-20 02:00:35 -08:00
Lianmin Zheng	51e87f6f21	Skip flaky custom_logit_processor tests (#3004 )	2025-01-20 00:28:47 -08:00
Lianmin Zheng	03464890e0	Separate two entry points: Engine and HTTP server (#2996 ) Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>	2025-01-19 22:09:24 -08:00
Lianmin Zheng	cd493b5afc	Improve metrics, logging, and importing orders (#2992 )	2025-01-19 18:36:59 -08:00
Lianmin Zheng	61f42b5732	Move sgl.Runtime under sglang/lang (#2990 )	2025-01-19 17:10:29 -08:00
Hongpeng Guo	e403d23757	[Feature] Add sampler custom logits processor (#2396 ) Signed-off-by: Hongpeng Guo <hpguo@anyscale.com>	2025-01-19 14:46:53 -08:00
Enrique Shockwave	3bcf5ecea7	support regex in xgrammar backend (#2983 )	2025-01-20 04:34:41 +08:00
Chang Su	4d4cdb3fe7	Frontend: better error message handling for FINISH_ABORT in scheduler.py (#2956 )	2025-01-18 19:37:30 -08:00
Mick	3d93f84a00	[Feature] Support minicpmv v2.6 (#2785 ) Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-01-18 14:14:19 -08:00
bjmsong	d3024f4fc8	support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894 ) Co-authored-by: bjmsong <bjmsong@126.com>	2025-01-18 11:43:22 +08:00
Ke Bao	d47c5101f1	Add ut for qwen model (#2947 )	2025-01-18 00:03:54 +08:00
Chang Su	a8ccacc8b8	[Frontend] Fix request length check and add option to disallow auto truncation in scheduler (#2876 )	2025-01-16 14:51:19 -08:00
Lianmin Zheng	8f2c522aba	Improve benchmark scripts and error message printing (#2922 )	2025-01-16 06:24:31 -08:00
yizhang2077	767c9dec03	adapt custom allreduce for tensorrt llm (#2511 )	2025-01-16 04:57:35 +08:00
Ke Bao	bfbda62c8b	Add ut for w8a8 int8 quantization (#2897 )	2025-01-15 18:29:14 +08:00
fzyzcjy	923f518337	CUDA-graph-compatible releasing and resuming KV cache and model weight memory (#2630 )	2025-01-13 11:38:51 -08:00
Lianmin Zheng	6249e4a19e	Revert "Integration of TurboMind AWQ" (#2866 )	2025-01-13 04:44:39 -08:00
bjmsong	17de02f98d	Integration of TurboMind AWQ (#2828 ) Co-authored-by: root <bjmsong@126.com>	2025-01-13 20:14:16 +08:00
Lianmin Zheng	51ab3ccf47	Collect more metrics: num_requests_total (#2859 )	2025-01-13 03:57:39 -08:00
Lianmin Zheng	67008f4b32	Use only one GPU for MLA CI tests (#2858 )	2025-01-13 03:55:33 -08:00
Lianmin Zheng	72c7776355	Fix linear.py and improve weight loading (#2851 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-01-13 01:39:14 -08:00
bjmsong	0bb0f76311	Support FP8 E4M3 KV Cache (#2786 ) Co-authored-by: root <bjmsong@126.com>	2025-01-12 21:17:11 -08:00
Shi Shuai	c4f9707e16	Improve: Token-In Token-Out Usage for RLHF (#2843 )	2025-01-11 15:14:26 -08:00
Lianmin Zheng	f1769586d6	Update threshold in test_nightly_gsm8k_eval.py (#2836 )	2025-01-10 20:37:34 -08:00
justdoit	a47bf39123	[Eagle2] Fix multiple concurrent request crashes (#2730 )	2025-01-10 14:00:43 -08:00
Chang Su	f290bd4332	[Bugfix] Fix embedding model hangs with `--enable-metrics` (#2822 )	2025-01-10 13:14:51 -08:00
JJJJOHNSON	694e41925e	[eagle2] fix end check when target model verify (#2723 )	2025-01-07 21:46:02 -08:00
Lianmin Zheng	b22f3f6475	Fix nightly accuracy tests (#2780 )	2025-01-07 21:02:35 -08:00
Lianmin Zheng	6fb5768372	Disable math eval on nightly CI temporarily (#2779 )	2025-01-07 18:17:34 -08:00
libra	bdb3929dbb	Refactor SchedulePolicy to improve code organization (#2571 )	2025-01-04 00:05:16 +08:00
Lianmin Zheng	0f9cc6d8d3	Fix package loss for small models (#2717 ) Co-authored-by: sdli1995 < mmlmonkey@163.com>	2025-01-02 18:25:26 -08:00
Shi Shuai	dd2e2d275f	Docs: Update documentation workflow and contribution guide (#2704 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-01-02 09:18:31 -08:00
yukavio	815dce0554	Eagle speculative decoding part 4: Add EAGLE2 worker (#2150 ) Co-authored-by: kavioyu <kavioyu@tencent.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-01-02 03:22:34 -08:00
fzyzcjy	9183c23eca	Speed up `update_weights_from_tensor` (#2695 )	2025-01-02 02:05:19 -08:00
Xiaotong Jiang	a4d6d6f1dd	[feat]: Add math eval to CI nightly run (#2663 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-01-01 15:29:35 -08:00
Lianmin Zheng	21ec66e59e	Minor follow-up fixes for the logprob refactor (#2670 )	2024-12-30 05:42:08 -08:00

1 2 3 4 5 ...

381 Commits