sglang

Author	SHA1	Message	Date
Mick	583d6af71b	example: add vlm to token in & out example (#3941 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-04 22:18:26 -08:00
Lianmin Zheng	77a3954bf7	Simplify eagle tests and TP sync in grammar backend (#4066 )	2025-03-04 13:40:40 -08:00
Xihuai Wang	95575aa76a	Reasoning parser (#4000 ) Co-authored-by: Lucas Pickup <lupickup@microsoft.com>	2025-03-03 21:16:36 -08:00
Qiaolin Yu	57a404fd55	Remove outdated test utils and fix links for the doc of sampling params (#3999 )	2025-03-03 09:41:38 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	1a8f995c46	remove cache configs in model definitions (#4031 )	2025-03-03 05:00:50 -08:00
Lianmin Zheng	66301e124f	Improve code styles (#4021 )	2025-03-03 03:20:23 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
fzyzcjy	e3e0bc50a9	[Feature] SPMD for SGLang + Verl (#3852 )	2025-02-28 09:53:10 -08:00
lukec	21463e321a	Expert Parallelism (EP) Support for DeepSeek V3/R1 (#3602 ) Co-authored-by: laixin <xielx@shanghaitech.edu.cn> Co-authored-by: HandH1998 <1335248067@qq.com> Co-authored-by: laixin <q865809639@gmail.com>	2025-02-26 02:29:37 -08:00
Shenggui Li	c0bb9eb3b3	[improve] made timeout configurable (#3803 )	2025-02-25 00:26:08 -08:00
aoshen524	e79f7420be	[Fix] Fix bugs and refactor codes in lora for better scalability. (#3652 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-20 11:51:57 -08:00
Baizhou Zhang	70817a7eae	[Feature] Define backends and add Triton backend for Lora (#3161 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-03 22:09:13 -08:00
Lianmin Zheng	53cef81587	Improve weight loading and code style (#3174 )	2025-01-27 03:00:41 -08:00
Lianmin Zheng	a4331cd260	Add accuracy and latency tests of eagle into CI (#3027 )	2025-01-21 02:55:14 -08:00
Lianmin Zheng	287d07a669	Misc fixes for eagle (flush_cache, CPU overhead) (#3014 )	2025-01-20 20:27:38 -08:00
Lianmin Zheng	60b2a44a80	Fix flaky tests in test_programs.py (#3022 )	2025-01-20 16:50:39 -08:00
Lianmin Zheng	03464890e0	Separate two entry points: Engine and HTTP server (#2996 ) Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>	2025-01-19 22:09:24 -08:00
Lianmin Zheng	61f42b5732	Move sgl.Runtime under sglang/lang (#2990 )	2025-01-19 17:10:29 -08:00
Mick	3d93f84a00	[Feature] Support minicpmv v2.6 (#2785 ) Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-01-18 14:14:19 -08:00
bjmsong	d3024f4fc8	support e4m3 kvcache in qwen2 & add kv scaling facotr json (#2894 ) Co-authored-by: bjmsong <bjmsong@126.com>	2025-01-18 11:43:22 +08:00
Lianmin Zheng	8f2c522aba	Improve benchmark scripts and error message printing (#2922 )	2025-01-16 06:24:31 -08:00
Lianmin Zheng	f65c13b559	Remove normalized_prompt_logprobs from the engine to make code easier to maintain (#2902 )	2025-01-15 04:54:14 -08:00
Lianmin Zheng	b22f3f6475	Fix nightly accuracy tests (#2780 )	2025-01-07 21:02:35 -08:00
Lianmin Zheng	bdc1acf6cd	Misc fix for min_p_sampling, --cuda-graph-bs (#2761 )	2025-01-07 02:52:53 -08:00
Xingyao Wang	1acbaf1b5a	Add generator-style run_batch function (#2513 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-01-06 15:04:55 -08:00
HandH1998	53aed988cb	Refactor MoE (#2575 ) Co-authored-by: zhyncs <me@zhyncs.com>	2024-12-26 00:02:14 +08:00
Lianmin Zheng	a6ca736c8e	Simplify stream_output (#2398 )	2024-12-08 12:27:13 -08:00
Lianmin Zheng	a2486eb58f	Fix a bug with logprob streaming + chunked prefill (#2403 )	2024-12-08 03:55:27 -08:00
Lianmin Zheng	ccaf1f997c	[CI] Print summary on github actions (#2274 )	2024-11-29 23:48:54 -08:00
Chayenne	7d5d1d3d29	udate weights from disk (#2265 )	2024-11-30 01:17:00 +00:00
bjmsong	01017d4c20	Support LoRA in Completion API (#2243 ) Co-authored-by: root <bjmsong@126.com>	2024-11-29 16:13:38 -08:00
Lianmin Zheng	b2ccf36d4d	Fix memory leak during abort (#2238 )	2024-11-28 02:22:15 -08:00
Lianmin Zheng	d4fc1a70e3	Crash the server correctly during error (#2231 )	2024-11-28 00:22:39 -08:00
Lianmin Zheng	fb6e04a0c2	Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2222 )	2024-11-27 02:52:46 -08:00
Lianmin Zheng	6997e28f6e	Revert "Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default" (#2221 )	2024-11-27 02:02:01 -08:00
Lianmin Zheng	a0e58740a8	Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2217 )	2024-11-27 01:13:41 -08:00
Lianmin Zheng	8e1adb8441	Allow overwrite flashinfer use_tensorcore (#2169 )	2024-11-24 20:58:17 -08:00
Lianmin Zheng	c211e7b669	Simplify batch update (#2154 )	2024-11-24 04:47:10 -08:00
Byron Hsu	cbedd1db1d	[router] cache-aware load-balancing router v1 (#2114 )	2024-11-23 08:34:48 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Yineng Zhang	4f8c3aeafc	minor: update gsm8k threshold (#2125 )	2024-11-22 19:23:58 +08:00
bjmsong	ad30d5cf9a	Benchmark with Pytorch Profiler easily (#2110 ) Co-authored-by: root <bjmsong@126.com>	2024-11-21 23:29:50 -08:00
Lianmin Zheng	dfec7fca06	Rename sglang.bench_latency to sglang.bench_one_batch (#2118 )	2024-11-21 20:07:48 -08:00
James Xu	f6f713797b	Add support for Qwen2-VL-based embedding models (#2055 )	2024-11-21 14:24:25 -08:00
Lianmin Zheng	7d671e4ad2	Enable overlap by default (#2067 )	2024-11-19 22:07:58 -08:00
Lianmin Zheng	b110453802	Simplify logits penalizer (#2086 )	2024-11-18 17:48:28 -08:00
Yineng Zhang	766192610e	feat: update torch 2.5.1 (#2069 )	2024-11-18 21:29:13 +08:00
ws	29ebe3dff4	fix: align enable_overlap_scheduler naming between code and docs (#2038 )	2024-11-15 03:39:10 -08:00
Lianmin Zheng	aae5434bdf	Fix unit tests (#2034 )	2024-11-14 11:08:37 -08:00

1 2 3 4

168 Commits