sglang

Author	SHA1	Message	Date
fzyzcjy	6b7038babd	Speedup warmup when DP > 1 (#4695 )	2025-03-24 21:08:05 -07:00
Wei Wu	91ba98fe50	[Fix] Resolve GPU Memory Leak in update_weights_from_tensor (#4446 )	2025-03-17 08:54:30 +00:00
Yinghai Lu	c614dbdf95	Nicer standalone engine inferface (#4480 )	2025-03-17 01:42:04 -07:00
Rin Intachuen	d1112d8548	Add endpoint for file support, purely to speed up processing of input_embeds. (#2797 )	2025-03-16 18:30:37 -07:00
woodx	48efec7b05	Feature: support code completion (#3612 )	2025-03-16 18:26:19 -07:00
Ying Sheng	1b859295f4	[Eagle] Remove the greedy branch and some redundant code (#4363 ) Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-16 02:48:55 -07:00
Wang Ran (汪然)	158430473e	Fix typos (#4368 )	2025-03-15 21:27:58 -07:00
wangyu	1ce4878d31	feat(remote_model): support variable remote backend for model loader (#3964 ) Signed-off-by: wangyu <wangyu.steph@bytedance.com>	2025-03-14 00:40:44 -07:00
Wang Ran (汪然)	91b19949d7	typo: Update http_server.py (#4350 )	2025-03-12 15:05:30 -07:00
Yineng Zhang	1cf63485c1	upgrade flashinfer 0.2.3 (#4317 ) Co-authored-by: qingquansong <qsong@linkedin.com>	2025-03-11 15:37:17 -07:00
Lianmin Zheng	d4017a6b63	[EAGLE] many fixes for eagle (#4195 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Sehoon Kim <sehoon@x.ai>	2025-03-07 22:12:13 -08:00
Pan Lyu	361971b859	Add Support for Qwen2-VL Multi-modal Embedding Models (#3694 )	2025-03-06 16:46:20 -08:00
Jhin	70b3c6eeb1	Add update_weights_from_disk endpoint to Engine (#4102 ) Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-03-05 12:25:18 -08:00
Xihuai Wang	95575aa76a	Reasoning parser (#4000 ) Co-authored-by: Lucas Pickup <lupickup@microsoft.com>	2025-03-03 21:16:36 -08:00
Lianmin Zheng	935cda944b	Misc clean up; Remove the support of jump forward (#4032 )	2025-03-03 07:02:14 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Qiaolin Yu	40782f05d7	Refactor: Move return_hidden_states to the generate input (#3985 ) Co-authored-by: Beichen-Ma <mabeichen12@gmail.com>	2025-03-01 17:51:29 -08:00
Chayenne	930da877c4	rename FunctionCallReqInput to ParseFunctionCallReq (#3976 )	2025-02-28 18:46:25 -08:00
fzyzcjy	e3e0bc50a9	[Feature] SPMD for SGLang + Verl (#3852 )	2025-02-28 09:53:10 -08:00
KCFindstr	bc20e93f2d	[feat] Add Vertex AI compatible prediction route for /generate (#3866 )	2025-02-27 19:42:15 -08:00
Yineng Zhang	564bdf29f7	upgrade flashinfer v0.2.2.post1 (#3934 )	2025-02-27 09:53:48 -08:00
Wang Ran (汪然)	60b771c815	Improve: fix typos (#3801 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-24 16:51:23 -08:00
Lianmin Zheng	62bbd34393	Revert "Extract generation_manager from tokenizer_manager" (#3829 )	2025-02-24 14:49:16 -08:00
Lianmin Zheng	f2388f6b95	Revert "Rename TokenizerManager to StdOrchestrator" (#3828 )	2025-02-24 14:47:59 -08:00
fzyzcjy	45360b2fa9	Improve: Rename TokenizerManager to StdOrchestrator (#3116 )	2025-02-23 00:30:58 -08:00
fzyzcjy	3f41b18455	Improve: Extract generation_manager from tokenizer_manager (#3115 )	2025-02-22 23:25:45 -08:00
Andrew Smith	1df6eabd5d	feat: Add SageMaker support (#3740 )	2025-02-21 19:31:09 +08:00
Yineng Zhang	75d171a9c5	chore: update flashinfer v0.2.1.post2 (#3644 )	2025-02-18 02:47:42 +08:00
Mick	7711ac6ed0	doc: emphasize and notify the usage of chat_template (#3589 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-15 00:10:32 -08:00
Shenggui Li	fb4c9c3a30	[fix] added support for vlm in offline inference (#3548 )	2025-02-15 05:27:29 +08:00
Yineng Zhang	70f894b810	feat: support flashinfer mla attention for deepseek v3 (#3550 )	2025-02-14 08:50:14 +08:00
Ata Fatahi	b8318aec48	Make NCCL NVLS configurable (#3502 )	2025-02-12 03:25:06 +08:00
Yineng Zhang	d39899e85c	upgrade flashinfer v0.2.0.post2 (#3288 ) Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-04 21:41:40 +08:00
YAMY	b045841bae	Feature/function calling update (#2700 ) Co-authored-by: Mingyuan Ma <mamingyuan2001@berkeley.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: shuaills <shishuaiuoe@gmail.com>	2025-01-26 09:57:51 -08:00
Lianmin Zheng	1dda8c5e4c	Return more infos for computing average acceptance length (#3152 )	2025-01-26 04:51:54 -08:00
Lianmin Zheng	03464890e0	Separate two entry points: Engine and HTTP server (#2996 ) Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>	2025-01-19 22:09:24 -08:00

36 Commits