sglang

Author	SHA1	Message	Date
Lianmin Zheng	86d10d220f	Update grok.py and tiktoken tokenizer (#9532 )	2025-08-23 05:40:18 -07:00
blzheng	ebbb75e917	[CPU] Fix TP padding issue on Phi-4 (#8289 )	2025-08-17 16:25:26 -07:00
PGFLMG	b7cd743038	[Feat] QWen-1M context support[2/2]: Update block sparse attention backend (#5949 )	2025-08-06 23:49:36 -07:00
Wenchen Lo	ea93079b30	model: adapt mllama4 to VisionAttention (#8512 ) Co-authored-by: root <mickjagger19@icloud.com>	2025-08-02 00:39:40 -07:00
Chang Su	51c38163c1	model: support Step3V (#8583 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: nnnobody-code <nnnobody@foxmail.com> Co-authored-by: ispobock <ispobaoke@gmail.com> Co-authored-by: Qiaolin-Yu <qy254@cornell.edu> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com> Co-authored-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-31 02:41:00 -07:00
Lianmin Zheng	8d2cf38c79	[Minor] Remove redundant print (#8005 )	2025-07-14 10:55:13 -07:00
Atream	615553079d	Support Kimi K2 (#7940 )	2025-07-11 00:02:21 -07:00
Lianmin Zheng	14229ccf8f	Move mem_fraction_static adjustment for multimodal models to `server_args.py` & Fix session control & Other cleanups (#7748 )	2025-07-04 16:33:33 -07:00
Xinyuan Tong	d6864ce6d6	[New Model] Devstral support (#6547 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-26 19:27:48 -07:00
Mick	01dd39bac1	refactor: minor refactors regarding multimodal processing (#6187 )	2025-05-17 22:53:20 -07:00
Lianmin Zheng	e07a6977e7	Minor improvements of TokenizerManager / health check (#6327 )	2025-05-15 15:29:25 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
xm:D	3409aaab32	Support InternVL3 (#5350 ) Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-05-01 22:38:59 -07:00
liwenju0	8fefdd32c7	[Feature] add support kimi vl model (#5383 ) Co-authored-by: wenju.li <wenju.li@deepctr.cn>	2025-04-29 21:31:19 -07:00
Lianmin Zheng	5641a09458	Revert "[Model] Support `ArcticForCausalLM` architecture (Snowflake/snowflake-arctic-instruct)" (#5754 )	2025-04-25 15:50:28 -07:00
Brayden Zhong	43fb95c2fa	[Model] Support `ArcticForCausalLM` architecture (Snowflake/snowflake-arctic-instruct) (#5078 ) Co-authored-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>	2025-04-25 15:24:09 +08:00
Mick	34ef6c8135	[VLM] Adopt fast image processor by default (#5065 )	2025-04-11 21:46:58 -07:00
Adarsh Shirawalmath	f8f9244a61	[Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.50.0 (#3984 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-22 14:27:39 -07:00
萝卜菜	d6d21640d3	[Feature] Support Deepseek-VL2 (#2798 ) Co-authored-by: Edenzzzz <wtan45@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yi Zhang <1109276519@qq.com>	2025-03-16 23:07:59 -07:00
Mick	9d02bb3e2a	Urgent model support: support gemma-3-it (#4424 )	2025-03-16 17:37:32 -07:00
wangyu	1ce4878d31	feat(remote_model): support variable remote backend for model loader (#3964 ) Signed-off-by: wangyu <wangyu.steph@bytedance.com>	2025-03-14 00:40:44 -07:00
Mick	01090e8ac3	model: Support Janus-pro (#3203 )	2025-03-12 11:02:11 -07:00
Mick	ff2ce0b86f	refactor: move image processors to separate files (#4229 )	2025-03-11 12:35:35 -07:00
Mick	bcc213df61	Model: Support Qwen 2.5 vl (#3258 )	2025-02-16 00:58:53 -08:00
Yunmeng	656aed58c6	Remove vllm dependency in model config (#2809 )	2025-01-09 17:51:56 +08:00
Yineng Zhang	85e1a6f3aa	Update model_loader deps and qqq quantization deps (#2220 ) (#2318 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-02 23:22:13 +08:00
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Lianmin Zheng	2ce32db6fb	Let reward model take text inputs instead of message lists (#1907 ) Co-authored-by: Kyle Corbitt <kyle@corbt.com>	2024-11-03 13:27:12 -08:00
Ran Chen	146f613405	Fix incorrect context length for llama3.2-11b (#1873 )	2024-11-02 00:04:50 -07:00
Hui Liu	9ce8e1a93c	move max_position_embeddings to the last (#1799 )	2024-10-25 19:30:50 -07:00
Lianmin Zheng	8f8f96a621	Fix the perf regression due to additional_stop_token_ids (#1773 )	2024-10-23 16:45:21 -07:00
Lianmin Zheng	0d800090b4	Fix missing additional_stop_token_ids (#1769 )	2024-10-23 12:18:59 -07:00
Lianmin Zheng	80a905475d	Fix stop condition for <\|eom_id\|> (#1766 )	2024-10-23 10:47:12 -07:00
Yineng Zhang	cbbc82b7b8	Support qwen2 vl model (#1721 ) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: ispobock <ISPObaoke@163.com>	2024-10-19 21:44:38 -07:00
Lianmin Zheng	fb2d0680e0	[Fix] Fix clean_up_tokenization_spaces in tokenizer (#1510 )	2024-09-24 21:37:33 -07:00
Lianmin Zheng	3a6e8b6d78	[Minor] move triton attention kernels into a separate folder (#1379 )	2024-09-10 15:15:08 -07:00
Jani Monoses	474317f2b6	Support Phi3 mini and medium (#1299 )	2024-09-02 21:49:40 -07:00
Kai-Hsun Chen	0836055324	[Chore] Rename model_overide_args to model_override_args (#1284 ) Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-09-01 03:14:56 -07:00
Lianmin Zheng	79ece2c51f	Report median instead of mean in bench_latency.py (#1269 )	2024-08-30 06:05:01 -07:00
김종곤	b7f8341014	EXAONE 3.0 Model Support (#1258 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-30 08:08:28 +00:00
Lianmin Zheng	bf53bf5142	[Fix] Fix llava on multi images (#1247 )	2024-08-28 06:33:05 -07:00
Lianmin Zheng	902278008a	[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208 )	2024-08-25 14:46:34 -07:00
Lianmin Zheng	bea2bb9eea	Improve multi-node stability (#1171 )	2024-08-20 22:35:05 -07:00
Lianmin Zheng	a8ae640328	Improve docs and warnings (#1164 )	2024-08-20 08:31:29 -07:00
Lianmin Zheng	3c1f5a9220	Fix duplicated imports in hf_transformers_utils.py (#1141 )	2024-08-17 18:03:00 -07:00
Lianmin Zheng	57d0bd91ec	Improve benchmark (#1140 )	2024-08-17 17:43:23 -07:00

1 2

66 Commits