sglang

Author	SHA1	Message	Date
ryang	a6ae3af15e	Support XiaomiMiMo inference with mtp (#6059 )	2025-05-22 14:14:49 -07:00
HAI	5c0b38f369	aiter attention-backend (default enabled on AMD/ROCm) (#6381 )	2025-05-20 22:52:41 -07:00
Kiv Chen	5380cd7ea3	model(vlm): pixtral (#5084 )	2025-05-13 00:16:10 -07:00
Lianmin Zheng	e8e18dcdcc	Revert "fix some typos" (#6244 )	2025-05-12 12:53:26 -07:00
applesaucethebun	d738ab52f8	fix some typos (#6209 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-13 01:42:38 +08:00
Lifu Huang	6e2da51561	Replace time.time() to time.perf_counter() for benchmarking. (#6178 ) Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>	2025-05-11 14:32:49 -07:00
XinyuanTong	9d8ec2e67e	Fix and Clean up chat-template requirement for VLM (#6114 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-05-11 00:14:09 +08:00
Qiaolin Yu	3042f1da61	Fix flaky issues of lora and add multi batch tests (#5957 )	2025-05-04 13:11:40 -07:00
Qiaolin Yu	7bcd8b1cb2	Fix lora batch processing when input lora_path contains None (#5930 )	2025-04-30 19:42:42 -07:00
Qiaolin Yu	8c0cfca87d	Feat: support cuda graph for LoRA (#4115 ) Co-authored-by: Beichen Ma <mabeichen12@gmail.com>	2025-04-28 23:30:44 -07:00
Lianmin Zheng	849c83a0c0	[CI] test chunked prefill more (#5798 )	2025-04-28 10:57:17 -07:00
DavidBao	d8fbc7c096	[feature] support for roberta embedding models (#5730 )	2025-04-26 18:47:06 -07:00
Ravi Theja	7d9679b74d	Add MMMU benchmark results (#4491 ) Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>	2025-04-25 15:23:53 +08:00
Xiaoyu Zhang	bf86c5e990	restruct compressed_tensors_w8a8_fp8 (#5475 )	2025-04-19 04:52:15 -07:00
woodx	3bface15e6	Feat/support encoder model (like bert) (#4887 )	2025-04-17 01:50:48 -07:00
eigen	8f783c1943	[Model Support] unsloth/Phi-4-mini bnb model (#4982 ) Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-04-16 19:58:20 -07:00
saienduri	7f875f1293	update grok test (#5171 )	2025-04-09 11:09:47 -07:00
fzyzcjy	39efad4fbc	Tiny disable model that does not work (#5175 )	2025-04-08 18:42:37 -07:00
Baizhou Zhang	42873eac09	[Fix] Improve Lora tests and reduce CI runtime (#4925 )	2025-03-30 19:40:14 -07:00
Lianmin Zheng	4ede6770cd	Fix retract for page size > 1 (#4914 )	2025-03-30 02:57:15 -07:00
chaobo jia	ef9a378a20	[Feature] add multi-rank support for Lora (#4492 ) Co-authored-by: rudy152 <czh1137892874@gmail.com>	2025-03-28 09:38:44 -07:00
Qiaolin Yu	9fdc6d6abc	Fix the lora adapter when lora path is none (#4799 ) Co-authored-by: Beichen Ma <mabeichen12@gmail.com>	2025-03-27 21:03:08 -07:00
Pan Lyu	c913ed4046	support clip embedding model (#4506 )	2025-03-27 00:18:15 -07:00
fzyzcjy	15ddd84322	Add retry for flaky tests in CI (#4755 )	2025-03-25 16:53:12 -07:00
Ximingwang-09	22c3702e1e	[Model] Support Qwen2ForSequenceClassification (#4609 ) Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-03-24 19:13:44 -07:00
aoshen524	588865f0e0	[Feature] Support Tensor Parallelism and Weight Slicing for Lora (#4274 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: Baizhou Zhang <sobereddiezhang@gmail.com>	2025-03-18 20:33:07 -07:00
Lianmin Zheng	f0afaf5289	Add a dummy grok test case (#4399 )	2025-03-13 15:29:48 -07:00
Lianmin Zheng	e8a69e4d0c	Clean up fp8 support (#4230 )	2025-03-09 21:46:35 -07:00
Pan Lyu	361971b859	Add Support for Qwen2-VL Multi-modal Embedding Models (#3694 )	2025-03-06 16:46:20 -08:00
Lianmin Zheng	77a3954bf7	Simplify eagle tests and TP sync in grammar backend (#4066 )	2025-03-04 13:40:40 -08:00
fzyzcjy	e3e0bc50a9	[Feature] SPMD for SGLang + Verl (#3852 )	2025-02-28 09:53:10 -08:00
Lianmin Zheng	27a46317b6	Fix dependency (#3813 )	2025-02-24 03:50:58 -08:00
aoshen524	e79f7420be	[Fix] Fix bugs and refactor codes in lora for better scalability. (#3652 ) Co-authored-by: ShenAo1111 <1377693092@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>	2025-02-20 11:51:57 -08:00
Baizhou Zhang	c45cab1c00	[Fix] Fix accuracy bug and refactor codes for lora (#3413 )	2025-02-10 13:29:00 +08:00
Baizhou Zhang	70817a7eae	[Feature] Define backends and add Triton backend for Lora (#3161 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-03 22:09:13 -08:00
Lianmin Zheng	a4331cd260	Add accuracy and latency tests of eagle into CI (#3027 )	2025-01-21 02:55:14 -08:00
Lianmin Zheng	61f42b5732	Move sgl.Runtime under sglang/lang (#2990 )	2025-01-19 17:10:29 -08:00
Ke Bao	d47c5101f1	Add ut for qwen model (#2947 )	2025-01-18 00:03:54 +08:00
Fred Reiss	993956c6b1	Add support for IBM Granite 3.x models (#2437 )	2024-12-11 06:30:23 -08:00
Jani Monoses	db674e3d24	Add OLMo2 model. (#2233 )	2024-11-28 00:15:20 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
James Xu	f6f713797b	Add support for Qwen2-VL-based embedding models (#2055 )	2024-11-21 14:24:25 -08:00
Tanjiro	8c280cee55	add phi-3 small support (#2062 ) Co-authored-by: Tushar Goel <114812108+AI-Tushar@users.noreply.github.com>	2024-11-17 18:47:43 -08:00
Xiaoyu Zhang	eff468dd5a	fix test_embedding_models prompt length too long's bug (#2015 )	2024-11-12 23:21:16 +08:00
Chayenne	c77c1e05ba	fix black in pre-commit (#1940 )	2024-11-08 07:42:47 +08:00
Xuehai Pan	a5e0defb5a	minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926 )	2024-11-06 13:46:04 +00:00
Chayenne	704f8e8ed1	Add Reward API Docs etc (#1910 ) Co-authored-by: Chayenne <zhaochenyang@g.ucla.edu>	2024-11-03 22:33:03 -08:00
Lianmin Zheng	2ce32db6fb	Let reward model take text inputs instead of message lists (#1907 ) Co-authored-by: Kyle Corbitt <kyle@corbt.com>	2024-11-03 13:27:12 -08:00
DanielC12321	5e00ddebc0	Add new model: Gpt2 (#1833 )	2024-10-29 17:52:33 -07:00
Lianmin Zheng	00611286a1	Fix sliding window attention and gemma-2 unit tests in CI (#1746 )	2024-10-21 13:47:12 -07:00

1 2

79 Commits