sglang

Author	SHA1	Message	Date
Lianmin Zheng	048685430d	Improve process creation (#1534 )	2024-09-29 02:36:12 -07:00
Lianmin Zheng	4e4459b91f	Multiple minor fixes (#1530 )	2024-09-28 14:43:35 -07:00
Kylin	f42e9bfb52	[bugfix] Add modelscope package to avoid docker image without modelscope (#1520 )	2024-09-28 12:43:22 -07:00
Ying Sheng	b1e330bcb0	[Event] Update meeting link (#1529 )	2024-09-27 13:30:04 -07:00
Ying Sheng	37c5899fc2	Release v0.3.2 (#1512 )	2024-09-25 14:17:09 +08:00
TianyiQ	3c93187caf	Add support for tie_word_embeddings when loading weights + support for SmolLM (#1508 )	2024-09-24 21:50:20 -07:00
Lianmin Zheng	167591e864	Better unit tests for adding a new model (#1488 )	2024-09-22 01:50:37 -07:00
Yineng Zhang	82136eb0b5	chore: bump v0.3.1.post3 (#1483 )	2024-09-21 11:17:45 +08:00
Niklas Muennighoff	014982b5e0	Add OLMoE (#1476 )	2024-09-20 10:32:49 +08:00
Lianmin Zheng	5ce55aee15	Release v0.3.1.post2 (#1470 )	2024-09-19 02:03:38 -07:00
Lianmin Zheng	2d346a57c2	Fix padding in the cuda graph (#1469 )	2024-09-19 01:52:15 -07:00
Ying Sheng	8f527e2940	[Event] Add public meeting invite to README (#1458 )	2024-09-18 23:53:22 +08:00
Ke Bao	c6b6d2e71b	Enable MLA by default (#1447 )	2024-09-17 11:42:48 +00:00
Lianmin Zheng	90a26be31c	Release 0.3.1.post1 (#1445 )	2024-09-17 01:47:31 -07:00
Lianmin Zheng	e79f6cd73d	Release v0.3.1 (#1430 )	2024-09-15 23:03:16 +09:00
Lianmin Zheng	9463bc1385	Enable torch.compile for triton backend (#1422 )	2024-09-14 15:38:37 -07:00
hxer7963	c33d82a211	Add Support for XVERSE Models (Dense and MoE) to sglang (#1397 ) Co-authored-by: will he <hexin@xverse.cn> Co-authored-by: root <root@localhost.localdomain> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2024-09-12 01:47:52 -07:00
William	2a71be5e25	Fix README format (#1399 )	2024-09-11 23:46:51 -07:00
Vectory	224200e3c2	BaiChuan2 Model (#1367 ) Co-authored-by: wanpenghan <wanpenghan@sohu-inc.com>	2024-09-11 03:55:24 -07:00
Lianmin Zheng	46094e0c1b	Deprecate --disable-flashinfer and introduce --attention-backend (#1380 )	2024-09-10 17:11:16 -07:00
William	e72275cf7f	Support MiniCPM3 (#1371 )	2024-09-10 19:57:52 +10:00
Lianmin Zheng	8d1095dbf0	[Docs] Improve documentations (#1368 )	2024-09-09 20:48:28 -07:00
Yineng Zhang	5ab9418f5b	[Doc] update news (#1327 )	2024-09-04 04:21:21 -07:00
Yineng Zhang	a63c8275c6	chore: bump v0.3.0 (#1320 )	2024-09-04 04:32:18 +08:00
Lianmin Zheng	c500f96bb1	Update README.md for llava-onevision instructions (#1313 )	2024-09-03 01:43:08 -07:00
Lianmin Zheng	f64eae3a29	[Fix] Reduce memory usage for loading llava model & Remove EntryClassRemapping (#1308 )	2024-09-02 21:44:45 -07:00
Lianmin Zheng	9999442756	Release v0.2.15 (#1295 )	2024-09-01 22:22:38 -07:00
Byron Hsu	4a9f8ea43b	[doc] Fix more broken links (#1294 )	2024-09-01 14:46:36 -07:00
Byron Hsu	6cc9c52521	[doc] fix quick start link (#1282 )	2024-08-31 22:54:34 -07:00
Lianmin Zheng	79ece2c51f	Report median instead of mean in bench_latency.py (#1269 )	2024-08-30 06:05:01 -07:00
김종곤	55f5976b42	Update README.md - Supported Models add Exaone 3.0 (#1267 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-08-30 18:49:07 +10:00
Yineng Zhang	13ac95b894	chore: bump v0.2.14.post2 (#1250 )	2024-08-28 18:46:33 +00:00
Lianmin Zheng	bf53bf5142	[Fix] Fix llava on multi images (#1247 )	2024-08-28 06:33:05 -07:00
Yineng Zhang	f25f4dfde5	hotfix: revert sampler CUDA Graph (#1242 )	2024-08-28 21:16:47 +10:00
Lianmin Zheng	184ae1c683	Update README.md (#1239 )	2024-08-28 02:15:52 -07:00
Yineng Zhang	198974cd1a	feat: support sm75 with FlashInfer v0.1.6 (#1233 )	2024-08-28 18:39:12 +10:00
Dr. Artificial曾小健	c8a9e79186	Fix readme (#1236 )	2024-08-27 23:51:41 -07:00
Yineng Zhang	c5fe11a8e1	chore: bump v0.2.14 (#1155 )	2024-08-27 00:28:24 +10:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Lianmin Zheng	b20daf982a	Update README.md (#1198 )	2024-08-24 14:50:05 -07:00
Lianmin Zheng	f6af3a6561	Cleanup readme, llava examples, usage examples and nccl init (#1194 )	2024-08-24 08:02:23 -07:00
Kaichen Zhang - NTU	a5b14ad043	[Feat/WIP] add llava-onevision, with support for (1) siglip encoder, (2) qwen2 decoder (3) openai api compatible server. (#1123 ) Co-authored-by: Bo Li <drluodian@gmail.com>	2024-08-23 14:11:16 -07:00
Zhanghao Wu	ac1b74fa85	[Docs] Fix rendering of details in README (#1179 )	2024-08-22 07:05:33 +08:00
Yineng Zhang	350a81609b	fix: resolve README render (#1166 )	2024-08-21 03:23:52 +10:00
Lianmin Zheng	a8ae640328	Improve docs and warnings (#1164 )	2024-08-20 08:31:29 -07:00
Zhanghao Wu	d8627ed16d	[Docs] Add instruction for running on clouds and kubernetes with SkyPilot (#1144 ) Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>	2024-08-19 14:01:55 +08:00
Yineng Zhang	5bd953749b	chore: bump v0.2.13 (#1111 )	2024-08-16 03:50:43 +10:00
Yineng Zhang	fe5024325b	docs: update README (#1098 )	2024-08-14 04:40:05 -07:00
Lucien	312e849255	Example file for docker compose and k8s (#1006 )	2024-08-13 15:07:57 -07:00
Yineng Zhang	b0ad0c1bc8	chore: bump v0.2.12 (#1048 )	2024-08-12 20:59:38 +10:00

1 2 3 4

165 Commits