sglang

Author	SHA1	Message	Date
Lianmin Zheng	4936be8acc	Revert "Revert "[FEAT] Support GGUF format"" (#2287 )	2024-11-30 22:14:48 -08:00
Lianmin Zheng	7e4c6dd8da	Revert "[FEAT] Support GGUF format" (#2285 )	2024-11-30 19:03:26 -08:00
Yang Zheng	883c955489	[FEAT] Support GGUF format (#2215 ) Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>	2024-11-30 00:44:48 -08:00
Ying Sheng	8b48496aaf	Revert "Revert "Add simple CPU offloading support"" (#2253 ) Co-authored-by: Jani Monoses <jani.monoses@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-28 23:58:54 -08:00
Ying Sheng	4057ea82c9	Revert "Add simple CPU offloading support" (#2252 ) We'll re-add the commit to correctly ack Kaichao's authorship	2024-11-28 23:36:55 -08:00
Lianmin Zheng	d4fc1a70e3	Crash the server correctly during error (#2231 )	2024-11-28 00:22:39 -08:00
Lianmin Zheng	fed4c6946a	Release v0.3.6.post2 (#2214 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2024-11-27 03:35:30 -08:00
Lianmin Zheng	fb6e04a0c2	Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2222 )	2024-11-27 02:52:46 -08:00
Lianmin Zheng	6997e28f6e	Revert "Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default" (#2221 )	2024-11-27 02:02:01 -08:00
Lianmin Zheng	a0e58740a8	Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2217 )	2024-11-27 01:13:41 -08:00
HAI	10189d08dd	[Performance]: Process affinity to CPU cores with multiple sockets support (#2171 )	2024-11-25 14:57:32 -08:00
Lianmin Zheng	8e1adb8441	Allow overwrite flashinfer use_tensorcore (#2169 )	2024-11-24 20:58:17 -08:00
Yineng Zhang	e3938b2f9c	feat: update other MoE models deps (#2156 )	2024-11-24 21:36:34 +08:00
Yineng Zhang	b509db5832	feat: remove the dependency on FusedMoE (#2153 )	2024-11-24 20:09:27 +08:00
Jani Monoses	d98fa1e93d	Add simple CPU offloading support. (#2081 )	2024-11-23 06:23:53 +00:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Yineng Zhang	766192610e	feat: update torch 2.5.1 (#2069 )	2024-11-18 21:29:13 +08:00
Lianmin Zheng	df7fe4521a	Crash the CI jobs on model import errors (#2072 )	2024-11-17 22:18:11 -08:00
Lianmin Zheng	11f881d173	Deprecate --disable-flashinfer and --disable-flashinfer-sampling (#2065 )	2024-11-17 16:20:58 -08:00
Lianmin Zheng	38625e2139	Remove monkey_patch_vllm_dummy_weight_loader (#2064 )	2024-11-17 15:48:12 -08:00
Lianmin Zheng	c1f401fc58	Revert "chore: update torch v2.5.1" (#2063 )	2024-11-17 15:29:38 -08:00
Yineng Zhang	3b878863f7	chore: update torch v2.5.1 (#1849 )	2024-11-18 00:06:00 +08:00
Lianmin Zheng	f719d9aebc	Launch dp ranks in parallel (#2053 ) Co-authored-by: Haotian Liu <6631389+haotian-liu@users.noreply.github.com>	2024-11-16 17:39:39 -08:00
HAI	2ffe0a7363	Add get_amdgpu_memory_capacity() (#2049 )	2024-11-15 22:51:48 -08:00
Lianmin Zheng	b01df48cf2	[Fix] Adjust default chunked prefill size and cuda graph max bs according to GPU memory capacity (#2044 )	2024-11-15 06:21:57 -08:00
Lianmin Zheng	1929c06762	Simplify prometheus metrics (#1981 ) Co-authored-by: Mohit Reddy <mohitreddy1996@users.noreply.github.com>	2024-11-10 04:39:32 -08:00
Lianmin Zheng	9c939a3d8b	Clean up metrics code (#1972 )	2024-11-09 15:43:20 -08:00
Lianmin Zheng	a509552087	[minor] Improve code style and compatibility (#1961 )	2024-11-08 02:19:41 -08:00
Lianmin Zheng	0abbf289a8	Unify the model type checking (#1905 )	2024-11-03 12:25:39 -08:00
Lianmin Zheng	86fc0d79d0	Add a watch dog thread (#1816 )	2024-10-27 02:00:50 -07:00
Liangsheng Yin	a628dd8e31	Set `ZMQ` buffer size heuristic (#1801 )	2024-10-25 23:15:56 -07:00
Liangsheng Yin	1e8903414a	Fix possible ZMQ hanging (#1800 )	2024-10-25 23:07:07 -07:00
Liangsheng Yin	94cde10920	Llama3.2 vision model support (#1551 )	2024-10-21 15:01:21 -07:00
Yineng Zhang	cbbc82b7b8	Support qwen2 vl model (#1721 ) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: ispobock <ISPObaoke@163.com>	2024-10-19 21:44:38 -07:00
Yineng Zhang	8bee20f80b	Update vllm to 0.6.3 (#1711 ) (#1720 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2024-10-19 20:45:41 -07:00
Zeng Zhongchao	2782132be8	Add date to logging messages (#1623 ) (#1679 )	2024-10-16 18:54:55 -07:00
Michael Feil	b0facb3316	add orjson for jsonresponse (#1688 )	2024-10-16 18:14:30 -07:00
Lianmin Zheng	9116b2896f	Add a new event loop (#1677 )	2024-10-16 01:33:20 -07:00
Ying Sheng	4876117171	[Fix] fix eos trim inconsistency (#1650 )	2024-10-13 01:07:09 -07:00
Zhang, Liangang	8275049ce3	Add device support (#1607 )	2024-10-11 02:05:58 -07:00
Ying Sheng	c5325aba75	[Profile] Add pytorch profiler (#1604 )	2024-10-07 14:37:16 -07:00
Lianmin Zheng	ebbc42d989	Optimize broadcast & Reorg code (#1598 )	2024-10-07 13:19:23 -07:00
Lianmin Zheng	6a5b352aaf	Use is_flashinfer_available to replace is_hip for flashinfer check (#1596 ) Co-authored-by: Zhang Liangang <liangang.zhang@intel.com>	2024-10-06 22:54:05 -07:00
Lianmin Zheng	b6aad70ab1	[Fix] Fix the case where prompt_len = 0 (#1593 )	2024-10-06 20:30:02 -07:00
Lianmin Zheng	9244f27f0a	[Minor] Improve the style and fix flaky tests (#1584 )	2024-10-06 00:10:48 -07:00
Lianmin Zheng	114bbc8651	Use ipc instead of tcp in zmq (#1566 )	2024-10-04 00:45:52 -07:00
Xinyu Yang	acaffd233f	[Fix] fix ipv6 url when warm up model (#1537 )	2024-09-29 11:02:40 -07:00
Lianmin Zheng	048685430d	Improve process creation (#1534 )	2024-09-29 02:36:12 -07:00
Ying Sheng	9aa6553d2a	[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525 )	2024-09-27 23:32:11 -07:00
Yineng Zhang	b4408b0d16	feat: update linear deps 1/N (#1305 )	2024-09-19 20:53:11 +08:00

1 2 3

115 Commits