sglang

Author	SHA1	Message	Date
Lianmin Zheng	a0e58740a8	Use an env var SGLANG_SET_CPU_AFFINITY to set cpu affinity; turn it off by default (#2217 )	2024-11-27 01:13:41 -08:00
HAI	10189d08dd	[Performance]: Process affinity to CPU cores with multiple sockets support (#2171 )	2024-11-25 14:57:32 -08:00
Lianmin Zheng	8e1adb8441	Allow overwrite flashinfer use_tensorcore (#2169 )	2024-11-24 20:58:17 -08:00
Yineng Zhang	e3938b2f9c	feat: update other MoE models deps (#2156 )	2024-11-24 21:36:34 +08:00
Yineng Zhang	b509db5832	feat: remove the dependency on FusedMoE (#2153 )	2024-11-24 20:09:27 +08:00
Jani Monoses	d98fa1e93d	Add simple CPU offloading support. (#2081 )	2024-11-23 06:23:53 +00:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Yineng Zhang	766192610e	feat: update torch 2.5.1 (#2069 )	2024-11-18 21:29:13 +08:00
Lianmin Zheng	df7fe4521a	Crash the CI jobs on model import errors (#2072 )	2024-11-17 22:18:11 -08:00
Lianmin Zheng	11f881d173	Deprecate --disable-flashinfer and --disable-flashinfer-sampling (#2065 )	2024-11-17 16:20:58 -08:00
Lianmin Zheng	38625e2139	Remove monkey_patch_vllm_dummy_weight_loader (#2064 )	2024-11-17 15:48:12 -08:00
Lianmin Zheng	c1f401fc58	Revert "chore: update torch v2.5.1" (#2063 )	2024-11-17 15:29:38 -08:00
Yineng Zhang	3b878863f7	chore: update torch v2.5.1 (#1849 )	2024-11-18 00:06:00 +08:00
Lianmin Zheng	f719d9aebc	Launch dp ranks in parallel (#2053 ) Co-authored-by: Haotian Liu <6631389+haotian-liu@users.noreply.github.com>	2024-11-16 17:39:39 -08:00
HAI	2ffe0a7363	Add get_amdgpu_memory_capacity() (#2049 )	2024-11-15 22:51:48 -08:00
Lianmin Zheng	b01df48cf2	[Fix] Adjust default chunked prefill size and cuda graph max bs according to GPU memory capacity (#2044 )	2024-11-15 06:21:57 -08:00
Lianmin Zheng	1929c06762	Simplify prometheus metrics (#1981 ) Co-authored-by: Mohit Reddy <mohitreddy1996@users.noreply.github.com>	2024-11-10 04:39:32 -08:00
Lianmin Zheng	9c939a3d8b	Clean up metrics code (#1972 )	2024-11-09 15:43:20 -08:00
Lianmin Zheng	a509552087	[minor] Improve code style and compatibility (#1961 )	2024-11-08 02:19:41 -08:00
Lianmin Zheng	0abbf289a8	Unify the model type checking (#1905 )	2024-11-03 12:25:39 -08:00
Lianmin Zheng	86fc0d79d0	Add a watch dog thread (#1816 )	2024-10-27 02:00:50 -07:00
Liangsheng Yin	a628dd8e31	Set `ZMQ` buffer size heuristic (#1801 )	2024-10-25 23:15:56 -07:00
Liangsheng Yin	1e8903414a	Fix possible ZMQ hanging (#1800 )	2024-10-25 23:07:07 -07:00
Liangsheng Yin	94cde10920	Llama3.2 vision model support (#1551 )	2024-10-21 15:01:21 -07:00
Yineng Zhang	cbbc82b7b8	Support qwen2 vl model (#1721 ) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: ispobock <ISPObaoke@163.com>	2024-10-19 21:44:38 -07:00
Yineng Zhang	8bee20f80b	Update vllm to 0.6.3 (#1711 ) (#1720 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2024-10-19 20:45:41 -07:00
Zeng Zhongchao	2782132be8	Add date to logging messages (#1623 ) (#1679 )	2024-10-16 18:54:55 -07:00
Michael Feil	b0facb3316	add orjson for jsonresponse (#1688 )	2024-10-16 18:14:30 -07:00
Lianmin Zheng	9116b2896f	Add a new event loop (#1677 )	2024-10-16 01:33:20 -07:00
Ying Sheng	4876117171	[Fix] fix eos trim inconsistency (#1650 )	2024-10-13 01:07:09 -07:00
Zhang, Liangang	8275049ce3	Add device support (#1607 )	2024-10-11 02:05:58 -07:00
Ying Sheng	c5325aba75	[Profile] Add pytorch profiler (#1604 )	2024-10-07 14:37:16 -07:00
Lianmin Zheng	ebbc42d989	Optimize broadcast & Reorg code (#1598 )	2024-10-07 13:19:23 -07:00
Lianmin Zheng	6a5b352aaf	Use is_flashinfer_available to replace is_hip for flashinfer check (#1596 ) Co-authored-by: Zhang Liangang <liangang.zhang@intel.com>	2024-10-06 22:54:05 -07:00
Lianmin Zheng	b6aad70ab1	[Fix] Fix the case where prompt_len = 0 (#1593 )	2024-10-06 20:30:02 -07:00
Lianmin Zheng	9244f27f0a	[Minor] Improve the style and fix flaky tests (#1584 )	2024-10-06 00:10:48 -07:00
Lianmin Zheng	114bbc8651	Use ipc instead of tcp in zmq (#1566 )	2024-10-04 00:45:52 -07:00
Xinyu Yang	acaffd233f	[Fix] fix ipv6 url when warm up model (#1537 )	2024-09-29 11:02:40 -07:00
Lianmin Zheng	048685430d	Improve process creation (#1534 )	2024-09-29 02:36:12 -07:00
Ying Sheng	9aa6553d2a	[Feature] Support reward model LxzGordon/URM-LLaMa-3.1-8B (#1525 )	2024-09-27 23:32:11 -07:00
Yineng Zhang	b4408b0d16	feat: update linear deps 1/N (#1305 )	2024-09-19 20:53:11 +08:00
HAI	3a6e04185b	[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420 )	2024-09-17 07:43:52 +00:00
Lianmin Zheng	27b557aea7	Clean up model loader (#1440 )	2024-09-16 18:16:27 -07:00
Ying Sheng	712216928f	[Feature] Initial support for multi-LoRA serving (#1307 )	2024-09-12 16:46:14 -07:00
lxww302	a362340b33	fix: multimodal_config in monkey_patch_vllm_dummy_weight_loader (#1260 )	2024-08-30 16:43:41 +10:00
Lianmin Zheng	bf53bf5142	[Fix] Fix llava on multi images (#1247 )	2024-08-28 06:33:05 -07:00
Lianmin Zheng	902278008a	[Minor] Improve the function organization in TokenizerManager & improve loggers (#1208 )	2024-08-25 14:46:34 -07:00
Chayenne	30b4f771b0	Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model (#1186 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2024-08-25 10:29:12 -07:00
Lianmin Zheng	f6af3a6561	Cleanup readme, llava examples, usage examples and nccl init (#1194 )	2024-08-24 08:02:23 -07:00
Lianmin Zheng	bea2bb9eea	Improve multi-node stability (#1171 )	2024-08-20 22:35:05 -07:00

... 3 4 5 6 7

306 Commits