sglang

Author	SHA1	Message	Date
Jani Monoses	a5114b6f91	Add OLMo model (#1676 )	2024-10-16 00:11:18 -07:00
Liangsheng Yin	b6b4094621	Fix filter_batch function call (#1681 )	2024-10-15 22:59:26 -07:00
Lianmin Zheng	f1088e0fc8	Fix memory leak during abort (#1674 )	2024-10-15 08:15:08 -07:00
Lianmin Zheng	175afed370	Improve benchmark scripts (#1672 )	2024-10-14 21:53:01 -07:00
Lianmin Zheng	4a292f670d	[Minor] Add some utility functions (#1671 )	2024-10-14 20:08:03 -07:00
Byron Hsu	56503d9bc9	[1/N] Remove `CacheConfig` import in all model files (#1658 )	2024-10-14 09:06:34 -07:00
Lianmin Zheng	02bc95796d	Simplify chunked prefill (#1667 )	2024-10-14 06:47:50 -07:00
Lianmin Zheng	24f3e1511c	[Minor] Improve style (#1666 )	2024-10-14 05:25:00 -07:00
Lianmin Zheng	6790240cc3	Fix unit test order to balance the tasks in CI (#1665 )	2024-10-14 02:01:44 -07:00
Shuo Yang	061e546313	Support double sparsity (#1459 )	2024-10-14 02:00:41 -07:00
Lianmin Zheng	0c1e87964b	Move filter_batch out of stream_output (#1663 )	2024-10-14 01:15:34 -07:00
Lianmin Zheng	869f1c02c4	Add a test case to test retract (#1662 )	2024-10-13 20:32:37 -07:00
Ying Sheng	2725f8da61	[Minor] Rename no_eos_trim to no_stop_trim (#1661 )	2024-10-13 20:30:03 -07:00
Lianmin Zheng	da1ffed689	Add output_ids into ScheduleBatch (#1659 )	2024-10-13 19:54:02 -07:00
Ying Sheng	4876117171	[Fix] fix eos trim inconsistency (#1650 )	2024-10-13 01:07:09 -07:00
Lianmin Zheng	7ee6c259ff	Simplify the event loop and expose `--num-continuous-decode-steps` as an argument (#1652 )	2024-10-12 21:35:30 -07:00
Lianmin Zheng	9610fcd469	Fix the batch_is_full check for jump-forward decoding (#1654 )	2024-10-12 19:47:24 -07:00
Patrick Yi	31fad29ab0	Add get_tokenizer function for Engine class (#1653 )	2024-10-12 19:39:35 -07:00
Lianmin Zheng	9da5a60b18	Add an option to disable penalizer (#1651 )	2024-10-12 17:53:23 -07:00
Lianmin Zheng	69aa937aa5	Fix unit tests and type annotations (#1648 )	2024-10-12 14:49:24 -07:00
Zhang, Liangang	5d638c92f5	[Feature, Hardware] Enable SGLang on XPU GPUs via PyTorch (#1480 )	2024-10-12 18:10:32 +00:00
Lianmin Zheng	e37cdab0c6	Fix ignore_eos (#1645 )	2024-10-12 00:36:28 -07:00
LI MOU	1d9deeacdb	fix missing ignore_eos in v1/chat/completions (#1642 )	2024-10-11 21:37:20 -07:00
Byron Hsu	862cd265e5	[engine] support async and streaming (#1614 )	2024-10-11 15:26:25 -07:00
Lianmin Zheng	00c7e6368b	Release v0.3.3.post1 (#1636 )	2024-10-11 07:56:16 -07:00
Lianmin Zheng	23cc66f7b6	Add back data parallelism (#1635 )	2024-10-11 07:22:48 -07:00
Lianmin Zheng	5d09ca5735	Fix constrained decoding (#1634 )	2024-10-11 06:26:20 -07:00
Lianmin Zheng	f13d86f920	Add image_token in conversation.py (#1632 ) Co-authored-by: yizhang2077 <1109276519@qq.com>	2024-10-11 05:07:51 -07:00
Lianmin Zheng	aba9eae4c6	Fix the correctness test in bench_latency.py when tp > 1 and test_generation_models.py (#1631 )	2024-10-11 05:03:20 -07:00
科英	bbd72bfc86	Add the ability to enable and disable the Profiler via HTTP API. (#1626 )	2024-10-11 02:34:25 -07:00
Yiding-Lu	b503881bd2	[Bug] Fix the Image Input of Batch Generation (#1579 )	2024-10-11 02:25:04 -07:00
glen-amd	58093b868f	Nit about the decorator of `PortArgs.init_new` (#1611 )	2024-10-11 02:17:47 -07:00
Zhang, Liangang	8275049ce3	Add device support (#1607 )	2024-10-11 02:05:58 -07:00
HAI	e11ab79e68	[Performance, hardware] MoE tuning update to AMD MI300x GPUs (#1619 )	2024-10-10 22:48:15 -07:00
Byron Hsu	01fdb2f377	Fix test_vision_openai_server on CI (#1620 )	2024-10-10 16:34:13 -07:00
Amos You	c996e8ccd4	[Minor] Fix logging typo (#1615 )	2024-10-08 21:11:19 -07:00
Lianmin Zheng	7b69d91b4f	Release v0.3.3 (#1605 )	2024-10-08 12:58:41 -07:00
Byron Hsu	e8613df071	[Engine] Fix generate hanging issue after the first call (#1606 )	2024-10-08 04:26:56 +00:00
Ying Sheng	c5325aba75	[Profile] Add pytorch profiler (#1604 )	2024-10-07 14:37:16 -07:00
Lianmin Zheng	ebbc42d989	Optimize broadcast & Reorg code (#1598 )	2024-10-07 13:19:23 -07:00
Jani Monoses	3ff641132e	Remove references to squeezellm (#1603 )	2024-10-07 11:30:41 -07:00
Lianmin Zheng	2b302b9393	Fix the port_args in bench_latency (#1597 )	2024-10-07 00:44:38 -07:00
Ke Bao	68f8b60d22	Fix chunked prefill condition (#1594 )	2024-10-07 06:34:14 +00:00
Lianmin Zheng	6a5b352aaf	Use is_flashinfer_available to replace is_hip for flashinfer check (#1596 ) Co-authored-by: Zhang Liangang <liangang.zhang@intel.com>	2024-10-06 22:54:05 -07:00
Byron Hsu	565b05f02f	Use `atexit` hook to implicitly shutdown `Runtime` (#1595 )	2024-10-07 05:18:45 +00:00
Lianmin Zheng	b6aad70ab1	[Fix] Fix the case where prompt_len = 0 (#1593 )	2024-10-06 20:30:02 -07:00
Byron Hsu	551a3a9d38	Provide an offline engine API (#1567 )	2024-10-06 20:27:03 -07:00
Lianmin Zheng	91877a9f9c	Fix modality for image inputs (#1592 )	2024-10-06 15:43:32 -07:00
Ying Sheng	c98e84c21e	[Minor, Performance] Use torch.argmax for greedy sampling (#1589 )	2024-10-06 13:15:05 -07:00
Ying Sheng	9c064bf78a	[LoRA, Performance] Speedup multi-LoRA serving - Step 1 (#1587 )	2024-10-06 10:33:44 -07:00

1 2 3 4 5 ...

784 Commits