sglang

Author	SHA1	Message	Date
fzyzcjy	fa2f677e18	Fix torch memory saver not enabled in DP scenario (#5560 )	2025-04-20 14:20:52 -07:00
Xiaoyu Zhang	d58e354472	simplify the control logic for using shared experts fusion (#5504 )	2025-04-19 13:17:35 -07:00
fzyzcjy	f6a71139a8	Make profiler output file names consistent (#5548 )	2025-04-18 22:57:11 -07:00
fzyzcjy	53dcf38876	Introduce moe_dense_tp_size to fix dense layer errors in DeepSeek V3 + 4x8xH100 (#4836 )	2025-04-17 21:38:26 -07:00
Baizhou Zhang	6fb29ffd9e	Deprecate enable-flashinfer-mla and enable-flashmla (#5480 )	2025-04-17 01:43:33 -07:00
Baizhou Zhang	4fb05583ef	Deprecate disable-mla (#5481 )	2025-04-17 01:43:14 -07:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
Cheng Wan	6aca583420	Fix several minor issues in PD disaggregation (#5444 )	2025-04-15 23:04:41 -07:00
Baizhou Zhang	a42736bbb8	Support MHA with chunked prefix cache for DeepSeek chunked prefill (#5113 )	2025-04-15 22:01:22 -07:00
ybyang	dd83e7e9c3	[Bug fix] need record start time in pd mode (#5425 )	2025-04-16 10:11:16 +08:00
shangmingc	ffde65a094	[PD] Fix dynamic port support and MLA buffer for Mooncake (#5415 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: ybyang <ybyang7@iflytek.com>	2025-04-15 19:29:31 +08:00
Byron Hsu	a9499885e9	[PD] Add transfer backend abstraction (#5328 )	2025-04-14 01:39:39 +08:00
Liangsheng Yin	f765579046	Fix typo: infight -> inflight (#5357 )	2025-04-14 01:25:30 +08:00
tianlian yi	bc92107b03	Support server based rollout in Verlengine (#4848 ) Co-authored-by: Jin Pan <jpan236@wisc.edu> Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>	2025-04-12 10:07:52 -07:00
Mick	34ef6c8135	[VLM] Adopt fast image processor by default (#5065 )	2025-04-11 21:46:58 -07:00
Mick	e53a0b3d5b	[fix] fix mrope positions not picked up (#5265 )	2025-04-11 01:29:45 -07:00
Cheng Wan	038bc5d521	Support `--enable-llama4-multimodal` (#5254 )	2025-04-11 01:24:14 -07:00
Ke Bao	1078396f47	Update deps for mllama4 (#5215 )	2025-04-10 09:12:44 -07:00
Teng Ma	4c31ae9f6d	[PD] Support KV transfer with mooncake (#4880 ) Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com> Co-authored-by: Xuchun Shang <xuchun.shang@linux.alibaba.com> Co-authored-by: shangmingc <csmthu@gmail.com>	2025-04-10 14:23:23 +08:00
Stefan He	5db37c8626	[metrics] Add in queue metrics (#4444 )	2025-04-09 17:19:27 -07:00
Mick	fbebcb7aa4	model: support mllama4 (#5144 )	2025-04-09 09:28:44 -07:00
fzyzcjy	61970b08d8	Let `bench_one_batch` support `enable_dp_attention` (#4058 )	2025-04-08 23:44:25 -07:00
fzyzcjy	466899e69c	Fix multimodal hashing error (#5174 )	2025-04-08 18:42:26 -07:00
XinyuanTong	d09a51f1f6	[feat&refactor] Enhance multimodal input support with refactor io_struct (#4938 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-04-08 14:48:07 -07:00
huangtingwei	27f8e6b9c1	fix multimodal hash feature (#5083 )	2025-04-07 22:43:23 -07:00
mlmz	7c5658c189	feat: disable grammar restrictions within reasoning sections (#4984 ) Co-authored-by: tianhaoyu <thy@mail.ecust.edu.cn> Co-authored-by: DarkSharpness <2040703891@qq.com>	2025-04-07 21:46:47 -07:00
Chang Su	f04c80dc42	Add Llama4 support (#5092 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: ispobock <ispobaoke@163.com>	2025-04-07 00:29:36 -07:00
Baizhou Zhang	efbae697b3	[Revision] Replace enable_flashinfer_mla argument with attention_backend (#5052 )	2025-04-05 01:23:02 -07:00
Xiaoyu Zhang	924ca7c92c	Add DeepSeek V3/R1 shared experts fusion (#4918 )	2025-04-04 01:59:29 -07:00
Ravi Theja	69df9761dd	Add LlavaLlamaForCausaLM in MultiModal Processors (#5039 ) Co-authored-by: Ravi Theja Desetty <ravitheja@Ravis-MacBook-Pro.local>	2025-04-03 15:41:12 -07:00
Lianmin Zheng	74885a848b	Revert "Replace enable_flashinfer_mla argument with attention_backend" (#5048 )	2025-04-03 13:30:56 -07:00
Baizhou Zhang	e8999b13b7	Replace enable_flashinfer_mla argument with attention_backend (#5005 )	2025-04-03 02:53:58 -07:00
Kaiyu Yang	31da75abed	Update tokenizer_manager.py (#5008 )	2025-04-02 13:56:19 -07:00
Zhiqiang Xie	e119f04215	Large page size aligned hierarchical caching (#4581 )	2025-04-01 22:38:15 -07:00
XinyuanTong	9eb49e878b	[VLM RLHF] Take Image input for verl vlm rollout (#4915 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com> Co-authored-by: GeLee <leege233@gmail.com>	2025-04-01 20:03:17 -07:00
Zhiqiang Xie	12047f5e94	Prevent memory leak of retract_decode when page_size > 1 (#4977 )	2025-04-01 15:30:45 -07:00
Jinyan Chen	23c764b18a	[Feature] Support DeepEP Low Latency (#4767 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: laixinn <xielx@shanghaitech.edu.cn> Co-authored-by: ch-wan <cwan39@gatech.edu>	2025-04-01 09:23:25 -07:00
Mick	5cb552b1d4	refactor: multimodal data (#4754 )	2025-03-31 09:57:51 -07:00
Zhiqiang Xie	a169b9f813	Fix oom error for large page size (#4913 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-03-30 21:34:21 -07:00
Baizhou Zhang	e62d60fe6d	[Fix] avoid stream sync and torch compile in prefill for fa3 backend (#4932 )	2025-03-30 13:53:44 -07:00
Lianmin Zheng	4ede6770cd	Fix retract for page size > 1 (#4914 )	2025-03-30 02:57:15 -07:00
Lianmin Zheng	b26bc86b36	Support page size > 1 + eagle (#4908 )	2025-03-30 00:46:23 -07:00
fzyzcjy	b1cfb4e972	Fix BadRequestError wrong arguments and remove openai dependency (#4882 )	2025-03-29 08:16:21 -07:00
Fr4nk1in	c483377ed7	Fix wrong variable name when stopping memory profile (#4772 )	2025-03-28 10:35:02 -07:00
Lianmin Zheng	74e0ac1dbd	Clean up `import vllm` in quantization/__init__.py (#4834 )	2025-03-28 10:34:10 -07:00
fzyzcjy	8c04f0f2e1	Support with_stack and record_shapes in profiler (#4740 ) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>	2025-03-27 23:01:42 -07:00
fzyzcjy	265e756494	Super tiny remove unused code (#4750 )	2025-03-27 22:32:14 -07:00
fzyzcjy	53a2c3b466	Support controlling nsys start and end range programmatically (#4688 )	2025-03-27 22:21:13 -07:00
XinyuanTong	42a45df043	[Fix] `self.worker` assignment in `TpModelWorker` and refactor references (#4788 ) Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>	2025-03-27 20:28:38 -07:00
tarinkk	7f19e083c1	Support (1 <= dp < tp) in the dp attention in DeepEP (#4770 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu>	2025-03-27 17:09:35 -07:00

... 3 4 5 6 7 ...

889 Commits