sglang

Author	SHA1	Message	Date
HAI	0beea4503f	ROCm: Flex Attention Enablement with custom backends (#4178 ) Co-authored-by: linsun12 <linsun12@amd.com>	2025-03-07 04:38:53 -08:00
Lianmin Zheng	98c73d71cb	[Minor] make the `__init__` function of model_runner.py shorter (#4132 )	2025-03-06 01:51:12 -08:00
Zhiqiang Xie	aee30630d8	Add a pointer to the real KV cache pool (#4113 )	2025-03-05 21:39:07 -08:00
Ying Sheng	d3d4d76758	[Eagle] Refactor eagle speculative decoding (#3986 ) Co-authored-by: Ke Bao <ISPObaoke@163.com>	2025-03-05 08:06:07 -08:00
Lianmin Zheng	e074d84e5b	[Minor] more code cleanup (#4077 )	2025-03-04 21:23:47 -08:00
Chen Shengzhi	61261b3996	[XCCL] Use xccl for xpu backend since xccl is ready in latest PyTorch. (#3954 )	2025-03-04 04:05:56 -08:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Baizhou Zhang	90a4b7d98a	[Feature]Support ragged prefill in flashinfer mla backend (#3967 ) Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-28 18:13:56 -08:00
fzyzcjy	e3e0bc50a9	[Feature] SPMD for SGLang + Verl (#3852 )	2025-02-28 09:53:10 -08:00
Shenggui Li	c0bb9eb3b3	[improve] made timeout configurable (#3803 )	2025-02-25 00:26:08 -08:00
Baizhou Zhang	b110084654	Refactor flashinfer logic for deepseek v3 and fix accuracy bug (#3785 )	2025-02-24 04:07:25 -08:00
Yineng Zhang	714f3e6362	feat: support flashinfer mla with prefix cache (#3643 )	2025-02-18 02:06:43 +08:00
Yineng Zhang	70f894b810	feat: support flashinfer mla attention for deepseek v3 (#3550 )	2025-02-14 08:50:14 +08:00
Baizhou Zhang	70817a7eae	[Feature] Define backends and add Triton backend for Lora (#3161 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com>	2025-02-03 22:09:13 -08:00
Yineng Zhang	013021b6a1	refactor EAGLE 2 (#3269 ) Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: merrymercy <lianminzheng@gmail.com> Co-authored-by: Ying1123 <sqy1415@gmail.com>	2025-02-03 20:52:30 +08:00
Lianmin Zheng	53cef81587	Improve weight loading and code style (#3174 )	2025-01-27 03:00:41 -08:00
Ke Wen	862bcff833	Support loading of larger models with on-the-fly quantization (#3061 )	2025-01-22 21:33:17 -08:00
Lianmin Zheng	89cd923581	Roll back to use vllm custom allreduce (#3006 )	2025-01-20 04:03:15 -08:00
Lianmin Zheng	7906d1d298	Remove the unused write_with_records (#2972 )	2025-01-18 20:20:23 -08:00
Mick	3d93f84a00	[Feature] Support minicpmv v2.6 (#2785 ) Co-authored-by: Chayenne <zhaochen20@outlook.com> Co-authored-by: yizhang2077 <1109276519@qq.com>	2025-01-18 14:14:19 -08:00
Yineng Zhang	5dc54f1a62	feat: remove vllm distributed (#2907 ) Co-authored-by: Zhangyi <1109276519@qq.com>	2025-01-17 22:31:51 +08:00
Chunyuan WU	63051738a9	Enable CPU device on SGLang (#2806 )	2025-01-16 21:22:53 -08:00
Lianmin Zheng	bc6915e3b9	Improve type annotation and styles (#2926 )	2025-01-16 12:51:11 -08:00
Lianmin Zheng	8b6ce52e92	Support multi-node DP attention (#2925 ) Co-authored-by: dhou-xai <dhou@x.ai>	2025-01-16 11:15:00 -08:00
Lianmin Zheng	46d4431889	Add a new api configure_logging to allow dumping the requests (#2875 )	2025-01-13 14:24:00 -08:00
fzyzcjy	923f518337	CUDA-graph-compatible releasing and resuming KV cache and model weight memory (#2630 )	2025-01-13 11:38:51 -08:00
bjmsong	0bb0f76311	Support FP8 E4M3 KV Cache (#2786 ) Co-authored-by: root <bjmsong@126.com>	2025-01-12 21:17:11 -08:00
Chang Su	f290bd4332	[Bugfix] Fix embedding model hangs with `--enable-metrics` (#2822 )	2025-01-10 13:14:51 -08:00
Lianmin Zheng	8a6906127a	Improve linear.py to load sharded weights & remove the dependency of Parameters from vllm (#2784 ) Co-authored-by: SangBin Cho rkooo567@gmail.com	2025-01-07 23:29:10 -08:00
Lianmin Zheng	bdc1acf6cd	Misc fix for min_p_sampling, --cuda-graph-bs (#2761 )	2025-01-07 02:52:53 -08:00
Lianmin Zheng	9dec582dab	Remove --modelopt-config in server_args (#2758 )	2025-01-06 16:35:45 -08:00
Zhiyu	287427e2e6	Enable Nvidia's ModelOpt fp8 quantized models (#2535 )	2025-01-06 14:54:52 -08:00
Lianmin Zheng	ad20b7957e	Eagle speculative decoding part 3: small modifications to the general scheduler (#2709 ) Co-authored-by: kavioyu <kavioyu@tencent.com>	2025-01-02 02:09:08 -08:00
fzyzcjy	9183c23eca	Speed up `update_weights_from_tensor` (#2695 )	2025-01-02 02:05:19 -08:00
Lianmin Zheng	21ec66e59e	Minor follow-up fixes for the logprob refactor (#2670 )	2024-12-30 05:42:08 -08:00
Lianmin Zheng	9c6ba2484f	Refactor logprob computation to return the real logprob used in sampling (#2664 )	2024-12-30 04:51:38 -08:00
fzyzcjy	fd28640dc5	Add `update_weights_from_tensor` (#2631 )	2024-12-28 13:30:27 -08:00
HandH1998	6e5305158c	update sgl_moe_align_block_size usage (#2617 )	2024-12-28 00:01:13 +08:00
Yineng Zhang	60e2fdcf4f	use sgl-kernel moe_align_block_size (#2581 ) Co-authored-by: ispobock <ispobaoke@163.com> Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-26 06:29:08 +08:00
Lianmin Zheng	bd6196163e	Small fix for the order of apply_torchao_config (#2495 )	2024-12-16 19:21:11 -08:00
Lianmin Zheng	ba36b5520a	Revert "Small fixes for torchao quant" (#2493 )	2024-12-16 15:04:16 -08:00
Jerry Zhang	82699474fd	Small fixes for torchao quant (#2476 )	2024-12-16 14:08:12 -08:00
Ying Sheng	8586b72da0	[feat] Enable chunked prefill for llava-onevision (#2412 )	2024-12-09 09:52:38 -08:00
Lianmin Zheng	641b7d0ae0	[Minor] Improve code style (#2422 )	2024-12-09 06:30:35 -08:00
Lianmin Zheng	0ce091a82d	[Minor] Improve code style (#2419 )	2024-12-09 03:05:59 -08:00
xiaobochen	3d32e4a32c	Resubmit MoE-EP (#2371 )	2024-12-06 15:05:21 +08:00
Lianmin Zheng	2b0fc5941d	[Minor] Code style improvements (#2355 )	2024-12-04 19:02:08 -08:00
Jerry Zhang	9cc733b38c	move apply_torchao_config_ to model_runner (#2342 )	2024-12-04 17:26:42 -08:00
Lianmin Zheng	07ec07ad1f	Improve torch compile for fused moe (#2327 )	2024-12-03 01:58:25 -08:00
Ying Sheng	aa47f64223	Revert "[feat] Enable chunked prefill for llava-onevision" (#2329 )	2024-12-02 23:11:13 -08:00

1 2 3 4

169 Commits