sglang

Author	SHA1	Message	Date
Mick	9d02bb3e2a	Urgent model support: support gemma-3-it (#4424 )	2025-03-16 17:37:32 -07:00
JieXin Liang	1a3fa75f2f	[Fix] use `torch.cat` instead of `torch.concat` to prevent entering the `Autograd` backends. (#4466 )	2025-03-16 00:02:47 -07:00
Yineng Zhang	65b7c9b78f	cleanup deps 2/n (#4464 )	2025-03-15 23:06:17 -07:00
Michael Feil	1fd0cf8a7b	Update comment in qwen2.py (#4447 )	2025-03-15 21:14:29 -07:00
Lianmin Zheng	c6d7f8d370	Add some fused elementwise kernels for grok-1 (#4398 ) Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <69857889+hanming-lu@users.noreply.github.com>	2025-03-13 13:39:10 -07:00
Lianmin Zheng	8e66fbecee	Improve DP attention (#4390 ) Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: SangBin Cho <rkooo567@gmail.com>	2025-03-13 08:23:56 -07:00
Lianmin Zheng	45de89719c	Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367 )	2025-03-12 23:45:52 -07:00
Meng, Hengyu	71046fcd71	[XPU][CPU] Enable the native path of DeepSeek (#4086 ) Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>	2025-03-12 22:26:29 -07:00
Mick	01090e8ac3	model: Support Janus-pro (#3203 )	2025-03-12 11:02:11 -07:00
yych0745	6f43a9b9f4	remove the unused readline dependency from the Qwen2 model implementa… (#4340 )	2025-03-12 02:47:27 -07:00
lambert0312	481f608b8e	Add INT8 support MTP NextN function (#3911 )	2025-03-12 01:37:16 -07:00
Yineng Zhang	d1da58e275	unify is_cuda and is_hip (#4321 )	2025-03-11 18:12:56 -07:00
Mick	ff2ce0b86f	refactor: move image processors to separate files (#4229 )	2025-03-11 12:35:35 -07:00
shimin	ac69885056	fix the input_ids is None error (#4144 )	2025-03-10 01:38:37 -07:00
DavidChan	4455b26e76	[Bug fixed] fixed the crash when enable the dp-attention on the single card (#3958 )	2025-03-10 00:50:34 -07:00
Baizhou Zhang	9fb48f951f	Support nextn for flashinfer mla attention backend (#4218 )	2025-03-09 00:01:54 -08:00
HandH1998	c7f254468f	[Feature] DeepSeek V3/R1 INT8 Quantization (channel-wise) (#3888 ) Co-authored-by: yych0745 <1398089567@qq.com> Co-authored-by: sleepcoo <sleepcoo@gmail.com> Co-authored-by: b0urnee <2769086541@qq.com>	2025-03-06 20:54:52 -08:00
Qubitium-ModelCloud	56a724eba3	[QUANT] Add GPTQModel Dynamic Quantization + `lm_head` Quantization (#3790 ) Signed-off-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>	2025-03-05 01:11:00 -08:00
Lianmin Zheng	e074d84e5b	[Minor] more code cleanup (#4077 )	2025-03-04 21:23:47 -08:00
Xiuyu Li	9545bfb28a	fix: support gelu_new activation function in gpt2 (#3712 )	2025-03-04 04:09:52 -08:00
Ke Bao	9fafa62db7	Share target model embed and head weights for nextn (#4033 )	2025-03-03 13:30:04 -08:00
Lianmin Zheng	1a8f995c46	remove cache configs in model definitions (#4031 )	2025-03-03 05:00:50 -08:00
Zhousx	7fbab730bd	[feat] add small vocab table for eagle's draft model[1]. (#3822 ) Co-authored-by: Achazwl <323163497@qq.com> Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-02 18:58:45 -08:00
Baizhou Zhang	90a4b7d98a	[Feature]Support ragged prefill in flashinfer mla backend (#3967 ) Co-authored-by: Yineng Zhang <me@zhyncs.com> Co-authored-by: pankajroark <pankajroark@users.noreply.github.com>	2025-02-28 18:13:56 -08:00
fzyzcjy	e3e0bc50a9	[Feature] SPMD for SGLang + Verl (#3852 )	2025-02-28 09:53:10 -08:00
Nicolas Castet	127998cc41	Fix allgather ops inside cuda graphs (#3709 )	2025-02-25 08:39:10 -08:00
Yueyang Pan	7036d6fc67	[Bug]: Add missing clamp to llavavid (#3787 )	2025-02-24 19:10:15 -08:00
Chaitanya Sri Krishna Lolla	6ce9dbe828	[ROCm] Enable Fused MLA Triton kernel for DeepSeekV3 (#3237 ) Co-authored-by: HAI <hixiao@gmail.com>	2025-02-24 18:14:31 -08:00
laixin	1a6e97577a	Feature DeepSeek V3/R1 INT8 Quantization (block-wise) (#3730 ) Co-authored-by: HandH1998 <1335248067@qq.com>	2025-02-24 05:43:35 -08:00
Baizhou Zhang	b110084654	Refactor flashinfer logic for deepseek v3 and fix accuracy bug (#3785 )	2025-02-24 04:07:25 -08:00
fzyzcjy	a3339d8cac	Bug: Fix weight loader error when LM head weights are tied (#3766 )	2025-02-21 17:53:12 -08:00
Chayenne	14d90617b0	Bug: fix lm head weights in Qwen models (#3777 )	2025-02-21 16:49:31 -08:00
fzyzcjy	d37f95511d	Improve: Tiny fix Olmo2 (#3348 )	2025-02-21 16:09:35 -08:00
Zhiyu	c66b2c9cf1	Add support for nvidia modelopt fp8 kv cache (#3223 )	2025-02-22 07:04:58 +08:00
simveit	20b765a26e	Model: Support Qwen 72B RM model. (#3772 )	2025-02-21 14:38:21 -08:00
Mick	424848d26f	fix: remove dependency on latest transformers impl (#3635 )	2025-02-19 01:14:11 +08:00
Yineng Zhang	714f3e6362	feat: support flashinfer mla with prefix cache (#3643 )	2025-02-18 02:06:43 +08:00
Mick	bcc213df61	Model: Support Qwen 2.5 vl (#3258 )	2025-02-16 00:58:53 -08:00
Mick	7711ac6ed0	doc: emphasize and notify the usage of chat_template (#3589 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-02-15 00:10:32 -08:00
Ke Bao	862dd76c76	Support NextN (MTP) speculative decoding for DeepSeek-V3/R1 (#3582 )	2025-02-15 05:28:34 +08:00
Yineng Zhang	70f894b810	feat: support flashinfer mla attention for deepseek v3 (#3550 )	2025-02-14 08:50:14 +08:00
Liangsheng Yin	8616357a97	Fix deepseek awq v3 (#3450 )	2025-02-12 22:09:52 +08:00
Chayenne	40022d075a	Feature: Fix the binding error in Llama (#3355 )	2025-02-06 20:19:24 -08:00
Yineng Zhang	8db776f049	support QuickGELU (#3250 )	2025-02-01 19:31:47 +08:00
Mick	9f635ea50d	[Fix] Address remaining issues of supporting MiniCPMV (#2977 )	2025-01-28 00:22:13 -08:00
Yineng Zhang	2f79f58873	feat: use sgl-kernel 0.0.3 in sglang (#3179 )	2025-01-27 21:39:52 +08:00
Lianmin Zheng	52c03f16b9	Add activation parameters to fused_moe (#3170 )	2025-01-27 00:23:37 -08:00
Hui Liu	8e48ca8cc1	enable kv_scale for Gemma2 (#3113 )	2025-01-25 18:29:14 -08:00
Ke Wen	862bcff833	Support loading of larger models with on-the-fly quantization (#3061 )	2025-01-22 21:33:17 -08:00
Hui Liu	d2571dd5c7	Enable Cohere2 Models (#3018 )	2025-01-20 19:21:41 -08:00

1 2 3 4 5 ...

275 Commits