sglang

Author	SHA1	Message	Date
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
fzyzcjy	15e91d721b	Tiny fix DeepseekScalingRotaryEmbedding always use forward_native (#5406 )	2025-04-15 01:33:47 -07:00
Ke Bao	5e0a9b0981	Apply deepseek cuda rope (#5385 ) Co-authored-by: Yineng Zhang <me@zhyncs.com>	2025-04-14 15:22:43 -07:00
Chang Su	f04c80dc42	Add Llama4 support (#5092 ) Co-authored-by: Cheng Wan <cwan39@gatech.edu> Co-authored-by: fzyzcjy <ch271828n@outlook.com> Co-authored-by: ispobock <ispobaoke@163.com>	2025-04-07 00:29:36 -07:00
Yuhong Guo	87fafa0105	Revert PR 4764 & 4813 related to R1 RoPE (#4959 )	2025-03-31 20:56:58 -07:00
strgrb	668ecc6c5b	Fix ut mla-test-1-gpu-amd (#4813 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-03-27 08:27:51 -07:00
strgrb	886fcbdd09	Use apply_rope_with_cos_sin_cache_inplace for DeepSeek (#4764 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-03-27 01:45:37 -07:00
Kyungmin Lee	2a206b22ed	Fix RotaryEmbedding when using Triton backend for EXAONE-3.5-2.4B (#4064 )	2025-03-23 17:58:12 -07:00
Adarsh Shirawalmath	f8f9244a61	[Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.50.0 (#3984 ) Co-authored-by: Chayenne <zhaochen20@outlook.com>	2025-03-22 14:27:39 -07:00
Xiaoyu Zhang	804d250a0d	remove useless backend forward in rotary_embedding (#4500 )	2025-03-17 23:54:00 -07:00
Mick	d373a48c98	fix: second_per_grid_ts should be used to get mrope position (#3682 )	2025-03-17 18:12:38 -07:00
Mick	9d02bb3e2a	Urgent model support: support gemma-3-it (#4424 )	2025-03-16 17:37:32 -07:00
Yineng Zhang	977d7cd26a	cleanup deps 1/n (#4400 ) Co-authored-by: sleepcoo <sleepcoo@gmail.com>	2025-03-14 00:00:33 -07:00
Lianmin Zheng	45de89719c	Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367 )	2025-03-12 23:45:52 -07:00
Meng, Hengyu	71046fcd71	[XPU][CPU] Enable the native path of DeepSeek (#4086 ) Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>	2025-03-12 22:26:29 -07:00
JieXin Liang	0540fef7a1	[Fix] fix _yarn_linear_ramp_mask with device parameter (#4337 )	2025-03-12 02:28:19 -07:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Yineng Zhang	4eb4b401cc	update and simplify CustomOp (#3249 )	2025-02-01 18:56:44 +08:00
Byron Hsu	988d0a4bfc	[kernel] Use sgl_kernel rope (#3169 ) Co-authored-by: zhyncs <me@zhyncs.com>	2025-01-28 14:33:11 +08:00
nstream-ai-devx	0d2148efaa	fix rotary_embedding rope_scaling for phi (#3055 )	2025-01-23 02:15:32 +08:00
Yineng Zhang	44a9669770	keep rotary_embedding only (#2997 )	2025-01-20 13:21:36 +08:00
Yineng Zhang	2c05f81f15	fix custom op version compatibility (#2988 )	2025-01-20 04:21:29 +08:00
Yineng Zhang	5a176c92df	fix deepseek v2 with cpu device (#2975 )	2025-01-19 21:33:27 +08:00
Yineng Zhang	2add697d7a	feat: remove vllm get_rope (#2964 )	2025-01-18 19:38:01 +08:00
Chunyuan WU	63051738a9	Enable CPU device on SGLang (#2806 )	2025-01-16 21:22:53 -08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
yizhang2077	def55bc876	Qwen2vl support cuda graph and disable radix cache (#1780 )	2024-10-25 10:45:17 -04:00
Yineng Zhang	cbbc82b7b8	Support qwen2 vl model (#1721 ) Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: ispobock <ISPObaoke@163.com>	2024-10-19 21:44:38 -07:00

28 Commits