Commit Graph

28 Commits

Author SHA1 Message Date
Lianmin Zheng
177320a582 Clean up imports (#5467) 2025-04-16 15:26:49 -07:00
fzyzcjy
15e91d721b Tiny fix DeepseekScalingRotaryEmbedding always use forward_native (#5406) 2025-04-15 01:33:47 -07:00
Ke Bao
5e0a9b0981 Apply deepseek cuda rope (#5385)
Co-authored-by: Yineng Zhang <me@zhyncs.com>
2025-04-14 15:22:43 -07:00
Chang Su
f04c80dc42 Add Llama4 support (#5092)
Co-authored-by: Cheng Wan <cwan39@gatech.edu>
Co-authored-by: fzyzcjy <ch271828n@outlook.com>
Co-authored-by: ispobock <ispobaoke@163.com>
2025-04-07 00:29:36 -07:00
Yuhong Guo
87fafa0105 Revert PR 4764 & 4813 related to R1 RoPE (#4959) 2025-03-31 20:56:58 -07:00
strgrb
668ecc6c5b Fix ut mla-test-1-gpu-amd (#4813)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
2025-03-27 08:27:51 -07:00
strgrb
886fcbdd09 Use apply_rope_with_cos_sin_cache_inplace for DeepSeek (#4764)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
2025-03-27 01:45:37 -07:00
Kyungmin Lee
2a206b22ed Fix RotaryEmbedding when using Triton backend for EXAONE-3.5-2.4B (#4064) 2025-03-23 17:58:12 -07:00
Adarsh Shirawalmath
f8f9244a61 [Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.50.0 (#3984)
Co-authored-by: Chayenne <zhaochen20@outlook.com>
2025-03-22 14:27:39 -07:00
Xiaoyu Zhang
804d250a0d remove useless backend forward in rotary_embedding (#4500) 2025-03-17 23:54:00 -07:00
Mick
d373a48c98 fix: second_per_grid_ts should be used to get mrope position (#3682) 2025-03-17 18:12:38 -07:00
Mick
9d02bb3e2a Urgent model support: support gemma-3-it (#4424) 2025-03-16 17:37:32 -07:00
Yineng Zhang
977d7cd26a cleanup deps 1/n (#4400)
Co-authored-by: sleepcoo <sleepcoo@gmail.com>
2025-03-14 00:00:33 -07:00
Lianmin Zheng
45de89719c Revert "[XPU][CPU] Enable the native path of DeepSeek" (#4367) 2025-03-12 23:45:52 -07:00
Meng, Hengyu
71046fcd71 [XPU][CPU] Enable the native path of DeepSeek (#4086)
Co-authored-by: Zhang, Liangang <liangang.zhang@intel.com>
2025-03-12 22:26:29 -07:00
JieXin Liang
0540fef7a1 [Fix] fix _yarn_linear_ramp_mask with device parameter (#4337) 2025-03-12 02:28:19 -07:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Yineng Zhang
4eb4b401cc update and simplify CustomOp (#3249) 2025-02-01 18:56:44 +08:00
Byron Hsu
988d0a4bfc [kernel] Use sgl_kernel rope (#3169)
Co-authored-by: zhyncs <me@zhyncs.com>
2025-01-28 14:33:11 +08:00
nstream-ai-devx
0d2148efaa fix rotary_embedding rope_scaling for phi (#3055) 2025-01-23 02:15:32 +08:00
Yineng Zhang
44a9669770 keep rotary_embedding only (#2997) 2025-01-20 13:21:36 +08:00
Yineng Zhang
2c05f81f15 fix custom op version compatibility (#2988) 2025-01-20 04:21:29 +08:00
Yineng Zhang
5a176c92df fix deepseek v2 with cpu device (#2975) 2025-01-19 21:33:27 +08:00
Yineng Zhang
2add697d7a feat: remove vllm get_rope (#2964) 2025-01-18 19:38:01 +08:00
Chunyuan WU
63051738a9 Enable CPU device on SGLang (#2806) 2025-01-16 21:22:53 -08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
yizhang2077
def55bc876 Qwen2vl support cuda graph and disable radix cache (#1780) 2024-10-25 10:45:17 -04:00
Yineng Zhang
cbbc82b7b8 Support qwen2 vl model (#1721)
Co-authored-by: yizhang2077 <1109276519@qq.com>
Co-authored-by: ispobock <ISPObaoke@163.com>
2024-10-19 21:44:38 -07:00