Commit Graph

30 Commits

Author SHA1 Message Date
ll819214
506a2d5934 npu fused op (#7386)
Co-authored-by: Li Junwen <lijunwen13@hisilicon.com>
2025-06-25 01:54:20 -07:00
YanbingJiang
094c116f7d Update python API of activation, topk, norm and rope and remove vllm dependency (#6614)
Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>
2025-06-17 22:11:50 -07:00
Yijie Zhu
a39d928782 support qwen2 running on ascend npu device (#7022)
Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>
2025-06-17 11:24:10 -07:00
woodx
e30ef368ab Feat/support rerank (#6058) 2025-06-16 10:50:01 -07:00
JieXin Liang
97cb762bb6 [misc] remove is_cuda_available (#5319) 2025-04-20 18:16:51 -07:00
Lianmin Zheng
177320a582 Clean up imports (#5467) 2025-04-16 15:26:49 -07:00
Yineng Zhang
65b7c9b78f cleanup deps 2/n (#4464) 2025-03-15 23:06:17 -07:00
Xiuyu Li
9545bfb28a fix: support gelu_new activation function in gpt2 (#3712) 2025-03-04 04:09:52 -08:00
Yineng Zhang
8db776f049 support QuickGELU (#3250) 2025-02-01 19:31:47 +08:00
Yineng Zhang
4eb4b401cc update and simplify CustomOp (#3249) 2025-02-01 18:56:44 +08:00
Yineng Zhang
2f79f58873 feat: use sgl-kernel 0.0.3 in sglang (#3179) 2025-01-27 21:39:52 +08:00
Yineng Zhang
5dc54f1a62 feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
2025-01-17 22:31:51 +08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
Yineng Zhang
766192610e feat: update torch 2.5.1 (#2069) 2024-11-18 21:29:13 +08:00
Lianmin Zheng
c1f401fc58 Revert "chore: update torch v2.5.1" (#2063) 2024-11-17 15:29:38 -08:00
Yineng Zhang
3b878863f7 chore: update torch v2.5.1 (#1849) 2024-11-18 00:06:00 +08:00
Lianmin Zheng
ebbc42d989 Optimize broadcast & Reorg code (#1598) 2024-10-07 13:19:23 -07:00
Lianmin Zheng
6a5b352aaf Use is_flashinfer_available to replace is_hip for flashinfer check (#1596)
Co-authored-by: Zhang Liangang <liangang.zhang@intel.com>
2024-10-06 22:54:05 -07:00
Yineng Zhang
b4408b0d16 feat: update linear deps 1/N (#1305) 2024-09-19 20:53:11 +08:00
HAI
aa2750beb3 [Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419) (#1453) 2024-09-18 02:01:35 -07:00
HAI
3a6e04185b [Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420) 2024-09-17 07:43:52 +00:00
Yineng Zhang
c411f32e1c feat: replace GeluAndMul (#1234) 2024-08-28 14:07:02 +00:00
Yineng Zhang
198974cd1a feat: support sm75 with FlashInfer v0.1.6 (#1233) 2024-08-28 18:39:12 +10:00
Yineng Zhang
3602692c7c feat: replace get_act_fn for gpt_bigcode (#1231) 2024-08-27 21:15:31 +10:00
Yineng Zhang
c9064e6fd9 feat: use gelu_tanh_and_mul (#1193) 2024-08-24 01:58:16 -07:00
Yineng Zhang
1fb9459908 fix: custom op fallback forward native when lower sm80 (#1177) 2024-08-21 14:26:35 -07:00
Lianmin Zheng
a59636bb5e Update grok 1 model (#1095) 2024-08-14 04:40:44 -07:00
Lianmin Zheng
fb1f28cbbb Clean up the comments and names under python/sglang/srt/layers (#1047) 2024-08-12 05:54:37 +00:00
Yineng Zhang
c245b78973 hotfix: add CustomOp abstraction (#1027) 2024-08-11 02:45:59 -07:00
Yineng Zhang
94752ac811 feat: use FlashInfer rmsnorm and silu (#907) 2024-08-11 14:57:13 +10:00