Commit Graph

44 Commits

Author SHA1 Message Date
maxiao1
7993ed8ddd 适配deepseekv3.2
Some checks failed
CI Monitor / ci-monitor (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (all, gfx942) (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (all, gfx942-rocm700) (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (all, gfx950) (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (srt, gfx942) (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (srt, gfx942-rocm700) (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (srt, gfx950) (push) Has been cancelled
Release Docker Images Nightly (Ascend NPU) / build (8.2.rc1, 910b) (push) Has been cancelled
Release Docker Images Nightly (Ascend NPU) / build (8.2.rc1, a3) (push) Has been cancelled
Build and Push Development Docker Images / build-dev-x86 (map[tag:dev type:all version:12.9.1]) (push) Has been cancelled
Build and Push Development Docker Images / build-blackwell-arm (map[tag:blackwell-cu129 type:blackwell_aarch version:12.9.1]) (push) Has been cancelled
Build and Push Development Docker Images / create-manifests (map[arm64_tag:blackwell-cu129-arm64 tag:dev-manifest x86_tag:dev]) (push) Has been cancelled
Nightly Test / nightly-test-eval-text-models (push) Has been cancelled
Nightly Test / nightly-test-perf-text-models (push) Has been cancelled
Nightly Test / nightly-test-eval-vlms (push) Has been cancelled
Nightly Test / nightly-test-perf-vlms (push) Has been cancelled
Nightly Test (AMD) / nightly-test (linux-mi300-gpu-2) (push) Has been cancelled
Nightly Test (AMD) / nightly-test (linux-mi325-gpu-2-nightly) (push) Has been cancelled
Close Inactive Issues / close-inactive-issues (push) Has been cancelled
2025-10-03 20:01:17 +08:00
maxiao
852a49c5cc adapt to dsv32 on dcu 2025-09-30 18:37:31 +08:00
maxiao
8f7453e3af adapt to ds3.2 2025-09-30 17:44:54 +08:00
Baizhou Zhang
f111649580 Replace os.environ in layernorm.py (#10684) 2025-09-20 00:20:33 -07:00
Baizhou Zhang
8ecef73f12 [1/2] Support deterministic inference with flashinfer attention backend (#10645)
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com>
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>
2025-09-19 23:34:29 -07:00
Shangming Cai
f5f6b3b4b5 Refactor fused_add_rmsnorm import logic (#10207)
Signed-off-by: Shangming Cai <csmthu@gmail.com>
2025-09-09 00:23:58 -07:00
ssshinigami
5dd8c6444b [Bug fix] Fix Gemma 2 and fix Gemma 3 multimodal with bs > 1 on NPU (#9871)
Co-authored-by: Maksim <makcum888e@mail.ru>
2025-09-08 01:19:40 -07:00
Huaiyu, Zheng
ee21817c6b enable llama3.1-8B on xpu (#9434) 2025-09-07 22:34:20 -07:00
fzyzcjy
bc5fc332f7 Fix slow fused add RMSNorm (#10141) 2025-09-07 20:20:39 -07:00
sogalin
8d114f254b Fix RMSNorm API CALL mismatch issue. (#10032)
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-09-05 20:45:13 -07:00
VDV1985
ba861293cf [feat]Ascend NPU Gemma-3-12b and Gemma-3-27b support (#8909) 2025-08-31 00:25:07 -07:00
strgrb
88fbc31b50 Support trtllm_allreduce_fusion in flashinfer for cuda<12.8 (#9339)
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>
2025-08-20 16:54:30 -07:00
RunningLeon
b7094a5ef1 model: support intern-s1 (#8350)
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: zxy <zhou0493@e.ntu.edu.sg>
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: Mick <mickjagger19@icloud.com>
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>
2025-07-26 13:48:51 -07:00
Xiaoyu Zhang
2e7ab862e3 Fix illegal memory in trtllm allreduce fusion (#7864) 2025-07-08 11:47:17 -07:00
Xiaoyu Zhang
8e64140e35 [b200] support trt-llm allreduce fuse rms_norm_add kernel (#7621) 2025-07-02 19:36:20 -07:00
ll819214
506a2d5934 npu fused op (#7386)
Co-authored-by: Li Junwen <lijunwen13@hisilicon.com>
2025-06-25 01:54:20 -07:00
YanbingJiang
094c116f7d Update python API of activation, topk, norm and rope and remove vllm dependency (#6614)
Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>
2025-06-17 22:11:50 -07:00
Yijie Zhu
a39d928782 support qwen2 running on ascend npu device (#7022)
Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>
2025-06-17 11:24:10 -07:00
HAI
b819381fec AITER backend extension and workload optimizations (#6838)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>
2025-06-05 23:00:18 -07:00
JieXin Liang
180ff5eecc [fix] recover auto-dispatch for rmsnorm and rope (#6745) 2025-06-03 21:44:20 -07:00
applesaucethebun
2ce8793519 Add typo checker in pre-commit (#6179)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 12:55:00 +08:00
michael-amd
93c6fb12c7 Fix: deepseek forward absorb (#5723)
Co-authored-by: ispobock <ispobaoke@163.com>
2025-04-25 13:48:55 -07:00
HAI
b0feda090c Revert "Support aiter RMSNorm in AMD" (#5646) 2025-04-22 15:20:24 -07:00
michael-amd
968ef51562 Support aiter RMSNorm in AMD (#5510)
Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com>
2025-04-21 17:40:39 -07:00
JieXin Liang
97cb762bb6 [misc] remove is_cuda_available (#5319) 2025-04-20 18:16:51 -07:00
Lianmin Zheng
177320a582 Clean up imports (#5467) 2025-04-16 15:26:49 -07:00
Mick
9d02bb3e2a Urgent model support: support gemma-3-it (#4424) 2025-03-16 17:37:32 -07:00
Yineng Zhang
65b7c9b78f cleanup deps 2/n (#4464) 2025-03-15 23:06:17 -07:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
Yineng Zhang
4eb4b401cc update and simplify CustomOp (#3249) 2025-02-01 18:56:44 +08:00
Yineng Zhang
2f79f58873 feat: use sgl-kernel 0.0.3 in sglang (#3179) 2025-01-27 21:39:52 +08:00
Xuehai Pan
62a4a339eb docs: fix module docstrings and copyright headers (#2077) 2024-11-22 22:16:53 +08:00
Yineng Zhang
766192610e feat: update torch 2.5.1 (#2069) 2024-11-18 21:29:13 +08:00
Lianmin Zheng
c1f401fc58 Revert "chore: update torch v2.5.1" (#2063) 2024-11-17 15:29:38 -08:00
Yineng Zhang
3b878863f7 chore: update torch v2.5.1 (#1849) 2024-11-18 00:06:00 +08:00
Lianmin Zheng
6a5b352aaf Use is_flashinfer_available to replace is_hip for flashinfer check (#1596)
Co-authored-by: Zhang Liangang <liangang.zhang@intel.com>
2024-10-06 22:54:05 -07:00
HAI
aa2750beb3 [Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419) (#1453) 2024-09-18 02:01:35 -07:00
HAI
3a6e04185b [Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420) 2024-09-17 07:43:52 +00:00
Yineng Zhang
b1a540ec42 feat: update GemmaRMSNorm (#1232) 2024-08-28 22:47:34 +10:00
Yineng Zhang
198974cd1a feat: support sm75 with FlashInfer v0.1.6 (#1233) 2024-08-28 18:39:12 +10:00
Yineng Zhang
1fb9459908 fix: custom op fallback forward native when lower sm80 (#1177) 2024-08-21 14:26:35 -07:00
Lianmin Zheng
fb1f28cbbb Clean up the comments and names under python/sglang/srt/layers (#1047) 2024-08-12 05:54:37 +00:00
Yineng Zhang
c245b78973 hotfix: add CustomOp abstraction (#1027) 2024-08-11 02:45:59 -07:00
Yineng Zhang
94752ac811 feat: use FlashInfer rmsnorm and silu (#907) 2024-08-11 14:57:13 +10:00