sglang

Author	SHA1	Message	Date
maxiao1	7993ed8ddd	适配deepseekv3.2 Some checks failed CI Monitor / ci-monitor (push) Has been cancelled Details Release Docker Images Nightly (AMD) / publish (all, gfx942) (push) Has been cancelled Details Release Docker Images Nightly (AMD) / publish (all, gfx942-rocm700) (push) Has been cancelled Details Release Docker Images Nightly (AMD) / publish (all, gfx950) (push) Has been cancelled Details Release Docker Images Nightly (AMD) / publish (srt, gfx942) (push) Has been cancelled Details Release Docker Images Nightly (AMD) / publish (srt, gfx942-rocm700) (push) Has been cancelled Details Release Docker Images Nightly (AMD) / publish (srt, gfx950) (push) Has been cancelled Details Release Docker Images Nightly (Ascend NPU) / build (8.2.rc1, 910b) (push) Has been cancelled Details Release Docker Images Nightly (Ascend NPU) / build (8.2.rc1, a3) (push) Has been cancelled Details Build and Push Development Docker Images / build-dev-x86 (map[tag:dev type:all version:12.9.1]) (push) Has been cancelled Details Build and Push Development Docker Images / build-blackwell-arm (map[tag:blackwell-cu129 type:blackwell_aarch version:12.9.1]) (push) Has been cancelled Details Build and Push Development Docker Images / create-manifests (map[arm64_tag:blackwell-cu129-arm64 tag:dev-manifest x86_tag:dev]) (push) Has been cancelled Details Nightly Test / nightly-test-eval-text-models (push) Has been cancelled Details Nightly Test / nightly-test-perf-text-models (push) Has been cancelled Details Nightly Test / nightly-test-eval-vlms (push) Has been cancelled Details Nightly Test / nightly-test-perf-vlms (push) Has been cancelled Details Nightly Test (AMD) / nightly-test (linux-mi300-gpu-2) (push) Has been cancelled Details Nightly Test (AMD) / nightly-test (linux-mi325-gpu-2-nightly) (push) Has been cancelled Details Close Inactive Issues / close-inactive-issues (push) Has been cancelled Details	2025-10-03 20:01:17 +08:00
maxiao	852a49c5cc	adapt to dsv32 on dcu	2025-09-30 18:37:31 +08:00
maxiao	8f7453e3af	adapt to ds3.2	2025-09-30 17:44:54 +08:00
Baizhou Zhang	f111649580	Replace os.environ in layernorm.py (#10684 )	2025-09-20 00:20:33 -07:00
Baizhou Zhang	8ecef73f12	[1/2] Support deterministic inference with flashinfer attention backend (#10645 ) Co-authored-by: hebiao064 <hebiaobuaa@gmail.com> Co-authored-by: Qiaolin-Yu <liin1211@outlook.com>	2025-09-19 23:34:29 -07:00
Shangming Cai	f5f6b3b4b5	Refactor fused_add_rmsnorm import logic (#10207 ) Signed-off-by: Shangming Cai <csmthu@gmail.com>	2025-09-09 00:23:58 -07:00
ssshinigami	5dd8c6444b	[Bug fix] Fix Gemma 2 and fix Gemma 3 multimodal with bs > 1 on NPU (#9871 ) Co-authored-by: Maksim <makcum888e@mail.ru>	2025-09-08 01:19:40 -07:00
Huaiyu, Zheng	ee21817c6b	enable llama3.1-8B on xpu (#9434 )	2025-09-07 22:34:20 -07:00
fzyzcjy	bc5fc332f7	Fix slow fused add RMSNorm (#10141 )	2025-09-07 20:20:39 -07:00
sogalin	8d114f254b	Fix RMSNorm API CALL mismatch issue. (#10032 ) Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>	2025-09-05 20:45:13 -07:00
VDV1985	ba861293cf	[feat]Ascend NPU Gemma-3-12b and Gemma-3-27b support (#8909 )	2025-08-31 00:25:07 -07:00
strgrb	88fbc31b50	Support trtllm_allreduce_fusion in flashinfer for cuda<12.8 (#9339 ) Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com>	2025-08-20 16:54:30 -07:00
RunningLeon	b7094a5ef1	model: support intern-s1 (#8350 ) Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: zxy <zhou0493@e.ntu.edu.sg> Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com> Co-authored-by: Mick <mickjagger19@icloud.com> Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com>	2025-07-26 13:48:51 -07:00
Xiaoyu Zhang	2e7ab862e3	Fix illegal memory in trtllm allreduce fusion (#7864 )	2025-07-08 11:47:17 -07:00
Xiaoyu Zhang	8e64140e35	[b200] support trt-llm allreduce fuse rms_norm_add kernel (#7621 )	2025-07-02 19:36:20 -07:00
ll819214	506a2d5934	npu fused op (#7386 ) Co-authored-by: Li Junwen <lijunwen13@hisilicon.com>	2025-06-25 01:54:20 -07:00
YanbingJiang	094c116f7d	Update python API of activation, topk, norm and rope and remove vllm dependency (#6614 ) Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com> Co-authored-by: jianan-gu <jianan.gu@intel.com> Co-authored-by: sdp <sdp@gnr799219.jf.intel.com>	2025-06-17 22:11:50 -07:00
Yijie Zhu	a39d928782	support qwen2 running on ascend npu device (#7022 ) Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com>	2025-06-17 11:24:10 -07:00
HAI	b819381fec	AITER backend extension and workload optimizations (#6838 ) Co-authored-by: wunhuang <wunhuang@amd.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com>	2025-06-05 23:00:18 -07:00
JieXin Liang	180ff5eecc	[fix] recover auto-dispatch for rmsnorm and rope (#6745 )	2025-06-03 21:44:20 -07:00
applesaucethebun	2ce8793519	Add typo checker in pre-commit (#6179 ) Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-05-11 12:55:00 +08:00
michael-amd	93c6fb12c7	Fix: deepseek forward absorb (#5723 ) Co-authored-by: ispobock <ispobaoke@163.com>	2025-04-25 13:48:55 -07:00
HAI	b0feda090c	Revert "Support aiter RMSNorm in AMD" (#5646 )	2025-04-22 15:20:24 -07:00
michael-amd	968ef51562	Support aiter RMSNorm in AMD (#5510 ) Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com>	2025-04-21 17:40:39 -07:00
JieXin Liang	97cb762bb6	[misc] remove is_cuda_available (#5319 )	2025-04-20 18:16:51 -07:00
Lianmin Zheng	177320a582	Clean up imports (#5467 )	2025-04-16 15:26:49 -07:00
Mick	9d02bb3e2a	Urgent model support: support gemma-3-it (#4424 )	2025-03-16 17:37:32 -07:00
Yineng Zhang	65b7c9b78f	cleanup deps 2/n (#4464 )	2025-03-15 23:06:17 -07:00
Lianmin Zheng	ac2387279e	Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988 ) Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: dhou-xai <dhou@x.ai> Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>	2025-03-03 00:12:04 -08:00
Yineng Zhang	4eb4b401cc	update and simplify CustomOp (#3249 )	2025-02-01 18:56:44 +08:00
Yineng Zhang	2f79f58873	feat: use sgl-kernel 0.0.3 in sglang (#3179 )	2025-01-27 21:39:52 +08:00
Xuehai Pan	62a4a339eb	docs: fix module docstrings and copyright headers (#2077 )	2024-11-22 22:16:53 +08:00
Yineng Zhang	766192610e	feat: update torch 2.5.1 (#2069 )	2024-11-18 21:29:13 +08:00
Lianmin Zheng	c1f401fc58	Revert "chore: update torch v2.5.1" (#2063 )	2024-11-17 15:29:38 -08:00
Yineng Zhang	3b878863f7	chore: update torch v2.5.1 (#1849 )	2024-11-18 00:06:00 +08:00
Lianmin Zheng	6a5b352aaf	Use is_flashinfer_available to replace is_hip for flashinfer check (#1596 ) Co-authored-by: Zhang Liangang <liangang.zhang@intel.com>	2024-10-06 22:54:05 -07:00
HAI	aa2750beb3	[Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1419 ) (#1453 )	2024-09-18 02:01:35 -07:00
HAI	3a6e04185b	[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm (#1420 )	2024-09-17 07:43:52 +00:00
Yineng Zhang	b1a540ec42	feat: update GemmaRMSNorm (#1232 )	2024-08-28 22:47:34 +10:00
Yineng Zhang	198974cd1a	feat: support sm75 with FlashInfer v0.1.6 (#1233 )	2024-08-28 18:39:12 +10:00
Yineng Zhang	1fb9459908	fix: custom op fallback forward native when lower sm80 (#1177 )	2024-08-21 14:26:35 -07:00
Lianmin Zheng	fb1f28cbbb	Clean up the comments and names under python/sglang/srt/layers (#1047 )	2024-08-12 05:54:37 +00:00
Yineng Zhang	c245b78973	hotfix: add CustomOp abstraction (#1027 )	2024-08-11 02:45:59 -07:00
Yineng Zhang	94752ac811	feat: use FlashInfer rmsnorm and silu (#907 )	2024-08-11 14:57:13 +10:00

44 Commits