maxiao1
7993ed8ddd
适配deepseekv3.2
CI Monitor / ci-monitor (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (all, gfx942) (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (all, gfx942-rocm700) (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (all, gfx950) (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (srt, gfx942) (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (srt, gfx942-rocm700) (push) Has been cancelled
Release Docker Images Nightly (AMD) / publish (srt, gfx950) (push) Has been cancelled
Release Docker Images Nightly (Ascend NPU) / build (8.2.rc1, 910b) (push) Has been cancelled
Release Docker Images Nightly (Ascend NPU) / build (8.2.rc1, a3) (push) Has been cancelled
Build and Push Development Docker Images / build-dev-x86 (map[tag:dev type:all version:12.9.1]) (push) Has been cancelled
Build and Push Development Docker Images / build-blackwell-arm (map[tag:blackwell-cu129 type:blackwell_aarch version:12.9.1]) (push) Has been cancelled
Build and Push Development Docker Images / create-manifests (map[arm64_tag:blackwell-cu129-arm64 tag:dev-manifest x86_tag:dev]) (push) Has been cancelled
Nightly Test / nightly-test-eval-text-models (push) Has been cancelled
Nightly Test / nightly-test-perf-text-models (push) Has been cancelled
Nightly Test / nightly-test-eval-vlms (push) Has been cancelled
Nightly Test / nightly-test-perf-vlms (push) Has been cancelled
Nightly Test (AMD) / nightly-test (linux-mi300-gpu-2) (push) Has been cancelled
Nightly Test (AMD) / nightly-test (linux-mi325-gpu-2-nightly) (push) Has been cancelled
Close Inactive Issues / close-inactive-issues (push) Has been cancelled
2025-10-03 20:01:17 +08:00
maxiao
852a49c5cc
adapt to dsv32 on dcu
2025-09-30 18:37:31 +08:00
maxiao
8f7453e3af
adapt to ds3.2
2025-09-30 17:44:54 +08:00
Baizhou Zhang
f111649580
Replace os.environ in layernorm.py ( #10684 )
2025-09-20 00:20:33 -07:00
Baizhou Zhang
8ecef73f12
[1/2] Support deterministic inference with flashinfer attention backend ( #10645 )
...
Co-authored-by: hebiao064 <hebiaobuaa@gmail.com >
Co-authored-by: Qiaolin-Yu <liin1211@outlook.com >
2025-09-19 23:34:29 -07:00
Shangming Cai
f5f6b3b4b5
Refactor fused_add_rmsnorm import logic ( #10207 )
...
Signed-off-by: Shangming Cai <csmthu@gmail.com >
2025-09-09 00:23:58 -07:00
ssshinigami
5dd8c6444b
[Bug fix] Fix Gemma 2 and fix Gemma 3 multimodal with bs > 1 on NPU ( #9871 )
...
Co-authored-by: Maksim <makcum888e@mail.ru >
2025-09-08 01:19:40 -07:00
Huaiyu, Zheng
ee21817c6b
enable llama3.1-8B on xpu ( #9434 )
2025-09-07 22:34:20 -07:00
fzyzcjy
bc5fc332f7
Fix slow fused add RMSNorm ( #10141 )
2025-09-07 20:20:39 -07:00
sogalin
8d114f254b
Fix RMSNorm API CALL mismatch issue. ( #10032 )
...
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com >
2025-09-05 20:45:13 -07:00
VDV1985
ba861293cf
[feat]Ascend NPU Gemma-3-12b and Gemma-3-27b support ( #8909 )
2025-08-31 00:25:07 -07:00
strgrb
88fbc31b50
Support trtllm_allreduce_fusion in flashinfer for cuda<12.8 ( #9339 )
...
Co-authored-by: Zhang Kaihong <zhangkaihong.zkh@alibaba-inc.com >
2025-08-20 16:54:30 -07:00
RunningLeon
b7094a5ef1
model: support intern-s1 ( #8350 )
...
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com >
Co-authored-by: zxy <zhou0493@e.ntu.edu.sg >
Co-authored-by: Xinyuan Tong <xinyuantong.cs@gmail.com >
Co-authored-by: Mick <mickjagger19@icloud.com >
Co-authored-by: Xinyuan Tong <115166877+JustinTong0323@users.noreply.github.com >
2025-07-26 13:48:51 -07:00
Xiaoyu Zhang
2e7ab862e3
Fix illegal memory in trtllm allreduce fusion ( #7864 )
2025-07-08 11:47:17 -07:00
Xiaoyu Zhang
8e64140e35
[b200] support trt-llm allreduce fuse rms_norm_add kernel ( #7621 )
2025-07-02 19:36:20 -07:00
ll819214
506a2d5934
npu fused op ( #7386 )
...
Co-authored-by: Li Junwen <lijunwen13@hisilicon.com >
2025-06-25 01:54:20 -07:00
YanbingJiang
094c116f7d
Update python API of activation, topk, norm and rope and remove vllm dependency ( #6614 )
...
Co-authored-by: Wu, Chunyuan <chunyuan.wu@intel.com >
Co-authored-by: jianan-gu <jianan.gu@intel.com >
Co-authored-by: sdp <sdp@gnr799219.jf.intel.com >
2025-06-17 22:11:50 -07:00
Yijie Zhu
a39d928782
support qwen2 running on ascend npu device ( #7022 )
...
Co-authored-by: 刁莹煜 <diaoyingyu1@hisilicon.com >
2025-06-17 11:24:10 -07:00
HAI
b819381fec
AITER backend extension and workload optimizations ( #6838 )
...
Co-authored-by: wunhuang <wunhuang@amd.com >
Co-authored-by: Hubert Lu <Hubert.Lu@amd.com >
2025-06-05 23:00:18 -07:00
JieXin Liang
180ff5eecc
[fix] recover auto-dispatch for rmsnorm and rope ( #6745 )
2025-06-03 21:44:20 -07:00
applesaucethebun
2ce8793519
Add typo checker in pre-commit ( #6179 )
...
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-05-11 12:55:00 +08:00
michael-amd
93c6fb12c7
Fix: deepseek forward absorb ( #5723 )
...
Co-authored-by: ispobock <ispobaoke@163.com >
2025-04-25 13:48:55 -07:00
HAI
b0feda090c
Revert "Support aiter RMSNorm in AMD" ( #5646 )
2025-04-22 15:20:24 -07:00
michael-amd
968ef51562
Support aiter RMSNorm in AMD ( #5510 )
...
Co-authored-by: JieXin Liang <Alcanderian@users.noreply.github.com >
2025-04-21 17:40:39 -07:00
JieXin Liang
97cb762bb6
[misc] remove is_cuda_available ( #5319 )
2025-04-20 18:16:51 -07:00
Lianmin Zheng
177320a582
Clean up imports ( #5467 )
2025-04-16 15:26:49 -07:00
Mick
9d02bb3e2a
Urgent model support: support gemma-3-it ( #4424 )
2025-03-16 17:37:32 -07:00
Yineng Zhang
65b7c9b78f
cleanup deps 2/n ( #4464 )
2025-03-15 23:06:17 -07:00
Lianmin Zheng
ac2387279e
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts ( #3988 )
...
Co-authored-by: SangBin Cho <rkooo567@gmail.com >
Co-authored-by: dhou-xai <dhou@x.ai >
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu >
2025-03-03 00:12:04 -08:00
Yineng Zhang
4eb4b401cc
update and simplify CustomOp ( #3249 )
2025-02-01 18:56:44 +08:00
Yineng Zhang
2f79f58873
feat: use sgl-kernel 0.0.3 in sglang ( #3179 )
2025-01-27 21:39:52 +08:00
Xuehai Pan
62a4a339eb
docs: fix module docstrings and copyright headers ( #2077 )
2024-11-22 22:16:53 +08:00
Yineng Zhang
766192610e
feat: update torch 2.5.1 ( #2069 )
2024-11-18 21:29:13 +08:00
Lianmin Zheng
c1f401fc58
Revert "chore: update torch v2.5.1" ( #2063 )
2024-11-17 15:29:38 -08:00
Yineng Zhang
3b878863f7
chore: update torch v2.5.1 ( #1849 )
2024-11-18 00:06:00 +08:00
Lianmin Zheng
6a5b352aaf
Use is_flashinfer_available to replace is_hip for flashinfer check ( #1596 )
...
Co-authored-by: Zhang Liangang <liangang.zhang@intel.com >
2024-10-06 22:54:05 -07:00
HAI
aa2750beb3
[Bugfix] Enable SGLang on AMD GPUs via PyTorch for ROCm ( #1419 ) ( #1453 )
2024-09-18 02:01:35 -07:00
HAI
3a6e04185b
[Feature, Hardware] Enable SGLang on AMD GPUs via PyTorch for ROCm ( #1420 )
2024-09-17 07:43:52 +00:00
Yineng Zhang
b1a540ec42
feat: update GemmaRMSNorm ( #1232 )
2024-08-28 22:47:34 +10:00
Yineng Zhang
198974cd1a
feat: support sm75 with FlashInfer v0.1.6 ( #1233 )
2024-08-28 18:39:12 +10:00
Yineng Zhang
1fb9459908
fix: custom op fallback forward native when lower sm80 ( #1177 )
2024-08-21 14:26:35 -07:00
Lianmin Zheng
fb1f28cbbb
Clean up the comments and names under python/sglang/srt/layers ( #1047 )
2024-08-12 05:54:37 +00:00
Yineng Zhang
c245b78973
hotfix: add CustomOp abstraction ( #1027 )
2024-08-11 02:45:59 -07:00
Yineng Zhang
94752ac811
feat: use FlashInfer rmsnorm and silu ( #907 )
2024-08-11 14:57:13 +10:00