Commit Graph

25 Commits

Author SHA1 Message Date
Yineng Zhang
d1da58e275 unify is_cuda and is_hip (#4321) 2025-03-11 18:12:56 -07:00
Lianmin Zheng
e8a69e4d0c Clean up fp8 support (#4230) 2025-03-09 21:46:35 -07:00
HandH1998
0dd6cda288 Apply sgl w8a8 fp8 kernel (#3148) 2025-03-09 00:03:32 -08:00
HAI
13bc39c5d6 ROCm: enable trillion-parameter MoE models with INT4-FP8 single node (#4152) 2025-03-06 15:33:02 -08:00
HAI
51d25405a7 ROCm: update aiter and its usage to fused moe (bloat16, fp8, fp8 block-quant) (#4053) 2025-03-04 03:00:46 -08:00
Lianmin Zheng
ac2387279e Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
2025-03-03 00:12:04 -08:00
HAI
5c54ef0352 AMD/ROCm: update AITER repo to ROCm/aiter (#3747) 2025-02-21 00:18:08 -08:00
Ke Bao
c02e313914 Fix block wise fp8 torch compile (#3232) 2025-01-31 19:56:02 +08:00
Lianmin Zheng
52c03f16b9 Add activation parameters to fused_moe (#3170) 2025-01-27 00:23:37 -08:00
Yineng Zhang
5dc54f1a62 feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
2025-01-17 22:31:51 +08:00
Yineng Zhang
bf8d07a6f9 feat: patch linear base (#2915) 2025-01-16 18:00:03 +08:00
kk
42f3909963 Unify sglang coding style (#2856)
Co-authored-by: Lin, Soga <soga.lin@amd.com>
2025-01-13 02:12:44 -08:00
kk
e808c1df3e Integrate ROCm ater package for ck moe function feasibility (#2854)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Lin, Soga <soga.lin@amd.com>
2025-01-13 08:23:07 +00:00
Lianmin Zheng
8a6906127a Improve linear.py to load sharded weights & remove the dependency of Parameters from vllm (#2784)
Co-authored-by: SangBin Cho rkooo567@gmail.com
2025-01-07 23:29:10 -08:00
HAI
e6f523b5f2 fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655) 2024-12-29 23:45:02 -08:00
HAI
30828e7192 AMD: set weights and scaling numbers properly for block FP8 (#2637) 2024-12-29 03:23:39 -08:00
Xiaoyu Zhang
9254a33ad4 avoid fused_moe_triton padding circular import (#2624) 2024-12-28 14:01:35 +08:00
HandH1998
53aed988cb Refactor MoE (#2575)
Co-authored-by: zhyncs <me@zhyncs.com>
2024-12-26 00:02:14 +08:00
Ke Bao
e835a50021 Reorg moe code (#2563) 2024-12-24 01:10:22 +08:00
HAI
95f93f493a Fp8 MoE optimizations on AMD (#2388) 2024-12-07 21:18:26 +08:00
Yineng Zhang
d332aa3b0c fix: resolve fp8 moe issue (#2387) 2024-12-07 19:28:53 +08:00
Yineng Zhang
84d96b3ae5 Move FP8 to SGLang (#2370)
Co-authored-by: HaiShaw <hixiao@gmail.com>
2024-12-06 15:42:10 +08:00
Lianmin Zheng
fb1f28cbbb Clean up the comments and names under python/sglang/srt/layers (#1047) 2024-08-12 05:54:37 +00:00
Yineng Zhang
dd7e8b9421 chore: add copyright for srt (#790) 2024-07-28 23:07:12 +10:00
Ying Sheng
2d96da813e refactor model loader [unreachable code]: initial refactor (#655) 2024-07-19 09:27:06 -07:00