Yineng Zhang
|
d1da58e275
|
unify is_cuda and is_hip (#4321)
|
2025-03-11 18:12:56 -07:00 |
|
Lianmin Zheng
|
e8a69e4d0c
|
Clean up fp8 support (#4230)
|
2025-03-09 21:46:35 -07:00 |
|
HandH1998
|
0dd6cda288
|
Apply sgl w8a8 fp8 kernel (#3148)
|
2025-03-09 00:03:32 -08:00 |
|
HAI
|
13bc39c5d6
|
ROCm: enable trillion-parameter MoE models with INT4-FP8 single node (#4152)
|
2025-03-06 15:33:02 -08:00 |
|
HAI
|
51d25405a7
|
ROCm: update aiter and its usage to fused moe (bloat16, fp8, fp8 block-quant) (#4053)
|
2025-03-04 03:00:46 -08:00 |
|
Lianmin Zheng
|
ac2387279e
|
Support penalty in overlap mode; return logprob with chunked prefill; improve benchmark scripts (#3988)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: dhou-xai <dhou@x.ai>
Co-authored-by: Hanming Lu <hanming_lu@berkeley.edu>
|
2025-03-03 00:12:04 -08:00 |
|
HAI
|
5c54ef0352
|
AMD/ROCm: update AITER repo to ROCm/aiter (#3747)
|
2025-02-21 00:18:08 -08:00 |
|
Ke Bao
|
c02e313914
|
Fix block wise fp8 torch compile (#3232)
|
2025-01-31 19:56:02 +08:00 |
|
Lianmin Zheng
|
52c03f16b9
|
Add activation parameters to fused_moe (#3170)
|
2025-01-27 00:23:37 -08:00 |
|
Yineng Zhang
|
5dc54f1a62
|
feat: remove vllm distributed (#2907)
Co-authored-by: Zhangyi <1109276519@qq.com>
|
2025-01-17 22:31:51 +08:00 |
|
Yineng Zhang
|
bf8d07a6f9
|
feat: patch linear base (#2915)
|
2025-01-16 18:00:03 +08:00 |
|
kk
|
42f3909963
|
Unify sglang coding style (#2856)
Co-authored-by: Lin, Soga <soga.lin@amd.com>
|
2025-01-13 02:12:44 -08:00 |
|
kk
|
e808c1df3e
|
Integrate ROCm ater package for ck moe function feasibility (#2854)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Lin, Soga <soga.lin@amd.com>
|
2025-01-13 08:23:07 +00:00 |
|
Lianmin Zheng
|
8a6906127a
|
Improve linear.py to load sharded weights & remove the dependency of Parameters from vllm (#2784)
Co-authored-by: SangBin Cho rkooo567@gmail.com
|
2025-01-07 23:29:10 -08:00 |
|
HAI
|
e6f523b5f2
|
fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655)
|
2024-12-29 23:45:02 -08:00 |
|
HAI
|
30828e7192
|
AMD: set weights and scaling numbers properly for block FP8 (#2637)
|
2024-12-29 03:23:39 -08:00 |
|
Xiaoyu Zhang
|
9254a33ad4
|
avoid fused_moe_triton padding circular import (#2624)
|
2024-12-28 14:01:35 +08:00 |
|
HandH1998
|
53aed988cb
|
Refactor MoE (#2575)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2024-12-26 00:02:14 +08:00 |
|
Ke Bao
|
e835a50021
|
Reorg moe code (#2563)
|
2024-12-24 01:10:22 +08:00 |
|
HAI
|
95f93f493a
|
Fp8 MoE optimizations on AMD (#2388)
|
2024-12-07 21:18:26 +08:00 |
|
Yineng Zhang
|
d332aa3b0c
|
fix: resolve fp8 moe issue (#2387)
|
2024-12-07 19:28:53 +08:00 |
|
Yineng Zhang
|
84d96b3ae5
|
Move FP8 to SGLang (#2370)
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2024-12-06 15:42:10 +08:00 |
|
Lianmin Zheng
|
fb1f28cbbb
|
Clean up the comments and names under python/sglang/srt/layers (#1047)
|
2024-08-12 05:54:37 +00:00 |
|
Yineng Zhang
|
dd7e8b9421
|
chore: add copyright for srt (#790)
|
2024-07-28 23:07:12 +10:00 |
|
Ying Sheng
|
2d96da813e
|
refactor model loader [unreachable code]: initial refactor (#655)
|
2024-07-19 09:27:06 -07:00 |
|