bjmsong
|
17de02f98d
|
Integration of TurboMind AWQ (#2828)
Co-authored-by: root <bjmsong@126.com>
|
2025-01-13 20:14:16 +08:00 |
|
kk
|
42f3909963
|
Unify sglang coding style (#2856)
Co-authored-by: Lin, Soga <soga.lin@amd.com>
|
2025-01-13 02:12:44 -08:00 |
|
Lianmin Zheng
|
72c7776355
|
Fix linear.py and improve weight loading (#2851)
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
|
2025-01-13 01:39:14 -08:00 |
|
kk
|
e808c1df3e
|
Integrate ROCm ater package for ck moe function feasibility (#2854)
Co-authored-by: wunhuang <wunhuang@amd.com>
Co-authored-by: Lin, Soga <soga.lin@amd.com>
|
2025-01-13 08:23:07 +00:00 |
|
Ke Bao
|
85b2e05770
|
Add int8 quant kernel (#2848)
|
2025-01-13 13:16:58 +08:00 |
|
Ke Bao
|
b5fb4ef58a
|
Update modelopt config and fix running issue (#2792)
|
2025-01-08 18:04:30 +08:00 |
|
Lianmin Zheng
|
8a6906127a
|
Improve linear.py to load sharded weights & remove the dependency of Parameters from vllm (#2784)
Co-authored-by: SangBin Cho rkooo567@gmail.com
|
2025-01-07 23:29:10 -08:00 |
|
Lianmin Zheng
|
bdc1acf6cd
|
Misc fix for min_p_sampling, --cuda-graph-bs (#2761)
|
2025-01-07 02:52:53 -08:00 |
|
Zhiyu
|
287427e2e6
|
Enable Nvidia's ModelOpt fp8 quantized models (#2535)
|
2025-01-06 14:54:52 -08:00 |
|
HAI
|
e6f523b5f2
|
fix typo in python/sglang/srt/layers/quantization/fp8.py (#2655)
|
2024-12-29 23:45:02 -08:00 |
|
HandH1998
|
afa0341e57
|
Update Triton configs for block fp8 kernels (#2641)
|
2024-12-29 22:53:47 +08:00 |
|
HAI
|
30828e7192
|
AMD: set weights and scaling numbers properly for block FP8 (#2637)
|
2024-12-29 03:23:39 -08:00 |
|
Yineng Zhang
|
7863e4368a
|
add configs for block fp8 related kernels (#2628)
Co-authored-by: HandH1998 <1335248067@qq.com>
|
2024-12-28 23:12:04 +08:00 |
|
Xiaoyu Zhang
|
9254a33ad4
|
avoid fused_moe_triton padding circular import (#2624)
|
2024-12-28 14:01:35 +08:00 |
|
Yineng Zhang
|
635a042623
|
docs: update deepseek v3 example (#2592)
|
2024-12-26 17:43:37 +08:00 |
|
HandH1998
|
53aed988cb
|
Refactor MoE (#2575)
Co-authored-by: zhyncs <me@zhyncs.com>
|
2024-12-26 00:02:14 +08:00 |
|
Ke Bao
|
e835a50021
|
Reorg moe code (#2563)
|
2024-12-24 01:10:22 +08:00 |
|
HAI
|
95f93f493a
|
Fp8 MoE optimizations on AMD (#2388)
|
2024-12-07 21:18:26 +08:00 |
|
Yineng Zhang
|
d332aa3b0c
|
fix: resolve fp8 moe issue (#2387)
|
2024-12-07 19:28:53 +08:00 |
|
Yineng Zhang
|
84d96b3ae5
|
Move FP8 to SGLang (#2370)
Co-authored-by: HaiShaw <hixiao@gmail.com>
|
2024-12-06 15:42:10 +08:00 |
|
HAI
|
b2986d7aa5
|
Adding SGLang FP8 Utils (#2348)
|
2024-12-04 03:01:33 -08:00 |
|
Lianmin Zheng
|
1228f7ca69
|
Fix gptq for moe layers (#2300)
Co-authored-by: root <me@zhyncs.com>
|
2024-12-03 23:12:33 +08:00 |
|
Yineng Zhang
|
55842eb81a
|
feat: fused_moe fp8 monkey patch (#2174)
|
2024-11-25 17:06:36 +08:00 |
|
Lianmin Zheng
|
be0124bda0
|
Rename triton_fused_moe -> fused_moe_triton (#2163)
|
2024-11-24 08:12:35 -08:00 |
|
Yineng Zhang
|
b509db5832
|
feat: remove the dependency on FusedMoE (#2153)
|
2024-11-24 20:09:27 +08:00 |
|
Chayenne
|
c77c1e05ba
|
fix black in pre-commit (#1940)
|
2024-11-08 07:42:47 +08:00 |
|
Xuehai Pan
|
a5e0defb5a
|
minor: Add basic editorconfig and pre-commit hooks to enforce style for whitespaces (#1926)
|
2024-11-06 13:46:04 +00:00 |
|
Ke Bao
|
16eb33ffe2
|
Update vocab embedding deps and add TP switch (#1856)
|
2024-10-31 20:13:07 -07:00 |
|
Jani Monoses
|
3ff641132e
|
Remove references to squeezellm (#1603)
|
2024-10-07 11:30:41 -07:00 |
|
Yineng Zhang
|
b4408b0d16
|
feat: update linear deps 1/N (#1305)
|
2024-09-19 20:53:11 +08:00 |
|
Lianmin Zheng
|
fb1f28cbbb
|
Clean up the comments and names under python/sglang/srt/layers (#1047)
|
2024-08-12 05:54:37 +00:00 |
|
Yineng Zhang
|
dd7e8b9421
|
chore: add copyright for srt (#790)
|
2024-07-28 23:07:12 +10:00 |
|
Ying Sheng
|
2d96da813e
|
refactor model loader [unreachable code]: initial refactor (#655)
|
2024-07-19 09:27:06 -07:00 |
|