Xiaoyu Zhang
|
e9c6ce461d
|
sgl scaled_fp8_quant support output padding (#4861)
|
2025-04-02 23:53:57 +08:00 |
|
Stefan He
|
e0917e6bd0
|
Remove vllm ops scaled fp8 quant and accelerate per token quant by 20-28% (#4215)
Co-authored-by: Stefan He <bhe@linkedin.com>
|
2025-03-12 00:08:03 -07:00 |
|
Yineng Zhang
|
d1da58e275
|
unify is_cuda and is_hip (#4321)
|
2025-03-11 18:12:56 -07:00 |
|
kk
|
4885b90802
|
Use forward_cuda to execute custom op for hip platform (#3305)
Co-authored-by: wunhuang <wunhuang@amd.com>
|
2025-02-05 02:58:17 +00:00 |
|
HAI
|
566d61d90f
|
ROCm: bump 6.3.0 (#3259)
|
2025-02-03 04:13:40 +08:00 |
|
Yineng Zhang
|
4eb4b401cc
|
update and simplify CustomOp (#3249)
|
2025-02-01 18:56:44 +08:00 |
|