set_rows
f16
f32
* opencl: add `set_rows` for `f16` and `f32` * opencl: better choose workgroup size for `set_rows`