Ma Mingfei
|
5ad296bda1
|
Optimize prefill performance on cpu backend (#8750)
|
2025-08-28 17:21:55 -07:00 |
|
YanbingJiang
|
1fe691a429
|
Fix FP8 block quantization when N or K is not multiples of 128 (#8648)
|
2025-08-01 15:57:19 -07:00 |
|
Chunyuan WU
|
128f16a817
|
[CPU]convert topk_weights to fp32 for INT8 and FP8 paths (for llama4) and fix LmHead weight pack (#7818)
|
2025-07-08 19:27:24 -07:00 |
|
YanbingJiang
|
fcde67b016
|
CPU: map changes from developing branch in sgl-kernel (#6833)
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-06-10 01:08:15 -07:00 |
|
Chunyuan WU
|
3ded6235c9
|
Add fp8 fused_experts kernel for CPU in sgl-kernel and add UT (#6404)
|
2025-05-23 02:01:55 -07:00 |
|
blzheng
|
cfe48c5902
|
[CPU] Fix build issue (#6419)
|
2025-05-21 11:17:10 -07:00 |
|
Chunyuan WU
|
5dd62c3a6f
|
Add fp8 shared_expert kernel for CPU in sgl-kernel and add UT (#6339)
Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>
Co-authored-by: mingfeima <mingfei.ma@intel.com>
|
2025-05-18 12:42:15 -07:00 |
|
Ma Mingfei
|
a73c4df438
|
Add optimized native kernels in sgl-kernel (#5150)
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: blzheng <beilei.zheng@intel.com>
|
2025-04-08 09:37:46 -07:00 |
|