Commit Graph

8 Commits

Author SHA1 Message Date
Ke Bao
2f9bd0fafd Fix correctness issue for triton decoding kernel (#2479) 2024-12-14 16:50:54 +08:00
Ke Bao
61dec545b0 Remove unused vars in the triton backend (#2401) 2024-12-08 03:37:03 -08:00
Ke Bao
7dc66fcb40 Optimize Triton decoding kernel for long context (#2394) 2024-12-08 01:17:37 -08:00
Ke Bao
c77762d57f Fix Triton decode kernel & ut (#1819) 2024-10-27 10:54:38 -07:00
Liangsheng Yin
99ec439da4 Organize Attention Backends (#1547) 2024-09-30 15:54:18 -07:00
Lianmin Zheng
9ba1f09760 [Fix] Fix logprob and normalized_logprob (#1428) 2024-09-15 06:36:06 -07:00
Lianmin Zheng
fec185ce0c Refactor attention backend (#1381) 2024-09-11 11:44:26 -07:00
Byron Hsu
8c0efa514d remove assertion in triton attention and add an unit test (#1385) 2024-09-11 03:22:07 -07:00