Commit Graph

19 Commits

Author SHA1 Message Date
Ke Bao
4fc5f2f977 Add unit test for triton swa kernel (#8853) 2025-08-06 16:10:38 +08:00
Lianmin Zheng
e8e18dcdcc Revert "fix some typos" (#6244) 2025-05-12 12:53:26 -07:00
applesaucethebun
d738ab52f8 fix some typos (#6209)
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-13 01:42:38 +08:00
woodx
3bface15e6 Feat/support encoder model (like bert) (#4887) 2025-04-17 01:50:48 -07:00
fzyzcjy
15ddd84322 Add retry for flaky tests in CI (#4755) 2025-03-25 16:53:12 -07:00
JieXin Liang
9e93ef3f8e [fix] fix illegal mem access and clean up triton attention backend (#4571) 2025-03-20 02:01:52 -07:00
JieXin Liang
c0e9a36c5f Optimize Triton decoding kernel for dynamic workload (#4553) 2025-03-18 21:25:38 -07:00
Ke Bao
2d61132374 Support Eagle2 for Triton backend (#3466) 2025-02-10 20:00:42 +08:00
Ke Bao
a322051e31 Support custom mask for Triton attention (#3317) 2025-02-06 01:16:02 +08:00
Ke Bao
de5533341e Update Triton extend backend interface (#3309) 2025-02-05 18:12:22 +08:00
Ke Bao
a07364ccc5 Update Triton decode backend interface (#3292) 2025-02-04 23:26:04 +08:00
Ke Bao
2f9bd0fafd Fix correctness issue for triton decoding kernel (#2479) 2024-12-14 16:50:54 +08:00
Ke Bao
61dec545b0 Remove unused vars in the triton backend (#2401) 2024-12-08 03:37:03 -08:00
Ke Bao
7dc66fcb40 Optimize Triton decoding kernel for long context (#2394) 2024-12-08 01:17:37 -08:00
Ke Bao
c77762d57f Fix Triton decode kernel & ut (#1819) 2024-10-27 10:54:38 -07:00
Liangsheng Yin
99ec439da4 Organize Attention Backends (#1547) 2024-09-30 15:54:18 -07:00
Lianmin Zheng
9ba1f09760 [Fix] Fix logprob and normalized_logprob (#1428) 2024-09-15 06:36:06 -07:00
Lianmin Zheng
fec185ce0c Refactor attention backend (#1381) 2024-09-11 11:44:26 -07:00
Byron Hsu
8c0efa514d remove assertion in triton attention and add an unit test (#1385) 2024-09-11 03:22:07 -07:00