callmelaoyi
|
b86953acf9
|
[Kernel] Qwen3-next 优化 recompute_w_u_fwd & chunk_fwd_o (#74)
Co-authored-by: yuanjizhong <yuanjizhong@baidu.com>
|
2026-01-05 10:24:51 +08:00 |
|
Xinyu Dong
|
75d0bdae2f
|
Merge pull request #40 from ldh2020/v0.11.0dev
[Kernel] Optimize the performance of Qwen3-Next
|
2025-12-22 21:50:27 +08:00 |
|
hanhaowen
|
a4b9e92ca1
|
[Kernel] Replace native torch solve_tril by solve_tril_fwd kernel op
|
2025-12-22 17:37:19 +08:00 |
|
ldh2020
|
004e164bdb
|
[Kernel] Optimize the recurrent op
|
2025-12-21 11:18:00 +08:00 |
|
ldh2020
|
fce97df908
|
[Kernel] Use l2norm kernel op instead of triton op.
|
2025-12-16 16:24:47 +08:00 |
|
ldh2020
|
9bb2ee06a4
|
[Bugfix] fix the bug of torch_solve_tril
|
2025-12-12 17:01:50 +08:00 |
|
chenyili
|
7c22d621fb
|
提交vllm0.11.0开发分支
|
2025-12-10 17:51:24 +08:00 |
|