Commit Graph

12 Commits

Author SHA1 Message Date
Cheng Wan
295895120d [6/N] MoE Refactor: Cleanup MoE-related configs (#8849) 2025-08-14 21:14:53 -07:00
jacky.cheng
25caa7a8a9 [AMD] Support Wave attention backend with AMD GPU optimizations (#8660)
Signed-off-by: Stanley Winata <stanley.winata@amd.com>
Signed-off-by: Harsh Menon <harsh@nod-labs.com>
Signed-off-by: nithinsubbiah <nithinsubbiah@gmail.com>
Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
Signed-off-by: xintin <gaurav.verma@amd.com>
Co-authored-by: Harsh Menon <harsh@nod-labs.com>
Co-authored-by: Stanley Winata <stanley.winata@amd.com>
Co-authored-by: Stanley Winata <68087699+raikonenfnu@users.noreply.github.com>
Co-authored-by: Stanley Winata <stanley@nod-labs.com>
Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com>
Co-authored-by: nithinsubbiah <nithinsubbiah@gmail.com>
Co-authored-by: Nithin Meganathan <18070964+nithinsubbiah@users.noreply.github.com>
Co-authored-by: Ivan Butygin <ibutygin@amd.com>
2025-08-12 13:49:11 -07:00
Simo Lin
1ce30dd13e [router] update router documentation (#9121) 2025-08-12 13:16:34 -07:00
Zhiqiang Xie
0eec4cb6cc HiCache, add bench long context plus minor fixs (#9086)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-11 16:54:52 -07:00
Faraz
f508cd3cb7 TRTLLM-MLA FP8 path (#8638)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-08-11 14:02:13 -07:00
Lianmin Zheng
8c07fabda7 Update hyperparameter_tuning.md (#9083) 2025-08-11 13:44:11 -07:00
Liangsheng Yin
f9afa7dceb Fix docs for clip max new tokens (#9082) 2025-08-11 13:15:21 -07:00
Jimmy
0d9e89ec69 [PD]decode: add CLIP_MAX_NEW_TOKEN for pop_preallocated (#8866) 2025-08-11 13:08:11 -07:00
Hangzhi
3d64fda376 Fix broken Kimi models HuggingFace link (#9080) 2025-08-11 12:15:00 -07:00
Baizhou Zhang
75e6a7cde1 Support radix cache for Lora feature (#7216) 2025-08-11 10:14:11 -07:00
Lianmin Zheng
2e8e7e353b Improve docs and developer guide (#9044) 2025-08-10 21:05:18 -07:00
Lianmin Zheng
2449a0afe2 Refactor the docs (#9031) 2025-08-10 19:49:45 -07:00